5. Quality characteristics of the Coronavirus (COVID-19) Infection Survey
Relevance
(The degree to which statistical outputs meet current and potential user needs.)
The Coronavirus (COVID-19) Infection Survey, looks to estimate the percentage of the population testing positive for COVID-19 and helps track the current extent of infection and transmission of COVID-19 among the community population as a whole.
We use the number of people testing positive for COVID-19 with PCR tests via nose and throat swabs to calculate the proportion of the community population who test positive for the infection at a given point in time (positivity rate), and the number of new infections over a given time period (incidence rate).
We calculate the positivity rate for COVID-19 in England, Wales, Northern Ireland, and Scotland as well as regions of England and, when the positivity rate allows, also for sub-regional geographies for the UK. Note that we report the positivity rate rather than the prevalence rate, as to calculate the latter would need an accurate understanding of the swab test's sensitivity (true-positive rate) and specificity (true-negative rate).
The incidence rate is a measure of only new infections in a given time period. A statistical model calculates the clearance time (the time between the first positive test and last time a participant would have tested positive). New infections are then deduced from these clearance times in England, Wales, Northern Ireland, and Scotland.
We use blood test results to identify individuals who have antibodies against SARS-CoV-2, which helps us to understand who has had COVID-19 in the past and the impact of vaccinations. These blood tests are taken from individuals aged 16 years and over from a randomly selected subsample of households. We also present data on the characteristics of people testing positive for COVID-19 in our fortnightly bulletin. Topics to be included vary depending on user need.
Statistics from the COVID-19 Infection Survey are used to aid government decision-making, providing insights on how the infection is spreading in the community. This helps the government to make informed decisions on important policies such as changes to restrictions and planning for services and vaccination rollout.
Analysis feeding into calculation of the reproduction number, R
The statistics produced from this survey contribute to modelling, which is used to calculate the reproduction number (R) of the virus.
R is the average number of secondary infections produced by one infected person. The Scientific Pandemic Influenza Group on Modelling (SPI-M), a sub-group of the Scientific Advisory Group for Emergencies (SAGE), has built a consensus on the value of R based on expert scientific advice from multiple academic groups.
Accuracy and reliability
(The accuracy of statistical outputs is the degree of closeness between an estimate and the true value that the statistics were intended to measure. Reliability refers to the closeness of the initial estimates value to the subsequent estimates value.)
Uncertainty
The estimates presented in our weekly COVID-19 Infection Survey bulletin and fortnightly characteristics and antibody and vaccination bulletins contain uncertainty. There are many sources of uncertainty, but the main sources in the information presented include each of the following:
uncertainty in the test (false-positives, false-negatives and timing of the infection)
data are based on a sample of people rather than the whole population, so there is some statistical uncertainty in the estimates
uncertainty in the model
uncertainty in the quality of data collected in the questionnaire and in the swabbing procedure
Results come directly from the laboratory that performs the PCR test, and no test is perfect. There will be false-positive and false-negative results from the tests, and false-negatives could also come from the fact that participants in this study are self-swabbing. More information about the potential impact of false-positives and false-negatives is provided in the Test sensitivity and specificity section.
Any estimate based on a random sample contains some uncertainty. If we were to repeat the whole process many times, we would expect the true value to lie in the 95% confidence interval on 95% of occasions. A wider interval indicates more uncertainty in the estimate.
As in any survey, some data can be incorrect or missing. For example, participants and interviewers sometimes misinterpret questions, record information that is not entirely accurate, or skip them by accident. To minimise the impact of this, we clean the data, editing or removing data that are clearly incorrect. For more information, see our methodology page on statistical uncertainty.
Response rates
Participants selected and invited to take part in the Coronavirus (COVID-19) Infection Survey are not given a specified date by which to respond, and as a result reported response rates will increase as time progresses. Although most responses occur within the first few weeks after invitation letters are sent, they can continue to increase for some time after that.
We have used two approaches to selecting households for the survey: the first was to re-contact named previous respondents from other ONS surveys who agreed to further contact about other research (this now makes up a small subset of the overall sample), and the second is by writing to "the householder" at addresses selected from the AddressBase (a sampling frame).
For more information on the sampling process, see Section 2: Study design: sampling in our methods article. For up-to-date information on our response rates, please see our most recent bulletin.
Response rates for each nation are found in the dataset that accompanies this bulletin. We provide response rates separately for the different sampling phases of the study.
Communicating uncertainty
The data that are modelled are drawn from a sample and so there is uncertainty around the estimates that the model produces, which is based on a number of assumptions. Because a Bayesian regression model was used, we present estimates along with credible intervals. These 95% credible intervals can be interpreted as there being a 95% probability that such intervals will contain the true value being estimated. Again, a wider interval indicates more uncertainty in the estimate.
For our weighted estimates, confidence intervals are provided. These again are calculated so that if we were to repeat the survey many times on the same occasion and in the same conditions, in 95% of these surveys the true population value would be contained within the 95% confidence intervals. Smaller intervals suggest greater certainty in the estimate, whereas wider intervals suggest greater uncertainty in the estimate.
Further information on confidence and credible intervals can be found in Section 13: Confidence intervals and credible intervals in our methods article.
Representativeness
Ensuring a representative sample of the general population is important for producing survey-based estimates broken down by characteristics such as age, sex, region and ethnicity. In the Coronavirus (COVID-19) Infection Survey, this is important because estimates of COVID-19 positivity rates and antibody rates are required to help us understand trends in different population sub-groups and different parts of the country.
The ONS regularly produces information on the representativeness of the survey. Findings show that the swabs sample is representative of both males and females at a UK level and for all the nations of the UK. All age groups are well represented, and the swab sample is representative of all regions and representative of Wales, Scotland, and Northern Ireland in terms of population share. The white group is overrepresented at the UK level and at a UK level, households of three or more are overrepresented, while households of one person or two people are underrepresented.
The following tables provide an example of some of the representative analysis for the swabs sample for the UK for the week of the 15 May 2021. The unweighted response population is the actual number of people taking part in the survey, while the weighted population has been adjusted to be representative of the target population. The calibration step of the weighting ensures coherence for those variables and categories used in the weighting. Hence the "actual proportion" and "weighted proportion" agree for every category of age and sex because they are used in the weighting.
Download this table Table 1a: Actual UK population by sex
.xls
.csv
Download this table Table 1b: Response population for the COVID-19 Infection Survey by sex
.xls
.csv
Download this table Table 1c: Representativeness of the response population compared with the actual population by sex
.xls
.csv
Download this table Table 1d: Actual UK population by age
.xls
.csv
Download this table Table 1e: Response population for the COVID-19 Infection Survey by age
.xls
.csv
Download this table Table 1f: Representativeness of the response population compared with the actual population by age
.xls
.csv
Download this table Table 1g: : Actual UK population by ethnicity
.xls
.csv
Download this table Table 1h: Response population for the COVID-19 Infection Survey by ethnicity
.xls
.csv
Download this table Table 1i: Representativeness of the response population compared with the actual population by ethnicity
.xls
.csv
To address the fact that some individuals in the survey will have dropped out and others will not respond to the initial invite and to reduce potential bias, the regression models used to produce our estimates adjust the survey results to be more representative of the overall population in terms of age, sex, region (for England) and ethnicity (for England, Scotland and Wales). For more information see our methods article.
We are also looking at further ways we can improve the representativeness of individuals taking part in the survey. For example, we have a programme of work to look at increasing the representativeness of ethnicity in the sample. This includes strategies such as sending out reminders and using community engagement officers to go into communities with underrepresented ethnic groups to explain why taking part in the survey is important.
Coherence and comparability
(Coherence is the degree to which data that are derived from different sources or methods, but refer to the same topic, are similar. Comparability is the degree to which data can be compared over time and between geographic areas.)
The ONS and its academic partners carry out extensive quality assurance in producing these statistics, from checking data received is in the expected format and statistics produced look plausible, to triangulation with other COVID-19 data sources. These are detailed further in this section along with information on why estimates between the different data sources may differ.
NHS Test and Trace
Each nation of the UK (England, Wales, Northern Ireland and Scotland) has a Test and Trace system. These ensure that anyone who develops symptoms of COVID-19 can quickly be tested to find out if they have the virus. Some nations also include targeted asymptomatic testing of NHS and social care staff and care home residents. Additionally, it helps trace close recent contacts of anyone who tests positive for COVID-19 and, if necessary, notify them that they must self-isolate. We have published an article that compares the methods used in the COVID-19 Infection Survey and NHS Test and Trace in England.
In comparison with Public Health data and Test and Trace data, the statistics presented in our weekly bulletin take a representative sample of the community population (those in private residential households), including people who are not otherwise prioritised for testing. This means that we can estimate the number of people in the community population with COVID-19 who do not report any evidence of symptoms, which is one of the unique features of the Coronavirus (COVID-19) Infection Survey.
Laboratory confirmed cases in the UK
Public Health England (PHE) presents data on the total number of laboratory-confirmed cases in the UK, which capture the cumulative number of people in the UK who have tested positive for COVID-19. These statistics present all known cases of COVID-19, both current and historical, for the UK, and by nation, by regions of England, and because of the large sample size, by local authority. Further information can be found on the Coronavirus Dashboard. A summary for England, Wales, Scotland and Northern Ireland is also available.
Other studies
This study is one of a number of studies that look to provide information around the coronavirus pandemic within the UK.
COVID Symptom Study (ZOE app and King's College London), UK
The COVID Symptom Study app allows users to log their health each day, including whether or not they have symptoms of COVID-19. The study aims to predict which combination of symptoms indicate that someone is likely to test positive for COVID-19. The app was developed by the health science company ZOE with data analysis conducted by King's College London. Anyone over the age of 18 years can download the app and take part in the study. Respondents can report symptoms of children.
The study estimates the total number of people with symptomatic COVID-19 and the daily number of new cases of COVID-19 based on app data and swab tests taken in conjunction with the Department of Health and Social Care (DHSC). The study investigates the "predictive power of symptoms", and so the data do not capture people who are infected with COVID-19 but who do not display symptoms.
Unlike the data presented in the COVID-19 Infection Survey bulletins, the COVID Symptom Study may not be a representative sample of the population. It is reliant on app users and so captures only some cases in hospitals, care homes and other communities where fewer people use the app. To account for this, the model adjusts for age and deprivation when producing UK estimates. The larger sample size allows for detailed geographic breakdown.
Real-time Assessment of Community Transmission-1 and -2 (REACT-1 and -2), England
Like our study, the Real-time Assessment of Community Transmission-1 (REACT-1) survey, led by Imperial College London, involves taking swab samples to test for COVID-19 antigens to estimate the prevalence and transmission of the virus that causes COVID-19 in the community. Each round of the study currently involves around 160,000 participants aged five years and over, selected from a random cross-section sample of the general public from GP registration data. It is also possible to look at trends in infection rates by different characteristics, such as age, sex, ethnicity, symptoms and key worker status through the study.
One of the main differences from our COVID-19 Infection Survey is that the REACT surveys do not require follow-up visits, as the study is interested primarily in prevalence at a given time point.
Public Health England surveillance
Public Health England (PHE) also publishes an estimate of the prevalence of antibodies in the blood in England using blood samples from healthy adult blood donors. PHE provides estimates by region and currently do not scale up to England.
Estimates in our bulletins and those published by PHE are based on different tests; PHE estimates are based on testing using the Euroimmun assay method, while blood samples in our survey are tested for antibodies by research staff at the University of Oxford using a novel ELISA.
For more information about our antibody tests, see the COVID-19 Infection Survey protocol.
Insights
The ONS's latest insights tool provides an overview of the coronavirus (COVID-19) pandemic in the UK bringing together data from across the ONS and other data sources to explore the latest data and trends.
Accessibility and clarity
(Accessibility is the ease with which users are able to access the data, also reflecting the format in which the data are available and the availability of supporting information. Clarity refers to the quality and sufficiency of the release details, illustrations and accompanying advice.)
Our recommended format for accessible content is a combination of HTML webpages for narrative, charts and graphs, with data being provided in usable formats such as Excel spreadsheets. Our website also offers users the option to download the narrative in PDF format. Our outputs conform to the ONS Web accessibility policy in terms of formats and font sizes and the presentation of tables and charts.
More details on related releases can be found on the release calendar on GOV.UK. If there are any changes to the pre-announced release schedule, public attention will be drawn to the change and the reasons for the change will be explained fully.
Early management information from the Coronavirus (COVID-19) Infection Survey is made available to government decision-makers to inform their response to COVID-19. Occasionally, we may publish figures early if it is considered in the public interest. We will ensure that we pre-announce any ad hoc or early publications as soon as possible. These will include supporting information where possible to aid user understanding. This is consistent with guidance from the Office for Statistics Regulation.
In addition to this Quality and Methodology Information Report, quality and methods information is included in our weekly bulletin.
COVID-19 Infection Survey data are available in our Secure Research Service (SRS); this provides access to microdata and disclosive data, which have the potential to identify individuals. Access to such data requires Approved Researcher accreditation.
Timeliness and punctuality
(Timeliness describes the length of time between data availability and the event they describe. Punctuality is the time lag between the actual delivery of data and the target date on which they were scheduled for release as announced in an official release calendar.)
Survey fieldwork for the Coronavirus (COVID-19) Infection Survey began in England on 26 April 2020 and was expanded to cover Wales, Northern Ireland and Scotland across summer and autumn 2020. Headline figures were provided for each country as soon as sample sizes were sufficiently large to allow for good quality estimates to be produced.
The main aim of the COVID-19 Infection Survey is to provide data on the spread of infection to inform the public and organisations involved in decision-making. The survey also provides valuable information on characteristics of people testing positive (such as symptoms or amount of contact with others) and estimates of the population who would test positive for antibodies. Therefore, these data need to be collected, processed and published within a short time frame. Our typical publications are as follows:
Other products such as blogs and technical articles are also published on an ad hoc basis. For more details on related releases, the GOV.UK release calendar is available online and provides advance notice of release dates.
Reference dates
We aim to provide the estimates of positivity rates and incidence that are most timely and most representative of each week. We decide the most recent week we can report on is based on the availability of test results for visits that have already happened, accounting for the fact that swabs have to be couriered to the labs, tested and results returned. Typically, the cut-off date for data that are published on the Friday will be the previous Saturday. For example, our bulletin published on Friday 16 July 2021 included data related to 4 to 10 July 2021.
Within the most recent week, we provide an official estimate for positivity rate and incidence based on a reference point from the modelled trends. For positivity rates, we can include all swab test results, even from the most recent visits. Therefore, although we are still expecting further swab test results from the labs, there is sufficient data for the official estimate for infection to be based on a reference point after the start of the reference week. To improve stability in our modelling while maintaining relative timeliness of our estimates, we report our official estimates based on the midpoint of the reference week.
The calculation of incidence uses time between two tests; so, for example, a participant who was last seen two weeks ago and is not due their next visit for another two weeks only contributes to the model up to two weeks ago. Our official estimates of incidence are therefore based on the first day of the reference week.
Why you can trust our data
The Office for National Statistics (ONS) is the UK's largest independent producer of statistics and its National Statistical Institute. The Data Policies and Information Charter details how data are collected, secured and used in the publication of statistics. We treat the data that we hold with respect, keeping it secure and confidential, and we use statistical methods that are professional, ethical and transparent. View more information about our data policies.
The COVID-19 Infection Survey has been carefully designed and tested and is being delivered in partnership with University of Oxford, University of Manchester, Public Health England and Wellcome Trust.
Output quality trade-offs
(Trade-offs are the extent to which different dimensions of quality are balanced against each other.)
Provisional estimates and revisions
The general principle applied to the Coronavirus (COVID-19) Infection Survey will be that when data are found to be in error, both the data and any associated analysis that has been published by the Office for National Statistics (ONS) will be revised in line with our revisions and corrections policy.
There are a number of reasons why we may wish to revise Coronavirus (COVID-19) Infection Survey estimates once they have been published and/or the datasets disseminated, including:
errors are discovered in raw, or derived variables
initial estimates are released with the expectation that these may be revised and updated as further data become available; for example, the use of models where later data points will affect the modelled estimate at earlier time periods
a significant methods change is made
Revisions made because of errors discovered in raw, or derived variables
While every effort is made to thoroughly check the data before they are either published or released for dissemination, errors do on occasion occur. This can include errors with the analysis produced, such as categories not including the correct people, or errors made when data are inputted into spreadsheets. When errors occur, corrections are made in a timely manner, announced and clearly explained to users in line with the ONS guide to statistical revisions. Work is also undertaken to mitigate the same error happening again, for example, by reviewing and improving code.
Revisions made when more recent estimates become available
Modelling is used to produce times series estimates of positivity. Without modelling, changes in the point estimates of positivity over time could be quite erratic, caused by statistical uncertainty in the data (small sample sizes and low prevalence rates). This could provide time series that would not be considered a credible description of real-world changes, which would be much smoother. However, the use of modelling means that the estimate for any specified time point will be subject to revision as more time points are added to the model.
Therefore, estimates presented in our weekly bulletin are provisional results and subject to revision. Modelled estimates include all swab results that are available at the time the official estimates are produced. This is done to provide timely estimates to government decision-makers.
Official estimates should be used to understand the positivity rate for a single point in time. This is based on the modelled estimate for the latest week and is our best and most stable estimate. Additional swab tests that become available after this are included in subsequent models, meaning that modelled estimates can change slightly as additional data are included.
A new model for the most recent six-week period available is produced for each weekly bulletin, meaning that estimates for days within that six-week period that were covered in previous bulletins are revised. The modelled estimate is more suited to understand the recent trend, given it is regularly updated to include new test results and smooths the trend over time. In line with the ONS guide to statistical revisions, it is made clear in the bulletin that figures are initial estimates and subject to revision later.
Revisions made due to significant method changes
The COVID-19 Infection Survey was rapidly set up in response to the COVID-19 pandemic and launched on 26 April 2020. Because the survey is relatively new and there is an ongoing need for analysis to be responsive to the changing nature of the pandemic, methodological changes are inevitable.
In line with the ONS guide to statistical revisions and the Code of Practice for Statistics (Quality 2.5), when possible users are consulted and provided with advance notice about changes to methods, explaining why the changes are being made. When a methods change is made, a consistent time series is produced, with back series provided where possible. Users are made aware of the nature and extent of the change within the publications.
Concepts and definitions
(Concepts and definitions describe the legislation governing the output as well as harmonisation principles and classifications used in the output.)
Community
The Coronavirus (COVID-19) Infection Survey presents estimates for the number of current COVID-19 infections within the community population; community in this instance refers to private residential households and it excludes those in hospitals, care homes and/or other institutional and communal establishment settings.
Positivity rate
The positivity rate is the percentage of people who test positive for COVID-19 at a given point in time. We use current COVID-19 infections to mean testing positive for SARS-CoV-2, with or without having symptoms, on a swab taken from the nose and throat. This is different to the incidence rate, which is a measure of only the new infections in a given time period.
Incidence rate
The estimates of incidence of polymerase chain reaction (PCR)-positive cases use a new method based on our positivity estimate. This gives the rate at which new positives occur, and subsequently become detectable, within the population. The new incidence method uses an estimate of the length of time for which an individual will test positive, based on modelling the time from first positive to first subsequent negative test in the survey. This estimate is used alongside the positivity model to produce an incidence estimate. For more information on this method of incidence please see our methods article.
Characteristics
Participants are asked to provide their ethnicity and occupation (among other things) in the participant questionnaire to allow analysis of the characteristics of those testing positive for COVID-19.
The options provided on the questionnaire for ethnicity are harmonised to allow for consistency and comparability of statistical outputs from different sources across the UK. The participant's occupation is provided in a free text box and responses are coded using the Standard Occupation Classification, again to allow for consistency and comparability of statistical outputs from different sources across the UK.
Geographic coverage
Survey fieldwork for the pilot study began in England on 26 April 2020. Survey fieldwork in Wales began on 29 June 2020, and since 7 August 2020 we have reported headline figures for Wales. Survey fieldwork began in Northern Ireland on 26 July 2020 and since 25 September 2020 we have reported headline figures for Northern Ireland. Survey fieldwork in Scotland began on 21 September 2020, and we have reported headline figures for Scotland since 23 October 2020.
Sub-regional analysis
Where possible, we present modelled estimates for the most recent week of data at the sub-regional level. This analysis was first presented in our weekly COVID-19 Infection Survey bulletin on 20 November 2020. To balance the granularity with the statistical power, we have grouped together local authorities into COVID-19 Infection Survey sub-regions. The geographies are a rules-based composition of local authorities, and local authorities with a population over 200,000 have been retained separately where possible.
The boundaries for these COVID-19 Infection Survey sub-regions can be found on the Open Geography Portal.
Nôl i'r tabl cynnwys