1. Background

"The UK's decennial population census is central to decisions in all areas of society - whether by businesses, councils, the health service or charities. It is the basis of population estimates; it underpins funding formulae; it provides insight into the wellbeing and needs of communities throughout UK society. It is also the most expensive single statistical undertaking - getting it right matters, both in terms of providing value for this spending of public money, as well as in ensuring that users base these important decisions on trusted and high-quality data."

Special Assessment of the 2011 Censuses in the UK, UK Statistics Authority, 2015.

Nôl i'r tabl cynnwys

2. Introduction

The Census 2021 results will only be of value when they are fit for purpose and when they are trusted by users as a basis for decisions. This paper builds on the How the Office for National Statistics is ensuring the 2021 Census will serve the public report. It describes our initial proposal for how we will check the Census 2021 processes and results to make sure users can rely on them.

We expect our plans to evolve in the light of:

  • engagement with our stakeholders
  • our experience from the census rehearsal in 2019 and 2020
  • greater availability of alternative data sources that can be used for quality assurance purposes

We welcome comments on our proposed approach. In particular, we welcome proposals for other analyses or data sources we could adopt in the validation of the estimates and suggestions of evidence that would provide reassurance on the reliability of the census results.

Please send any comments to census.quality.assurance@ons.gov.uk.

Nôl i'r tabl cynnwys

3. Quality assurance strategy

The quality assurance of Census 2021 data needs to:

  • ensure the census results provide a reliable basis for decision-making
  • give data-users confidence that the census results are fit for purpose
  • allow the release of census results as soon as possible
  • leave a legacy of methods, tools and skills for the quality assurance of future statistics from a transformed population statistics system

To achieve this, we adopt the strategic principles that we will:

  • run quality assurance processes to ensure that census processes have worked correctly
  • validate estimates to check that the final results are credible and consistent with other evidence
  • approach quality assurance with the "eyes of a user" and use checks and comparator data sources that a user would be likely to use
  • build trust in our methods by inviting external review of our proposed approach
  • build trust in the census results by being open about their strengths and limitations
  • work in partnership with data-users to understand what evidence would give them confidence in the census estimates
  • quantify accuracy wherever possible - this will allow us to prioritise our checks, help make evidence-based decisions on any interventions required, assess the success of the census against success criteria, and help users understand the reliability of the results
  • use flexible methods and tools so we can react quickly to unexpected anomalies in the census data
  • work closely with teams developing transformed population statistics to develop, as far as possible, standardised approaches to quality assuring these
  • use expertise across the Office for National Statistics (ONS), making sure that topic experts on subjects such as demography, housing, labour market and health are assessing the stories shown by the provisional census results in the context of other evidence and trends
  • be outward looking, sharing experiences and learning from other census-taking organisations and fostering harmonisation and efficiency by collaborating with teams working on the censuses in Scotland and Northern Ireland
  • conduct quality assurance in parallel to census data processing, starting quality assurance checks as soon as the first data comes into the ONS so we can identify issues as soon as possible
  • prioritise national and local authority population estimates by age and sex - these will be included in the first release of the Census 2021 results and are the fundamental outputs from the census
  • take explicit account of all lessons learnt from 2011

The following sections describe:

  • the initial proposal for how we will conduct our quality assurance activities
  • how we will develop this initial proposal
  • our plans for publishing information on quality of the results
  • how we expect to manage the work during the census
Nôl i'r tabl cynnwys

4. Assurance of processes

The publication of the census results is the culmination of a series of processes, from the respondent completing the form to adjusting the census database to correct for estimated undercount. We will quality assure each process that could introduce error into the final census results. These include, for example:

  • people completing their census questionnaire

  • how we correct for people who are not included on a census return

  • how we deal with people not answering all the questions

  • coding responses - for example, assigning a description of an occupation to a corresponding code

It is inevitable that these processes will introduce some degree of error or uncertainty in the data. However, it is important that we are able to identify and address any issues that might affect the usefulness of the census results.

Our approach to assuring each of the processes is to:

  • understand each process and the nature of error that might be introduced within it

  • estimate the expected size and impact of the error within each process in order to prioritise our checks effectively

  • define metrics that allow us to understand the size and nature of the error relating to each process during Census 2021

One important aspect of this work is assessing how accurately people completed their census return. In 2011, we did this by running a Census Quality Survey. This involved interviewing a sample of census respondents and checking the consistency of their answers from the interview and their census response. We will evaluate whether this remains the best approach to this work, or whether we can achieve the same result more efficiently by using other methods or sources of data.

See Annex A - Processes for further detail on this strand of work.

Nôl i'r tabl cynnwys

5. Validation of estimates

This strand of work considers the likely accuracy of the census results themselves.

For this work, we will use a suite of tools and methods that will provide standard automated checks for all areas. We will use the results of these checks to identify areas, population groups or topics where there are inconsistencies or a need for further analysis. Assurance Panels will review this evidence. They will assess whether the estimates are fit for purpose or require further investigation or adjustment.

We have a time constraint of publishing the first census outputs by the end of March 2022. This means we will need to maximise the amount of quality assurance work we conduct in parallel with processing.

While the initial focus will be on the population estimates for local authorities, we will also quality assure results for each topic included in the census.

We will look to develop a set of methods and tools for quality assuring population statistics that we could also use, for example, with the annual population estimates and new administrative data-based population statistics. Annex B summarises the planned systems and standard checks described in the following sections.

Internal plausibility and consistency

The first part of our validation checks will look at whether the census data are plausible as a stand-alone data set. We will identify:

  • unlikely demographic patterns - for example, in sex ratios, implied fertility and mortality rates, or the distribution of very high ages
  • highly unusual records or outliers, for example travel-to-work anomalies, groups of records with unlikely commuting patterns
  • groups of records with unusual characteristics, for example a high number of people with the same occupation within a small geography

Comparison of demographic statistics with other sources

The second part of our validation checks will compare the census results for households and the sex and age distribution of the population with other sources of data.

Quality assurance of 2011 Census estimates used a range of independent sources of population information as comparators. These included GP Patient Registers, Council Tax and School Census information, as well as information from the 2001 Census. We used these comparators to assess the numbers and characteristics of key population groups.

In 2021, we will aim to make best use of all sources available including:

  • the mid-year population estimates
  • administrative data-based population estimates
  • administrative or survey sources

We will use alternative sources after assessing both their statistical quality and users' likely interest in that source as a comparator for the census results. For example, we will need to be able to explain differences between the census population estimate and the number of people registered with a GP in an area. Annex C - Comparator data sources contains a list of the main data sources we are considering for use in the validation.

We are especially interested in data sources available to local authorities (such as Council Tax data) that could provide additional local intelligence and comparators to support our assurance processes at small and local area levels.

The evaluation of 2011 Census quality assurance recommended comparisons with the previous census. This is especially useful in highlighting any issues affecting small areas, for example the placement of communal establishments.

We will use information generated in the Census 2021 collection operation to understand the quality of estimates. We will use census field operation indicators, including overall return rates, local variation and other field intelligence in reviewing our analysis and checks.

Topic analysis

The third part of our validation checks will look at the results for each topic included in the census.

Our plans for these checks will reflect what users have told us are their priorities and concerns. Topic experts within the ONS will design or review the checks themselves, reflecting their knowledge of the ways in which the data will be used.

We will design specific checks for each census topic. We will use the 2011 Census data to check results for all topics covered in both censuses. At the micro data level, we will use the Longitudinal Study, which will link a sample of individual census records from 2011 and 2021 to records from other administrative data sources, to assess consistency of responses over time. This data-driven approach will be complemented by analysis by ONS topic experts to assess whether the Census figures seem consistent with other data sources and with societal changes since 2011.

We expect that checks will incorporate appropriate standardisation to ensure that effects due to, for example, changes in the age-sex structure of the population are reflected in the analyses. As an extension of this standardisation, we will use loglinear models to identify changes in relationships between multiple variables since 2011, or unusual patterns in these relationships for particular geographical areas in 2021.

Nôl i'r tabl cynnwys

6. Developing the approach to quality assurance

he development of the approach to quality assurance of the census data reflects the strategic principles outlined earlier.

Making quality assurance user-centred and transparent

We will seek the views of stakeholders in the development of our proposals for the quality assurance process. This will help us make sure that our quality assurance reflects good practice and includes, where possible, checks that other stakeholders would plan to conduct. It will also help us understand what evidence census data users need to understand the reliability of the census results. Ways in which we will seek user views and report on our plans include:

  • seeking comment on proposals from Census Advisory Groups covering main users of census results
  • seeking endorsement of the proposed approach from the Census Methodological Assurance Review Panel
  • attending groups such as the Local Authority Operational Management Group (LAOMG) that are likely to attract users of census results
  • establishing a Local Authority Engagement Group of a small number of local authorities representing a range of different characteristics, to understand their requirements and issues around the quality of census data
  • inviting comments from all users on the proposals set out in this document
  • publishing a final plan for the approach to quality assurance in autumn 2020, reporting comments and suggestions we have received from users and how these are reflected in the plan

Using expertise within the Office for National Statistics

We will make use of expertise within the ONS by:

  • seeking comment on early proposals from the Census Research Assurance Group (a group of methodological and demographic specialists from inside and outside the ONS and staff who have previously worked on quality assurance of census data)
  • taking account of experience from 2011, reflecting on lessons learnt and the evaluation of quality assurance in that census
  • drawing on expertise of topic leads and demographic experts elsewhere in the ONS in developing plans

Being outward looking

We will look beyond the ONS to develop our approach by:

  • establishing a UK Census Quality Assurance Harmonisation Working Group along with National Records of Scotland (NRS) and the Northern Ireland Statistics and Research Agency
  • using that group to share ideas and approaches and to work towards a harmonised approach to quality assurance
  • drawing on experiences and approaches of other census-taking organisations through their published reports and inviting comment on proposals, including setting up a specific quality assurance group
Nôl i'r tabl cynnwys

7. Published information on quality

Releases of census results will be accompanied by information on the quality of those results.

We expect this information to:

  • include confidence intervals for national and local authority population estimates reflecting sampling error due to estimation of the population not covered on census questionnaires
  • include a description of the size and nature of errors or uncertainty introduced through each part of the census data process
  • provide comparisons with data sources that users are likely to use as checks of the census results
  • cover all topics included in the census results
  • be accessible to all in an easily used and interpreted form

Information on planned publications is available on our Census 2021 outputs page.

Nôl i'r tabl cynnwys

8. Management of the quality assurance process

We will manage the quality assurance process as follows.

The joint heads of census quality assurance will be responsible for agreeing and co-ordinating the work of the various teams involved in quality assurance of the data. These teams will be:

  • the Process Quality Assurance team - ensuring that the work described in Assurance of Processes is completed, working closely with census teams responsible for each process
  • Topic Quality Assurance Groups (housing, labour market, education, health, income, ethnicity, national identity, language, religion, gender identity and sexual orientation), which will be informal groups bringing expertise from across the ONS
  • Population Estimates Quality Assurance team - ensuring that the main demographic statistics are plausible for England and Wales and its constituent areas

The Population Estimates Quality Assurance Team will be closely linked with the teams working on the estimation and adjustment for under-enumeration in the census and draw on expertise from Population Statistics Division, which produces the official population estimates, and teams working on transformed population statistics

Each of these teams will be responsible for producing a stream of evidence on the reliability of the census processes and estimates.

If we identify any data issues during the operational period (that is, while we are collecting or processing the data), we will raise them at a daily Data Quality Management Forum. This forum will assess whether there is any need for a change in how the processes work.

We currently propose that Quality Assurance Panels would consider the final census estimates (population estimates for each local authority, and results for each topic covered in the census). These panels would consist of a small number of ONS experts on that subject who were not involved in the quality assurance process.

We would pass the recommendations of these panels - whether to accept the data or whether an intervention is needed - to a High-level Quality Assurance Panel containing ONS staff and external experts, for agreement. We would then give the final conclusions of this High-level Quality Assurance Panel to the National Statistician as the evidence needed to obtain final sign-off of the data.

Nôl i'r tabl cynnwys

9. Annex A – Processes

This annex describes the approach to understanding where error or uncertainty in the data could be introduced within the census process. Other aspects of operational quality or progress towards programme quality goals are not discussed in this paper.

Approach

The initial approach required that we understand where processes impact on the data and what concerns there might be around that impact. By doing this, we can ensure that quality is considered throughout the development of methods and processes.

We then considered the complexity and impact on the data of each process, which allowed us to categorise the processes into themes. These themes are:

  • H - there is a highlikelihood that the process may introduce error
  • M - there is a mediumlikelihood that the process may introduce error
  • L - there is a lowlikelihood that the process may introduce error
  • C - the service has little impact on the data or is controlled via key performance indicators

This helps us prioritise our quality assurance work. Understanding any concerns and possible impacts also allows us to identify the mitigating actions needed to reduce the risk of error or the impact of an error. This also means we can identify the appropriate actions to take if error does occur.

The remainder of this annex outlines initial proposals for the quality assurance of each process.

Address Frame

There are two aspects of the Address Frame that are of particular importance to the accuracy of the census results, even after we collect the data. The first is that census returns are accurately georeferenced (that is, an address being correctly assigned to a geographical point). The second is that the Address Frame's coverage and classification of household and communal establishment addresses is sufficiently accurate to avoid systematic inaccuracies in the type and location of imputed records.

We will base the quality assurance of this geographical information on quality standards for the completeness of the Address Frame and the accuracy of the georeferencing. This assurance is likely to include:

  • a field address check covering many addresses in each local authority
  • desk research for larger communal establishments
  • evaluation of the accuracy of the georeferencing, and, if possible, identification of any large communal establishments that cross local authority (or other area) boundaries

Quality assurance following the census data collection might include:

  • checks of census Output Area (OA) population estimates against those implied by the Address Frame
  • checks estimates of communal establishment population within a given OA against those for 2011, for that and neighbouring OAs (to identify whether a large communal establishment might have been missed or assigned to the wrong OA)
  • evaluation of information (for example requests for additional census questionnaires from respondents or notification of apparently unoccupied dwellings) from the census collection operation

Data collection

Completeness of enumeration

While the census process is designed to adjust for non-response, the quality of results at a local level may be compromised if there is a failure to conduct the enumeration effectively in that area. This could, for example, result in a very poor response rate within an area or a large communal establishment being missed.

Any such problems may be identified as part of the response management work conducted within the census operations. The Quality Assurance team will be part of a Data Quality Management Forum which would discuss these issues. This would ensure that the team is aware of issues that might be relevant for later quality assurance work.

Accuracy of collected data

We know that the information collected on a census questionnaire is not always correct. For example, the respondent may have misunderstood the question, accidentally selected the wrong option, or provided incorrect information as a proxy response for someone else in the household. The obvious approach to understanding the scale of these errors is to compare the information from a particular census questionnaire with that collected for the same household or person in another source.

To estimate the expected scale of this error, we can use:

  • the results of the 2011 Census Quality Survey
  • the analysis of the linkage of the 2011 Census data with the Longitudinal Study
  • information from testing of the 2021 questions and collection tools

Possible approaches in 2021 include:

  • linking census returns to a 2021 Census Quality Survey - allowing coverage of all the census questions
  • linking census returns to the Census Coverage Survey - providing a larger sample for a subset of the census questions.
  • linking census returns to other data sources, including the 2011 Census and the Longitudinal Study - this would be likely to provide a large number of matched records, but a limited set of variables that could be compared
  • analysis of 2021 data by mode of collection, focusing on possible quality effects related to mode
  • outlier detection methods and the identification of spurious (that is, clearly fictional) responses

Once we have estimated the likely scale and nature of this error, we can estimate the likely impact of the error. If, for example, 1% of respondents answer one of the questions incorrectly, this may have a large impact if this substantially distorts the estimates for a small group. Alternatively, it could have a very small impact if the individual errors largely "cancel each other out" when aggregating results.

Data load and coding

Returned paper questionnaires need to be scanned. Both paper and online returns need to have responses to some questions coded. Both of these processes risk introducing error.

To estimate the expected scale of this error we can use:

  • the evaluation of the 2011 capture and coding
  • the quality standards agreed with the supplier for this work
  • the results of the 2011 Census Quality Survey (as part of the total error in responses)

Possible approaches in 2021 include:

  • checking a sample of returns to understand the rate of error in capture and coding and the "transition matrix" of errors (this will be important in understanding any potential bias in the results)
  • evaluation of the completeness of the coding lists (for example, how many write-in answers could be automatically assigned to an existing code)
  • using the results of the dummy imputation exercise described in Item imputation to identify possible systematic errors
  • checks on specific identified risks (for example "major" in job title being interpreted as military officer when not appropriate)

Resolution of multiple records and removal of false persons

The resolution of multiple records process seeks to identify and remove incorrect census returns that relate to a person already correctly covered on another return in the same location. The removal of false persons process removes census returns that do not contain sufficient information to be treated as a response.

To estimate the expected scale of this error, we can use evidence from the linkage of the 2011 census data with the Longitudinal Study, together with assumptions on changes in response patterns as a result of greater use of online or individual response.

Possible approaches in 2021 include:

  • analysis of the linkages of the census returns with other data sources
  • analysis of names and other variables to identify false people not identified in the removal of false persons process

Item imputation

Item imputation is the process of imputing for a missing or clearly incorrect value on a census return. It is conducted in the census using a donor-imputation methodology. Quality issues around item imputation include:

  • predictive accuracy - how accurate the method is in imputing the correct value for an individual
  • distributional accuracy - how accurate the method is in reflecting the true multivariate distributions of variables
  • "spikes" - where the same donor is selected for multiple cases of missing values when that donor has rare characteristics

To estimate the expected scale of this error we can use the evaluation of the 2001 and 2011 imputation processes. This involves looking at both the rate of imputation for each item and the accuracy of that imputation.

Possible approaches in 2021 include:

  • comparing imputed values against those collected in the Census Coverage Survey, Census Quality Survey and other data sources
  • measuring the effect of randomness in the application of the imputation method
  • dummy imputation on 2021 data - applying the observed pattern of missingness to records without missing data, applying imputation to those records, and comparing the results with the collected values

For each of these approaches, it will be important to consider impacts on multivariate distributions rather than looking at each variable in isolation.

Coverage matching, estimation and adjustment

These processes are closely related. The coverage matching process matches census responses with those from the Census Coverage Survey. The estimation process uses that matched data to estimate the number of people not included on a census response. The adjustment process adjusts the census database to provide the best estimate of the population of England and Wales.

We will monitor diagnostics from each of these processes to ensure that the processes are performing as expected. The adjusted estimates resulting from these processes will be quality assured as described in Validation of estimates.

Statistical disclosure control

The statistically adjusted census data is subject to two statistical disclosure control processes before publication. These processes are targeted record swapping, where the geographical location of some records is changed, and post-tabulation cell key perturbation.

We can estimate the expected scale of the error introduced by record swapping, by looking at the evidence from 2011 in conjunction with any change in the parameters and rules being applied in this process.

The approach to quality assurance in 2021 is expected to be a combination of the above and a simple comparison of the adjusted figures with the unadjusted.

Cell key perturbation is a statistical disclosure control method that introduces small adjustments to cells within output tables. We can confidently estimate the expected scale of error introduced by the post-tabulation adjustment from first principles - that is, derived mathematically from the properties of the perturbation algorithm. As with record swapping, the quality assurance approach in 2021 is expected to be a combination of the above and a simple comparison of the adjusted figures for a particular set of tables with the unadjusted figures.

Create, amend or revise geographies

Whilst provisional 2021 OA boundaries will be derived before the census data is collected, the final boundaries will be derived using the census database following coverage adjustment. We will check the final set of OAs to ensure they meet the required size thresholds.

Published outputs

Published outputs will go through a thorough and standard quality assurance process. We will use an independent system to check tabulations are consistent with the census database and that tables have been correctly specified.

Nôl i'r tabl cynnwys

10. Annex B – Validation tools and checks

The tools described below will allow analysis on the data as they stand, or stood, at any point in time or processing cycle. Standard checks are identified for each system, but these will be a starting point for further analysis, as required. In particular, we will develop specific checks for each topic covered in the census.

Nôl i'r tabl cynnwys

11. Annex C – Potential comparator sources

The table below lists the main comparator data sources that are planned for use in the validation of Census 2021 data. We are also looking to use a range of ONS survey data sets to help validate topic results.

Nôl i'r tabl cynnwys