1. Why population and migration statistics are important

Population estimates are one of the most critical statistical outputs produced by us, at the Office for National Statistics (ONS), who produce estimates for England and Wales and work alongside the devolved administrations to produce UK-wide estimates. They are used for local funding allocations and to inform decisions by government, such as migration policy. They also underpin many other statistical outputs, as survey weights or denominators.

We are being ambitious, working to provide more timely statistics that reflect changes in society and meet the needs of users. We are establishing methods which make greater use of administrative data (collections of data maintained for administrative reasons such as registrations, referred to throughout as “admin” data) to create improved population estimates.

These admin-based estimates will be updated regularly to provide timely estimates on changing populations. The statistical processes involved in producing our new population and migration statistics, alongside their limitations, and plans related to revisions and quality are outlined below, in addition to our planned future development work.

It is crucial that our statistics reflect what our users need. In line with our User engagement strategy for statistics, we will produce and disseminate statistics which benefit everyone, improve lives and help solve society’s biggest problems. We aim to inform and help all users, and encourage non-users, to ensure they understand the statistics as they evolve and have confidence in our population and migration statistics.

Nôl i'r tabl cynnwys

2. Why we are evolving population and migration statistics

A survey-based census has taken place every 10 years since 1801 (except for 1941). Since the mid-1800s, the population has been estimated each year by starting with the census, ageing the population on by one year, adding births from birth registrations, subtracting deaths from death registrations, and adding or subtracting net migration.

There are further adjustments for specific groups of the population using data from administrative systems, for example using Ministry of Defence data for armed forces and Higher Education Statistics Agency (HESA) data for students. This method is known as the cohort component method (see our Population estimates for England and Wales methods guide for more information). However, every time there is a new census, we find that these population estimates have “drifted” from the census estimate of the population. This means that the accuracy of population estimates reduces the further we move away from a census.

In recent years, improvements to computing power and the availability of admin data sources (for example, tax and benefits records, visas issued, and NHS data) could mean that we can use new approaches to estimate the population to a more consistent level of accuracy, in greater detail, more frequently. While we have already been using admin data to measure the population, for example births and deaths’ registrations, the Personal Demographic Service (PDS) and HESA data, our aim is to make greater use of admin data to improve the quality, timeliness and inclusion of our population statistics.

This methodology article sets out the new statistical design of how administrative sources will be used to produce accurate and timely admin-based population estimates (ABPEs). This involves several steps including acquiring and preparing admin sources for statistical purposes, producing separate estimates of international and internal migration, and then applying statistical modelling techniques including coverage adjustment. This is shown in Figure 1, and then described in Section 3: Overview of statistical design for future statistics. Finally, Section 4: Producing population estimates explains how these components are brought together to produce the ABPEs.

Nôl i'r tabl cynnwys

3. Overview of statistical design for future population and migration statistics

Administrative data

Building on the Statistics and Registration Service Act 2007, the Digital Economy Act 2017 provides a legal gateway for us to access data held by public authorities and commercial undertakings to support the production of official and accredited official statistics, including the census. These data will be accessed for statistical purposes only and personal information will be removed during the processing so that individuals cannot be identified.

At the heart of our statistical design is the acquisition and use of a range of data sources to cover the population and its characteristics from children; for example, birth registration and school censuses, to students where further and higher education datasets are used. HM Revenue and Customs (HMRC) and Department for Work and Pensions (DWP) tax and benefits data are used to cover people of working age and pensioners, while Home Office (HO) data are used to cover special populations such as migrants, refugees, and asylum seekers. As with the current population statistics system, we also continue to use NHS health registration data, which provides good coverage of the population across all age groups. Further details of datasets used in our research and statistics are provided in our Data source overview report.

It is important that we understand how new data sources can be used to measure the population, such as mobile phone data or advanced passenger information data (the data provided to airlines before travel), to improve the quality, timeliness and inclusion of our population statistics. Therefore, we work closely with data suppliers to develop our understanding of these data sources (including any changes over time that may affect content and quality). We use a variety of approaches, including working groups and secondments, where our staff work together with data experts in the supplier organisations. We currently have secondments in place with the DWP and the HO.

Linkage and de-identification

Using several data sources together relies on having these data sources integrated and accessible in a consistent and secure way. This underpins our statistical design. To create outputs that meet user needs, we need to use the best available linkage methods and have a robust understanding of the linkage quality. Data or record linkage is a method of bringing together information about the same person or address from different sources to create a new, richer dataset. Data linkage is now commonly used for improving data accuracy and quality over time, to allow the reuse of existing data sources for new studies, and to reduce the cost and effort in data collection.

Once the data are linked, identifying information (such as a person’s name and address) are removed from the data set used in further processing. The reference data management framework (RDMF) is our model for handling data securely and consistently for linkage purposes. Our data strategy provides further information on the RDMF and how it fits into our plans to develop data capabilities.

The RDMF is made up of five indexes that link and match data on addresses, businesses, classifications, demographics and location. One of these indexes is the Demographic Index (DI), which combines records from health, tax, education, and the births register to provide an ever-registered population that may include people who are not resident. The DI does not include any demographic or characteristic information, it simply links records and references the data by removing personal identifiers and replacing with a unique Office for National Statistics (ONS) identifier for further use.

The methods used are a mixture of deterministic and probabilistic methods. Deterministic is the approach that uses exact matches. For instance, James Andrew Smith, date of birth (DOB) 11 July 1984, is the same on both records, while probabilistic methods consider the likelihood of two records being a match based on a set of criteria such as common nicknames such as Jim Andrew Smith DOB 11 July 1984. By developing our linkage methods in this way, we can improve both the quality and our understanding of the quality of our data linkage and how it is likely to affect our statistical outputs.

Filtered dataset of the usual resident population

The DI on its own simply provides data on people who have ever registered within the administrative systems used. It is not intended to only include the usual resident population that we want to measure.

We apply a set of activity-based rules using ”signs of life” across health, education and income sources, to filter records from the DI to approximate the usual resident population at a reference date (for example, the midpoint of the year at 30 June). We also use those rules to help determine demographic information such as age, sex and location. The dataset created through this process is referred to as the Statistical Population Dataset (SPD).

People lead different lives and interact with services differently, so there will be variations in how people are recorded on administrative systems. There will be both undercoverage (where people are not included) and overcoverage (where a person may be recorded more than once, or should not be included; for example, if they have emigrated). Our Transforming population statistics article shows the need for a robust coverage adjustment approach to produce admin-based population estimates (ABPEs) by local authority (LA), age and sex, to the required quality.

As well as providing a main input for the ABPEs, the SPD provides a population spine that enables us to link across characteristic attributes such as ethnicity. This has the potential to provide the foundation for estimates of those characteristics once we have completed development of our coverage adjustment process to support this.

Admin-based international migration estimates

International migration is defined as someone who changes their country of usual residence for 12 months or more. To produce estimates of international migration at a UK level, we use a combination of data sources and methods, selecting the best data source for each group. The methodology to derive the latest admin-based international migration estimates (ABMEs) in our Long-term international migration bulletin uses different admin data sources and methods from the ABPEs to produce migration estimates at a UK level.

To estimate non-EU migration, Home Office Borders and Immigration (HOBI) data provide information about the numbers of people arriving from non-EU countries who require a visa to move to the UK long term. Our EU migration estimates are derived from the Department for Work and Pension’s Registration and Population Interaction Database (RAPID). This provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance number (NINo). See our Long-term international migration: quality assuring administrative data (QAAD) for more information on data sources.

For British national migration estimates, we use International Passenger Survey data, as the complexity associated with identifying British migrants in admin data means we cannot use such data at this time. However, we are continuing to explore potential sources of data that capture actual migration behaviour. For further information on the ABME methodology, see our Technical user guide.

International migration estimates are produced at the UK level, with further methods required to produce estimates at LA level, by single year of age and sex.

The ABMEs are an important input to our model for producing ABPEs. This model, alongside births and deaths takes the population and migration inputs and produces coherent population and migration (stocks and flows) statistics.

Internal migration estimates (including cross-border flows)

Estimates of internal migration and cross-border flows are also inputs to the ABPEs. Internal migration describes moves between LAs in England and Wales. Cross-border flows are moves between England and Wales and the rest of the UK and are agreed with devolved administrations.

In the current system, we use health data to produce internal migration and cross-border flow estimates. They are derived from the Personal Demographic Service (PDS) data, which records when people change their address on NHS systems (for example, with their GP). We also link the PDS to Higher Education Statistics Agency (HESA) data to better identify moves made by higher education students to and from places of study as these moves are less well captured in health data alone.

We use the internal migration and cross-border flow estimates produced for the mid-year estimates as the input for the ABPEs.

However, to produce timelier provisional ABPEs (six months after the reference period), we have developed alternative internal migration estimates to use the data available at that point. These PDS-based internal migration estimates are scaled using the ratio between previous years’ PDS-based estimates and mid-year population estimates of internal migration to ensure consistency in the timeseries and adjust for moves less well captured by the PDS alone. These internal migration estimates are then updated with HESA data for the updated ABPEs available the next summer.

Nôl i'r tabl cynnwys

4. Producing population estimates

Admin-based population estimates

The admin-based population estimates (ABPEs) are produced by bringing together a range of admin and other data sources and applying statistical modelling techniques. For further information, see our Understanding mid-year admin-based population estimates for local authorities in England and Wales article. The statistical model is referred to as the dynamic population model (DPM), which uses available information on the usual resident population (stocks) and movement into and out of the population (flows) at specific points in time. Similarly to the census-based mid-year estimates, the DPM is also based on the approach set out in our Population estimates for the UK methods guide. However, it uses a range of sources to estimate the stock population each year rather than use the decennial census as a baseline.

To produce the ABPEs, we start with extracts from administrative systems at specific points in time. To produce mid-year ABPEs, the DPM utilises the available information on the usual resident population as close to the mid-year reference period as possible. This uses data on stocks in addition to data that show changes in the population over time. The stocks data include data in the SPD, and census-based mid-year estimates. The flows data include births, deaths, international migration (ABMEs), internal migration, and cross-border flows.

The DPM balances stocks and flows to produce a coherent set of population and migration estimates. If the information on stocks and flows are not consistent, we use information about the uncertainty of the estimates to help determine the most likely true stocks and flows (for example, that the change in population stocks over time is equal to the net flows).

The DPM has advantages, particularly its flexibility which will improve the quality of the statistics. It can account for quality limitations in the underlying data sources and use strengths across the wide range of sources being used. The statistical models can account for underlying demographic trends and differing levels of coverage and uncertainty associated with input data. It can also incorporate other data sources as and when they become available. This could be helpful to address the challenge of some admin data sources being more reliable to measure particular population groups than others, for example, older people being more likely to interact with healthcare systems, or a local data source could more accurately measure local populations than a national-level data source.

Coverage adjustment

To produce accurate population estimates, it is important to have unbiased population stock estimates for each year broken down by local authority, single year of age and sex. Coverage adjustment is required to address gaps or duplicated records in the admin data sources.

We are currently exploring methods for coverage adjustment, including using admin data sources. Work to date has focused on applying dual system estimation (DSE) to available admin data sources as a possible approach. DSE is a well-recognised and established method typically used to ensure that estimates resulting from a census have maximum coverage. It uses a coverage survey following the census to estimate how many people responded. We are currently considering if a similar method could be applied with different sources of admin data. Further work could cover approaches, such as multiple system estimation, use of additional admin data sources and potentially the use of surveys. In the meantime, we will continue to use Census 2021 results to apply coverage adjustment to the ABPEs and will continue consulting with additional methodological experts as we develop our methods.

Nôl i'r tabl cynnwys

5. Local area population estimates

Our users have told us of the need for more frequent, detailed statistics about the population, including information at lower levels of geography on a more regular basis. Local authorities need small area estimates to plan and monitor services, service use and uptake more accurately.

Like the census-based mid-year estimates, our current methods for local area population estimates roll the population forward and use administrative sources to estimate the population change each year. We are exploring options for improving our methods using a wider range of administrative sources and using geospatial data. The geospatial approach shows good potential as it relates information about the infrastructure on the ground to population density and doesn’t necessarily rely on the census. We will be publishing investigatory research outputs later in the year. See our Population estimates quality and methodology information (QMI) report for more information on our methods.

We have focused on the use of statistical models for producing population and migration statistics and the potential for geo-spatial models for local area population estimates. Both are examples of evolving the current methods based on the current definitions (mid-year). Using admin and other types of data opens opportunities to measure temporary resident populations such as "daytime populations" and a diverse range of migration patterns such as seasonal migrants. We have produced experimental research outputs on these alternative definitions showing how we are responding to the needs of a rapidly changing population. Our Population and migration estimates – exploring alternative definitions article provides further examples of this research, including Estimating population by time of day.

For local area population estimates, we need to consider our statistical disclosure control methods in our future design to maintain confidentiality in our outputs. In December 2023, we published our Disclosure control proposal for Future Population and Migration Statistics.

Nôl i'r tabl cynnwys

6. Bringing data sources together using longitudinal data

The previous sections outline our use of admin data for measuring and analysing the population size and structure at a point in time. We can also use admin data to look at the outcomes of specific population groups over time (longitudinally). Information on longitudinal outcomes helps users to inform decision making, policy development and target services to address inequalities. It also helps users to assess the effectiveness of interventions over time and adjust as needed to improve people's lives and outcomes. Currently, the Office for National Statistics (ONS) Longitudinal Study contains census data linked to life events data for 1% of the population. We are looking to build on this approach by investigating linked admin data sources which might help researchers to explore societal outcomes over time, such as for refugees. Beginning with Census 2021 data and updated to account for births and deaths as well as internal and international migration, we hope to extend the coverage beyond the 1% in the current study.

Our next steps will be to produce a ”proof of concept” Longitudinal population dataset (LPD, previously referred to as census data asset), a longitudinal representation of the population of England and Wales that will support the creation and maintenance of a range of longitudinal cohort studies, such as veterans. Initially, this will incorporate our ONS 1% longitudinal study and Refugee integration outcomes study (RIO) before expanding to include other cohort studies.

Nôl i'r tabl cynnwys

7. Strategy for quality and uncertainty

We have updated our Population and migration statistics transformation in England and Wales quality strategy, which outlines our approach for assessing and reporting on the quality of the data sources (input quality), data processing (process quality) and the resulting final statistical outputs, ensuring they meet user needs (output quality). We have published an Admin-based population estimates for England and Wales quality and methodology information (QMI) report, which details the strengths and limitations of the data, methods used, and data uses and users.

To develop high quality methods, during our research, we have investigated current best methods used across other National Statistical Institutions (NSIs) and academia, and developed new radical methods where required. We have secured quality assurance and peer review from additional experts and have received favourable feedback.

Our methodological work has been presented to our independent Methodological Assurance Review Panel (MARP), chaired by Sir Bernard Silverman Emeritus Professor of Statistics at the University of Oxford and consisting of a panel of recognised experts. MARP has provided external, independent assurance and guidance on our developing methods and all papers presented to the panel are available on the Papers section of the UK Statistics Authority website.

As with all statistical estimates, the published admin-based population estimates (ABPEs) are subject to uncertainty related to the measurement of population stocks at specific points in time and the components of population change over time. Alongside the estimates, we include credible intervals that provide the range of plausible values and illustrate the uncertainty around the estimates (see Section 11: Glossary for a full definition of credible intervals). We are working towards building measures of uncertainty for all our statistical outputs, identify how well these measures are understood and how they can help users interpret the statistics.

Nôl i'r tabl cynnwys

8. Ethical considerations

Our research and processes are ethically and legally compliant, demonstrated by completed ethics assessments for each research project. We published an equality impact assessment on 29 June 2023 to fulfil the requirements of the Public Sector Equality Duty as set out in section 149 of the Equality Act 2010.

Nôl i'r tabl cynnwys

9. Revisions

To provide the most up-to-date data using admin sources, we initially produce provisional estimates (six months after the reference period), and update these as more data become available, the first update being 12 months after the reference period. Revisions are a standard part of producing timely estimates.

While we have always recognised the potential need for revisions to our estimates, reflected in our existing revisions policy and the Code of Practice, the transition to a new population and migration statistics system means that how we think about them is changing (see our Statistics transformation article for more information). The replacement of survey-based data sources with admin data as the main source for migration and population statistics gives us both the opportunity for an improved regular schedule of provisional and updated estimates and a different context for other revisions.

As population statistics evolve, the cycle of revisions may need to be amended. It is important we understand the impact of any changes. Please get in touch if you would like to be involved in any discussions.

Nôl i'r tabl cynnwys

10. Future developments

We have outlined our statistical approaches for producing timely, inclusive population statistics that are important for decision making. We acknowledge there are several challenges and opportunities that influence our future research plans including:

  • working with data suppliers to mature the processes and systems used to ensure timely and reliable supplies of the data we currently use, including the automation of delivery processes and the strengthening of data sharing agreements

  • investigating new data sources for use in the future; for example, exploring the potential from tax and benefits data and mobile phone data to improve our internal migration estimates

  • developing our statistical models including a robust coverage adjustment strategy

  • delivering coherence across the full range of population and migration outputs including admin-based population estimates (ABPEs), admin-based migration estimates (ABMEs), statistics about households and information about population characteristics

  • providing an admin-based census for approved researchers (see the Integrated data service access page) which will contain de-identified person level records by age, sex, Lower level Super Output Area (LSOA) and Demographic Index reference

  • developing coherence across our population and migration estimates for England and Wales and across the UK, considering our best approach for producing consistent migration estimates from our population model, with an aim to publish international migration and internal migration and cross-border flows separately

  • developing our criteria on which to base our decision to use ABPEs as our official estimates of the population which we will do in consultation with methodological experts and users of the data, and will publish these criteria later in the year

  • ensuring harmonisation and consistency of definitions across our data sources and outputs; for example, we are working across the government statistical service (GSS) to promote the adoption of their standards, ensuring greater usefulness of statistics

  • working with the Office for Statistics Regulation (OSR) on the assessment of our admin-based population estimates and working to ensure that these, and our long-term international migration estimates, meet the standards expected of accredited official statistics by summer 2025

  • addressing user feedback, including recommendations from the OSR report

User needs are central to us achieving these aims. Understanding how well our statistics meet user needs is essential for informing our workplans. We continuously seek this feedback in various ways such as through our published research, public consultations, conferences, webinars and detailed meetings with stakeholders. If you would like to hear more or get involved, let us know by contacting us at pop.info@ons.gov.uk.

Nôl i'r tabl cynnwys

11. Glossary

Activity-based rules or signs of life

“Activity” can be defined as an individual interacting with an administrative system, for example, for National Insurance or tax purposes, when claiming a benefit, attending hospital or updating information on government systems in some other way. Only demographic information (such as name, date of birth and address) and dates of interaction are needed from such data sources to improve the coverage of our population estimates.

“Activity” data can also be referred to as “interactions’, “signs of life” and “signals” data.

Admin-based population estimates

Admin-based population estimates (ABPEs) are population estimates derived from administrative sources, such as NHS health registration data, Department for Work and Pensions tax and benefits data, and Home Office data.

Admin-based migration estimates

Admin-based migration estimates (ABMEs) bring data together and estimate how many people arriving and departing in a specific period are long-term international migrants. Our ABMEs are used to estimate EU and non-EU nationals migrating to and from the UK and are a core component of population estimates.

Administrative data

These are data that people have already provided to government, for example, when accessing public services. Some of these data could be re-used by us to produce statistics about the population.

We have been using administrative data for many years. For example, annual births and deaths registrations are used, as well as NHS patient registrations, to roll forward the population estimates between censuses.

Coverage adjustment

This is a method that adjusts the population estimates to account for the fact that some admin sources will include people who are not usual residents and others may be missed. Coverage adjustment is essential to produce accurate population estimates.

Credible interval

A credible interval gives an indication of the uncertainty of an estimate from data analysis. The 90% credible intervals are calculated so that there is a 90% probability of the true value lying in the interval.

Demographic Index (DI)

Integrates education, health, and tax and benefit administrative data to provide a composite data source of the population interacting with administrative data sources.

Dual system estimation (DSE)

This is a statistical method that was used to adjust census estimates to ensure that census outputs included people who were estimated to be missed from the census. It is based on the use of a separate Census Coverage Survey (CCS), which takes a sample of addresses and estimates the number of people who have been missed by the census. This is one of the approaches being considered for a suitable coverage adjustment method for the ABPEs.

Dynamic population model

A dynamic population model (DPM) is a statistical modelling approach that uses a range of data to measure the population and population changes in a fully coherent way.

Higher Education Statistics Agency

Higher Education Statistics Agency (HESA) whose data contain a list of students who are registered on to a Higher Education course in England and Wales.

Flows

This refers to population change between two time points, because of births, deaths and movements of people through internal and international migration.

Internal migration

This refers to residential moves between different geographic areas within the UK. This may be between local authorities, regions or countries of the UK. It excludes moves within a single local authority, as well as international moves into or out of the UK.

International Passenger Survey data

The International Passenger Survey (IPS) collects information about passengers entering and leaving the UK.

International migration

We use the United Nations (UN)-recommended definition of a long-term international migrant: “a person who moves to a country other than that of his or her usual residence for a period of at least a year (12 months), so that the country of destination effectively becomes his or her new country of usual residence.”

Personal Demographic Service (PDS) data

Personal Demographic Service (PDS) from NHS Digital. A national electronic database of NHS patients, which contains only demographic information with no medical details. The PDS differs from the Patient Register (PR), since it is updated more frequently and by a wider range of NHS services. The PDS data available to us consist of a subset of the records, including those which show a change of postcode recorded throughout the year or a new NHS registration.

Registration and Population Interaction Database (RAPID)

Registration and Population Interaction Database (RAPID) is a database created by the Department for Work and Pensions. It provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance Number (NINo).

Reference data management framework (RDMF)

The framework used by the Office for National Statistics (ONS) for linking data. The RDMF enables us to separate the data linkage function (where identifiers such as name and address are used to link datasets) from subsequent data processing (where de-identified linked data is then used).

Statistical design

Statistical design is any aspect of the process to produce and assure statistics. In the context of FPMS, this is all aspects of how we get from input data sources (including what these sources are) to the outputs we publish, including ensuring they meet user needs (in terms of relevant, timeliness and quality). It includes:

  • choice and detail of statistical methods
  • choice of data sources, and how the quality of those sources is managed or measured
  • approach to linkage between data sources
  • any modelling or other assumptions
  • choices in decisions which affect quality (for example, sample sizes and response targets)
  • choices about assurance and approach to governance

Statistical Population Dataset (SPD)

A single, coherent dataset that forms the basis for estimating the size of the resident population. It is produced by linking records across multiple administrative data sources and applying a set of inclusion and distribution rules.

Stocks

Refers to a snapshot of the population at a point in time.

Usually resident population

We are currently adopting the UN definition of “usually resident” – that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).

Nôl i'r tabl cynnwys

13. Cite this methodology

Office for National Statistics (ONS), released 15 July 2024, ONS website, methodology, The future of population and migration: a statistical design

Nôl i'r tabl cynnwys

Manylion cyswllt ar gyfer y Methodoleg

Population Statistical Design team
pop.info@ons.gov.uk
Ffôn: +44 1329 444661