1. Introduction

In this report, you will find information about the administrative and commercial data sources that have been used by the Office for National Statistics (ONS) in the development of population and migration statistics in England and Wales. This includes the work undertaken as part of the Census and Data collection Transformation Programme (CDCTP). Administrative data are collected by government and other organisations primarily for administrative (not research or statistical) purposes, such as registration, transaction and record keeping, usually for the provision of public services. 

The use of administrative data is enabling the ONS to improve its population and migration statistics, delivering against the core principles of the UK Statistics Authority Strategy, Statistics for the Public Good. The research to develop these improved statistics will inform the UK Statistics Authority’s recommendation to government in 2024 on the future of population and migration statistics in England and Wales.  

The research covers various topics of significant importance to users, including: 

  • population and migration statistics

  • population sub-groups and characteristics

  • households and living arrangements

  • housing and housing characteristics

  • longitudinal analysis and outcomes

Further information about our research on these topics and the administrative data used is provided on our Research outputs using administrative data page and in our Population and migration statistics transformation in England and Wales, population characteristics update: 2023 article. To enable this research, administrative data are acquired from organisations outside of the ONS. This involves us understanding the needs of our users, the public, and governing bodies so that we can find the best data to use for future statistical development. As we continue to evolve our research, we will work to identify new data sources that will support improvements to our official statistics that are used for the public good.

Administrative data are acquired in line with our Data acquisition policy and Data ethics policy. The data are subject to robust controls to ensure that individuals cannot be identified. The ONS does not share or disclose any personal information. Furthermore, ONS complies with all data protection legislation, including the General Data Protection Regulation, the Data Protection Act 2018, the Statistics and Registration Act 2007 and the Digital Economy Act. Further information, including ONS's privacy statement and data protection policy can be found on our Data protection page.

Back to table of contents

2. Overview of administrative data sources

The sections that follow provide an overview of the administrative data sources, including information about why a source has been used in our research and its importance. Data sources have been chosen based on their coverage of the population, how well they capture the required attributes of the population, and their quality. The sources are important for statistics that are used to ensure there are the right services and associated infrastructure to support the current and future population. This includes health, education, employment, housing, transport, retail, and recreation services. The statistics are also essential for understanding and addressing inequalities across regions and groups of the population. 

Back to table of contents

3. Health data sources

Birth and death registrations, and birth notifications  

The Local Registration Service in partnership with the General Register Office (GRO) record all births and deaths in England and Wales. The data are collected under the Births and Deaths Registration Act 1874, where there is a legal duty for parents to register the birth within 42 days. 

Births are also recorded by a midwife or doctor, which is generally done soon after a baby is born (a birth notification). These data are timelier than birth registrations and include other information, such as ethnicity. 

Births and deaths data are essential for research into administrative-based population estimates, as they account for population change because of natural causes. The data are also used for ethnicity statistics.  

Further information about the births and deaths data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD), and Census 2021.  

Hospital Episode Statistics and the Emergency Care Dataset 

The Hospital Episode Statistics (HES) and the Emergency Care Dataset (ECDS) record attendances, appointments, and admissions to NHS Hospitals in England. An extract of the data (excluding information about health) is used along with the Personal Demographics Service (PDS) to ensure our administrative-based population estimates adequately capture the resident population of England through the population's interactions with health services. The data also include information about characteristics of the population, including ethnicity, so are an important source for administrative-based ethnicity statistics. 

Further information about the HES and ECDS data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD), and Census 2021. 

Patient Episode Database for Wales and Emergency Department Data Set Wales  

Patient Episode Database for Wales (PEDW) and the Emergency Department Data Set (EDDS) include information on attendances, appointments, and admissions to NHS Hospitals in Wales. In common with HES and ECDS, the datasets are important for our population and ethnicity statistics, providing coverage for Wales. 

NHS Talking Therapies data set

The NHS Talking Therapies data set previously known as Improving Access to Psychological Therapies (IAPT) contains information about the population that has accessed NHS-commissioned adult psychological therapies and services in England. Data on ethnicity from Talking Therapies are used in combination with other sources to produce admin-based ethnicity statistics. 

Further information about the data and their quality, is provided in our Producing admin-based ethnicity statistics for England: methods, data and quality article. 

Personal Demographics Service 

The Personal Demographics Service (PDS) contains demographic data for those who have interacted with an NHS Service in England, Wales, and the Isle of Man, including through GP practices and hospital visits. PDS data have been used by the ONS since 2016. Prior to using PDS the Patient Register (PR) was used, which was updated from GP registrations. 

The PDS provides information on the resident population in England and Wales through people's interaction with NHS services. Despite known time lags and some known coverage limitations (for which we apply statistical methods to account for), it is one of the most important sources for our administrative-based population estimates. It is also a long-established source for capturing population moves between local authorities and across the countries of the UK for our existing population National Statistics.  

Further information, about the PDS data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021. In addition, our report on Understanding quality of linked administrative data sources in England and Wales, using the 2021 Census – Demographic Index linkage provides information about the PDS when compared with Census 2021.

More information about health data sources that are used by the Office for National Statistics (ONS), along with ONS's health data policy, is available on our Sources of data page. The page also includes information about the use of the data for health statistics, including statistics on the impact of COVID-19.

Back to table of contents

4. Housing data sources 

Valuation Office Agency property attributes data 

The Valuation Office Agency (VOA) is an executive agency, sponsored by HM Revenue and Customs (HMRC). Since the 1990s, it has been responsible for banding dwellings liable for Council Tax (CT) in England and Wales. To fulfil this function, VOA collects data on property attributes for residential properties. This includes information on property type, number of rooms and floor area, which is used to produce administrative-based statistics on accommodation type and overcrowding. 

The ONS has applied the Quality Assurance of Administrative Data (QAAD) Toolkit to the VOA property attribute data. The summary of this assessment can be found in our Valuation Office Agency property attribute data: quality assurance of administrative data used in Census 2021 methodology. The data were also used as part of Census 2021 to provide information on number of rooms.

Further information about the VOA data and their quality, is provided in our Administrative data used in Census 2021, England and Wales methodology. 

Local authority supplied Council Tax

Each local authority (LA) in England and Wales is responsible for the collection of Council Tax (CT), a yearly charge for all domestic properties. It includes information about exemptions, discounts, and premiums applied to certain types of properties at a dwelling level.  

CT data provide important information about population change at local level, including by type of household. The data were used in our Quality assurance of Census 2021 and are also important for our future research into admin-based population estimates.   

Further information about the CT data and their quality, is provided in our Administrative data used in Census 2021, England and Wales methodology. 

Energy Performance Certificate 

The Energy Performance of Buildings Register holds all Energy Performance Certificates (EPCs) for England and Wales. EPCs are valid for 10 years and published on the Department for Levelling Up, Housing and Communities (DLUHC) website. EPCs indicate the energy efficiency of a building to prospective tenants or buyers, with the intention to improve it. EPC also includes information on floor area and number of rooms in a property. 

EPC data have been used for administrative-based statistics on Energy efficiency of housing in England and Wales and in Developing admin-based property floor area statistics for England and Wales. The data were also used as part of Census 2021 quality assurance, to validate the statistics on central heating type and accommodation type.

Further information about the EPC data and their quality, is provided in our Administrative data used in Census 2021, England and Wales methodology. 

Tenancy Deposit Protection

Tenancy Deposit Protection (TDP) data are provided by the DLUHC. DLUHC receives tenancy deposit agreement data from government approved schemes to fulfil its legislative role in providing protection of tenancy deposits and to dispense a dispute resolution service. 

TDP covers tenancy agreements in the private rental sector in England and Wales. The TDP data will be used alongside DLUHC Continuous Recording of Lettings and Sales in Social Housing in England (CORE) data for research into tenure statistics. 

Continuous Recording of Lettings and Sales in Social Housing in England

The Continuous Recording of Lettings and Sales in Social Housing in England (CORE) dataset is a national information source collected by DLUHC that records information on the characteristics of both private registered providers' and local authorities' new social housing rentals and purchases. CORE data will be used alongside TDP data for research into tenure statistics.

Additional housing datasets 

For housing and household statistics, it is important to maintain a comprehensive list of residential addresses, which is achieved via the ONS Census Address Frame. The Address Frame was used for Census 2021 collection and covers all residential addresses in England and Wales. The Address Frame is built using several administrative and commercial data sources linked to AddressBase Premium (ABP). Further information about the Address Frame and the associated administrative sources is provided in our Administrative data sources used in Census 2021, England and Wales methodology. 

Back to table of contents

5. Education data sources 

National Pupil Database 

The National Pupil Database (NPD) is a collection of datasets held by the Department for Education. These datasets are either related to population, attainment, or post-school destinations. The ONS are currently working on procuring the Welsh equivalents to NPD datasets.

The NPD is used alongside Individualised Learner Record data (ILR) and Higher Education Statistics Agency (HESA) data for research into population and education statistics, including highest level of qualification attained. 

English and Welsh School Censuses

The English School Census and Welsh School Census include all pupils in state-funded schools in England and Wales, respectively, along with characteristics data such as a pupil's ethnicity. They are important sources for ensuring children are adequately represented in our research into admin-based population estimates and our statistics on ethnicity.  

Further information about the ESC and WSC data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021. 

Individualised Learner Record 

The Individualised Learner Record (ILR) data contain information about individuals who are registered on a further education course or receive training from providers in the Further Education and Skills sector in England. The ILR data are important for admin-based population estimates, as they capture those who are in further education and may be missing from other administrative data sources. The ILR is used alongside the National Pupil Database (NPD) and Higher Education Statistics Agency (HESA) data for research into population and education statistics.

Further information about the ILR data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021.

Lifelong Learning Wales Record 

The Lifelong Learning Wales Record (LLWR) is a collection of data on learners and the learning undertaken by them in post-age 16 years education providers funded directly or in-part by the Welsh Government. The data include further education institutions, other work-based learning providers and community learning provision. School sixth forms are not included.

The LLWR provides excellent coverage of students in further education and is an important source for population and education statistics. Information in LLWR on students' characteristics, such as ethnicity, is also important for administrative-based ethnicity statistics.

Higher Education Statistics Agency

Higher Education Statistics Agency (HESA) student data contain information about students at higher education institutions in the United Kingdom, including international students. Students in higher education drive changes in populations at local levels, as they move to be close to their place of study. This includes the movement of international students into the UK. For this reason, the data are needed to adequately capture students in the admin-based population and migration estimates. HESA data are used alongside National Pupil Database and Individualised Learner Record data for research into education statistics, including highest level of qualification attained. The data are also important for admin-based ethnicity, labour market status and education statistics.

Further information about the HESA data and their quality, is provided in publications outlining the administrative data sources used for both our Statistical Population Dataset (SPD) and Census 2021.

Back to table of contents

6. Income, tax, and benefits data sources 

Department for Work and Pension's Customer Information System and Benefits and Income Datasets

The Department for Work and Pensions (DWP) Customer Information System (CIS) contains demographic information on everyone who has a National Insurance Number (NINo) in the United Kingdom. The data include children whose parent(s) has (have) claimed child benefit, as well as individuals who require a NINo to work or receive benefits in the UK, including migrants.

DWP's Benefits and Income datasets (BIDs) contain information on benefits distributed by DWP (including the state pension); HM Revenue and Customs (HMRC) Pay as You Earn, Tax Credits and Child Benefit data, and local authority (LA) Housing Benefit data.

The CIS and BIDs data are used to ensure administrative-based population statistics adequately capture the working age and pensioner population. The data are also essential for research into admin-based labour market status and income statistics, which are important for understanding inequalities down to local levels of geography.

Further information about the BIDs and CIS data and their quality, is provided in publications outlining the administrative data sources used for both the Statistical Population Dataset (SPD) and our Admin-based income statistics Quality and Methodology Information (QMI).

From 2023, the deliveries of Child Benefit and Tax Credits data into the ONS are directly provided by HMRC.

The Registration and Population Interaction Database

The Registration and Population Interaction Database (RAPID) is created by the DWP to provide a single coherent view of citizens' interactions across the breadth of systems in DWP, HMRC, and LAs via Housing Benefit.

RAPID data have proven important for research into admin-based migration estimates, particularly for migrants from the European Union. The ONS currently receives aggregate data from RAPID for use in migration statistics.

RAPID is also used alongside the Migrant Worker Scan, which is a record of non-UK nationals who have been issued with a NINo as part of work into admin-based migration estimates.

Further information about the RAPID and its quality is provided in publications outlining our Methods for measuring international migration using RAPID administrative data methodology and our Administrative data used in Census 2021, England and Wales methodology.

HMRC Self-Assessment

Self-Assessment is the system used by HMRC to collect Income Tax from individuals who are self-employed or have other forms of income not registered to a PAYE scheme. Tax is usually deducted automatically from wages, pensions and savings. However, individuals and businesses with other types of income must report their income in a Self-Assessment tax return. The data are used alongside data on other sources of income, such as income from employment and benefits to produce statistics on labour market status, and individual and occupied address incomes.

HMRC Pay As You Earn (PAYE)

HMRC's PAYE data have historically been received as part of the BIDs delivery from DWP. However, the latest and future deliveries will come directly from HMRC, via its Real Time Information (RTI) system. RTI provides detailed information on the earnings, tax deductions, National Insurance Contributions (NICs) and workplace pensions from employers.

The PAYE data are used to ensure administrative-based population statistics adequately capture the working age and pensioner population. The data are also essential for research into admin-based labour market status and income statistics, which are important for understanding inequalities down to local levels of geography.

Further information about the HMRC PAYE data and their quality, is provided in our Admin-based income statistics Quality and Methodology Information (QMI) publication.

Back to table of contents

7. Migration and travel data sources

Home Office Borders and Immigration data

Home Office Borders and Immigration (HOBI) data (previously referred to as Exit Checks) are derived from a linked database that combines data from Home Office systems to build travel histories that consist of an individual's travel into or out of the UK, together with data relating to their immigration status. The data are an essential part of our research into producing migration statistics.

Information on how the ONS uses the HOBI data to estimate immigration and emigration is published in our Long-term international migration, provisional: year ending June 2023 bulletin and our Methods to produce provisional long-term international migration estimates methodology. The ONS also receives additional data from the Home Office on asylum applicants and returns to supplement the long-term migration analysis. Information about the quality of the data, along with other data sources used for international migration statistics, is provided in the Long-term international migration: quality assuring administrative data report.

In addition to the estimates of immigration and emigration, data relating to refugees have also been used in combination with other data sources for important research into the outcomes experienced by refugees. The section that follows provides further details about this research.

Refugee Integration Outcomes Cohort Study

Resettled and Asylum Refugee (AR) data are used with other sources, including the HOBID data, NHS Personal Demographics Service (PDS), Births Registrations, Deaths and Census 2021 data as part of the Refugee Integration Outcomes (RIO) Cohort Study. RIO is a collaboration between the ONS, Home Office and Department for Levelling Up, Housing and Communities (DLUHC) aimed at improving the evidence base around integration outcomes for refugees in the UK.

RIO uses data for refugees resettled in England and Wales under the Vulnerable Persons Resettlement Scheme (VPRS) and Vulnerable Children’s Resettlement Scheme (VCRS), the UK Resettlement Scheme (UKRS) and Afghan Citizens Resettlement Scheme (ACRS). The study has used data between the years 2015 and 2021 but is also being extended to cover subsequent years of data. Further data for resettled refugees are available in regularly published Home Office Immigration Systems statistics.

RIO also contains individuals who were granted asylum in England and Wales (excluding those that are still awaiting a decision on their asylum claim, or those who were denied asylum). Further data for asylum refugees is available in regularly published Home Office Immigration Systems statistics.

Detailed information about the RIO Cohort Study is provided in:

Back to table of contents

8. Other population groups data sources

Electoral Register 

The Electoral Register (ER), sometimes called the "electoral roll", includes everyone registered to vote in the UK. The dataset is being used for our research into admin-based population statistics and was also used to quality assure Census 2021 for England and Wales.

Further information about the ER data and their quality, is provided in our Administrative data sources used in Census 2021, England and Wales methodology.

Ministry of Justice (prisoners' data) 

Annual prisoner data are supplied to the ONS by HM Prison and Probation Service (HMPPS), an executive agency within the Ministry of Justice (MOJ). The data cover all prison establishments in England and Wales which are required to record prisoner details on Prison National Offender Management Information System (Prison-NOMIS). The data include length of sentence and type, which is used to determine whether to count someone as resident at the prison or their home address. They provide a useful snapshot of the resident prison population and will be used in future admin-based population estimates, in addition to their use in current official population statistics.

Armed forces  

The ONS receive aggregated UK armed forces (UKAF) data from the Ministry of Defence (MOD). These data include military personnel counts by age, sex and local authority of base. Separate aggregate data are received from British Forces Germany (BFG) by sex and age, of dependants (partners and children) who accompany members of UKAF stationed in Germany.

In addition, data for US Air Forces (USAF) based in England and Wales are supplied to the ONS annually providing the number of USAF personnel and their dependants, by sex, age and base in England and Wales.

Further information about the MOD data previously used by the ONS and their quality, is provided in our Administrative data sources used in Census 2021, England and Wales methodology.

Service Leavers 

The Ministry of Defence (MOD) Service Leavers Database (SLD) provides information for service personnel that have left the UK armed forces, irrespective of regular or reserve status and length of service. The data is sourced from legacy personnel systems and the current system, Joint Personnel Administration (JPA). The ONS receives a subset of variables from the SLD for data back to 1975.

The MOD has collaborated with the ONS to set up a data linkage study looking at the feasibility of producing statistics on UK armed forces veterans by linking data from the SLD and Census 2021 to our Statistical Population Dataset version 4.2 (SPD V4.2). Further information is provided in our Feasibility research on producing UK armed forces veteran statistics for England and Wales: 2021 article.

Driver and Vehicle Licensing Agency driving licence data

The Driver and Vehicle Licensing Agency (DVLA) provides the ONS with driving licence data. DVLA is an executive agency of the Department for Transport, responsible for maintaining the registration and licensing of drivers in Great Britain. DVLA also maintains the registration and licensing of vehicles, together with the collection and enforcement of Vehicle Excise Duty (VED), in the UK.

The data the ONS receives are for England and Wales and are based on driving licence records, including driver transactions, such as when applying for a new licence or renewing an existing licence. These data are important for administrative-based population statistics, providing good coverage of the adult population of England and Wales.

Aggregate data on whether people accessed DVLA services online were also used to support the Census 2021 data collection design. Further information on the use of the data in the census is provided in our Administrative data sources used in Census 2021, England and Wales methodology.

Mobile network travel and location data

The ONS receives mobility data through a partnership with Virgin Media O2 Business to support delivering statistics for the public good. The data are based on mobile network insights from Virgin Media O2 Business’ anonymised and aggregated data service, O2 Motion.

The data are important for population and migration statistics, providing extremely timely information about population movement (implied as mobile devices move between masts in different areas). The data provide useful insights into travel patterns and how local populations change in size through the day and over longer time periods. This is important for understanding what services and infrastructure are needed to support the population, including emergency service planning. Mobility data also proved important for understanding population movement during the pandemic (Understanding mobility during the COVID-19 pandemic).

The data include connections for devices of UK residents, but also for those held by travellers from abroad who have used their phone in the UK. This has supported the use of the data for overseas travel and tourism statistics (Using Mobile Phone Data for Enhancing International Passenger Survey Traveller Statistics).

Back to table of contents

9. Data sources used to transform and carry out a successful Census 2021 

Further details on the administrative data sources that were used to support a high-quality Census 2021 in England and Wales are provided in our Administrative data sources used in Census 2021, England and Wales methodology. The report explains how the different administrative sources were used in the census, with details about the coverage, accuracy and timeliness of the sources against the needs of the census.

Back to table of contents