This year’s assessment reflects an important step forward in the Office for National Statistics (ONS) being able to access the range of data needed to produce Administrative Data Census outputs. The Digital Economy Act 2017 was passed into law in April 2017. The Act gives ONS a right of access to information held by government departments, other public bodies, charities and large and medium-sized businesses, for statistics and research purposes.
Over the last year, we’ve done a lot of ground work to enable us to achieve our expected assessment post 2021 (see Figure 1). As well as the Digital Economy Act 2017, progress has been made in:
- improving the accuracy of administrative data based population estimates through improved linking, use of new data sources and improved methodology; these population estimates are now closer to our official estimates
- producing new estimates on the number of occupied addresses (“households”) and estimates of personal income direct from administrative data sources
identifying potential administrative sources and starting to describe methods which will enable us to produce estimates of population and household characteristics using these sources in combination with survey and other data
Our plans for the next year build on this ground work. They continue to demonstrate the potential for an Administrative Data Census to produce outputs that meet information needs. This will involve working with data suppliers to access data sources through the new powers of the Digital Economy Act 2017. The Act will enable us to make progress against the other high-level criteria over the coming years. This progress is reflected in the “expected progress by 2018” indicator in the main assessment as shown in Figure 1.
In particular, we’ll be working within the legally operational framework of the Digital Economy Act 2017 to:
access more “activity”1 data which would be used in combination with a coverage survey to improve the quality of population estimates
access a wider range of data covering population characteristics and using it to increasingly demonstrate the range of outputs that are possible through an Administrative Data Census
Together with accessing data sources, we will develop new methods to enable the production of small area multivariate2 outputs about population characteristics.
Notes for: Main points
Information from administrative data sources about when individuals have interacted with systems or services, such as the National Insurance, tax or benefit systems, or a hospital visit through the NHS system.
Cross tabulated outputs using more than one variable, for example unemployment rates by ethnic group for small areas.
In March 2014, the National Statistician made a recommendation that the census in 2021 should be predominantly online, making increased use of administrative data and surveys to both enhance the statistics from the 2021 Census and improve statistics between censuses. The government’s response to this recommendation was an ambition that “censuses after 2021 will be conducted using other sources of data”.
A move towards an Administrative Data Census is our response to this challenge.
An Administrative Data Census offers a number of opportunities. These include producing key census outputs on a more timely, frequent basis (possibly every year), for less money than the current system. It also offers opportunities to produce new census outputs that aren’t available through the current approach, such as income, fuel poverty and housing affordability. This will result in better decision-making across government in line with UK Statistics Authority strategy – Better Statistics, Better Decisions.Nôl i'r tabl cynnwys
This is ONS’s second assessment of its ability to move to an Administrative Data Census in the next decade. It is our ambition to produce the type of information that is collected by a 10-yearly census (on housing, households and people) from an Administrative Data Census. Doing this will require a combination of:
- record-level administrative data held by government
- a population coverage survey
- a population characteristics survey
- some commercial and other non-survey data sources
You can find further information on what an Administrative Data Census is in last year’s assessment.
The assessment in this report is made against the following five high-level criteria:
- Rapid access to new and existing data sources
- The ability to link data efficiently and accurately
- Methods to produce statistical outputs that meet priority information needs of users
- Acceptability to stakeholders
- Value for money
These criteria reflect what needs to be in place for ONS to move to an Administrative Data Census. Further details on these criteria can be found in Annex A.Nôl i'r tabl cynnwys
In the previous report, the focus of our assessment was whether ONS would be able to move to an Administrative Data Census post-2021. We described the goal of comparing outputs based on administrative data and targeted surveys against the 2021 Census.
We have now developed our plans for this comparison in more detail.
To make this comparison as fair and robust as possible, we will need to produce the best possible Administrative Data Census outputs in 2021. We plan to have the following in place by 2021:
Administrative Data Census-based population statistics by 2020
An Administrative Data Census-based approach to producing population statistics using a combination of administrative data and a population coverage survey, will need to be in place in 2020 to demonstrate the ability to produce:
- annual estimates of the size of the usual resident population by age and sex for national, local and small areas
- components of population change (births, deaths and migration)
These estimates provide a base for further outputs about the population including population projections.
The annual Population Coverage Survey (PCS) would measure and adjust for coverage errors in the Statistical Population Dataset (SPD) as shown in the framework in Figure 2. We are developing plans to test a PCS, to commence later in 2017. In the next 3 years, we plan to implement a comprehensive testing strategy to explore appropriate sampling and data collection methods, response rates and estimation methodologies.
Earlier this year, we produced a first set of research outputs on the numbers of occupied addresses (“households”). These were produced from the same resident population base as that used for Administrative Data Census Outputs on the size of the population.
An Administrative Data Census producing characteristics of the population, housing and households by 2021, supported by an integrated approach to collecting survey information on characteristics of the population
This approach will need methods that make best use of a combination of administrative, survey and commercial data to produce outputs about household and population characteristics. A range of methods may be needed, depending on the availability and quality of these sources. Some topics will be predominantly administrative data-based, others will be based on survey information alone. We expect most will need integration of both sources (plus in some cases commercial data).
For census topics for which there is limited or no administrative data available, such as the number of hours of unpaid care, outputs would be largely based on a sample survey. As such outputs may be more limited in frequency and/or detail. For example for such topics, it might be possible to produce estimates only at Local Authority level. We are currently investigating how such a survey might fit alongside other ONS surveys.
The systems, services and technologies in place to support this transformation
We are exploring how best to make use of new corporate platforms for the acquisition, management, integration, processing, and analysis of data from multiple sources.
We’re in the early stages of development for a range of new technologies to support the transition to an Administrative Data Census including:
- online collection of census and survey data
- acquisition and validation of administrative data directly from suppliers
- matching services for an address and business “spine” and for data about people
- access control and data management
Future assessments will be measuring our progress towards achieving these objectives. The final assessment (due in spring 2023) will form the basis for the National Statistician’s recommendation at the end of 2023, about the future of the census and population statistics. A public consultation will take place on the basis of our final assessment and to ensure the National Statistician’s recommendation reflects user needs.Nôl i'r tabl cynnwys
Figure 2 shows a framework for operating an Administrative Data Census. It sets out how each of the components fit together, as follows:
- a range of administrative, survey and other data sources are combined using linkage methods (our current methods are described in our methodology paper)
- we then apply a set of rules and methods to create a Statistical Population Dataset (SPD)
- we will then link the SPD to a Population Coverage Survey and use estimation methods to produce outputs on the size of the population and households
- we will also link to a characteristics survey, and use other methods to produce outputs about the characteristics of the population
The framework has been mapped to the high level evaluation criteria (see Annex A) and coloured using the Red Amber Green (RAG) status to illustrate our progress.
The diagram shows that having access to a few key data sources enables us to produce population and household estimates for the total population (denominators) which are fairly close to the official estimates. We have published two sets of research outputs on population estimates in October 2015 and November 2016 – around 95% of administrative data based population estimates for local authorities are of similar quality to those produced by the Census in 2011.
We also published a first set of estimates of occupied addresses (“households”) from administrative data in February 2017, again showing good potential. The green/amber colour on the diagram reflects this.
Our greatest challenge is our ability to produce estimates of the characteristics of the population and households (numerators), currently red amber on the diagram.
Our assessment to produce statistics on the topics traditionally collected in the census is directly related to the limited access that we currently have to administrative sources that have information about such characteristics (coloured redamber). Gaining access to more key data sources, using the new powers in the Digital Economy Act 2017, together with developing new methods to produce such outputs will improve our progress in this area.
The next section provides a more detailed assessment on our progress to date and expected progress over the next year against each of the high-level criteria.
Nôl i'r tabl cynnwys
This high level assessment allows a direct comparison to be made with last year’s assessment. Work is ongoing to produce more detailed criteria. This includes describing definitions for the Red Amber Green (RAG) status for each of the criteria Figure 1 shows the assessment demonstrating where we are now and where we expect to be by 2023. The rest of this section provides the evidence behind the assessment and a description of what will be done in the future to improve the assessment. Each assessment indicates the expected progress over the next year.Nôl i'r tabl cynnwys
The Digital Economy Bill received Royal Assent in April 2017. The Digital Economy Act 2017 gives the UK Statistics Authority a statutory right of access to information held by government departments, other public bodies, charities, and large and medium-sized businesses, for statistics and research purposes. This will help ensure ONS has access to the data it needs to produce fit-for-purpose official statistics that meet the challenges of a modern administration and the evolving needs of statistical users.
Figure 3 shows an updated view on the availability of administrative and other non-survey data sources for the key topics traditionally included in the census. This is based on recent exploratory work to assess the potential for administrative and other sources to provide information about characteristics of the population. In this table, a green assessment relates to whether there is data currently available to us on those topics. This doesn’t necessarily mean that there is enough data to produce direct outputs based on administrative data alone. In many cases, these data sources would need to be combined with survey sources to produce characteristic outputs.
An amber assessment has been given to those topics where administrative data may be available, but ONS does not currently have access to it. A red assessment indicates topics we don’t think are covered on any administrative data sources. In particular, this highlights the challenges for certain topics such as “mode of travel to work” and “hours of unpaid care”, where there are no data sources available.
The table demonstrates that there are data sources which collect potentially useful information for most topics. However, more work is needed to determine whether the identified sources are of a suitable quality to produce these outputs. A preliminary assessment of the quality of these sources is presented in the Population, housing and household characteristics section.
Since last year’s report, we’ve assessed the statistical quality of two new data sources. We published our findings in data source overviews (one covers income and benefits data from the Department for Work and Pension (DWP) and HM Revenue and Customs (HMRC) and the other covers the Personal Demographic System (PDS) data from NHS Digital). We used these sources to produce new research outputs on income and improved estimates of the size of the population.
What we’re doing to improve the assessment
Since the Digital Economy Act 2017 has passed into law, the UK Statistics Authority and ONS are putting together a new legally operational framework for sharing data. This will include developing new codes of practice and setting out high level principles that will guide the exercise of new powers.
Annex B provides a list of administrative data sources by topic.
Once we start to obtain access to new sources we will be able to carry out further statistical research to improve our current methods and our ability to produce new statistical outputs.
As described earlier, some variables are not fully available on administrative data. For key topics, one solution might be to explore whether it is possible to collect these topics on administrative data, or if they are already collected, ensure this is done on a consistent basis. An example of this could be to improve the collection of data about ethnicity across the health service.
The assessment is now at amber, due to the Digital Economy Act 2017. The expected assessment of amber/green is on the basis that we can now access the required data sources and that they are of the expected and required quality to produce outputs.Nôl i'r tabl cynnwys
9.1 Population estimates
The second set of Administrative Data Census Research Outputs on the size of the population was published in November 2016. Estimates were provided down to Lower Layer Super Output Area (LSOA) and by single year of age (at local authority level) for 2011 and 2015, in response to user feedback.
We also produced an improved version of the Statistical Population Dataset (SPD) v2.0) which generally produced estimates that were more similar to the 2011 Census estimates than those produced from SPD v1.0. Some of the most notable improvements have been for the female population and children aged 5 to 14. Improvements are largely as a result of:
- including school census records to improve the coverage of children aged 5 to 14
- including additional records that have been linked between the NHS Patient Register (PR), the Department for Work and Pensions (DWP) Customer Information System (CIS), and Higher Education Statistics Agency (HESA) data using improved matching methodologies
- assigning records on the SPD to the most likely address using ”activity” data from DWP benefit interactions and address moves recorded on the Personal Demographic Service (PDS).
Following feedback from users, we also improved the way we published the Research Outputs. We did this by using a new web based format, improved data visualisation tools and a SlideShare presentation to help users understand the story behind the small area statistics.
Following the publication, users were asked to provide feedback on the methods and these outputs. This feedback will be summaised and published later in the year. Feedback we received from users included the following:
- SPD (v2.0) has shown an improvement in quality, which has the potential to be furthered by including “activity” data.
- the production of population estimates for outputs areas (OA) would be useful
- suggestions that we could work with local authorities who have a detailed knowledge of their areas to improve estimates and understanding of the data
- generally feedback received indicates that users were happy with the new release format
Users also suggested additional data sources that could improve the quality of particular population groups. These included the housing benefit register, electoral roll registrations and Driver and Vehicle Licensing Agency (DLVA) data.
What we’re doing to improve the assessment
The next set of Administrative Data Census Research Outputs on the size of the population is due to be published later in 2017. We aim to respond to users’ feedback and show developments to the methods and to produce outputs at output area level.
We will continue to explore the use of new ‘activity’ data for identifying and removing records from the SPD relating to individuals that are no longer usually resident in the population. This version of the SPD (v3.0) will be used in combination with a simulated Population Coverage Survey (PCS) drawn from 2011 Census data to produce coverage adjusted population estimates for 2011 by LA. This will take forward earlier research findings on adjusting for population coverage issues covered in Beyond 2011. We are currently working with DWP and HMRC to acquire further data for this purpose.
We are also developing plans to test a PCS, to commence later in 2017. In the next three years, we plan to implement a comprehensive testing strategy to explore appropriate sampling and data collection methods, response rates and estimation methodologies. Our aim is to have a PCS in place by 2020.
9.2 Households and families
The first set of Administrative Data Census Research Outputs on the number of occupied addresses (households) was published in February 2017. These examined the feasibility of producing household estimates from administrative sources. The Research Outputs provide estimates for the number of occupied addresses (households) at regional and local authority level for 2011, providing a comparison to the 2011 Census. They also provide estimates at regional level for 2015, making comparisons with the 2015 Labour Force Survey (LFS) household estimates.
The term “household” used for these outputs is based on the concept of occupied address from administrative sources. This is rather than the definition used in census and LFS estimates, which use a “shared facilities” definition of households, and from which communal addresses are excluded. It’s unlikely that administrative data sources will be able to provide information on the “shared facilities” definition. This is consistent with other countries that have moved to a register-based approach to census-taking, and we need to understand the impact of this for our users.
Following the publication users were asked to provide feedback on the methods and these outputs. This feedback will be summarised and published later this year. Feedback from users included the following:
- generally users were content that the definition of “occupied address” satisfied their requirements for statistics on households
- as well as time series data, users expressed a need for households cross tabulated with other characteristics such as income, employment, age and ethnicity
- suggestions on additional data sources that could potentially improve the quality of estimates included; Credit Reference Data, Council Tax data and electoral registrations
There is very limited information on families or relationships in administrative data sources. This provides challenges in producing household composition and family analysis based on ONS’s existing definitions. Using survey information to derive household relationships would be needed alongside the administrative sources to meet these definitions.
What we’re doing to improve the assessment
We aim to improve these estimates by:
- developing methods to improve coverage in the estimates, using a similar estimation approach to that being tested for the population estimates, using the PCS
- Identifying and removing communal establishments from the list of occupied addresses using AddressBase1; removing communal establishments will help improve the coverage by removing addresses that shouldn’t be included in our household estimates, it will also help us adjust for some of the definitional differences described earlier
- seeking users’ views on whether a definition of households based on ‘occupied addresses’ would meet user needs, we are jointly hosting a user workshop later in July to understand the impact of the change in definition
We also intend to increase the breadth and depth of the “household” estimates in the next year. We’re aiming to publish Administrative Data Research Outputs on the number of households for small areas (sub-local authority) and on the size and composition of “households”.
9.3 Population, housing and household characteristics
The first Administrative Data Census Research Outputs on the local authority level individual gross income distributions were published for England and Wales. The report focused on the coverage and statistical quality of these initial income outputs with a view to improving this in future publications.
Users were asked for their feedback on the outputs, A brief summary of the feedback provided is included below.
- household income would be useful to provide insight into living standards, financial inequalities and deprivation
- income by local authority and Lower Layer Super Output Area (LSOA) is/would be the most useful, alongside income cross tabulated with other characteristics such as employment status, tenure and household type
- for some areas (notably London), the top income band (£60,000.01 plus) didn’t reflect the disparities of income in this category, with a notable percentage of the population earning over this threshold
- there was a user requirement for looking at variations in income over time.
As mentioned in the earlier “Access to data” section, we’ve carried out an initial exploration into the availability and quality of administrative sources and their potential for producing outputs on population and housing characteristics. Figure 3 demonstrated the varied availability of administrative and other data sources in relation to census related topics. The next step is to understand how good these data sources are and how they could be best used to produce statistical outputs that meet user needs.
Assessing the quality of administrative data
Our initial investigation into the potential for administrative data to produce information on population characteristics assessed each source against recognised quality dimensions. These included relevance, accuracy, coverage, timeliness and comparability. This was challenging due to the limited information available on many administrative data sources, particularly those we don’t have access to.
The information we could find tended to be about the coverage, the data definitions and possible errors in the data. We’ve drawn out these key indicators to provide a framework for measuring the potential for the administrative sources to produce characteristics outputs. This is shown in Figure 4, which provides a summary of our assessment.
This diagram simplifies what is a much more complicated picture. Each topic has its own issues and opportunities and there is a subjective element to scoring each source but it does provide a tool to show our progress now and into the future as our knowledge grows.
The full quality assessment including further details on the scoring system, how the quality framework will be used in the future, can be found in Annex C. We are interested to hear your views about how useful it is to present our findings in this way and would appreciate your feedback.
The further a topic appears towards the top right of the diagram (full coverage, quality “very close”, lightest shading), the more we believe we can produce outputs based on administrative data alone using survey data to help quality assure the estimates. The nearer the topic appears to the bottom left, the greater reliance we’ll need on survey data with the consequence of less frequent and/or less detailed outputs.
The majority of topics will sit somewhere in between these two areas. These topics may be available from some administrative sources but there is likely to be insufficient data to produce direct administrative-based estimates and there will be a need for further support from survey or other data sources. Where the characteristic measured in the administrative source is very close to the concept we’re trying to measure but there is only partial coverage (partial coverage, very close quality), imputation methods are required to fill gaps in the data. Another approach would be to use surveys to measure and adjust for any coverage issues. Conversely, if the administrative data source has good coverage but imperfect quality (full coverage, quality “needs work”), using an integrated approach with surveys could be used to adjust for this error. An example of this would be “general health”, information on health conditions which are collected across the health service could be used alongside survey responses of self-reported “general health” to provide a general health indicator.
Modelling approaches could be used when the information available in the administrative sources is either not of good quality or has imperfect coverage but is closely related to the characteristic (Full or partial coverage, quality poor or needs work). We have applied two types of small area estimation methods; a model-based regression approach and a model-based structural approach. The first approach has been used to model unemployment estimates by combining unemployment benefit claimant counts from DWP with information on unemployment from the Labour Force Survey.
A model-based structural approach is useful when the administrative and other data sources have the same category structure and where the administrative source has limited coverage. The generalised structure preserving estimation (GSPREE) approach is currently being tested to produce ethnic group population estimates.
There will be some characteristics traditionally provided by a census that can’t be obtained from administrative data. In this case, survey data alone would be needed to provide estimates.
The aim for an Administrative Data Census is to replicate as many census outputs as possible using administrative data and surveys. The idea is to produce these in a flexible way to produce aggregate cross tabulated estimates. This will be easier for those characteristics for which there are good quality administrative data sources available. For those characteristics with limited administrative sources, further estimation modelling and imputation methods will be required.
We will also be developing our requirements for a population characteristics survey that will supplement the production of characteristics outputs produced by an Administrative Data Census. This work is being done alongside the transformation of ONS’s surveys as part of the Data Collection Transformation Programme.
A further description of the methodological framework will be published later this year to accompany our future outputs.
What we’re doing to improve the assessment
We are investigating the use of Valuation Office Agency (VOA) data to produce statistics on housing characteristics (number of rooms and bedrooms). The VOA data also contains information on age of property and floor space (topics not previously included in the census). The findings from this work will be published shortly. In the next year, we will be exploring the feasibility of producing Administrative Data Census Research Outputs from data that are currently available to us, on:
- ethnic group population estimates by local authority
- household size by local authority
- qualifications for school/university age population
- travel to work outputs using mobile phone data
- further outputs about income (both household and personal income).
- mother’s income in the (tax) year before birth
The new outputs above are being produced to demonstrate the wider opportunities from using administrative data.
Notes for: Current assessment – Ability to meet information needs of users
- AddressBase – an Ordnance Survey address product compiled from local authority, Ordnance Survey and Royal Mail address lists.
The assessment reflects the fact that until we can demonstrate that our methods can deliver outputs that meet the information needs of users, it’s challenging for an Administrative Data Census to be acceptable to stakeholders. We’ve demonstrated improvements in the quality of the population estimates which have been well received by users, as has the production of a first set of research outputs on income and occupied addresses (“households”).
We have engaged with a range of stakeholders about this work over the last year. This includes:
- a research conference held in June 2016 where we shared research updates, invited feedback on research plans and gauged confidence in our ability to move to an Administrative Data Census.
- continued engagement with local authority and regional user groups with whom we shared research findings on our recent Administrative Data Census Research Outputs on population, “households” and “income” estimates. Feedback from these sessions is summarised in a report to be released later this year.
- communicating progress through presentations delivered to, for example, the Government Statistical Service Methodology Symposium, British Society for Population Studies, RSS Stats User Forum, International Population Data Linkage Network, and European Conference on Quality in Official Statistics.
- biannual meetings that have been initiated between the devolved administrations to explore the use of administrative data in censuses
- representation on the United Nations Economic Commission for Europe (UNECE) taskforce on register based and combined censuses; part of the taskforce’s work is to agree on a set of guiding principles and a common framework for register based and combined censuses, which included ONS’s ambition to move towards an Administrative Data Census.
- working closely with other countries who are also considering the potential for moving away from a traditional, 10-yearly census, for example, Canada and New Zealand are doing similar research.
We are working closely with data suppliers to understand and improve statistical quality issues identified in the data. The Data Suppliers Group has been expanded to include members from key departments across Whitehall and the devolved administrations. This group is a forum for discussing issues relating to data sharing, sharing research and building stronger relationships.
We’ve also set up statistical quality working groups with a number of supplier departments which meet several times a year. The aim of these working groups is to share knowledge on the quality of the data sources and to identify mutual benefits to improve the overall quality of the data.
To safeguard the privacy of individuals, ONS have adopted the “five safes” framework to provide assurance that the data collected from individuals is only used for research using the following principles:
- data are handled by people who have been trained and accredited
- data are only used for research projects that deliver clear public benefits
- data are stored in a secure setting
- all outputs are checked and confirmed as non-disclosive before they are made available
- data are de-identified and have names, addresses and any other identifiable variables removed beforehand
The “five safes” are safe people; safe project; safe settings; safe outputs; safe data. They are designed to address concerns raised by the public during research into the public acceptability of sharing their information. The common principles of the framework are used across government and academia in the UK and internationally.
The National Statistician’s Data Ethics Advisory Committee (NSDEC) was established in 2015 to advise the National Statistician that the access, use and sharing of public data, for research and statistical purposes, is ethical and for the public good. The proposals for the personal “income” Administrative Data Census Research Outputs from combined Pay As You Earn (PAYE) and benefits data were considered by the committee in October 2016 before they were published in December.
Parliament is likely to be concerned about the usability of the outputs, protecting the confidentiality and security of data, and public acceptability. Therefore all the evidence provided across the stakeholders is relevant. Additionally, the next section outlines what is being done to deliver value for money from an Administrative Data Census which is of high interest to Parliament.
What we’re doing to improve the assessment
We plan to further improve the methods and expand the range of outputs to demonstrate our ability to meet the information needs of users. We’ll be submitting our methods and research for independent review later this year. The outcome from this review will be published.
Plans to further understand the requirements of users are underway. We are jointly hosting a user workshop in July on household information requirements. This will have a particular focus on understanding the impact of a change in household definition for users that would result from a move to an Administrative Data Census.
The relationship forged through the Data Suppliers Group will be vital to unlocking data across government. Following the Digital Economy Act 2017, work will need to be done to put in place the practical arrangements for sharing data with approved safeguards.
Proposals for new Administrative Data Census ‘Income’ Research Outputs will be taken to NSDEC for guidance on the ethical considerations for the use of administrative data. Where appropriate, future research on the use of administrative data in combination with survey and other data sources will also follow this process. We will conduct further public acceptability research to understand the public’s concerns around the use of their data. In 2023, a public consultation will gather views about an Administrative Data Census.
As outlined previously, the work that is being done to improve the assessment with regards to users, data suppliers and the public should also improve the acceptability of an Administrative Data Census to Parliament.Nôl i'r tabl cynnwys
Evidence and what we’re doing to improve the assessment
In the Beyond 2011 Programme, we published a Summary of benefits of census information. We will be reviewing the benefits associated with moving to an Administrative Data Census.Nôl i'r tabl cynnwys
This report assesses our current progress towards an Administrative Data Census. In it, we have described our plans to improve this assessment in the future, and we have identified the challenges we must overcome so that we can compare our outputs with those from the 2021 Census.
Despite these challenges, moving towards an Administrative Data Census offers a number of opportunities. These include producing key census outputs on a more timely and frequent basis, potentially for less money than the current system. It also offers opportunities to produce new outputs that aren’t available through the current approach such as income, that could better meet user needs.
This year’s assessment has focused on what we have been able to produce with the data we have. We’ve investigated the potential for administrative data sources to produce population, housing and household characteristics. Building on this work will gather pace as the new powers from the Digital Economy Act 2017 facilitate better access to the data sources we need. Next year’s assessment will focus on the main uses of census-related information and how we are placed to produce outputs that meet these needs.
Over the next year, we’ll:
- publish an expanded range of Administrative Data Research Outputs and seek feedback from users
- continue progress on acquiring administrative data and understanding the statistical quality of the data that are accessed
- submit our methods for External Assurance
- continue to engage with users and data suppliers through the Census Advisory Groups, Data Supplier Groups and other user working groups
We’d appreciate it if you could complete our feedback survey to help us develop our assessment in the future.Nôl i'r tabl cynnwys
There are four key challenges to delivering an Administrative Data Census:
- Accessing the range of data needed to produce outputs that are currently provided by the ten-yearly census
- Linking together lots of independently collected data accurately whilst preserving the privacy and security of the data
- Developing methods that can transform the linked data into outputs that meet the needs of users
- Making an Administrative Data Census acceptable to key stakeholders, for example by providing value for money, and providing reassurance that data will be kept safe through this approach
To address these challenges, the following would need to be in place.
Rapid access to existing and new data sources
Criteria for assessment – access to data
To maximise the breadth and quality of statistics that could be provided by an Administrative Data Census, ONS would need to have rapid access to new and existing data sources from across government. This would also need to extend to other sources of existing data that would add value. ONS would also need to be consulted before changes are made to the administrative data that may affect the quality and stability of outputs from an Administrative Data Census over time. The Digital Economy Act 2017 offers a solution to these requirements.
The ability to link data efficiently and accurately
Criteria for assessment – ability to link
An Administrative Data Census would involve linking together multiple administrative data sources and surveys to produce statistics on the range of topics that the census currently includes. This isn’t a simple task. Individuals in the UK don’t have a single unique reference number that is carried across all government-held data, making this linkage challenging. For example, data about tax and benefits from DWP and HMRC use the National Insurance Number, while GP Register data uses NHS number and School Census uses a unique pupil reference number.
We need methods that enable us to link together these independent data sources accurately to enable the production of high quality statistics. An additional challenge is to do this while preserving the privacy and security of the data.
Methods to produce statistical outputs of sufficient quality that meet priority information needs of users
Criteria for assessment – ability to meet information needs of users
We need to deliver methods that can transform the linked administrative and survey data into statistical outputs that meet the priority information needs of users. This means providing statistics on the topics users need, at the right level of detail (for example, for small areas), and at the right quality. In response to a public consultation in 2013, users told us we need to develop statistical methodologies that:
- provide robust estimates about the size of the population and the number of households
- provide estimates about population characteristics at a point in time to allow similar areas to be compared with one another
- provide the granularity of information that users need to measure change over time (for example being able to spot changes over a decade in unemployment rates by ethnicity for small areas)
Another important area, is developing the detail of the survey design that will be needed and the methods to model from surveys and administrative data.
Acceptability to stakeholders (users, suppliers, public and Parliament)
Criteria for assessment – acceptability to stakeholders
In order to successfully move to an Administrative Data Census in the next decade, users of the data, data suppliers, the public and Parliament need to be convinced that this approach meets their needs. Acceptability to the four main stakeholders (users, suppliers, public and Parliament) will be influenced by ensuring that:
- main information needs of users are met
- data are held, processed and linked while protecting privacy, confidentiality and security safeguards
Value for money – including benefits and costs
Criteria for assessment – Value for money
An Administrative Data Census will need to demonstrate that it provides value for money compared with a 10-yearly census. This means showing either that it can deliver the benefits that users get from a 10-yearly census at a lower cost, or that the cost saving is sufficient to justify lower benefits. For example the Administrative Data Census may not be able to deliver all the outputs that a 10-yearly census provides but it may include additional benefits such as more timely, frequent data and new outputs that are not currently provided by a 10-yearly census. This is the key trade-off that will need to be taken into account.Nôl i'r tabl cynnwys
This annex provides an update on data that we have obtained access to, and data that we are focusing on next.
Data source overviews have been published for a number of sources. These include statistical quality assessments which were carried out during the Beyond 2011 Programme on the key data sources which we now have access to. These reports covered primarily demographic and geographic variables with little characteristic information. The findings were written up as source reports and published on our website. The reports are:
- Administrative Data Sources Report: NHS Patient Register (S1)
- Administrative Data Sources Report: Electoral Register (S2)
- Administrative Data Sources Report: The English School Census and the Welsh School Census (S3)
- Administrative Data Sources Report: Higher Education Statistics Agency: Student Record (S4)
- Administrative Data Sources Report: CIS combined (S5)
Data sources that we are currently focusing on
We regularly review our priorities for the datasets we want to pursue and use. This means that some datasets referenced in the 2016 annual assessment publication may have made little progress this year.
Personal level income and benefit information
Personal level income and benefit administrative data, if of sufficient statistical quality and legally accessible, will provide income, household and “activity” data. We have access to a subset of variables within these data from HM Revenue and Customs (HMRC) and the Department for Work and Pensions (DWP). This data includes some income and benefits variables for the population receiving benefits (including Universal Credit, Personal Independence Payments and Child Benefit), tax credits aswell as information from the Pay As You Earn (PAYE) system (this excludes self-assessment and self-employment).
A source overview has been published for these datasets. The data has been used as an indicator of activity in Statistical Population Dataset (SPD) V2.0 and to produce the Income Research Outputs. We received an updated version of these datasets in December 2016. This will allow research to continue into how this data can be best used and enable us to refine our requirements for a more detailed data supply.
Access to the full data is a priority. The Digital Economy Act 2017 may help getting access the more detailed variables required and some specific subsets, such as self-assessment returns.
All education dataset for England
We are working with the Department for Education (DfE) to develop a longitudinal education dataset using the existing National Pupil Database, further education and higher education data. (Previously this project was owned by the Department of Business, Innovation and Skills). The dataset is expected to include variables on attainment and qualifications for everyone born from 1985 onwards through their secondary, further and higher education as well as a range of socio-demographic variables. Coverage is for England only at present.
The data can potentially be used to improve estimates of specific population groups such as schoolchildren and students. This is by using records as an indication of activity and to determine where they are resident in the country and patterns of movement during period of study. The data are also expected to help with estimating the population by characteristics such as age, sex, ethnic group and qualifications held.
We compared a first version of this dataset (with a subset of variables) with the 2011 Census in order to assess the coverage and statistical quality of the matching used to create the data. The findings of this analysis have been provided to DfE and will inform further development of the dataset. A legal assessment is being completed to establish how the full dataset can be accessed. Our aim is to publish the first qualification research outputs in late 2017. This is dependent on accessing an initial supply which is suitable for use.
We have access to demographic data from health datasets from NHS Digital for population statistics purposes. This data comes from the Personal Demographic Service (PDS).
A source overview was published on the PDS – movers’ extract. These data show all changes in postcode that took place within a set period of time. These data can be used as evidence of activity, from patients interacting with NHS systems through new registration or by updating their address or other details. This activity data was used within SPD V2.0.
We now also have access to a subset of the PDS stock extract. This covers similar information to the GP Patient Register.. We’re working with NHS Digital to understand what additional information can be provided through the PDS stock extract and to understand how it could be used in the future to improve our research outputs.
Access to “activity” and demographic data from Hospital Episode Statistics (such as ethnicity, but not including clinical information) is a priority for 2017.
Currently we have focused on health data for England only. We will need to expand this to include Welsh health data in the future.
Property attribute data
We have access to data from the Valuation Office Agency (VOA). The data contains variables on characteristics of address such as property type (for example detached, terrace) and size of property. These data are currently being reviewed to assess whether or not they can be used to produce estimates for the number of rooms and size of property.
Property website data
We have acquired two datasets from the Zoopla website about properties for sale or for rent:
- We bought cleaned Zoopla data for seven local authorities from WhenFresh, a data analytics company
- We acquired Zoopla data from the Urban Big Data Centre for the UK free of charge but this dataset has only had minimal data cleaning undertaken.
Using the property description from these datasets will help to add insight about properties, particularly those which are hard to count or access, such as gated communities or caravan homes (including whether the homes are more likely to be holiday or residential homes). As both datasets provide details about properties over time, they can provide an indication about the churn of people moving into or out of an area which could help to improve population estimates.
Council tax information
Council tax is being considered to indicate the extent of population churn at an address and to indicate activity at an address. Data on individuals could provide evidence on their location which would be used to improve population research outputs. This year we have been working with a few local authorities to test the legal basis for sharing data. We’ve received council tax data from two local authorities and are currently assessing its potential use.
Mobile phone data
Mobile phones transmit data on their location back to mobile network providers. These location measurements have the potential to create a variety of estimates relating to population densities and flows. We’ve obtained a sample of commuting flows derived from the movement patterns of mobile phone users. We’re comparing them against Census travel to work data and other official data. The data provided to us only reveals commuting flows greater than 15 commuters so as to be non-disclosive.
Administrative data sources we plan to pursue access to in the future
Feedback from the Census topic consultations and users (through feedback to the Administrative Data Research Outputs) has pointed to various administrative data sources for further investigation. These will be analysed and investigated based on their potential for meeting our aspirations including replicating census outputs.
Further health data
Further health data collected by the NHS could provide additional activity information (such as receipt of prescription) and non-health characteristics (such as language) information. These could be used to improve the population and characteristics outputs.
Vehicle and driver data
We have held an initial requirements meeting with the Driver and Vehicle Licensing Agency (DVLA) to discuss accessing vehicle and driver administrative data. These administrative data may allow the production of information about the number of cars or vans in households (although there would be limitations related to how business cars are recorded) which assists central and local government with transport and new housing planning. The data may also improve the administrative data population statistics research outputs (particularly among young males who may be less likely to update other administrative sources). Due to a change in priorities, less work has been done on accessing these data.
Electoral register data was assessed in 2013 as not sufficient to be used as the sole source of information for the production of population and small area socio-demographic statistics. In June 2014 there was a move individual electoral registration, we continue to work with the Electoral Commission to understand these changes and assess whether the electoral register data may be used in future.
Data from TV Licences may be another source which could provide additional information such as churn at an address or “activity” data (when individuals transfer their licence) as well as information on addresses which may not be captured on other sources (such as caravan parks).
Other datasets for consideration:
- Home Office data – exit checks
- Land Registry data
- Stamp Duty data from HMRC
- data to measure private sector housing tenure
- data to measure public sector housing tenure
- Royal Mail forwarding address data
- dwelling stocks and Department for Communities and Local Government (DCLG) and Welsh Government social housing returns
- electricity meter data
- business Valuation Office Agency (VOA) data
- destination of leavers from Higher Education Survey
Census topics by availability, and specific data sources
Looking at which data sources collect information on a particular topic is useful for targeting our efforts to access the right data sources. Some data sources could cover a range of population characteristics for example; ethnicity and religion are collected across education data.
The table below shows the data sources which are available for each census topic.
Please note: datasets in bold include more than one census topic.
Table 1: Census topics by availability, and specific data sources
|Census topic||Some data available to ONS||Some data available but ONS doesn’t currently have access|
|Demographics||Household Composition||Single Housing Benefit Extract (SHBE)|
|Marital or Legal Partnership Status||Births, Marriages and Deaths Registers||Marriage Allowance (HMRC)|
|Single Housing Benefit Extract (SHBE): Lone Parent Indicator|
|Education||Qualifications held||All Education Dataset for England (AEDE)|
|Higher Education Statistics Agency Student Record (HESA)|
|Individualized Learner Record (ILR)|
|The Universities & Colleges Admissions Service (UCAS)|
|Term Time Address||Higher Education Statistics Agency Student Record (HESA)||All Education Dataset for England (AEDE)|
|Ethnicity, Identity, Language and Religion||Ethnic group||English and Welsh School Census||Customer Information System (CIS)|
|Higher Education Statistics Agency Student Record (HESA)||Hospital Episode Statistics (HES)|
|Patient Demographic Service (PDS)|
|Work and Pensions Longitudinal Study (WPLS)|
|Citizenship (passport held) or Nationality||Migrant Worker Scan||Citizenship Data (Home Office, Immigration Statistics)|
|Central Reference System (CRS)- Visa Data|
|Higher Education Statistics Agency Student Record (HESA)|
|Main languages used||English School Census|
|Welsh School Census|
|English Language Proficiency||Citizenship Data|
|Individualized Learner Record (ILR)|
|Patient Demographic Service (PDS)|
|Welsh language||Customer Information System (CIS) -noted preference for Welsh documents|
|Religion||Higher Education Statistics Agency Student Record (HESA)|
|Health||Amount of unpaid care provided||Carers Allowance- Department for Work and Pensions (DWP)||Hospital Episode Statistics (HES)|
|Disability and Long Term Health Conditions||National Benefits Database (NBD)||Hospital Episode Statistics (HES)|
|Housing||Accommodation Type||Land Registry (LR)||Housing websites including Zoopla|
|Valuation Office Agency (VOA)|
|Number of Rooms||Valuation Office Agency (VOA)||Housing websites including Zoopla|
|Number of Bedrooms||Valuation Office Agency (VOA)||Housing websites including Zoopla|
|Second Residence||Council tax data|
|Self Containment of Accommodation||Valuation Office Agency (VOA)|
|Tenure and Landlord (if renting)||Single Housing Benefit Extract (SHBE)||Housing websites eg Zoopla|
|Valuation Office Agency (VOA)|
|Type of Address||Valuation Office Agency (VOA)|
|Labour Market||Employed (including students, excluding self employed)||Pay As You Earn (PAYE)- HMRC||Higher Education Statistics Agency Student Record (HESA)|
|Pay As You Earn (PAYE)- HMRC – to include pension earnings breakdown|
|Self Assessment Data – HMRC (includes self employed)|
|Hours Worked||Pay As You Earn Data (PAYE) - HMRC|
|Industry||Interdepartmental Business Register (IDBR)||Pay As You Earn (PAYE)- HMRC- IDBR linked to PAYE|
|Economically Inactive, retired||National Benefits Database (NBD)||Pay As You Earn (PAYE)- HMRC – to include pension earnings breakdown|
|Economically inactive, unable to work i.e. long term sick||National Benefits Database (NBD)||Universal Credit (DWP)|
|Unemployed (including students)||National Benefits Database (NBD)|
|Year Last Worked||Pay As You Earn (PAYE)- HMRC – to include pension earnings breakdown|
|Migration||Country of Birth||Birth registers||Central Reference System (CRS)- Visa Data|
|Patient Demographic Service (PDS)|
|Internal/International Migration (including address one year ago)||Central Reference System (CRS)- Visa Data|
|Migrant Worker Scan|
|Patient Demographic Service (PDS)|
|Travel||Number of Cars/Vans||Driving and Vehicle Licensing Agency (DVLA)|
|“Activity”||Interacting with a system from which data is taken||Child Benefit (HMRC)||Department for Business, Energy and Industrial Strategy (BEIS)/Utility Companies|
|English and Welsh School Census||Driving and Vehicle Licensing Agency (DVLA)|
|Higher Education Statistics Agency Student Record (HESA)||Hospital Episode Statistics (HES) Individualised Learner Record (ILR)|
|National Benefits Database (NBD)||Personal Independence Payments (DWP)|
|Patient Demographic Service (PDS)||Self Assessment (DWP)|
|Pay As You Earn (PAYE)- HMRC||Universal Credit (DWP)|
|Single Housing Benefit Extract (SHBE)|
|Tax Credits (HMRC)|
|Source: Office for National Statistics|
Download this table Table 1: Census topics by availability, and specific data sources.xls (34.3 kB)
The framework is built on the quality dimensions common to ONS outputs, which are based on those described by the United Nations Economic Commission for Europe (UNECE) : Relevance, Accuracy, Timeliness, Accessibility, Interpretability, and Coherence. We have used these to draw out several “indicators”, which aim to capture the work and limitations involved in estimating characteristics from admin data. The dimensions and the corresponding indicators are described in Table 2.
Table 2: Current structure of the quality framework for Admin data-based estimates for characteristics
|Dimension||Indicator: Admin data application||Definition of Indicator|
|Relevance||Data Definition||Does the definition provided by the admin data meet user needs?|
|Accuracy||Coverage||Does the available admin data capture the population of interest?|
|Linkage||Can the admin data be linked?|
|Errors in Data (Missing data and incorrect entries)||To what extent are there errors within records: specifically, missing data and incorrect entries?|
|Delays in updating data source||If individuals “join” an admin dataset late, or if data collectors are slow to “clean” individuals no longer in the population of interest from records, then the admin data may not accurately represent the target population for a characteristic.|
|Timeliness||Frequency||How often do we receive admin data from the data supplier?|
|Time between event and available outputs||Time from when data are collected to when they are “ready-to use”: includes the time between collection and receipt by ONS, plus the time required to process the data and produce outputs.|
|Accessibility||Collected & available||Does admin data exist for this characteristic (is it collected)? Is existing admin data available to the ONS for this characteristic?|
|Interpretability1||Potentially relating to the availability of supporting metadata||This might include: Do we have the required information for interpreting the data correctly?|
|Coherence||Comparability over space||Does the quality of the other indicators vary over geography?|
|Comparability over time||Does the quality of the other indicators vary over time? (e.g. data collection may be discontinued, replaced with new sources or changes in policy, according to the needs of services)|
|Source: Office for National Statistics|
|1. This indicator is still not fully defined, which is a reflection of our being at a relatively early stage of exploration. We will define this indicator more fully in time.|
Download this table Table 2: Current structure of the quality framework for Admin data-based estimates for characteristics.xls (30.2 kB)
The UK Statistics Authority has developed a quality assurance toolkit that requires users, producers and suppliers of administrative data to have a clear understanding of the quality of administrative sources. The toolkit creates a shared understanding across stakeholders from collection to dissemination anduse. ONS is developing guidance on how to assess the quality of administrative data sources for a variety of specific purposes; we will develop our framework alongside this work from collection to dissemination and use.
Developing the framework
Our approach is similar to that taken by Statistics New Zealand, who made a framework to describe their progress towards admin data-based estimates for characteristics in 2016. Key features include:
- The framework is for outputs about population and housing characteristics rather than individual data sources. So while individual data sources are considered, overall statements about quality are about the potential outputs.
- Quality is scored for each indicator in a subjective way, relative to other characteristics.
- Findings are visualised on a chart that compares coverage with a summary of the other quality indicators. This emphasis on coverage reflects the importance of this indicator when using admin data for estimation.
Evaluating the characteristics against the framework
Characteristics are scored for each indicator, on a scale of 1 to 5. The criteria used to score indicators for the current Assessment is provided at the end of this section.
Scoring criteria are based on two principles:
- Objective definitions for "poor" (1) and "excellent" (5), where possible.
- Comparison of indicators between characteristics, using subjective judgement (e.g. "coverage is better overall for this characteristic than that characteristic").
The subjectivity of scoring reflects the limitations inherent to exploratory research: fully objective assessments of quality are not possible- or appropriate- at this early stage. Instead, the framework is pragmatic and useful for supporting and communicating our work as it develops.
We intend the framework to fulfil two roles: to describe our progress towards producing admin data-based estimates for characteristics, including a snapshot of where we are now; and as a tool to support future Admin Data Census research towards this goal- for example, by helping us to locate gaps in our knowledge and decide how to address them, resulting in better quality estimates.
Using the framework to assess ONS’s progress towards an Administrative Data Census
Figure 1 provides an at-a-glance summary of quality across characteristics. The information available to us was not sufficient to support scores for every indicator in this figure.
Instead, we have assessed the accessibility, accuracy (coverage, linkability and level of error), and relevance of the data sources for each characteristic.
Developing the quality framework for future use
The framework is useful way for us to demonstrate progress towards our twin goals of producing Administrative Data Census outputs for comparison with 2021 Census, and for producing new outputs.
It will provide a starting point for discussion with our stakeholders on the potential of admin data to produce characteristics outputs that better meet user needs. It will also help us to identify gaps in our knowledge or quality issues in the administrative sources and to articulate these concerns with our data suppliers and users. Furthermore, the framework will allow us to identify potentially desirable trade-offs between the indicators- for example, producing outputs more frequently, but permitting more error.
In the future we hope to explore the potential of combining admin data with other data sources to meet user needs. As our research advances we will require our quality framework to develop, so that it can continue to articulate our findings and so support discussion with users. Our intention is to develop the framework through its use- and by seeking feedback- into a pragmatic tool for supporting our progress towards a better Census.
Table 3: Scoring Chart for this year’s Assessment
|Data definition (Relevance)||Very poor or no match on dataset and therefore cannot answer the same question as the desired output||Plausibly related variables e.g. subjective general health vs measured health||Data answers exactly the same question as the respective census question or desired output|
|Coverage (Accuracy)||Population covered is not similar to the target population for the desired output at all||Doesn't completely cover target population or over-covers target population, for example exclusions like communal housing etc.||The population covered in admin data is the same as the target population|
|Error in Data (Accuracy)||Lots of incorrect entries and missingness||No ‘mistakes’ in data; no/few missing data|
|Linkage (Accuracy)||Cannot link to other data sources: no potential linking variables identified||Variables that are likely to allow linking have been identified||Have shown that linking can be done|
|Timeliness||Data has been received in a one-off dataset, with no regular updates.||Data is received regularly, but there is a large delay between collection and it becoming ready to use||Data can be accessed by the ONS frequently, and with a short time between collection and being ready to use.|
|Accessibility||No data sources identified, or data source identified but with no process underway to obtain it||Data sources identified and in the process of obtaining||The data is available in an appropriate digital format to be used by the ONS.|
|Source: Office for National Statistics|