1. Overview of administrative data

This article outlines the qualitative methodology adopted by the Methodological Research Hub in the Methodology and Quality Directorate at the Office for National Statistics (ONS) to gain greater insights and understanding into administrative data quality for our statistical and research purposes. This article discusses our approaches to conducting qualitative interviews, the findings the interviews generate about administrative data quality, as well as recommendations for future work and research.

In the Methodology and Quality directorate at the Office for National Statistics (ONS), we aim to optimise the collection of data to better inform our society through producing statistics for public good. We are carrying out research to explore alternative data sources for use within our official statistics. One of these types of data is "administrative data".

Administrative data are data which have been collected during the operations of an organisation. The government produces a large amount of administrative data, providing a valuable resource if it is used correctly. There are legal gateways which allow accredited and approved researchers to access and link administrative data for research and statistical purposes. There are certain criteria to meet to ensure this can happen, including the assurance that a person's identity cannot be identified in the information disclosed for research and statistics. 

Administrative data are generally not collected for the sole purpose of producing statistics. This can lead to challenges when using it for this reason, which David Hand describes in his Statistical challenges of administrative and transaction data paper.

At the ONS, we want to understand how inclusive and representative administrative data is when we use it for our statistical purposes. We do this to understand what statistical adjustments or additional sources may be needed to ensure the resulting statistics are of high quality for our purposes.

The research in this paper contributes towards the UK's National Statistician's Inclusive Data Taskforce (IDTF) implementation plan. In October 2020, the UK's National Statistician established the IDTF to ensure inclusivity in UK data in a broad range of areas, including protected characteristics, areas associated with sustainable development goals, and equalities. The taskforce identified 46 recommendations, aligned to eight inclusive data principles, which are required to ensure UK data and evidence is inclusive.

The research presented within this report falls within principle six to "broaden the range of methods that are routinely used and create new approaches to understanding experiences across the population of the UK":

"ONS is researching the coverage of specific administrative datasets to better understand how certain groups within the population are represented. Qualitative research methods are also being developed to give us a greater insight into any inclusivity issues for such data sources."

The IDTF highlighted that, for administrative data to be used effectively and responsibly for statistical and research purposes, analysts require an understanding of the extent to which it can be considered inclusive and representative. Throughout this paper, we will refer to both inclusivity and representativeness.

Inclusivity is the extent to which groups or individuals are included in administrative data. An example of a lack of inclusivity would be members of a group or groups not being present on administrative data. There may also be groups where there is over-coverage. Over-coverage may happen when groups are counted in administrative sources when they should not be present. These can include groups that have left the UK (emigrated), but remain registered with public services (for example, services provided by government organisations) that feed into administrative data.

Representativeness is the extent to which administrative data reflect groups or individuals' characteristics. Examples of a lack of representativeness would include being present in administrative data without having characteristics recorded, or being present on administrative data, but members of the group or groups being classified differently from how they should be recorded.

The Methodology and Quality directorate at the ONS have adopted an innovative approach to help us understand inclusivity and representativeness in administrative data for our statistical purposes. There are two main innovations from the research presented in this paper. The first involved looking at administrative data inclusivity and representativeness during the data collection stage. We gained this intelligence by collecting information from members of the public and charities to understand how some groups enter their information that we later use as administrative data. We were also innovative in our choice of methods. Using a qualitative approach provided deeper insight to our understanding of administrative data inclusivity and representativeness. To our knowledge, adopting this qualitative approach to explore inclusivity and representativeness in administrative data at the start of the data collection stage has never been carried out in research before.

Our research included understanding the inclusivity and representativeness of three separate groups. A different sample was produced for each group. The three groups we researched were:

  • ethnicity (Black African, Polish and Eastern European, and Black Caribbean)

  • individuals who had emigrated

  • individuals from charities who have knowledge and insight of people experiencing homelessness

We aim to:

  • investigate whether we can gain greater insight and understanding of administrative data by exploring inclusivity and representativeness for statistical and research purposes using qualitative approaches

  • explore whether qualitative findings can provide insights into how data are collected, potential alternative data sets or methods to use

If you have any questions, comments or would like to collaborate with the Methodological Research Hub, please contact us at Methods.Research@ons.gov.uk.

Nôl i'r tabl cynnwys

2. Methods

This research is the first iteration of using qualitative methods at the data collection stage of the administrative data journey to understand about inclusivity and representativeness. From the project's inception, the research was guided by a working group of demographic topic experts. Decisions around which groups to focus on were based on input from these topic experts, research, and Office for National Statistics (ONS) needs.

At the ONS, we use the recommended definition of an emigrant as being "a person who moves to a country other than that of his or her usual residence for a period of at least a year (12 months), so that the country of destination effectively becomes his or her new country of usual residence".

We took a broad approach to the definition of homelessness and explained to participants that we are interested in the experience of people in different and fluctuating states of homelessness. These include:

  • those rough sleeping

  • those living in unconventional accommodation, such as living in cars or sheds

  • those in temporary accommodation, such as at a bed and breakfast, night shelters, and hostels

  • those temporarily staying with others, such as sleeping on someone else's floor or sofa

  • other unsuitable housing situations  

Recruitment was conducted through a purposive sampling approach. We qualitatively interviewed members of the public from the outlined ethnicity and emigration groups. For the individuals experiencing homelessness group, we interviewed individuals from charities who have knowledge or experience of working with individuals experiencing homelessness. Charities were selected for this group who had worked with a wide range of people experiencing homelessness. We sampled charity representatives because people experiencing homelessness are harder to contact to arrange interviews with. People experiencing homelessness are also an extremely vulnerable group to carry out interviews with, so interviewing charity representatives was considered a more sensitive approach.  

Data was collected through cognitive interviewing style semi-structured interviews. By interviewing our sample, we aimed to gain insight into if, and how, the public complete their information when filling in registration forms for access to public services. Participants were presented with examples of registration forms which feed into administrative data sources and examples of questions as elicitation aids to facilitate discussion. We also collected similar information from homelessness charities to explore whether they can provide insight on how individuals experiencing homelessness complete their information.

During the interviews, we did not collect any participants' actual administrative data. We gained insights on how members of the public complete their information when filling out registration forms to access public services or to inform public services of changes. For example, are there any questions they find difficult to answer because of response options not providing an appropriate choice for them. We asked members of the public and individuals from charities questions to understand:   

  • whether selected population groups are registered with any public services and whether they access them 

  • how selected population groups provide information about themselves when registering for public services (for instance, through a form, or via telephone) 

  • how selected population groups complete questions 

  • how and if selected population groups update information held by public services

Analysis was conducted through a deductive thematic approach, meaning themes relevant to the project's aims were decided before analysis began. However, the project retained the potential for emergent themes to be added where appropriate.

Nôl i'r tabl cynnwys

3. Inclusivity and representativeness findings

Inclusivity findings

Findings indicate that inclusivity in administrative data varies across the groups sampled in this project and administrative data sources. For the ethnicity groups, it was found that members of these groups were included on at least one administrative data source. They reported that they were registered with at least one service which feeds into administrative data.

In the emigration group, it was reported that this sampled group are included in certain administrative data sources when they should not be present. Reasons for this included:

  • a lack of knowledge that there is a need to inform public services of their emigration

  • a lack of knowledge of how to inform public services of their emigration

  • wanting to be able to access public services whilst emigrated and feeling that informing public services of their emigration could hinder their ability to access these public services upon their return

This is directly contrasted with the individuals who are experiencing homelessness group. It was reported that this group may not be included in some administrative data sources. A potential reason for this could be individuals experiencing homelessness having limited interactions with providers of public services. Another reason for this could be that members of this group may find themselves eventually becoming de-registered.

This would suggest that each administrative data set is different in terms of inclusivity. As such, it will take time to explore and understand each individual source's inclusivity and representativeness.

Inclusivity appears to be context dependent

Findings suggest inclusivity for all sampled groups can be context dependent.

Both the ethnicity and individuals experiencing homelessness groups reported that they prioritise which public services they are registered with based on public services they want to receive and perceive as being important.

Findings indicate that, for the individuals experiencing homelessness and emigration groups, there may be barriers to registering or updating their information with public services. For the individuals experiencing homelessness group, it was reported that digital exclusion, English language proficiency, documentation, and trust in organisations all act as barriers to registering with public services. In the emigration group, findings indicate that awareness of the requirement to inform public services of their emigration and knowledge of how to provide this information may be barriers to informing public services of their emigration.

The groups may also be included in alternative sources of data that we do not currently have access to; we aim to explore this further.

Representativeness findings

Based on the interviews within this research, there may be differences in how the same individuals have their characteristics recorded across the different administrative data sources. For example, address or ethnic group might be different across the various administrative data sources for the same person.  

This could be a result of two things. The first would be differences across public services in how questions are asked and designed on their registration forms, potentially causing differences in responses on these forms. The second would be that the information the public provide can be determined by the type of service the registration form is for.

Understanding how people make decisions on how to respond to questions on registration forms could help our understanding of how responses might differ across the different administrative sources.

Some questions on registration forms for public services may be more open to interpretation, in particular contexts. This is likely to vary, depending on the group. This might affect consistency in how these groups are represented across the different administrative data sources.

We found that respondents may think about what the information will be used for and why it is being asked. This can influence how they answer, leading to different responses being provided on different forms.

Different responses by the same individual on different registration forms affects the consistency of the administrative data. Poor consistency across administrative sources negatively affects the ability to link data across multiple administrative sources.

Understanding how open to interpretation a question is, as well as understanding which groups this may be more likely to affect, may help to understand consistency. Registration form question design can lead people to select an answer that they do not fully identify with. Understanding how question design affects responses to service registration forms will help to understand the administrative data we use for our statistical and research purposes at the Office for National Statistics (ONS).

Nôl i'r tabl cynnwys

4. Future developments

This research was designed to add to the research that is already taking place with administrative data. These insights may be brand new or provide further evidence to support pre-existing assumptions and theories about administrative data inclusivity and representativeness. These insights can be combined with our broader knowledge of the topic area, such as through quantitative methods or research, to provide recommendations on future actions.

It is recommended that the outcomes from this research are considered alongside other information to help inform decisions and next steps. These decisions can include:

  • how the Office for National Statistics (ONS) processes the data

  • what methods to use

  • what additional sources may be needed

  • how to communicate our statistics and findings to the public

These findings could also aid research developing harmonised standards for administrative data collection from public services.

As the project was successful in providing new insights around inclusivity and representativeness in administrative data, we intend to conduct further research using this method on other groups and specific administrative data sources. These decisions will be based on input from demographic topic experts, research, and user needs.

Recommendations

  • Quantitative analysis could be carried out to explore if this project's findings around inclusivity and representativeness are reflected in the actual administrative data; for example, can qualitative insights be linked to quantitative data?

  • Further research could be conducted to investigate and discover potential alternative data sources that groups might be included on; these alternative data sources may be used in the future to help support the ONS' current administrative data holdings.

  • The qualitative findings could be used as intelligence on how best to use and implement quantitative methods that measure or improve quality in administrative data; for example, when considering over-coverage, potentially caused as a result of emigrants still being included in administrative data, the qualitative findings could be helpful in understanding what potential parameters and predictors could be included in methods, such as fractional counting or signs of life modelling.

Nôl i'r tabl cynnwys

6. Cite this methodology

Office for National Statistics (ONS), released 4 April 2023, ONS website, methodology, Exploring the quality of administrative data using qualitative methods

Nôl i'r tabl cynnwys