1. About the Reference Data Management Framework
The Reference Data Management Framework (RDMF) is a collection of datasets and processes that help link different important pieces of information together and create reference data, which help make other data more useful. RDMF enhances security by de-identifying (removing directly identifiable information) data and reducing the number of people who need access to data to carry out linkage.
The framework allows us to connect de-identified information with data from other government departments so that we can work on more useful analysis.
It is made up of four collections of data, or indexes, where people's information (for example, names, addresses, or national identity numbers) is removed to ensure they cannot be directly identified. This information is then replaced with a reference number that is used within our analysis instead. There are three core indexes and one supporting index:
- the Business Index (core index)
- the Demographic Index (core index)
- the Location index, which consists of the Address Index and the Geography Index (core index)
- the Classification Index, which covers people, locations and organisations (supporting index)
The Demographic Index shows the population of England and Wales over time and is supported by the Classification Index, which holds information about jobs.
The Business Index is a list of UK organisations and is also supported by the Classification Index, which includes industry information.
The Address Index includes all UK addresses, from houses to care homes, universities, and even castles. This is supported by the Geography Index, which includes postal and statistical geographies. Together, these are known as the Location Index.
The Cross-Index Association (CIA, or sometimes XIA) allows us to make connections across indexes, such as linking a business on the Business Index to an address and its related local authority in the Location Index. These five indexes help us understand things like where people live, what jobs they have, and where businesses are located. They allow us to use different pieces of information to find out, for example, how education affects employment in different places. The RDMF helps us to do this quickly and safely.
Using this framework saves time and money, makes data more reliable, and ensures that personal information is protected.
For more information, see our A Quick Introduction to the Reference Data Management Framework (PowerPoint, 588KB) and our Reference Data Management Framework Overview Digital Booklet (PowerPoint, 1.33MB).
Nôl i'r tabl cynnwys2. How we keep data safe
The Office for National Statistics (ONS) takes privacy seriously. The Reference Data Management Framework (RDMF) allows us to quickly link and re-use data and to securely analyse data that we have not been able to access before. An important part of our secure approach is de-identification, which is where directly identifiable personal data are removed or hidden. For example, de-identification allows us to remove names, dates of birth, or NHS numbers from hospital data, leaving just the clinical information and a reference number used only within the ONS.
Using de-identified data is important because it helps researchers study information without directly using personal identifiers in statistics and analysis. However, we need identifiable data to ensure the RDMF is built correctly. A handful of highly skilled and vetted linkers are allowed to see personal data, do the linkage, and then ensure those identifiable data are completely removed. This linkage is done ahead of time and de-identified, so there is never a need for analysts or researchers to see the raw personal data for a person or business. This means they can focus instead on what the data are telling us to learn new things and make better decisions without invading anyone's privacy. It is like looking at a completed puzzle without seeing the individual pieces. This helps keep everyone's personal information private and secure.
The more information you have about a person, the easier it is to identify who that person is. This can be true with linked data even when they are de-identified. For example, your local shopkeeper told you about a new rare medication they were taking. If we joined our hospital data to some local area employment data, it may be possible to identify their record in the data by finding a shopkeeper taking that medication, even if their name was not visible. This is called re-identification. We take great care to prevent this.
The Five Safes framework is a set of principles that enable data services to provide safe research access to data. These are like five big locks that protect our data. Only trained or accredited people can see the data, and they must follow strict rules. The data are stored in a secure place and permission must be given before anyone can use them. The data can only be used for a good reason, and projects are subject to ethical review.
Most importantly, data will never be removed from our secure systems unless they have been through strict Statistical Disclosure Control (PDF, 964KB), which means that data are truly anonymised and can never be re-identified. For more information on de-identified data, please contact RDMF.Products@ons.gov.uk.
We are partnering with the National Centre for Social Research (NatCen) to look at how the RDMF can be used to support surveys, as we transform the way we collect data. For more information, please see our What is the RDMF guidance (PDF, 388KB).
Nôl i'r tabl cynnwys3. Reference Data Management Framework indexes
We have created Reference Data Management Framework (RDMF) diagrams that show what each index is and what sources of data are included in each index. To request these, or explanatory text describing the diagrams, please contact RDMF.Products@ons.gov.uk.
Demographic Index
The Demographic Index contains information about people. It contains information from administrative data from England and Wales starting in 2016, though some data go back to 2011. It includes things like patient registrations, student records, and birth and death records. The index itself is not available to the public, so personal data are protected. Even within the ONS, most researchers cannot access the data themselves. These data are used to help securely link other de-identified datasets together for analysis.
Business Index
The Business Index helps us understand the businesses and organisations in the country. It includes information about companies, like where they are located and what they do. This helps us to keep track of how many organisations and businesses there are, their size, their location, and how they are changing over time. This helps the government understand aspects of employment and the economy.
Location Index
The Location Index is made up of the Address Index, which is a list of all the places in the UK, and the Geography Index, which includes statistical, health and administrative geographies. The Location Index tells us about places like cities or towns and uses data from things like addresses and maps. This allows us to understand where things happen, to explore regional differences in statistics, or to analyse data on households.
Classification Index
Classification Indexes include reference data that support the other indexes. The two main Classification Indexes are:
the Standard Industrial Classification (SIC), which is a system used to classify and categorise businesses and industries
the Standard Occupational Classification (SOC), which is a system used to classify and characterise occupations based on the tasks and responsibilities of each job
4. Our work on data quality
Using the Reference Data Management Framework (RDMF) enables us to provide access to data more quickly and allows for those data to be reused. It increases privacy, permitting the use of data that would not otherwise be accessible to researchers.
The RDMF is helpful for aspects of quality, such as timeliness, coverage, and accessibility, that users have told us are important.
These indexes are updated frequently. For example, the Business Index is updated daily and the Demographic Index is updated every six months. This provides users with data that are current and meaningful.
The RDMF uses several large datasets that cover most of the population.
The administrative data included in the RDMF have been selected for their comparability and coherence with other datasets.
The RDMF aims to establish a quality standard for indexed data based on commonly used measures, such as precision (the proportion of records we matched correctly) and recall (the proportion of matches we found out of all those we could have found). This helps us provide transparent and comparable quality assurance.
Because the RDMF is a new way of working, we have ethical and methodological oversight to ensure our methods are correct and our operations have run properly. We use clerical review, using trained and vetted humans checking the outputs, to give us confidence in our indexing. Our experts are even developing innovative new measures for such complex data. We will compare RDMF methods against our usual processes, population datasets, and estimates to make sure they work together. We also carry out user research, which is helping us understand what our users need, to ensure the RDMF is fit for purpose.
Nôl i'r tabl cynnwys