The aim of this policy is to give producers of statistical outputs of births or deaths data the information they need to minimise the risk of unlawful disclosure of personal information. The policy will be used mainly by staff of Office for National Statistics (ONS), other government departments and NHS bodies, and approved researchers. Its scope is the data on births and deaths, including stillbirths, collected primarily through civil registration in England and Wales. It applies to outputs such as tables, charts or individual figures, whether in a scheduled publication or an ad hoc query.
All official statistics activities and outputs are subject to the UK Statistics Authority Code of Practice for Official Statistics, the Statistics and Registration Services Act 2007, the Data Protection Act 2018 and the General Data Protection Regulation (GDPR) (2016/679). The GDPR and the Data Protection Act 2018 replaced the 1998 Act from 25 May 2018. This policy describes the application of these provisions to births and deaths data.
Protective data management (Section 3.1) is the overall management of data, taking account of the applicable legislation and procedures, to maximise its statistical use while minimising the risk of unlawful disclosure. This term is intended to focus attention on the need to take active and coherent steps to prevent disclosure throughout the “lifecycle” of the data, not only at the point of publication.
Unlawful disclosure (Section 3.2) may occur when an output contains sufficient detail that an intruder can identify an individual and find out protected information about them. Identification might be on the basis of the published information alone, or combined with some other information.
Protected information (Section 3.3) is information which, if it can be related to an identified individual, reveals a fact about them or someone else that is confidential, or is likely to cause substantial damage or distress to someone. Birth and death registrations contain information about several people in addition to the newborn baby or the deceased individual.
An important issue to consider when assessing the disclosure risk of an output is the possibility of combining with other information (Section 3.4) – this may be either publicly available data or private knowledge that a user has.
Overall protective data management (Section 4.1) includes the need for data providers to apply appropriate standards and policies for IT and physical security, data protection, staff training and awareness. Systematic records must be kept of policies and procedures affecting confidentiality and data protection, showing their effective implementation. Requests for data and the resulting outputs should be considered in the context of clearly articulated user need, existing information availability, and judgement on the balance of risk and benefit. An audit trail of data releases should be kept, including the justification for statistical disclosure control.
Assessing outputs for disclosure risk (Section 4.2) is key to addressing the level of risk in the most appropriate way, taking account of both the sensitivity of the data and the benefit of its use. Assessment requires application of the concepts of protected information and unlawful disclosure explained in earlier sections.
The mere fact of birth or death does not in itself reveal any protected information. For that reason, tabulations which reveal only that a certain number of births or deaths took place in a certain area and time period, even if it might be possible to identify an individual involved, do not usually need to be disclosure controlled.
When assessing the risks, a key point is that small numbers, even unique cases, are not necessarily disclosive. The question to ask is – could an intruder discover any protected information from these figures? This breaks down into the following three questions:
can any individual be identified from the table, with any degree of certainty?
if so, is any new information revealed about them (attribute disclosure)?
is any information revealed about any other living person connected with them?
It is particularly important to be aware that, if a specific birth or death is revealed to be the only one of a particular age and sex combination or other readily identifiable characteristic in a known geographical area, there is a high risk of disclosure. It is therefore essential to check, for example, any sparse table containing cause of death by local authority (or smaller areas) against the corresponding “all causes” figures for instances of obvious uniqueness.
Common strategies for disclosure control (SDC) are outlined (Section 4.3), but it should be noted that there is a large methodological literature on SDC, and no specific methods are mandated by the policy. Generally speaking, approaches which avoid suppressing or perturbing the data are to be preferred. Common SDC techniques include:
collapsing categories to reduce the sparsity of the table (for example, aggregating single year ages to five-year groups, or five-year age groups to 10-year groups) (non-perturbative)
aggregating the data over a greater period of time, or a larger geographical area (non-perturbative)
rounding to a specific base to avoid very small numbers (usually three or five) (perturbative)
suppressing very small numbers (usually numbers less than three) (perturbative)
When applying statistical disclosure control the aim is to balance utility and risk. The published outputs should assist the user as much as possible in their need for statistics (for example, for developing policy) while at the same time ensuring that the risk of protected information being released is reduced to as close to zero as possible.
If the output carries an increased risk of disclosure or differencing, it may be appropriate to release it directly to a user under a Non-Disclosive Data Access Agreement and avoid publication.
Increasing public access to data through explorable datasets and online tools (section 4.4) may allow an intruder to exploit the flexibility of customised tabulations to reveal protected information. Consequently, we advise taking precautions including requiring user registration and rounding or suppressing small numbers at a higher minimum level such as less than five.Nôl i'r tabl cynnwys
2.1.1 What data or statistics does this policy cover?
This policy is about statistical outputs produced from the data on births and deaths, including stillbirths, which are collected primarily through the civil registration system in England and Wales. These data and some linked NHS variables are held for statistical purposes by ONS, who in turn provide copies to other government organisations and approved researchers.
Where births or deaths data are combined with data from other sources, such as NHS hospital records (HES) or cancer registrations, statistical outputs from the combined data must comply with both this policy, and any rules or policy applicable to the other data source(s).
It should be remembered that all tables of rates or other calculations based on births or deaths data have to be treated as births or deaths outputs for purposes of disclosure control, and this policy applied, if the underlying numbers could be inferred.
This policy is about disclosure control of aggregate outputs (tables) only, not about the secure handling or confidentiality protection of individual records (microdata). For information about access to births and deaths microdata, see the Approved Researcher Scheme.
Statistics based on births and deaths in Scotland or Northern Ireland have to comply with the relevant legislation and policy, see the National Records of Scotland, ISD Scotland and NI Statistics and Research Agency websites.
2.1.2 Who is this policy for?
This policy is addressed mainly to individuals and organisations who produce statistical outputs of birth, death or health data for publication, such as staff of ONS, other government departments and NHS bodies, and approved researchers.
These people will normally either have access to the underlying microdata routinely through their work, or have received a confidential dataset by specific agreement, such as for a research project as an approved researcher.
This policy should be considered equally whether the output tables are being produced from microdata files or by summarising more detailed tables: and whether the medium of publication is (for example) a website, an academic journal, or the answer to a parliamentary question.
2.1.3 How does this policy relate to other documents on disclosure control?
All official statistics activities and outputs are subject to the UK Statistics Authority Code of Practice for Official Statistics, the Statistics and Registration Services Act 2007, the Data Protection Act (DPA) 2018 and the EU General Data Protection Regulation (GDPR) (2016/679). The GDPR and the Data Protection Act 2018 replaced the DPA 1998 from 25 May 2018.
The detail of what aspects of confidentiality need to be considered and how disclosure control of outputs applies in practice differ, however, depending on factors such as:
the specific legal basis for the original data collection
any written or verbal confidentiality undertakings given to the respondents
whether the data subject is a living individual, a deceased individual, or a company or other non-personal legal entity
whether the data source is a sample or complete enumeration of the population
whether anonymisation, record swapping or other treatments can be applied
As a result, this policy is specific to births and deaths data. It should not be assumed that the same rules apply to data from other sources, or that policies or precedents relating to confidentiality of other data are appropriate in the context of birth and death statistics.
2.2 Purpose and use
2.2.1 What is the aim of this policy?
The aim of this policy is to give any person producing a statistical output of births or deaths data the information they need to minimise the risk of an unlawful disclosure of personal information about an individual mentioned in the data. While it may be virtually impossible to completely remove the risk of disclosure while retaining the usefulness of the statistics, following the principles of the policy should reduce the risk to an acceptable minimum and meet the publisher’s legal obligations.
A secondary aim is to inform the general public, as users of statistics and the original providers of the underlying data, about the methods used to protect their confidentiality and that of their families in the context of birth and death statistics.
2.2.2 Why has this policy been produced now?
This policy follows on in a series of such documents which have been published by ONS and supersedes previous versions. The content and form of disclosure control policy for births and deaths statistics, as for other topic areas, has changed over time to reflect changes in the law, the needs of statistics users and producers, and other considerations.
The current revision of the policy comes in the context of the availability of an increasing range of detailed “open” data products across government and from other sources, combined with constantly improving technologies for internet searching, data linkage and exploitation, and the ubiquity of online social networking. ONS and other providers of birth and death data publish many customised “ad hoc” tables. Data exploration tools may in future give end users greater power to “drill down” to some level of detail themselves.
These developments bring serious threats to confidentiality at the same time as opening up new avenues for the positive use of official statistics. In particular, there is a need to consider the level of disclosure control which may be needed to prevent potentially unlawful access to personal information caused by the increased technical ability of an intruder to associate unique instances in a large statistical table with identifiable individuals, with the aid of other readily available data sources.
We now consider, with the advice of cyber-security experts, that there is a significant risk of an intruder deducing the identity of any deceased tabulated by year of death, age and a defined geographical area, such as a local authority of average size, if (for example) the age at death is uncommon, using only publicly available information. A similar risk exists for other data topics, and even at higher geographical levels if there are unusual combinations of characteristics.
2.2.3 Guidance on initial implementation of this policy
All producers of births and deaths statistics are expected to implement this policy as soon as possible after the date of publication, with reasonable flexibility for the publication of outputs already completed and any need to change procedures or rewrite programs.
For the reasons noted above, this policy is in some respects more restrictive than the previous edition published in 2014. There is no evidence that any unlawful disclosure has in fact occurred under the previous policy, and producers are not expected to remove or re-process any outputs which are already in the public domain.
However, when producing an output for the first time under this new policy, it will be necessary to assess the risk of differencing with related tables published under the less restrictive rules. This may make it necessary in some cases for a stricter threshold to be applied to the first edition of a series under this policy than to subsequent editions.Nôl i'r tabl cynnwys
3.1 Protective data management
Protective data management is a term used in this policy to describe the overall process of managing births and deaths data, from its raw state as microdata through to its release to the public or to a data customer. These steps are summarised in this section and discussed in detail in Section 4.
This term is intended to focus the attention of the organisations and individuals who handle such data, and produce outputs from it, on the need to take active and coherent steps to prevent disclosure throughout the “lifecycle” of the data, not only at the point of publication.
Definition: Protective data management is the overall management of data, taking account of the applicable legislation and procedures, to maximise its statistical use while minimising the risk of unlawful disclosure.
It is unrealistic to expect that every possibility of unlawful disclosure can be predicted or prevented. By following protective data management practices thoroughly but in a proportionate way to the circumstances, statistics providers can reduce the risk to a minimum and comply with their obligations under legislation and the Code of Practice for Official Statistics (CoP). The protective data management process applies to:
both microdata and tables containing protected information
staff who have access to microdata or protected information
systems and processes for processing or storing data and producing statistical outputs
all statistical outputs, whether for general publication, or for supply to a specific customer (whether or not subsequently published).
The key aims throughout the process are to ensure that:
all relevant legal and policy requirements are properly considered and applied
risks to confidentiality are systematically assessed and documented
decisions are taken which consciously balance risk with benefit
disclosure control is applied proportionately and effectively
It can be seen that risk assessment and the application of disclosure control methods is only one aspect of the protective management of births and deaths data.
The General Data Protection Regulation (GDPR) lays particular emphasis on the need for systematic documentation to demonstrate compliance with its provisions. This includes policies and procedures relevant to confidentiality and data protection, records of activities showing proper implementation, justification of decisions made, and periodic audits of compliance.
3.2 Unlawful disclosure
The legal and policy framework is described in detail in Annex A to this policy, and the different types of disclosure risk are described in detail in Annex B.
Definition: An unlawful disclosure occurs when an output contains sufficient detail that an intruder can both identify an individual and find out protected information about them. The identification might be on the basis of the published information alone, or that information combined with some other information.
The meaning of “protected information” is explained in section 3.3.
The implications of “combined with some other information” are discussed in section 3.4.
There is not usually an unlawful disclosure when:
a person can be identified (recognition) but nothing new can be learned about them
a person can identify themselves (self-recognition) but others cannot identify them
a person believes they can be identified or can recognise someone else, but there is no certainty that this is the case
For the purposes of this policy, the above distinction between recognition of an individual’s identity without revealing any new information about them (“simple recognition”) and revealing new information about the person so identified (sometimes called “attribute disclosure”) is of central importance.
3.3 Protected information
The terminology and legal concepts used in the Statistics and Registration Service Act 2007 (SRSA) and the Data Protection legislation are different, and these Acts together apply to births and deaths data in a complex way. The expression “protected information” is not a legal term, and is used here to indicate the approach to be applied in practice, taking account of their combined effects.
Definition: Protected information is information which, if it can be related to an identified individual, either:
reveals a fact about them that is confidential; or
reveals a fact about someone else that is confidential; or
is likely to cause significant harm or distress to someone
The CoP says (principle T6.4): “Organisations should be transparent and accountable about the procedures used to protect personal data when preparing the statistics and data including the choices made in balancing competing interests.” The concept of balancing possible harm against benefit also exists in the Data Protection legislation. It should be explicitly considered when making decisions about the supply of outputs in accordance with this policy.
3.3.1 Information that needs to be protected
Protected information can be considered under the following headings.
Information about a deceased person themselves (except as noted in section 3.3.2 below) – for example cause of death, occupation, place of birth, marital status. It should be noted that such information, or combinations of it, can constitute either a source of identification, a fact of interest to an intruder, or both at the same time. (Throughout this policy, references to a deceased person should be taken to include a stillborn child.)
Information about a deceased person which reveals something about another (living) person – for example, the marital status of the deceased reveals by implication the marital status and the sex or sexual identity of a surviving partner.
Information about a deceased person which could cause substantial damage or distress to another person – for example, the fact that an infant death was referred to a coroner could be mistakenly thought to imply that there was suspicion on the parents. It should be noted, however, that due to changing attitudes and the public health importance of monitoring trends in causes of death, no extra protection is given on this ground to specific causes or types of death such as AIDS or suicides.
Information which relates to any other living person, such as:
newborn baby registered as a birth or stillbirth – such as gestation, birthweight
mother or father of a newborn baby or stillborn child – such as age, occupation, place of birth
death registration informant
partner of a deceased person
mother of a deceased infant or stillbirth
Information which relates to an identifiable legal entity (“body corporate”), for example a private nursing home or a charitable sector hospice. However, information relating to a public authority such as an NHS hospital is not protected beyond the needs of protecting individual patients or staff. An example would be the number of deaths that occurred in a specific nursing home.
Disclosure control issues about public officials and professionals carrying out their duties, for example the doctor who completed a death certificate or the registrar who recorded a birth registration – although not often of statistical interest – may be subject to particular legal considerations, and expert guidance should be obtained.
3.3.2 Information that does not need to be protected
The mere fact of birth or death does not in itself reveal any protected information. For that reason, tabulations which reveal only that a certain number of births or deaths took place in a certain area and time period, even if it might be possible to identify an individual involved, do not usually need to be disclosure controlled.
The relatively common possibility that a table might reveal or confirm a potentially identifiable individual’s sex or their age in years is considered too trivial to require protection for its own sake, bearing in mind the need to balance confidentiality against the utility of the statistics. Therefore, breakdowns of births or deaths by sex and age group, without any other data dimensions, for any area and time period, do not generally need to be disclosure controlled.
While there is no absolute reason to protect exact age, standard age groups are to be preferred (see Annex D). Careful consideration should be given to the possibility that tabulation by exact age (or non-standard age groups, which could allow differencing) might allow individuals to be identified in a way that could then contribute to the disclosure of protected information, and/or could expose living persons connected to the deceased to distress. This may be a particular concern where the output involves small numbers at very young or very old ages, including births to unusually young or old mothers.
Information of a very broad nature, such as area characteristics which are not specific to the individual, does not necessarily need to be protected. Consideration should be given in each case to the possible sensitivity of the data and the balance of risk and benefit. Examples are tables which might disclose the Index of Multiple Deprivation (IMD) score or census-based area classification of an individual’s area of residence, but not the address itself.
3.4 Combining with other information
The basic principle of “combined with other information” is that it should not be possible for an intruder to deduce some protected information by putting together the statistics which are published with some other information they have from another source.
Example: I see in the local news that a woman aged 101 years died in October 2016. Her name and address are mentioned. Statistics later show that there was only one female death at that age, and the cause was ischaemic heart disease. Since I can combine the two sources of information to (a) identify an individual, and (b) discover some protected information about them, there has been an unlawful disclosure.
The SRSA, which applies to information about deaths, requires that the “other information” should have been publicly available – information which was only in the intruder’s private possession is not counted. However, the Data Protection legislation requires that “other information”, whether public or private, be taken into consideration.
When thinking about what information could be combined with a given statistical output to cause a potential disclosure, it is necessary to consider:
other statistical outputs that have been published, or supplied on an ad hoc basis to the same data customer
other sources of government data that could be in the public domain or that an intruder could obtain without disproportionate effort
publicly available information that might be relevant, such as news reports of deaths
social media such as Facebook, LinkedIn and Twitter and other websites such as charity fundraising sites
Note: It is particularly important to be aware that, if a specific birth or death is revealed to be the only one of a particular age or sex combination or other readily identifiable characteristic in a known geographical area, there is a high risk of disclosure. It is therefore essential to check, for example, any sparse table containing cause of death by local authority (or smaller areas) against the corresponding “all causes” figures for instances of obvious uniqueness.
These are only examples, not a complete list.
There is no requirement for data producers to make an actual search for all sources of information that could potentially be combined with a particular output with disclosive results. Rather, they should be aware of the general possibilities – particularly relating to named small areas and unusual individual characteristics – and assess the risks accordingly.
3.5 Disclosure control
Disclosure control refers to a set of techniques for reducing the amount of detailed information in a statistical output (or a dataset), so as to prevent an unlawful disclosure, or at least reduce the risk to a very low level. Specific techniques are discussed in detail in Section 4.3 but typical methods include:
collapsing categories to reduce the sparsity of the table (for example, aggregating single year ages to five-year groups, or five-year age groups to 10-year groups)
aggregating the data over a greater period of time, or a larger geographical area
rounding to a specific base to avoid very small numbers (usually three or five)
rounding rates to a lesser level of precision (less decimal places)
suppressing very small numbers (usually numbers less than three)
Although suggestions are made, this policy does not mandate any specific disclosure control techniques, and a large methodological literature exists.Nôl i'r tabl cynnwys
4.1 Overall protective data management
When applying statistical disclosure control to outputs the aim is to balance utility and risk. The published outputs should assist the user as much as possible in their need for statistics (for example, for developing policy) while at the same time ensuring that the risk of protected information being released is reduced to as close to zero as possible.
4.1.1 Systems and processes
Protective data management requires that those responsible for the data take appropriate steps to ensure that:
both microdata and tables containing protected information are stored, processed and communicated securely, in accordance with the relevant government approved standards and with the guidance of qualified IT security and Information Governance personnel, as appropriate.
appropriate security precautions are followed at all times, including clear labelling of files containing protected information, storage in designated password-protected areas, and communication using only secure methods.
all those who have access to microdata or protected information have appropriate security clearance and disclosure control training, or equivalent arrangements according to organisational policies.
policies on data security, protective data management and disclosure control are readily available to staff, correctly applied and periodically audited.
all statistical outputs, whether for general publication or for supply to a specific customer, are checked for disclosure risk and disclosure control techniques applied as required.
statistical outputs for supply to a specific customer, whether or not subsequently published, are subject to “due diligence” and an audit trail kept which considers the following points, as well as recording what data were provided and any disclosure control decisions.
Preliminary questions to ask when preparing data for release are:
Who is the customer, is their identity clear and is there any doubt about their bona fides? (for example are they a user within government or the NHS, a well-known researcher at a UK university, or a member of the public with only a free email address?)
Why do they need the statistics? In particular, a precise justification is needed for any small area breakdowns or other tabulations which have the potential to reveal protected information, in terms of the purpose and methodology of the research or other data use.
Has any previous data been published, or provided to that customer (or one of their close colleagues), which might have the potential for differencing with the new output?
A procedure is in place to divert legitimate needs for detailed information about births and deaths which cannot be met using disclosure controlled statistical outputs to the appropriate microdata release arrangements.
A procedure is in place for the designation of selected statistical outputs as carrying a raised but still proportionate risk of disclosure, leading to release the customer under a Non-Disclosive Data Access Agreement (see section 4.3).
4.1.2 Security policies and training
As well as protecting specific data which are to be released, it is necessary to consider broader security issues. There are a number of security policies to be followed when working at the ONS. Other organisations which handle protected information and publish official statistics will have their own policies and procedures covering similar topics.
The clear desk policy ensures that no sensitive information will be left unattended on the desk or other working environment. Documents are to be stored in a lockable drawer and not removed from the secure environment.
The data protection policy lists the principles to follow to ensure the organisation complies with the Data Protection legislation. From May 2018 the General Data Protection Regulation and the Data Protection Act (DPA) 2018 replaced the DPA 1998. Data providers will need to be aware of how this will impact on any local data protection policies – Government Statistical Service (GSS) guidance will be produced.
For the entire range of ONS security procedures (not all of which are relevant to the production of birth and death statistics) ONS staff should see the Security section of the intranet.
E-learning on general issues around data protection and handling confidential information can be found on websites including Civil Service Learning.
Training specific to statistical disclosure control is also a necessity before working on the production of birth and death outputs. This document provides specific guidance, but in order to obtain a rounded appreciation of disclosure control it is advisable to attend the ONS SDC Awareness raising course which covers the introductory concepts of data confidentiality for a wide range of outputs.
4.1.3 Audit trail
An audit trail showing the proper application of policies and procedures and justifying the decisions made is essential. Each step in the process of the request for data should be recorded to ensure there is documentary evidence of the procedures followed when answering a request for births and deaths data. This will help answer any questions if specific outputs cannot be released and enable a record to be kept of the steps followed.
Main steps in the audit trail:
Name and contact details of individual or organisation requesting data, including any clarification needed about their identity, institutional affiliation, and legal status relevant to the data release (for example, approved researcher)
Data requested including requester’s explanation of the “public good’ purpose, and any discussion of data needs, leading to a final data specification
Have they requested data previously? If so, is this request similar to previous requests? This is relevant both to use previous experience as a guide where appropriate, and to aid in considering the risks of differencing
Is disclosure control necessary? Section 4.3 should be followed here in order to answer this step.
- If no, then release the data to the requester and publish on website.
- If yes, then apply disclosure control. Once this step is complete then publish.
If the needs of the requester cannot be met without a potentially disclosive level of detail, document the reasons and supply under a Non-Disclosive Data Access Agreement. Do not publish.
Keep a record of the supplied or published table to enable comparisons with future requests, both to avoid duplicating a release and to check for slightly different tables which could lead to disclosure by differencing.
Any organisation which releases births and deaths data should have a procedure or system in place for these steps to be routinely documented, including the rationale for any decisions. The records should be available for audit and reviewed periodically by the responsible manager(s).
4.2 Assessing births and deaths outputs for disclosure risk
The aim is to protect against an unlawful disclosure relating to either the subject of a registration (the deceased person for deaths) or any living individual connected with them. Can individuals be identified in the data or can their presence be deduced from the data either directly or along with other information either published or private?
In the context of this dataset, these living individuals will typically include:
Births: A newborn infant, mother or father of the infant or stillborn baby
Deaths: Partner of the deceased, registration informant
Example 1: If the identity of a stillbirth is disclosed, this also reveals that the mother had a stillborn baby – a sensitive fact about her health.
Example 2: If the sexual orientation of a deceased person is disclosed, this also reveals the sexual orientation of a surviving partner.
A good starting point is to consider who might be interested in discovering information from any potentially disclosive data. Some may have a malicious interest in finding out information about an estranged family member while others may find an acquaintance in a table more or less by chance. A journalist looking for a “good story” is also a possibility. Different types of intruder may have different sources of relevant information already available to them. Generally speaking, however, the table characteristics which may be disclosive will be similar whatever the motives of the intruder.
Anything that stands out in a table may be (but is not definitely) disclosive. Uniqueness or rarity in a table is one thing to look for when thinking about whether disclosure control might be necessary. Distribution of counts within rows and columns in a table also needs to be considered closely. All counts in a particular cell will also be saying something about all individuals in the table.
Remember to think of the table in wider context. What might an intruder discover from the table; how confident could they be in their discovery? Is the information sensitive? What “substantial damage or distress” might be caused to someone?
Thought should be given as to whether protection ought to be given against self-identification. If the “intruder” is the discovered individual in the table, is this a problem? In most cases neither simple recognition nor self-identification are a problem, but thought is needed if the subject of the table seems particularly sensitive.
Always look to publish as much as possible. If there are clear reasons not to publish then do not publish. As always, the aim is to ensure the risk of unlawful disclosure is minimised. The data provider will use their judgement on the likelihood of disclosure based on their knowledge of the data and the information provided in this document.
small numbers, even unique ones, are not necessarily disclosive
the question to ask is – could an intruder discover any protected information from these figures?
this breaks down into the following questions:
a) Can any individual be identified from the table, with any degree of certainty?
b) If so, is any new information revealed about them (attribute disclosure)?
c) Is any information revealed about any other living person connected with them?
The questions below are provided as a starting point for assessing the disclosure risk of an output.
4.2.1 The table
How many dimensions has the table?
The more complex the table, the more information is available to the intruder. Single dimension tables (that is, counts of events by only one criterion, for example, geography) will only be problematic if dealing with a sensitive variable such as death by a specific cause.
It should be remembered that in many cases the criteria for inclusion in the table already represent one or more effective dimensions for disclosure purposes. For example, a table of male deaths from cancer by geography only appears as one dimensional, but is in fact three dimensional, since sex and cause of death are differentiated relative to the total deaths (all persons, all causes) for those geographical areas. In most cases, it should be assumed that the total count of births or deaths for an area is publicly available information.
What variables are being tabulated?
Are they especially sensitive such as counts of stillbirths? This sensitive information might be of greater interest to an intruder.
Are any of the variables likely to be public knowledge?
Variables such as gender, age group and geography might be known by an intruder who could use this information to search for details of a sensitive variable in the table. There might be only a single death of a boy aged five- to nine-years-old in an Output Area (OA) in a particular year. If the intruder knows these facts they could find out more such as cause of death if this was included in the table.
What is the level of detail within each variable?
Lower levels of geography such as local authority district would be more likely to lead to a potentially correct identification than region or country. Single year of age compared to five-year age bands will encourage the intruder.
What else is publicly available?
What similar tables have been previously published as part of a standard release or through ad-hoc requests? Disclosure by differencing could be a concern here if, for example, tables with slightly different geographies or different grouping of causes of death are published. Two consecutive rolling three-year aggregates potentially reveal a single year’s figures. A “sliver” (a small overlap or intersection of geographical areas or other categories) could be found showing a single or small number of deaths in a small area or due to a specific cause. This is a reason for having a good audit trail.
4.2.2 The numbers
Are there low counts in the table?
If the table is sparse and there are many zeros, there is often potential for the table to be disclosive, depending on the variables being tabulated. However, this has to be considered alongside the number of ones and twos. A cell with a count of one is a potential disclosure risk as it allows for possible immediate identification. If the variables are sensitive and the categories detailed then any identification will enable very specific attributes to be determined.
Similar consideration needs to be given to a count of two. The intruder will target cells of one and two when there is a chance of finding out sensitive information about a member of the table. For larger counts, the risk of finding out attributes relating to an individual in the table is reduced. The producer of the tables will need to make a judgement call for other low counts, depending on the sensitivity of the table as a whole.
A rate or other statistic may be disclosive if it is possible to infer that the underlying counts contain small numbers, even if the latter are not published. Rates should not be published to a greater precision than is required for the purpose, and a released version of the data (for example in a spreadsheet) should never contain rates of greater precision that have been concealed by formatting.
It should be remembered that differencing any two larger numbers may reveal a small number. It is important to be aware of situations where slivers are likely. In most cases, potentially small residuals such as an “Other” category will be visible in the table and can be assessed for risk within the overall context.
When producing a time series of rolling multi-year aggregates (for example, 2013 to 2015, 2014 to 2016…) care should be taken to avoid revealing disclosive small numbers in the “overlap”. In practice, this means that unless the time periods are non-overlapping, each individual year’s data has to be assessed for disclosure risk before aggregation.
How are the counts distributed in rows and columns?
For tables of two or more dimensions any rows or columns where all the counts are in one or two cells (the remaining cells are zero) could be considered disclosive. Any intruder who knows one characteristic of an individual would then be able to find out another possibly more sensitive characteristic.
Can the level of risk be quantified?
Overall, a mean number of individuals/cell for the table is sometimes regarded as a guide as to whether the table is sparse enough to require protection (an average of 1 count/cell is sometimes used). This is a useful guide – but not to be followed without thought, as this method is context free with no consideration given to the nature of the variables defining the table.
Consider a 0, 1 or 2 in any table cell as having potential disclosure risk.
Take into account row and column totals, whether presented in the table or not, when assessing the risk.
A table with many dimensions, variables or small cell counts is likely to be riskier than one with fewer.
Overlapping geographical areas and rolling multi-year aggregates can easily give rise to disclosure through differencing.
Annex C describes how to apply the approach here to example tables.
4.3 Strategies for disclosure control
4.3.1 Methods of table protection
If the output is assessed and found to be disclosive then a decision has to be made on what needs to be done in order for the data to be released. It is important to consider the purpose and user(s) of the data. For a regular release it will be necessary to ensure that analysis carried out on each annual (or quarterly for example) release is not hindered excessively by any data protection strategies. For ad-hoc requests it is important to discuss with the user. Do they need a particular sensitive category for their work, or (for example) would a higher-level geography suffice?
A number of techniques can be applied to protect outputs. These all “damage” the data in some way and thus reduce the utility of the statistics. The aim is to try to ensure that the output is still of use to the user without disclosing protected information.
Standard applicable techniques can be divided into those which are perturbative (change or hide the ‘true’ figures in some way) and those which are non-perturbative. The most common non-perturbative approach is to redesign the table. There are a number of ways to do this.
Collapse all categories, so that categories with low counts are merged with neighbouring ones. For example, if causes of death (CoD) are requested it may be possible to combine the least common categories with similar ones, by presenting CoD using a higher level of the classification hierarchy. (It should be noted that collapsing CoD categories aims here to increase the numbers in a table cell. A small number cannot itself be made non-disclosive only by making the CoD less specific.)
Collapse breakdowns over the whole table, such as combining data years (either summing or averaging the data as more appropriate), combining sexes or using broader age groups. However, multi-year aggregation is not necessarily protective if the time periods overlap in a way that reveals figures for a single year.
Collapse, recode selected categories or breakdowns such as younger age groups where deaths are more sparse, leaving others where the counts are higher, alone. A common way to do this is to broaden the youngest and maybe the oldest age groups only, depending on the nature of the data (that is, apply top and/or bottom coding).
Anonymise categories by aggregating/recoding, for example replace a list of named areas with a breakdown by a relevant geographically-related characteristic such as deprivation quintile of the areas.
Split a table which has multiple dimensions into two or more separate tables, each with only part of the information. For example, births by area, mother’s age and birthweight might be split into (a) mother’s age by area, and (b) birthweight by area. Each table then has to be re-assessed for disclosure risk.
Recoding is a straightforward method of applying disclosure control in order to enable publication. No actual counts are altered but there may be a loss of information due to a reduction in the number of categories. There are also some disadvantages.
Certain categories are always used in standard releases. For example, it may be that age groups must be 0 to 4 years, 5 to 9 years up to 81 to 84 years then 85 years and older . Tables cannot be redesigned by altering these categories if inconsistencies in time series or confusion among users might result.
If similar tables are published using different categories, there is potential for disclosure by differencing. Subtracting one table from another could leave small geographies or single age groups with low counts.
Annex D contains examples of standard tabulation groupings used by ONS. These are not mandatory, but may be helpful to promote consistency and reduce the risk of differencing.
A common perturbative method is suppression. Here, low counts in a table are replaced by a symbol such as a “:” or “c”. In order to ensure that the value cannot be discovered by differencing from the row or column total, additional cells will have to be suppressed (secondary suppression).
One advantage of this method is that no data values are altered in the table. All values in the published table are actual values. However, it is possible that a large number of cells will require suppression and as a consequence many additional cells will need to be suppressed. This loss of information will be high and any resulting analysis could be compromised. In addition, the use of suppression symbols may not be easily handled by publication or analysis software.
Rounding each cell count is another perturbative method. There are a number of rounding approaches, the most common being to round to the nearest multiple of a small number such as 5 or 10. Rounding is a useful technique when the table consists of large counts. However, for sparse tables with many low frequencies, information loss is likely to be high.
In the case of statistics such as mortality rates, rounding may sometimes be applied to reduce the precision of the published rates so that underlying small numbers can no longer be inferred. This should be done only if the interpretation of the figures will not be significantly compromised as a result.
4.3.2 Using a Non-Disclosive Data Access Agreement
If none of the feasible means of disclosure control (as above) can be applied without making the data unusable for the intended purpose, an alternative option may be to release the table under a Non-Disclosive Data Access Agreement (DAA). A DAA is appropriate in situations where:
provision of the data to a customer is in the public interest, and
there is a higher than acceptable risk of disclosure, but
not if the data are clearly disclosive or so detailed as to effectively constitute microdata (individual records). In the latter cases, microdata release procedures must be followed instead
Examples could include tables where the user requests individual ages or a detailed cause of death at a detailed geographical level. Outputs such as these are not necessarily disclosive but the level of risk is higher than for the majority of publications.
If the data are released under a DAA, the tables should not be published on a website or made available to any other user (except following a similar request to ONS). The list of people with access to each table should be checked regularly to ensure that disclosure by differencing one table from another is not a possibility.
A non-disclosive DAA has a similar purpose to those used in microdata release procedures, namely to document:
the data supplied. Details of the table requested and supplied. Variables, categories, date of data collection etc.
purpose of the data supply and reasons why full disclosure control was not feasible.
name of the main contact and any others who will have access to the data, with their institutional affiliations and roles.
arrangements for keeping the data secure
obligations of and prohibitions on the user(s), including:
- Use of data to follow the DPA, SRSA and any conditions put on it by the provider
- Obligation to follow the disclosure control policy in any publications
- Prohibition on any attempt to identify individuals, or match or link the data to any other data source
arrangements at end of access period. Will the data be returned or destroyed?
A model non-disclosive DAA is provided with this policy in a separate download, Data access agreement for non-disclosive data template (PDF, 212 KB).
4.4 Explorable datasets and online tools
Births and deaths data usually are published either as standard releases or through ad-hoc requests to the provider by researchers with a specific interest. As government policy dictates that more data are made available it is likely that there will be greater opportunity for the public to access births and deaths data.
Accessing the microdata online through portals such as NOMIS to create their own tables may be the ideal solution for many users. This enables casual users to spend time understanding what they could gain from using these data, whereas experienced users can quickly target specific tables of interest. Some ONS life events data have been available on NOMIS, experimentally, since 2016.
Disclosure control will need to be applied prior to the user receiving the table. ONS and other providers will have limited control of the outputs created, with users able to generate multiple tables and potentially to attempt to breach confidentiality by specifying very detailed or overlapping outputs. As a result of this, it is likely that any disclosure control applied to online data will have to be stricter than is applied to outputs which the provider is able to assess individually for risk.
There are two basic approaches to ensuring the confidentiality of data in such an online scenario:
To disclosure control the underlying dataset, for example in the form of a large multi-dimensional table or data cube. This option ensures effective disclosure control, but limits the usefulness and flexibility of the service to users. It is also potentially more difficult to produce.
To include the underlying data in the online database without disclosure control, and have the table production or querying tool apply disclosure control at the point of presenting results to the user. This can be implemented relatively simply by rounding or suppressing cells below an agreed threshold, with secondary rounding or suppression.
However these datasets are made available, a cautious approach to disclosure control is recommended. A higher minimum frequency will be necessary to ensure comparisons with previously published tables do not lead to individuals and their characteristics being identified. A minimum cell count equal to five is suggested.
It is also recommended that any online database or tool which allows users to extract data flexibly and with possible potential for disclosure should:
require user registration with a validated email address, name and contact details, and keep a log of data requests by user which is audited for potentially suspicious activity
require users to read and acknowledge a statement on their legal obligations
If possible, place a limit on the number of similar tables which can be requested, this is to avoid disclosure by differencing.
Comparing similar tables where the categories for one of the variables are coded differently could result in slivers containing low counts becoming apparent.
The use of online data access tools is still developing, and appropriate controls will have to be kept under review.Nôl i'r tabl cynnwys
Statistical disclosure control covers a range of methods to protect individuals, households, businesses and their attributes (characteristics) from identification in published tables (and microdata). There is a large literature base now established on disclosure risk, disclosure control and its methodology, notably Hundepool and others (2012). Moreover, this is underpinned by legislation and policy specific to different countries, statistical institutes, public and private bodies. This section considers those applicable to the UK that are relevant to the provision of statistics on births and deaths.
A.2 EU General Data Protection Regulation and Data Protection Act 2018
The EU General Data Protection Regulation (GDPR) and the Data Protection Act (DPA) 2018 replaced the DPA 1998 from 25 May 2018. Government Statistical Service (GSS) guidance specific to the processing of data by government departments and production of statistics has been produced.
The GDPR is known in full as “Regulation (EU) 2016/679 of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC”. There are important changes especially with regard to the rights of the individual data subject, the need for data controllers and processors to actively demonstrate compliance, and the penalties for breaches.
The Data Protection Act 2018 covers those elements of data protection which are excluded from the GDPR (for example, on national security), defines various issues which have been left by the regulation for national decision, and sets our organisational arrangements in the UK. These matters include the operation of exemptions and derogations.
The GDPR applies to “personal data” meaning any information relating to an identifiable person who can be directly or indirectly identified in particular by reference to an identifier. Personal data that has been pseudonymised can fall within the scope of the GDPR depending on how difficult it is to attribute the pseudonym to a particular individual.
There are exemptions for personal data which are kept solely for processing for archiving purposes and for scientific or historical research and statistical purposes. The processing and release of births and deaths data will generally come under the statistical exemption, and for disclosure control, there is little change from the previous legislation.
However, the GDPR puts much greater responsibility on data controllers and processors to actively demonstrate their compliance, for example through written policies, comprehensive record-keeping, and audit trails of data release actions with their justification.
The main provisions relevant to disclosure control are similar to the DPA 1998, in that all data providers (whether data controllers or data processors) must consider whether any processing or release might cause substantial damage or substantial distress to a living individual. In the case of births this includes the baby and its parents, but anyone mentioned in the data is covered – for example the person who registered a death (the informant).
The possibility of substantial damage or distress to the living relatives of those who have died also needs to be considered, and the Information Commissioner’s Office has provided guidance on this latter point. For example, a disclosure about the marital or legal partnership status of a deceased person also relates to potentially identifiable information about another individual who may be still living, their spouse or civil partner, that may relate to ‘sexual life’ as a sensitive variable.
It should be noted that particular care should be taken when dealing with “special personal data”, which is information coming into any of the following categories (and some others):
racial or ethnic origin
religious or philosophical beliefs
sex life or sexual orientation
The processing of special personal data is only allowed in certain circumstances, which include where there is substantial public interest, or where the processing is necessary for statistical purposes.
The data provider has to consider privately held information as well as that in the public domain. The data provider must take account of all reasonable sources that might be used to try and identify an individual. If it is possible to identify an individual in the data using either public or private sources, the data are termed “personal data” under the DPA – not to be confused with “personal information” under the SRSA (see section A.3). So if it is reasonable to expect that release of a statistical output will be seen by someone who will have private information allowing identification of an individual, and will be caused substantial damage or distress, the data provider should consider withholding that release.
A.3 Statistics and Registration Service Act 2007
A.3.1 The effect of Section 39
ONS has legal obligations to protect the confidentiality of data it holds, under the Statistics and Registration Service Act 2007 (SRSA). A disclosure could lead to criminal proceedings against an individual who has released or authorised release of personal information, as defined under Section 39 of the SRSA. Any person who receives data from ONS is, in turn, bound by these provisions of the Act.
The SRSA defines “personal information” as information that identifies a particular person if the identity of that person:
(a) is specified in the information,
(b) can be deduced from the information, or
(c) can be deduced from the information taken together with any other published information.
Personal information as in (a) would occur if there are direct identifiers in the data, such as name, address, date of birth, NHS number, specific location (e.g. 6-digit northings and eastings, so to within the nearest metre – though in a sparsely populated area less precise geographical data could still be disclosive).
Under (b) it would occur where there is no direct identifier, but characteristics are so distinctive that the individual could be well-known and easily recognized, e.g. a male aged over 65 living in a caravan at a relatively local geography.
Under (c) it would occur for example, where the information could be matched against other information in the public domain, such as on a business or personal website, press and media articles, or other statistical information from the same or alternative source.
There are situations where information can be disclosed, for example where it has already lawfully been made publicly available, is made with consent of the person, or to an Approved Researcher. Note that it is not a breach under the SRSA (as opposed to the DPA) to release information that could lead to an identification of an individual, where private knowledge is necessary in order to make that identification.
The distinction is thus made between:
a disclosure that is possible for a limited number of individual(s), because they have some information that is known only to those individual(s)
a disclosure that is possible since there is some information available in the public domain.
A.3.2 A note on birth and death certificates
Birth and death certificates are “discoverable” – that is, it is possible for anyone to obtain one or more certificates from the General Register Office (GRO). This might be thought to mean that the information contained in them, which makes up a large part (though not all) of the births and deaths data, is at least in theory “publicly available” and not subject to disclosure control.
However, the fact that the information can be obtained only by applying through a statutory procedure and paying a fee means that it is not legally in the public domain, and is not “published information” for the purposes of the SRSA. At the same time, the legal framework is such that ONS (and any other provider of statistics to whom ONS has supplied data) has no power to publish personal data about any individual such as would appear on the birth or death certificate: that is the sole province of GRO. As a result, all birth and death variables are equally confidential from the point of view of the SRSA and disclosure control as described in this guidance always applies.
A.3.3 Reasonable belief and due process
In practice, it will often be impossible for a data provider to consider every possibility of disclosure using publicly available sources. The key point is in s39 (9) and s39 (10) that where the individual reasonably believes it is not a disclosure, then they are not committing an offence. To “reasonably believe” that, the individual should be able to demonstrate that they have followed due process, followed approved protocols, taken reasonable precautions and not been reckless in the disclosure.
A.4 Freedom of Information Act 2000 (FOIA)
The Freedom of Information Act 2000 (FOIA) makes provision for the disclosure of information held by public authorities in response to a request from a member of the public, and information has normally to be supplied within 20 working days of receipt of the request. However, supply of statistical information for a ”freedom of information” (FOI) request does not override concerns related to confidentiality of information about individuals. Information is exempt from release if it constitutes personal data under the Data Protection Act.
In cases where a response is needed to refuse the information, the response to the applicant has to state that the reason for refusing information is an exemption under FOIA Section 40. In all FOI cases, the data provider should contact their legal team or specialist team dealing with FOI requests for specific advice.
A.5 UK Statistics Authority Code of Practice for Official Statistics (CoP)
Compliance with the Code of Practice for Official Statistics is a statutory requirement for all UK bodies that are responsible for official statistics. The Code was revised in February 2018. The code says that:
T6.1 – All statutory obligations governing the collection of data, confidentiality, data sharing, data linking and release should be followed.
T6.4 – Organisations should be transparent and accountable about the procedures used to protect personal data when preparing the statistics and data, including the choices made in balancing competing interests. Appropriate disclosure control methods should be applied before releasing statistics and data.Nôl i'r tabl cynnwys
This section explains the types of disclosure risk that a data provider should consider before releasing statistical outputs in the form of frequency tables. Not all will always apply. The balance of risk and utility (the value of the statistics to users and some other considerations are also discussed.
B.2 Types of disclosure risk
R1: Identity disclosure
Identification as a disclosure risk involves finding oneself or another individual or group within a table. Many data providers will not consider that self-identification alone poses a disclosure risk. An individual that can recall their circumstances at the time of data collection will be likely to be able to deduce to which cell in a published table their information contributes. In other words, they will be able to identify themselves but only due to knowing all the attributes that are present in the output, as provided at the time of collection, along with any other information about themselves which may assist in this detection. It is a moot point whether identification means recognising that you are the 1 person in a table cell, or 1 of 10 people in a table cell. Generally, the former position is taken, given it can reveal an attribute that is absent in the rest of the population.
Identification or self-identification can thus lead to the discovery of rareness, or even uniqueness, in the population of the statistic, which is something an individual might not have known about themselves before. This is most likely to occur where a cell has a small value, for example a 1, or where it becomes in effect a population of 1 through subtraction or deduction using other available information. For certain types of information, particularly those that are sensitive or newsworthy, rareness or uniqueness may encourage others to seek out the individual or their relatives (in the case of the statistic relating to the individual’s death). The threat or reality of such a situation could cause harm or distress to the individual, or may lead them to claim that the statistics offer inadequate disclosure protection for them, and therefore others.
In thinking about the likelihood of this, one might consider examples, for example of the one woman in an output area, who gave birth aged 25 to 29 years, compared to the one woman in a local authority, who gave birth aged 14-years-old. The former may be more likely to be identified by other people resident in that area (who might know all the information anyway). The latter may be more difficult to identify, but may be more subject to (unwanted) local and media interest.
Identification itself poses a relatively low disclosure risk, but its tendency to lead to other types of disclosure, together with the perception issues (see Risk R6) it raises means that many data providers choose to protect against identification disclosure.
R2: Individual attribute disclosure
Attribute disclosure involves the uncovering of new information about a person through the use of published data. An individual attribute disclosure occurs when someone who has some information about an individual could, with the help of data from the table (or from a different table with a common attribute, perhaps even from a different data source), discover details that were not previously known to them. This is most likely to occur where there is a cell containing a 1 in the margin of the table and the corresponding row or column is dominated by zeros. The individual is identified on the basis of some of the variables spanning the table and a new attribute is then revealed about the individual from other variables. Note that identification is a necessary precondition for individual attribute disclosure to occur, and marginal totals of 1 should therefore be avoided.
This type of disclosure is a particular problem when many tables are released from one data set. If an intruder can identify an individual then additional tables provide more detail about that person. Full coverage sources – like the census or, in this case, birth and death registrations, are a particular concern for disclosure control because of their compulsory nature, where there is an expectation to find all eligible individuals in the output. Although there may be some missing data and coding errors etc, data collectors and providers work to minimise these, and the data issues are unlikely to be randomly distributed in the output. Some SDC techniques can be adjusted to target particular variables (or tables) with more or less inherent data error. For example, a data provider could employ more cell suppression for variables which are known to be of better quality and have fewer data issues.
R3: Group attribute disclosure
Another disclosure risk involves learning a new attribute about an identifiable group or learning a group does not have a particular attribute. It is an extension of individual attribute disclosure and is termed group attribute disclosure, though often just “attribute disclosure”. It can occur when all respondents fall into a subset of categories for a particular variable, that is where a row or column contains all zeros except one cell count that is non-zero. This type of disclosure is a much neglected threat to the disclosure protection of frequency tables, and in contrast to individual attribute disclosure, it does not require individual identification. In order to protect against group attribute disclosure it is essential to avoid a situation where all events fall into only one category (all other categories being zero) and is there is any associated disclosure risk.
R4: Within group disclosure
This occurs from all respondents falling into two response categories for a particular variable, and where one of these response categories has a cell value of 1. In this case, the person in the cell with a 1 can discover the value of the attribute for all other individuals in the row or column. It is a combination of both identity and attribute disclosure types. It is the ability to learn something new about a number of other respondents, where a row or column has two non-zero entries, one of which is a 1 that relates to oneself.
R5: Disclosure by differencing
This is one of the key disclosure scenarios in births and deaths statistics. Differencing involves an intruder using two or more overlapping tables and subtraction to gather additional information about the differences between them. A disclosure by differencing can occur when this comparison of two or more tables enables a small cell (0, 1, or perhaps 2, but most specifically 1) to be calculated. Disclosures by differencing can result from three different scenarios which will be explained in turn.
Disclosure by geographical differencing may result when there are several published tables from the same dataset and they relate to similar geographical areas. If these tables are compared they can reveal a new, previously unpublished table for the differenced area, which might contain one or more of the previously mentioned disclosure risk types. For example, providing statistics for one middle super output area (MSOA) and for some (but not all) of its constituent lower super output areas (LSOAs) might lead to a disclosure risk pertaining to the LSOA(s) not supplied. Other examples could be providing outputs for both MSOAs and wards, or for local authorities and clinical commissioning groups (CCGs). It is usually worth calculating the differenced table to be sure.
Disclosure by linking can occur when published tables relating to the same base population are linked by common variables. These new linked tables were not published, but could be constructed by an intruder, and therefore may reveal the statistical disclosure control methods applied and/or unsafe cell counts. Importantly, when linked tables are produced from the same dataset it is not sufficient to consider the protection for each table separately. If a cell requires protection in one table then it will require protection in all tables, otherwise the protection in the first table would be undone. Note that the ability to link a statistical output to a previously published press or media article could constitute a disclosure if the output included a variable that was not previously in the public domain. ONS recommends that a cautious approach should be taken when a very sparse table affords possible opportunities for attribute disclosure by linkage to news reports about deaths or other publicly available information.
The last type of disclosure by differencing involves differencing of sub-population tables. Sub-populations are specific groups into which data may be subset before a table is produced. Differencing can occur when a published table definition corresponds to a sub-population of another published table, resulting in the production of a new, previously unpublished table. If the total population is known and the subpopulation is gathered from another table, the remainder can be deduced. For example, a table of births may use a population of “all live births to mothers aged 16 to 44 years”, while another might use the base of ‘all live births’.
Tables based on categorical variables that have been recoded in different ways may also result in this kind of differencing. To reduce the disclosure risk resulting from having many different versions of variables, it is good practice, where possible, to have a set of standard classifications to use to release data.
R6: Perception of disclosure risk
In addition to providing actual disclosure control protection for sensitive information, data providers need to be seen to be providing this protection. The public may have a different understanding of disclosure control risks and their perception is likely to be influenced by what they see in tables. There are two types of perception that are considered as threats:
the perception that because there are small counts, no protection (or at least insufficient protection) has been applied
the perception of vulnerability by the individual who has identified themselves, that others can also identify them – and either find out something new about them or target them for various means, such as an interesting or newsworthy story, for example
To protect against negative perceptions, data providers should be transparent about the SDC methods applied. Managing perceptions is important to maintain credibility and responsibility towards respondents. Negative perceptions may impact future co-operation if respondents, users or the public perceive that there is little concern about protecting their confidentiality. More emphasis has been placed on this type of disclosure risk in recent years due to declining response rates and decreasing data quality. It is important to provide clear explanations to the public about the protection afforded by the SDC method, as well as guidance on the impact of the SDC methods on the quality and utility of the outputs. Explanations should provide details of the methods used but (unless the method is obvious, for example suppression of all counts lower than 3, or rounding to base 5) avoid stating the exact values of parameters as this may allow intruders to partly or wholly unpick the protection.
B.3 Balance of risk and utility
It is unreasonable to aim to completely remove any disclosure risk from statistical outputs. Indeed, it is impossible for anyone to have a complete record of all public and private data sources against which one might difference, or to which one might link. Creating outputs with zero risk would probably be useless for users and researchers and serve little purpose of any sort. The SRSA s7 refers to the objective of promoting and safeguarding the production and publication of official statistics that serve the public good. The reference to serving the public good includes in particular (a) informing the public about social and economic matters, and (b) assisting in the development and evaluation of public policy. Insisting on zero, or even very low risk, within birth and death statistical outputs often jeopardises this important (legislated) function. So the law does require us to consider what can be released as well as what must be protected.
B.4 High profile cases and sensitive circumstances
It is sometimes thought that special treatment is needed for high profile individual cases, such as the birth of a baby in the Royal Family or the death of a TV celebrity. It should be noted that one the one hand there will be greater public interest and therefore perhaps motivation for an intruder to seek further information; but on the other hand, much information (say, the broad cause of death or the baby’s birthweight) will already have been released into the public domain. Generally speaking, no efforts should be made either to find these cases in the data or to protect them from disclosure in any way differently from other cases.
Causes of death which may have previously been considered especially “sensitive”, for example suicides, AIDS or HIV-related deaths or maternal deaths, should be treated in the same way as all other causes of death. There is a requirement under the DPA to consider the possibility of “substantial harm or distress” (for example to living relatives) because of releasing sensitive personal information. In these cases, ONS considers the risks of disclosure or of causing harm or distress by publishing small numbers are normally outweighed by the public importance of these statistics. However, if there is a clear risk of exposing information about a living individual, confidentiality under the DPA will take precedence.Nôl i'r tabl cynnwys
The following examples show cases where there is potential disclosure, both lawful and unlawful. There are low counts in most tables leading to the possibility of individuals in the data being identifiable. The examples show how identification and attribute disclosure differ and are intended to help the data provider to assess the level of risk and determine if a particular table can be published as it stands.
The tables referred to are contained in a separate Excel download, Protecting confidentiality in tables of birth and death statistics, illustrative examples which has been published with this policy. Where appropriate, the original figures have been altered to prevent disclosure.
C1. Counts of Births and deaths by Output Area (OA) in England and Wales July 2013 to June 2014
A series of 1 dimensional tables. Counts of births by OA for males and for females. Counts of deaths by OA for males and for females.
Variables are counts of an event (birth or death) with no further breakdown
The level of geography is very detailed.; Output Areas contain approximately 300 people.
As there is no breakdown in the cause of death or issues surrounding the births it is unlikely that any similar tables could be used to enable disclosure by differencing
There are low counts in the table with many 0s and 1s; this is to be expected when the low level of geography is considered.
Counts are distributed without following any obvious pattern between OAs; most counts are <10.
It might be thought that the detailed geography would be a confidentiality concern. However, although identification is not difficult where there are unique cases at this geographical level, the intruder would not find anything other than the fact that a small number of births or deaths occurred in a particular small area. This shows that considering counts only without context is not a suitable approach. This table can be published.
C2. Hydrogen sulfide suicides by sex and five-year age group, England and Wales, occurred 2007 to 2013
Two dimensions (year of death by age group).
Variable is number of suicides by a very specific definition; this is sensitive information. How much distress might the release cause?
High level of geography - England and Wales; This might make it difficult for an intruder to identify anybody with confidence.
Unlikely to be similar tables as this is a specific request; there may be details of inquests in local media or online.
There are low counts in the table. Most cells are 0, 1 or 2; in some years there are no suicides due to this cause.
Distribution looks random, no obvious pattern.
The high level of geography should ensure that these data can be published. The output displays sensitive information but it is difficult to see how this can be easily connected with an individual and thus cause distress to a relative.
C3. Number of dentist suicides for the period 1995 to 2014 by sex and 5-year age breakdowns
Three dimension table (year by sex by age group).
Variable is a count of suicides in a small defined population, this is sensitive information; identification of an individual could be considered to be unlawful.
High level of geography - England and Wales - but underlying population (one occupation) is relatively small
Very low counts, mainly zeros.
No discernible pattern to the table
The high level of geography affords some protection but this needs to be considered alongside the interest that an intruder could have in a table with such specific information. Overall the table is very sparse and it would be very difficult for an intruder to relate their knowledge about the death of a specific dentist to a count in this table. This table can therefore be published.
C4. Suicides for 10- to 25-year-olds in Leicester, East Midlands and England 2001 to 2013
Three tables of two dimensions (year by two age groups), each table is for a different level of geography (local authority (LA), region, country)
Counts of number of suicides by two age groups; from a disclosure point of view most interest will be for the lowest geography level (a single LA)
Low counts at LA level especially for the lower age group.
No apparent trend in the data, this could discourage an intruder from investigating further.
Suicide inquests are often reported in local newspapers. This could enable identification of an individual. No other attributes could be discovered so this table could be published.
C5. Number of deaths by underlying cause, sex, 5-year age group and country of birth, deaths registered in England and Wales between 2009 and 2013
Table is divided into many sub-tables, the tables are of four dimensions; more detailed than previous examples, however geography is at country level.
Counts are generally low. where country of birth (CoB) is England the counts are higher as expected.
Counts of zero not shown in table
There is information here, some of which could be known to an intruder (unusual CoB, age group). This could enable them to identify an individual in the table (or think they have identified somebody) and find out their cause of death, which could be distressing to surviving relatives. This table should not be released.
C6. Rheumatoid arthritis deaths by sex and age group, registered in Wales 1981 to 2013
Four dimensions (Wales LA by age group by year by sex).
Number of deaths from unusual cause of death.
Very sparse table.
No pattern to frequencies.
Low counts but little disclosure risk in this table. There is enough detail for a confident identification to be made if all the information in the table was already known. No sensitive attributes can be found from the table. This table can be published.
C7. Deaths from cystic fibrosis, England, 1990 to 2013
Three dimensions (individual age by year by sex at national level).
Number of deaths from unusual cause of death.
Relatively sparse table.
Most deaths occur in late childhood or early adulthood.
There are many low counts where there is some possibility of identification but little or no risk of attribute disclosure. This table can be published.
C8. Deaths by sex, age group, cause and local authority, England, deaths registered 1993 to 2013
Five dimensions (sex by 5-year age group by cause of death by local authority by year).
Number of deaths broken down by the above variables.
Many low counts, most of which are 1.
This is an unwieldy table, but an intruder could use knowledge about somebody in the data (age, location, sex) to find out cause of death. This disclosure could be particularly sensitive if the cause of death is unusual or it involves the death of a baby or young child. The data should not be released in this form.
C9. Number of suicides in selected counties, by sex, age group, deaths registered in 2011 to 2014
Four dimensions (specific geographies – counties in the East region by age group, by sex by year)
Counts of number of suicides, sensitive information
Low counts throughout table, much sparser for females.
There are many low counts but difficult to find detailed information not in the public domain. Unlikely to be disclosive – the table can be published.
C10. Number of deaths from all causes broken down by middle super output area (MSOA), by sex and five-year age group, for the local authorities of Lambeth, Southwark and Nottingham for 1997 to 2011
Four dimensions (5-year age group by year by sex by geography (MSOA level in three large local authorities)
The population of an middle super output area (MSOA) can be between 5,000 and 15,000.
Counts of number of deaths from all causes
Many low counts especially for the young age groups and it is not easy to find out something sensitive directly from this table; the population for each MSOA is sufficiently large to make any identification difficult even when broken down by age group and sex.
One confidentiality problem with this table is that it is detailed in many respects, leading to low counts in many cells – but it does not include particularly sensitive variables. If an additional table was released showing a breakdown of causes of death at a similar geography, even with coarser age groupings and no male or female breakdown, the tables could be combined to provide disclosive information due to similar low counts in both tables.
Therefore caution is needed and it may be unwise to release this table in its current form.
C11. CVD, CHD and stroke deaths by local authority, deaths registered 2012 to 2014
Four dimensions (cause of death, three causes by 5-year age group by sex by local authority)
Counts from deaths by three defined common causes
Table is very sparse in parts, many zeros and low counts
The main risk is of identification. Limited likelihood of attribute disclosure. This table can be published.
C12. Live births by sex and LSOA in Wales, 2011
Two dimensions (LSOA by sex)
LSOA contains between 1,000 and 3,000 people
Counts of live births
Counts in the table are low but there are very few 0s, 1s or 2s. Under 5% for both males and females
The information an intruder could deduce from this table is the gender of a child given that a live birth took place. This is not sufficient grounds for applying disclosure control to the table therefore it can be published.
C13. Cancer and all deaths in local authorities in the East and West Midlands and wards of Birmingham 2014
Four dimensions (cause of death (All causes, Cancer) by sex by age group (under-16 years, 16 years and older) by ward in Birmingham; and by LAs in the Midlands with more 5-year age groups)
Wards vary considerably in size, those in Birmingham are larger than most
Counts of death by all cases and cancer
Low counts for < 16, both male and female. Many 0s, 1s and 2s.
Low counts at LA level due to the age groupings
These are counts of sensitive information. Although the detail of the cancer is not given, protection ought to be given particularly when there is a single individual dying from cancer present in the table. The release of this information could cause distress to relatives. This table should not be released.Nôl i'r tabl cynnwys
D.1 Causes of death
Causes of death are normally tabulated using the version of the International Classification of Diseases (ICD) appropriate for the relevant years – ICD-10 since 2001. ICD-10 codes may be three or four characters long. Four character codes are not commonly used in tabulations except at national level or for special purposes. Commonly used groupings are:
ONS short list of causes of death – see the User Guide for Mortality Statistics
Leading causes of death – see the published ONS definition.
It is important to be aware that, for the purpose of disclosure control, aggregation of causes of death into larger groups is of value primarily to reduce the sparsity of the tabulation and does not necessarily make the information content non-disclosive. A more general statement of cause of death (say, ICD chapter) is not legally less disclosive than a precise one (specific ICD code) if the individual can be identified.
Geographical breakdowns should follow policies and good practice guidelines which are available through the ONS Open Geography Portal, particularly the GSS Geography Policy. Important points for minimising disclosure risk are:
As far as possible, use the same geographies for similar outputs, and avoid publishing the same data using overlapping geographies where slivers could expose small numbers of events.
Use consistent geographical hierarchies (for example region, local authority, MSOA, LSOA) that nest inside each other without slivers.
If it is essential to publish using overlapping geographies, for example to produce mortality indicators for CCGs when LAD equivalents are already published, (i) ensure that all slivers are identified in the production process, and (ii) check all slivers for disclosure risk and apply disclosure control measures accordingly.
The Isles of Scilly and the City of London are usually combined with neighbouring areas because of their very small populations and numbers of events.
If the use of non-standard geographical areas is essential, build these up from standard smaller building blocks such as LSOAs whenever possible, not from individual data points.
Grouping areas using the ONS Area Classification or quantiles of the various deprivation classifications generally has lower disclosure risk than named administrative or geographical areas, since in such tabulations it is rarely possible identify the specific location of any event.
D.3 Demographic characteristics
Data are most often tabulated by age in five-year groups. Since the overall risk of death is lowest at younger ages, while the population is smallest at the oldest ages, both ends of the age range are subject to sparsity in many tables. When producing ad hoc outputs, it is common to collapse the younger age groups and reduce the minimum of the open-ended upper age group progressively until sufficient numbers of events are included in each cell of the table.
Some breakdowns used in published ONS mortality statistics are shown in Table D1.
The most detailed breakdown is shown in col A. Deaths under one year (infant deaths) are separated from others under five years because of the differences in common causes of death. In some cases, deaths under 28 completed days (neonatal deaths) are also shown separately or excluded from the figures. The upper age group is open-ended. The 110+ upper group is used only at national level and in a minority of tables, because of the very small number of deaths.
Columns B and C show commonly used age breakdowns, with open-ended upper groups of 95+ and 85+ respectively.
Column D shows abridged age groups which reflect the different frequency of death at different ages.
Column E shows broad age groups used in a number of specialised mortality publications, such as those on suicides. These groups reflect the smaller numbers of events involved and therefore the need to reduce the sparsity of the tables.
Table D2 below shows the standard breakdown by maternal age used in ONS births publications.
Specific age groups are used in publications relating to teenage pregnancy, and particular care should be taken with such tables because of the sensitivity of the topic.
Sex or gender
The only sex breakdown available for birth and death statistics is male, female, and all persons. In some tabulations involving infants, a very small number of “sex unknown” cases has historically been included in the all persons figure, but this is unusual.
Particular care should be taken to prevent disclosure in any tabulations where there are causes of death which conflict, or appear to conflict, with the sex of the deceased. Such cases may involve individuals who are entitled to protection under the Gender Recognition Act 2004.
Table D1: Common breakdowns by age
|0 to 1||0 to 4||0 to 4||0 to 4||0 to 9|
|1 to 4|
|5 to 9||5 to 9||5 to 9||5 to 14|
|10 to 14||10 to 14||10 to 14||10 to 29|
|15 to 19||15 to 19||15 to 19||15 to 44|
|20 to 24||20 to 24||20 to 24|
|25 to 29||25 to 29||25 to 29|
|30 to 34||30 to 34||30 to 34||30 to 44|
|25 to 39||25 to 39||25 to 39|
|40 to 44||40 to 44||40 to 44|
|45 to 49||45 to 49||45 to 49||45 to 64||45 to 59|
|50 to 54||50 to 54||50 to 54|
|55 to 59||55 to 59||55 to 59|
|60 to 64||60 to 64||60 to 64||60 to 74|
|65 to 69||65 to 69||65 to 69||65 to 74|
|70 to 74||70 to 74||70 to 74|
|75 to 79||75 to 79||75 to 79||75 to 84||75 and over|
|80 to 84||80 to 84||80 to 84|
|85 to 89||85 to 89||85 and over||85 and over|
|90 to 94||90 to 94|
|95 to 99||95 and over|
|100 to 104|
|105 to 109|
|110 and over|
Download this table Table D1: Common breakdowns by age.xls
Table D2: Common breakdown by maternal age
|20 to 24|
|25 to 29|
|30 to 34|
|35 to 39|
|40 and over|
Download this table Table D2: Common breakdown by maternal age.xls
For births and stillbirths, standard categories of birthweight and gestational age (or collapsed versions of those), type of registration and place of birth category should normally be used. The table layouts can be seen in the ONS annual births publication.
Breakdowns by socioeconomic status should follow the standards for the analytical classes of the National Statistics Socio-Economic Classification (NS-SEC) reduced derivation.
In most circumstances it will be possible to tabulate events by the major groups of the Standard Occupational Classification (current or historically most appropriate version). However, breakdowns by the minor groups or individual unit codes are likely to have disclosure risk in the case of less common occupations, especially at sub-regional geographies.Nôl i'r tabl cynnwys
- Data access agreement for non-disclosive data template (PDF, 215 KB)
- Protecting confidentiality in tables of birth and death statistics, illustrative examples
- SUPERSEDED Disclosure Control Guidance for Birth and Death Statistics
- SUPERSEDED Disclosure Control Briefing Note for Birth and Death Statistics