1. Main points

  • Over 46,000 establishments were classified as communal establishments (CEs) in Census 2021.

  • The estimation strategy used to assess potential population undercount depended on the size of the establishment.

  • Small CEs (establishments with fewer than 50 usual residents) were sampled within the Census Coverage Survey (CCS) to measure population coverage.

  • Large CEs (establishments with 50 or more usual residents) were primarily considered for estimation based on the presence of high-quality administrative (admin) data to address potential undercount.

  • Over 200,000 usual residents were added to the CE population as a result of large and small CE estimation.

Tell us what you think about this publication by answering a few questions.

Nôl i'r tabl cynnwys

2. Overview of communal establishments estimation

Communal establishments (CEs) are defined as establishments that provide managed residential accommodation. In a similar manner to the household population, Census 2021 sought to "enumerate all people resident on the census date in communal establishments such as hospitals, nursing and residential homes and hotels" as outlined in HM Government's Help Shape Our Future: The 2021 Census of Population and Housing in England and Wales White Paper (PDF, 967 KB)

Each CE received a form for the manager to complete and either paper questionnaires or initial contact letters with unique access codes (UACs) for the usual residents to complete. 1,140 CE Field Officers supported residents in filling out these forms and made sure the CE Manager form was completed.

Despite certain complications with the collection operation for education establishments because of the coronavirus pandemic and students not being present at accommodations on Census Day, the collection operation’s performance for CEs was excellent overall with high return rates seen for specific types of CE. Estimation was conducted to account for the remaining undercount in the CE population.

We resolved the undercount for small CEs, using the dual system estimation (DSE) process. This is a product of the Census Coverage Survey (CCS), which takes a sample of addresses and estimates the number of people who have been missed by the census. Results were produced by age, sex and local authority. This is outlined in The coverage estimation strategy for small communal establishment of the 2021 Census of England & Wales (PDF, 496 KB).

It was impractical to use the CCS for large CEs, as discussed in the Design of Address Frame, Collection and Coverage Assessment and Adjustment of Communal Establishments in 2021 Census paper. Instead, admin data sources, either collated by the Office for National Statistics (ONS), or sourced from other public bodies, were used to correct for population undercounts.

Counts listed in this publication have been derived using the "nature of establishment" variable on the census database. An alternative variable, "CE management type", has been used in some other census releases as it allows more consistent comparison with results from the 2011 Census. Therefore, there may be small differences in the counts stated in this report when compared with other census releases.

Nôl i'r tabl cynnwys

3. Overall impact

Over 200,000 usual residents were added into the communal establishment (CE) population because of the large and small CE estimation processes. The overall impact of estimation varied depending on factors such as the age and sex of the residents, the type of establishment that they are usually resident in and the region of the country they are usually resident in.

Table 1 shows that the age categories that received the largest impact from the estimation processes were for those aged 10 to 19 years and those aged 20 to 29 years. This is because of the large number of students who were estimated into student halls of residences. While the female response rate for each age group never went below 74%, a similar response rate is not seen for men aged between 10 and 49 years. This is primarly because of the low male return rate in detention accommodations and defence establishments, in comparison with women of a similar age.

Table 2 shows that most of the regions in England and Wales achieved a similar response rate for CEs. The scale of adjustment taking place in regions such as the North East was smaller than the adjustment for the South East. This is primarily because a larger proportion of the CE population is concentrated in the South East.

Finally, Table 3 shows that the impact of CE estimation differed based upon the type of establishment. While halls of residence received the largest adjustment within the estimation process, this is because they make up the largest proportion of the CE population. The table shows that our estimation's impact was greatest in CE types such as detention centres, defence establishments and prisons which, based on the final census estimate, achieved a low return rate during the collection operation.

Nôl i'r tabl cynnwys

4. Small communal establishment (CE) estimation

Scope and data collection

Small communal establishments (CEs) in the 2011 Census were defined as establishments with fewer than 100 bed spaces. For Census 2021, this definition changed to establishments with fewer than 50 usual residents. This was because the census did not capture information on the number of bed spaces in an establishment and because of the difficulties in performing the Census Coverage Survey (CCS) on larger (greater than 49 usual residents) CEs. Further explanations on the definitions for large and small CEs can be found in the Design of Address Frame, Collection and Coverage Assessment and Adjustment of Communal Establishments in 2021 Census paper and the Estimating Populations in Large Communal Establishments (CEs) paper (PDF, 208 KB).

Small CEs were counted using a CE questionnaire in both the census and CCS. This was used to identify the type of establishment. All usual residents were asked to complete an individual form for the CCS, which contained the same question as those in the census individual form.

We did not estimate over coverage for small CEs - over coverage is small compared with under coverage, which means that it is harder to estimate (especially for a relatively small population such as those in CEs) and has less impact.

Estimation methods

We estimate under coverage in small CEs in a similar way to the way we estimate it for the general household population. By matching CCS responses to the census, we can identify people who responded to the CCS but not the census. From this, we can build a model that estimates how likely people were to respond to the census. In principle, this model can reflect any relevant characteristics available on both the census and CCS. We can then apply that model to all census respondents to estimate non-response across England and Wales.

The CCS sample design did not explicitly take CEs into account and therefore there was no direct control over the size of the small CE sample. Lower CCS response, especially within CEs, meant that we had less data. This restricted the flexibility of the model. However, because the modelling approach uses data covering all of England and Wales, the data proved sufficient for the logistic-regression-based dual system estimator (DSE).

The model was chosen in line with the approach used for Census 2021 household coverage estimation. More information can be found in The coverage estimation strategy for small communal establishment of the 2021 Census of England and Wales (PDF, 496 KB) and Coverage Estimation Strategy for the 2021 Census of England and Wales (DOCX, 183KB). Once the model is formed, using the matched data, it can be applied to each census CE resident using their corresponding characteristics. This allows us to estimate response probabilities for every CE resident within the census. Estimated census non-response weights can then be calculated as the reciprocals of the estimated census response probabilities. Summing the weights within a domain allows us to estimate the population across that domain. Further details of the estimation system can be found in our Coverage estimation for Census 2021 in England and Wales methodology.

The estimates of under enumeration of person in small CEs and thus the census results are based upon a sample survey, the CCS, and are therefore subject to sampling error.

Model selection

Selecting a robust model to estimate under coverage of persons in small CEs was challenging because of the relatively small number of observations in CCS.

The purposeful model selection strategy outlined in Model selection for the coverage estimation of the 2021 Census of England (PDF, 338 KB) was used. This approach stresses careful univariate analysis of each variable that can enter the model and exploring possible transformation of continuous variables. The strategy helps to avoid various modelling issues as the model gets more complicated. Univariate and bivariate analysis of all variables helps to check which variables are good candidates for modelling. Decisions to collapse categories are made based on the information generated through univariate and bivariate analysis of the variables.

Initially, the process includes forcing all the key variables into the model to see if they meet the pre-specified significance level. If they were not significant, it was decided whether the variable should be kept in. The process then created multiple branches of datasets with a combination of different key variables. After selecting the list of key variables, further branches of datasets are created, with different collapsing options of each categorical covariate. Multiple iterations were undertaken before the potential best model was identified.

The resulting model included the hard-to-count index, collapsed age-sex and regions. More details of the model selection strategy for small CE can be found in our Model selection for coverage estimation for Census 2021 in England and Wales methodology.

Census Coverage Survey

The Census Coverage Survey response for people in small CEs was around 33.38%. The CCS sample contained around 193 small CEs. This sample size was enough to run country-level regression-based DSE by collapsed age-sex groups, collapsed region and hard-to-count index. The sample did include a range of different types of communal establishments.

Table 4 shows the response rate, estimates and adjustments of residents by age groups in small CEs. People aged 65 years and over accounted for over half (59% or 212,300) of all small CE residents. People aged 15 to 24 years represented only 11% of total small CE residents. The highest response rate was recorded for people aged 85 years and over, and the lowest response rate was recorded for people aged 15 to 24 years.

Table 5 represents the broader categories of the establishment types. Medical and care establishments represent three-quarters of the total small CE population and had the highest response rate. Education establishments had the lowest response rate.

Table 6 summarises the geographical distribution of the population in small CEs. The South East had the largest population of residents in small CEs of any region (16%, 59,000). The North East had the lowest proportion of the residents in small CEs (5%, 16,600). Wales had 7% of the small CE population of England and Wales. The highest response rate was recorded for the South East and South West regions and the lowest response rate recorded was for London.

Nôl i'r tabl cynnwys

5. Large communal establishment (CE) estimation

Definition of a large communal establishment (CE)

An establishment was categorised as a large communal establishment (CE) if it contained 50 or more residents indicated by:

  • the number of census returns against the establishment

  • the number of residents living in the establishment on Census Day, according to the CE manager form

  • the comparator admin data sources used in large CE estimation

During the estimation process, we identified that multiple entries existed for the same CE in the data. Sometimes these entries existed in the data as small CEs and others as large CEs. To prevent the same establishment being factored into both small and large CE estimation, we factored all separate responses that related to a large CE establishment into the large CE estimation. Therefore, while over 5,000 (11%) CEs were classified as large, the number of large CEs actually present in England and Wales is smaller than the number stated.

Information available to determine adjustments

Census sources

Each CE manager was asked to complete a form detailing information about the nature of their establishment, who was responsible for managing it and how many people were currently living there. This can be viewed on our Census 2021 paper questionnaires web page. This information was used to determine:

  • whether an establishment should be classified as large

  • the establishment type and whether it should be prioritised for estimation

  • the size of the undercount, in cases where no high-quality admin data existed 

Prior to large CE estimation, several statistical quality issues needed to be remedied within the census data, for:

  • a small number of CEs, multiple CE Manager forms had been submitted, often containing different information - to resolve this issue, clerical work was undertaken to select the response with the most accurate information

  • some CEs, the response populated in answer to the "nature of establishment" question in the CE Manager form was either missing or differed to the one suggested by our admin data sources - in these instances, the nature of establishment was changed to the one indicated by the trusted admin data

  • some CEs failed to respond to the census or responded on household forms - in these cases, a new CE case was created in census data and where appropriate, residents were either transferred from the household forms or imputed into it based on the estimated number indicated by the admin data sources 

  • a small number of CEs, the figure stated on the CE Manager form for the number of individuals living in the establishment was unfeasibly high - if this was the only source of information, no estimation was conducted to prevent overestimation 

Admin data sources 

The admin data sources used for the establishment types that were prioritised in large CE estimation were as follows:

  • halls of residences and boarding schools - student occupancy data

  • care homes with and without nursing - Personal Demographic Service and NHS Capacity Tracker data (NHS Capacity Tracker data were only used in quality assurance as they did not contain demographic information on care home residents. The Personal Demographic Service was used to estimate the care home population)

  • prisons - Ministry of Justice data

  • immigration detention centres - Home Office data

  • approved premises and bail hostels - Her Majesty's Prison Service data

  • defence establishments, education other, staff accommodations and religious establishments - post-census follow-up survey

The post-census follow-up survey was conducted by the Office for National Statistics (ONS) to obtain admin data from specific establishment types. This involved contacting establishments after census collection, requesting data on the age and sex of usual residents in the establishment as of Census Day. More information about the other admin data sources used in large CE estimation can be found in our Administrative data used in Census 2021, England and Wales methodology.

Most of the work conducted in large CE estimation was for the establishment types listed in this section. This was because of a lack of high-quality admin data sources available to accurately conduct estimation for the remaining CE types. More information can be found in the Estimating Populations in Large Communal Establishments (CEs) paper (PDF, 208 KB). Roughly 4,500 large CEs were considered for estimation.

Where possible, admin data were used that conformed to the census usual resident definitions and related to Census Day, or as close to it as possible. An important requirement for all the admin data sources was that they contained both age and sex information on the usual residents at an establishment, as the estimates were generated along these variables. There were some statistical quality issues with a few of our admin data sources that needed to be examined prior to large CE estimation. These included: 

  • some providers in the Student Occupancy data were only able to provide age and sex data as of the date they completed the survey (May to July 2022) rather than as of Census Day - as the data had been collected in age and sex bands rather than by single year of age, it was assumed that these would not be too different to the distribution on Census Day, as such no action was taken

  • some providers were only able to provide either age or sex data but not both - in this instance, the missing variable was populated based upon the national distribution within the admin data sources

  • some providers were only able to provide a usual resident count with no demographic information - in these cases, the existing census age and sex distribution for the establishment was replicated in the admin data 

  • some providers might have supplied data on both usual residents and short-term residents - as such, short-term residents were also included in large CE estimation

  • the Ministry of Defence provided rounded age and sex figures - we assumed that this averaged out, for example the probability the figure rounded down was similar to the probability of it being rounded up, therefore no action was taken; where a count of "five or lower" for an age and sex category existed, a method was implemented that selected with equal probability a value between one and five

While the admin data sources used were able to provide good coverage for specific types of large CE, this coverage was not complete. In roughly 19% of cases that were prioritised for estimation, either no admin data existed, or there was evidence to suggest that it would be inaccurate to use it to correct for potential undercount. Instead, a "borrowing strength" method was developed that generated estimates without the presence of admin data. This worked by harnessing insights on the scale of estimation occurring at a national level for each establishment type and the age and sex breakdown, then applying these as scalar values to constrained targets for the remaining establishments.

Methodology of estimation

Estimation only took place when the total census resident count was less than the total admin resident count for the establishment. In cases where no admin data existed, the borrowing strength method was only conducted when the CE Manager form stated that the number of residents was higher than the number of census returns received. In 22.5% of cases, no estimation was required, as the number of census returns equalled or exceeded the admin data count. As such, only 3,500 establishments required estimating.

The standard estimation approach can be demonstrated in the example of a large boarding school in Table 7. 

For each age and sex category, the original census resident count for an establishment was contrasted against the admin data. A shortfall, which is the resident non-response, was then calculated. The census resident count was subsequently amended to converge with the admin data to address the shortfall in the final population statistics.

There were some cases where the admin data resident count for a specific age and sex category was lower than the number of census returns.

As can be seen in Table 8, the admin count for males aged 17 years and over is lower than the census one, despite the total census count being lower than the admin count. Without resolving this, the standard process would create a final estimate for the establishment that was greater than the figure suggested by the admin data source. We tended to trust the overall admin data number, while acknowledging uncertainties around the age and sex breakdown.

To overcome this, the absolute value of the negative shortfall was taken and deducted proportionally from those age and sex categories. In the example in Table 8, females aged 17 years and over have a 75% share of the positive shortfall (6 divided by 8), and therefore receive the same percentage of the negative shortfall allocation, 3 (0.75 multiplied by 4). This ensured that the post estimation resident count equated to the admin data.

Table 9 highlights a case where no administrative data existed for the establishment and the CE Manager form implied that there had been an undercount, with the manager reporting there were 50 residents. At a national level, for all boarding schools with an admin data record, a scalar value was generated that calculated the ratio of the aggregate admin data count to the aggregate CE Manager form resident count. The justification for this was that we should use the insights gleaned from comparing the admin data with the CE Manager form values to adjust the CE Manager form values for all cases without admin data but eligible for estimation.

In this example, the scalar value is 1.1 and so the adjusted CE Manager value for the establishment is 55 (50 multiplied by 1.1). Therefore, the estimated undercount is 19 (55 minus 36). An additional set of scalar values is calculated at the aggregate level, for the estimation ratios for each age and sex category for establishments with admin data. For example, the post estimation values for males aged 12 to 16 years were 80% higher than the original census resident count, therefore the scalar value is 1.8. These scalar values are then applied to the original resident counts in each CE with an undercount, to create the initial estimates. A final round of scaling then occurs to round down the initial estimates and constrain them to the adjusted CE value.

Addressing complexities for halls of residences

As in the 2011 Census, large CE halls of residences "were often subdivided into component buildings, for example blocks, [houses] or colleges" for census enumeration purposes, as outlined in Estimation and Adjustment for Communal Establishments (PDF, 153 KB). However, the student occupancy data did not always match this addressing structure and information was sometimes requested at a higher level (for example, at a hall of residence, complex or college level) to encourage institutions to provide the data. When the addressing structures did not align, a method was developed that aggregated the census responses to match the admin data structure. If, after comparison with the aggregated census figures, the admin data indicated an undercount in the census, the shortfall was disaggregated proportionally among the component buildings. This was done to address the undercount and reduce the potential impact on the geographical distribution of the CE population.

Quality assurance of results

Every result of large CE estimation was quality assured, focusing on ensuring: 

  • all census responses were factored into large CE estimation

  • the estimation strategy being implemented for each establishment was correct

  • all the methods created as part of estimation had been implemented correctly

  • the estimation results were consistent with other sources of comparator data

When instances were identified where the admin data or manager form values appeared inaccurate, they were removed from estimation and consequently either no estimation was conducted, or the "borrowing strength" method was applied.

Furthermore, if there was no clear consensus on whether an undercount existed from the census, then no estimation was conducted. The risk that this strategy created in potentially underestimating the population was favoured over overestimating the population.

Results

In cases where multiple entries existed for the same CE and they had different natures of establishment, the results were consolidated under the single case with the correct nature of establishment. In addition, cases with a 100% response rate are often because of lack of comparator admin data sources to confirm whether there was under coverage from the census.

The results from Table 10 highlight that large CE estimation had the largest relative impact on prisons, defence establishments and detention centres. While halls of residences received the largest absolute increase from the large CE estimation process, this was because it was the most populous type of CE.

Nôl i'r tabl cynnwys

7. Cite this methodology

Office for National Statistics (ONS), released 5 January 2023, ONS website, methodology, Communal establishment (CE) estimation and adjustment: Census 2021

Nôl i'r tabl cynnwys

Manylion cyswllt ar gyfer y Methodoleg

Census customer services
Census.customerservices@ons.gov.uk
Ffôn: +44 1392 444972