1. Summary : purpose and recommendations

The purpose of this review has been to examine and assess existing, proposed other potentially effective methods of producing population estimates by ethnic group (PEEGs), for local authority districts of England and Wales, through:

  • study of documents including a summary prepared for this review, an unpublished review dated June 2012, the comparison with the census published by Office for National Statistics (ONS) (2013), PEEGs documents published in 2012 and earlier

  • consideration of administrative and other relevant data sources, and methods used elsewhere

  • consideration of analysis, which would help to prioritise and choose the most effective methods for the current decade

This report recommends that:

PEEGs should be resumed after confirming the granularity of output required by users. A plausible required granularity and secondary priorities are provided in this report (section 4.2).

Promising methods should be progressed quickly to a short-list of “beta test” tools and these developed to be tested against the 2011 Census (section 4.5). This test is necessary because none of the methods clearly outshine all others and all have their clear weaknesses.

The most effective method or combination of methods should then be developed for annual production. A review of the promising methods and strategies is provided in this report (sections 5.1 to 5.4).

Since it is expected that no single method will provide the most reliable estimates and that further relevant data will become available during the next decade, a robust strategy should be sought for early implementation. It should be designed so that additional estimates can be integrated when shown to improve on detail or accuracy in some or all sub-populations.

Nôl i'r tabl cynnwys

2. Production of ONS PEEGs 2001 to 2009

Office for National Statistics (ONS) provided population estimates by ethnic group (PEEGs) for local authority districts (LAD) in England and Wales for mid year in each of 2001 to 2009, as Experimental Statistics.

For these estimates 2001 to 2009, the ONS methodological strategy was to disaggregate the cohort component population accounts for the mid year estimates (MYE) for each local authority, with an ethnic group dimension. Each of births, deaths, flows of migration into and out of each local authority district (LAD) and the special populations of armed forces, prisoners and school boarders, were estimated for males and females and each single year of age, summing across ethnic groups to the corresponding component in the MYE accounts.

The output was published by quinary age group by sex for England and for Wales, and for three broad age groups by sex for each LAD, along with the national totals of births, deaths, net migration and “other changes”. Results for primary care organisation areas (PCOs) were also published, derived by allocation of LAD results to PCOs. Results were published after rounding to the nearest 100 people. The method also provided the detailed age and sex structure of components of change for each group in each LAD, without rounding, which were felt to be insufficiently reliable to be published but could be made available for research projects.

The latest reports and statistics together with a description of the methodology and quality information and evaluations are available.

Nôl i'r tabl cynnwys

3. Evaluation of ONS PEEGs 2001 to 2009, prior to 2014

Concerns about the accuracy of Office for National Statistics (ONS) population estimates by ethnic group (PEEGs) led to ONS reviews after the publication in 2011 of the 2009 PEEGs. ONS announced in June 2012 their decision to stop further production, pending evaluation against the outputs from the 2011 Census. The main findings of the concerns and reviews are listed in this section.

The concerns indicate information available to evaluate the PEEGs and therefore potentially useful in designing a future methodology.

The PEEGs showed faster movement of minorities out of the areas where they were the highest proportion of the population, than did either the 2001 Census, or the experience of 1991 to 2001. A summary of dispersal is the change in geographical concentration of minorities within England and Wales, measured by the index of dissimilarity between White groups and the rest of the population. The index decreased between 1991 and 2001 from 0.519 to 0.515, but according to the PEEGs for 2006 had already rapidly decreased in 5 years to 0.429 (Simpson, 2010: 3). The index of dissimilarity for 2011 is 0.494, confirming the PEEGs’ large over-estimate of the dispersal of minorities.

Net migration to each local authority district (LAD) without an ethnic group dimension is different when taken from the census (as used in the PEEGs with its ethnic group dimension) and when taken from patient re-registration (as used in mid year estimates (MYEs)) to which the PEEGs were subsequently controlled. The differences were highly related to LAD ethnic diversity for 2000 to 2001 (Fry, 2010), with unknown but likely impact on the PEEGs. For example, if the most diverse areas’ migration was controlled upwards and the least diverse areas controlled downwards, further tests might show that the result was faster movement out of diverse areas, noted in the previous point.

The PEEGs showed higher White British population in London than published survey estimates (Travers, 2010).

ONS provided Quality and Methodology Information for PEEGs (ONS, Feb 2012). Its conclusion begins: “At present the PEEGs are Experimental Statistics and should not be confidently relied on in making major policy decisions. The estimates are likely to provide a reasonable broad estimate of the ethnic group composition of the population of England and Wales”. The report lists empirical results and observations on methodology that support this limited endorsement of the PEEGs.

PEEGs showed an ethnic distribution in 2009 different from the Annual Population Survey, for example, twice the size of the Chinese population (0.8% versus 0.4%).

PEEGs showed a higher percentage of White British children aged 5 to 15 than School Census (81.6% versus 76.9%) and more discrepancies for broad ethnic groups in London than elsewhere.

PEEGs showed more White British and fewer White Other than birth registrations linked to NHS birth notifications in 2008. London was again most discrepant and more so than could be explained by the 10% of records without ethnicity recorded, which were concentrated outside London and not related to ethnic diversity (ONS, 2011).

PEEG relies on assumptions about patterns of migration between LADs, which are unlikely to hold, with insufficient graduation between LADs or types of LAD.

PEEG assumptions to allocate ethnic group to international migration using the International Passenger Survey information on country of birth could have been framed differently, with impact of over 5% on the final population estimates of African, White: Irish and Other White.

In June 2013, ONS released a comparison of unpublished PEEGs for 2010 with the 2011 Census ethnic group distribution. It confirmed discrepancies that were most evident in the region of London but equally large in other ethnically diverse LADs. The comparison was limited because cross-tabulations of ethnic group with age had not been released from the census and relative confidence intervals round the census estimates of ethnic group were wrongly applied as absolute values.

An unpublished ONS review dated June 2012 and a summary paper provided for this report, proposed alternative methods for future production of PEEGs. The ONS description of alternatives is reproduced at Appendix 1 and these and other potential strategies are discussed in section 5.

Nôl i'r tabl cynnwys

4. Potential for further evaluation

Office for National Statistics (ONS) considers quality dimensions of relevance, timeliness and punctuality, comparability and coherence, accuracy, output quality trade-offs, user needs and perceptionsand accessibility and clarity (ONS, 2012).

For relevance and user needs, the clarification of the purposes of population estimates by ethnic group (PEEGs) would be useful. They are understood for this report to be to help in (a) the identification of social inequalities that government seeks to reduce, and (b) the identification of diversity of demand for services based on culture or tradition, that government seeks to satisfy. These services and policies vary subnationally and are delivered by local as well as central government.

It is assumed in this review that users require PEEGs (a) for local authority district (LAD) areas and (b) which identify ethnic groups more finely than the broad headings of White, Asian and Black. The ONS unpublished review from June 2012 suggested that future estimates would merge White: Irish with White: Other and this aggregation of categories was used for the comparison with the 2011 Census (ONS, 2013). Such a reduction in granularity seems unnecessary and unhelpful.

A high but secondary priority is broad age structure to address policy areas such as adult care, youth services and employment, such as 0 to 4, 5 to 15, 16 to 24, 25 to 44, 45 to 64, 65 and over. Important but lesser priorities are single year of age structure for re-aggregation to user’s needs, disaggregation by sex and smaller geographical units.

It is assumed in this review that users require PEEGs referring to mid-2014 to be produced by the end of 2015, by which time the 2011 Census will be considered out of date, given the considerable annual change in ethnic diversity. Average annual growth of minority populations as a whole was 6% in the 2000s and considerably greater for some groups. Some may consider this timescale too slow.

Limited comparability is an important issue for evaluation of PEEGs. PEEGs “accuracy” by measurement against another source, is limited by the known patterns of unreliability in any measurement tool for ethnic group. One must accept that ethnic group will differ significantly when recorded for the same person at different times on the same register and expect larger differences when question layout or categories change, or when the context, mode and purpose of the record-filling changes. The unreliability is greater for all categories other than White British, greater for mixed groups than for “‘single” ethnic groups and is very high for residual groups titled “Other” in the census classification (Simpson and Akinwale 2007; Saunders et al. 2013; Simpson et al, 2014).

Coherence should be used to evaluate potential methods. There are two structural aspects of changing ethnic composition, which should be observed in successful methods.

First, there is considerable “ageing in place” of each ethnic group, such that its age structure in later years is predictable from its age structure at earlier years, because the number aged “a” at an earlier year is related to the number aged “a plus t” at a time “t” years later. Since numbers of births and deaths are highly dependent on age structure, not only the future age structure but the growth of each ethnic group is predictable. Migration and mortality do reduce this predictability, but the relationships should be observable broadly.

For example, a projection of 12% growth for Birmingham was accounted for by age momentum, which was particularly responsible for growth in the Indian, Pakistani and Bangladeshi populations (Simpson, 2007: 14 to 15). The proposed methods apart from cohort component estimates suffer from ignoring this relationship. All the methods upset the relationship when they constrain results to totals that have been independently estimated without an ethnic group dimension. Indicators of cohort stability are discussed at section 4.5.8. It may be possible to use cohort stability to improve constraining methods, but we are not aware of existing methods to do this.

Second, the geographical spreading of immigrants and their descendants from areas in which they have settled has been observed in the UK and other countries over many decades and generations. The scale of this “spreading” or dispersal is well known in Britain and only upset by large student populations or other points of attraction to new streams of immigration. The existence and approximate pace of this structural change to ethnic composition of areas should be reproduced in PEEGs.

The potential of learning from comparison with the 2011 Census has not yet been realised. A further evaluation against the 2011 Census should be a high priority in order to test out the current and alternative methods. Without such an evaluation, it is hard to judge any method as suitable.

Alternative methods should now be developed to a “beta test” stage where it is shown they (a) can be practically implemented and (b) promise potentially accurate updates to the LAD ethnic group distribution going forward from the 2011 Census.

The closest possible implementation of each of these “beta test” methods should be applied to mid-2011 without use of the 2011 Census information.

The evaluation should include age and sex dimensions, for those methods that provide it. This is important in its own right, but also allows insights from the separate analysis of age groups highly dependent on fertility (age 0 to 9), on migration (age 16 to 34) and on mortality (age 65 and over).

The methods should include a benchmark of no change since the 2001 Census.

Methods that depend on the mid year estimate (MYE) will need to be constrained to the 2011 Census estimates without an ethnic group dimension, so that discrepancies due to the MYE are not included. However, it may also be of use to evaluate estimates both with and without constraint to the MYE, when this is possible, as the constraint itself may introduce a bias.

Accuracy should be represented by the absolute percentage distance of a PEEG from the 2011 Census estimate. The approach taken in ONS (2013) compared absolute differences between ethnic group distributions, leading inevitably but misleadingly to the conclusion that smaller groups were relatively well estimated.

A regression analysis will allow the separate impacts on accuracy to be assessed of: methods, ethnic group, age, sex, type of area including its ethnic composition, population change and characteristics such as presence of a University or armed forces. Interactions between these independent variables will indicate if one method appears to have particular strengths or weaknesses for types of population or area. Such an analysis is likely to first transform the accuracy variable to achieve an approximate normal distribution to allow tests of significance (see, for example, Lunn et al. 1999 for a similar analysis without the dimension of ethnic group).

Summary measures in the evaluation should include not only the average accuracy achieved across all LADs, but also:

  • the geographical spread of each group (for example, its index of dissimilarity with the rest of the population across all LADs)

  • cohort stability, which can be measured by mean percentage deviation (MPD) and mean absolute percentage deviation (MAPD) of a group’s current age a plus t compared with age a at the previous census year t years before, with the mean taken across each age estimated within 0 to 15 and 34 to 59 (that is, before mortality is effective and omitting the years of highest migration); if the MPD is similar to the MAPD it suggests that cohorts are being affected similarly by migration as one would expect, if the MPD is much smaller than MAPD, it suggests that age cohorts are being differently affected, consistent with errors introduced by constraining; the variation in MPD across the age groups would be an alternative measure of stability, lower variation indicating greater stability

Alternative methods of disaggregating PEEGs from LADs to smaller areas should be included in the evaluation against the 2011 Census.

Nôl i'r tabl cynnwys

5. Methods with potential for PEEGs

The intention of this section is to help identify the most likely “beta test” methods for evaluation against the 2011 Census. A table describing potential methods is followed by commentary on how the methods may be combined in a robust strategy for population estimates by ethnic group (PEEGs). A further table provides specific comments on methods and data sources.

The following table comments on proposed methods. It is assumed that each method’s results will also be considered after constraint to the current mid year estimate (MYE).

A successful strategy is likely to combine more than one methodological approach. These should be evaluated against the 2011 Census at the same time as each method is assessed individually. The following three types of combining methods are likely to be of practical importance for the PEEGs.

An evaluation will identify whether two or more methods’ errors have low or negative correlation, an indication that their average is likely to be a more accurate estimate than any method alone. In such a (possibly weighted) average, the aim is that each method counterbalances the major errors of the other(s). Evaluation against the 2011 Census will confirm whether feasible combinations outperform individual methods.

Methods that work well nationally or for regions but not for local authority districts (LADs), may be subject to “hierarchical constraining”. For example, the APS might be used for a national estimate, to constrain regional estimates based on a combination of cohort progression and the APS, which in turn could constrain LAD estimates based on modelled administrative data.

Methods may be appropriate only for some sub-populations. If the principle can be accepted that estimates should be the best possible in all cases, a method may be supplemented in some sub-populations (by area, group or age), so long as the decision to do so is triggered by evidence. This may be the case when administrative data is missing or of poor quality in some areas. It may also be appropriate where two datasets have inconsistent categories recorded for ethnic group (for example, from name analysis), suggesting a different method should be used for some ethnic groups.

The following table is intended to help reduce the promising avenues of research when developing the potential methods into practical implementation. It lists concerns and suggestions about methods and data sources, arising from Office for National Statistics (ONS) documents or during this review. It begins with aspects of specific methods and then lists concerns that apply to more than one method.

Nôl i'r tabl cynnwys

6. References

Fry, R (2010). Internal migration comparison (PEEGs compared against census). Email from Rob Fry, ONS, to Ludi Simpson. 13 January 2010.

Lunn, D. J., Simpson, S. N., Diamond, I. and Middleton, E. (1998). The accuracy of age-specific population estimates for small areas in Britain. Population Studies, 52, 327 to 344.

Mateos, P., Longley, P. and O’Sullivan, D. (2011). Ethnicity and Population Structure in Personal Naming Networks. PLoS ONE, 6(9): e22943.

Mathur, R., Grundy, E. and Smeeth, L. (2013). Availability and use of UK based ethnicity data for health research. NCRM Working Paper 01/13. http://eprints.ncrm.ac.uk/3040/1/Mathur-_Availability_and_use_of_UK_based_ethnicity_data_for_health_res_1.pdf

ONS (2011). Quality of ethnicity and gestation data subnationally for births and infant deaths in England and Wales, 2005-2008. Statistical Bulletin, 13 September. http://www.ons.gov.uk/ons/dcp171778_232681.pdf

ONS (2012). Population Estimates by Ethnic Group: Quality and Methodology Information. 6 February. http://www.ons.gov.uk/ons/guide-method/method-quality/quality/quality-information/social-statistics/summary-quality-report-for-population-estmates-by-ethnic-group.pdf

ONS (2013). Comparison of mid-2010 population estimates by ethnic group against the 2011 Census. 25 July. http://www.ons.gov.uk/ons/guide-method/method-quality/specific/population-and-migration/pop-ests/population-estimates-by-ethnic-group/comparison-of-pop-estimates-by-ethnic-group-against-2011-census-estimates.pdf

Rees, P (2014) Personal communication including the file 'Asian Indian 2001 Internal Mig V3.xlsx'.

Rees, Philip, Pia Wohland and Paul Norman (2013) Using 2011 Census data to evaluate and update ethnic group projections, Presentation at the Census Research User Conference, Friday 27 September 2013, Birkbeck College, London.

Petersen, J., Longley, P., Gibin, M., Mateos, P. and Atkinson, P. (2011). Names-based classification of accident and emergency department users. Health and Place, 17: 1162 to 1169.

Saunders C. L., Abel G. A., El Turabi A., et al. (2013) Accuracy of routinely recorded ethnic group information compared with self-reported ethnicity: evidence from the English Cancer Patient Experience survey. British Medical Journal Open, 2013.

Simpson, L. (2007). Population forecasts for Birmingham, with an ethnic group dimension. Birmingham City Council, Birmingham. Reproduced as CCSR Working Paper 2007-12, University of Manchester. http://hummedia.manchester.ac.uk/institutes/cmist/archive-publications/working-papers/2007/2007-12-population-forecasts-for-birmingham.pdf

Simpson, L. (2010). ONS experimental population estimates with ethnic group dimension (PEEG): does their UK internal migration reflect evidence from the 2001 Census? Note to Rob Fry, Office for National Statistics. Ludi Simpson, University of Manchester, 4 January 2010.

Simpson, L. and Akinwale, B. (2007). Quantifying stability and change in ethnic group. Journal of Official Statistics 23, 185 to 208.

Simpson, L., Jivraj, S. and Warren, J. (2014). The stability of ethnic group and religion in the Censuses of England and Wales 2001-2011. CoDE Working Paper, University of Manchester. Travers, T (2011). Correspondence between Tony Travers of London School of Economics and ONS after the publication of 2009 PEEGs.

Wohland, P., Rees, P., Norman, P., Boden, P. and Jasinska, M. (2010). Ethnic Population Projections for the UK and Local Areas, 2001-2051, Working Paper 10/2. School of Geography, University of Leeds.

Nôl i'r tabl cynnwys

7. Appendix 1: Potential methods as listed and reviewed by ONS in March 2014

1 Apply census distributions directly to the mid-year estimates.

Pros

Simple to apply and understand.

Less prone to error in production.

Cons

Heavy reliance on census.

Reliability drops over time since the census.

Comparison against population estimates by ethnic group (PEEGs) shows no real improvement in the estimates using this approach.

2 Use a combination of social survey sources (Annual Population Survey or Integrated Household Survey)

Pros

Sample sizes are reliable at Government Region level.

Reliability can be improved by merging 3 or 5 years’ data.

The survey ethnicity question is harmonised with 2011 Census ethnicity.

Cons

Sampling error and non-response create bias.

Despite the large sample sizes, estimates are not typically reliable at local authority level.

Does not cover the population living in communal establishments.

3 Improve the current PEEG methodology

Components of change could be improved. For example, using births data for fertility rates.

Administrative or survey data could be applied to allocate ethnicity to people born outside the UK.

Pros

The revisions could be made to previous years’ estimates to allow back series comparisons.

Cons

Although the estimates should be enhanced, they may also draw criticism for their complexity and heavy reliance on census.

4 Hierarchical constraining

Figure 1 summarises the proposed hierarchical constraining methodology. The intention is for a simplified alternative methodology based on social survey and census data in the short-term, with the later addition of administrative sources as these become available and are considered adequately robust.

Pros

Flexibility to incorporate new administrative or survey sources and cope with ethnicity or geography reclassifications.

Combines census, survey and administrative sources and so overcomes over-reliance on any one of these.

More likely to produce accurate estimates for areas with large non-White populations such as London and Birmingham.

Cons

In the short-term, there is still a reliance on census for local authority-level estimates.

Possibly less accurate for areas with small non-White populations

5 Use small area estimation

Small area estimation may provide an alternative framework for combining survey, administrative and census data to improve the precision of population estimates by ethnic group . Robust estimates are made directly from the Annual Population Survey at regional level but the sample data are insufficient to provide direct estimates at local authority level.

A model-based approach may provide robust estimates if auxiliary information available in administrative data (such as the School Census or the personal demographic spine) is sufficiently related to the variable of interest. The standard approach uses regression models to estimate the small area characteristics of interest and incorporates random area effects to account for between area variations beyond that explained by the model covariates. The feasibility of this approach would depend on the existence of suitable methods for estimating variables with multiple categories.

Pros

Breaks the reliance on census so the estimates will capture changes over time more reliably.

Small area estimation can include direct and synthetic estimates, using, for example, direct estimates where social survey data are adequately robust (for example, London or Birmingham) and drawing strength from auxiliary using synthetic estimation, for areas with little ethnic mix or population turnover.

Can incorporate new data sources as they become available.

Provides a formal framework for combining information from different data sources and involves less complex data manipulation than the current method.

The calculation of variance for these estimates will be straightforward.

Cons

The method for estimating variables with multiple categories is still in development.

The method relies on the availability of auxiliary data with a strong relationship to the variable of interest.

This methodology is less intuitive to communicate to stakeholders.

Nôl i'r tabl cynnwys

Manylion cyswllt ar gyfer y Erthygl

Office for National Statistics
Pop.info@ons.gov.uk
Ffôn: +44 (0) 1329 444661