1. Introduction

This user guide provides an overview of the Classification of Workplace Zones for the UK (COWZ-UK). It describes the aims, purpose and scope of the classification, the data and methods employed to create it and finally provides details of the outputs available.

COWZ-UK is a geodemographic classification of Workplace Zones(WZs) based on data from the 2011 UK Censuses, which were administered by Office for National Statistics (ONS), National Records of Scotland (NRS) and the Northern Ireland Statistics and Research Agency (NISRA). COWZ-UK classifies WZs according to their similarity across a range of census variables. It therefore provides new information about the characteristics of workers and workplaces across the UK. It was produced by ONS and the University of Southampton, in collaboration with NRS and NISRA.

COWZ-UK extends the methods used to create the Classification of Workplace Zones for England and Wales (COWZ-EW) (Cockings and others, 2015) in order to produce a UK-wide classification. Except for some minor differences in the definitions of variables in Scotland and Northern Ireland compared with England and Wales, which are described in Section 2, the input variables and methods employed in COWZ-UK are almost identical to those in COWZ-EW. The final classification is also very similar, but has modifications to the number of group-level clusters and to some cluster names to reflect their different characteristics in this new classification. Note that the coding convention used in COWZ-UK is deliberately different to that of COWZ-EW in order to clearly differentiate between the two area classifications.

Nôl i'r tabl cynnwys

2. Background, aims and scope

The UK censuses collect data about workers and workplaces. For the 2001 Census, Output Areas (OAs) were the lowest level of geography for which Office for National Statistics (ONS) released workplace data for England and Wales. OAs are designed to represent the geographical distributions of residents and residences, but these distributions are very different to those of workplaces and workers. ONS was therefore only able to publish four univariate workplace tables at the OA level in 2001 and the workplace population in these OAs ranged from 0 to 80,145 workers. Scotland published five workplace tables at OA level and Northern Ireland only published univariate workplace population data at ward level and above.

Following the 2011 Census, ONS employed automated zone design techniques developed by the University of Southampton to create a set of geographical areas optimised for the release of workplace data for England and Wales (Martin and others, 2013). These Workplace Zones (WZs) were produced by splitting, merging or retaining the 2011 OAs, which had themselves been maintained using similar methods (Cockings and others, 2011).

Boundaries for these 53,578 WZs were released in January 2013 (ONS, 2014a) and corresponding aggregate data were released in May 2014 (ONS, 2014b). Twenty-one univariate tables were published at WZ level for the 2011 Census (compared with the four released in 2001) and workplace population size was much more uniform across WZs than OAs, (range: 101 to 11,985; mean 493), reflecting the benefits of this bespoke zone design.

The WZs provide much greater detail in areas with high numbers of workers and workplaces (such as city centres, retail and business parks), while the merging of OAs containing low numbers of workers (such as in suburban or rural areas) aids robust analysis. As no equivalent WZs existed for Scotland or Northern Ireland at that time, National Records of Scotland (NRS) and the Northern Ireland Statistics and Research Agency (NISRA) released 2011 workplace data for OAs (one table of counts of workplace population) and for the much larger Council Areas (eight tables) in Scotland, and Super Output Areas (SOAs) (22 tables) in Northern Ireland.

Following the successful release of WZs and workplace data for England and Wales and subsequent user demand for UK-wide coverage, ONS and the University of Southampton produced a set of WZs for Scotland and Northern Ireland. A UK set of 60,709 WZ boundaries is available from the Open Geography portal. The Scottish and Northern Irish boundaries are also available separately on the NRS and NISRA websites respectively. The completion of a set of UK WZ boundaries provided the opportunity to extend the COWZ from England and Wales to UK coverage.

Nôl i'r tabl cynnwys

3. Data and definitions

The geographical coverage of the Classification of Workplace Zones for the UK (COWZ-UK) is the United Kingdom of England, Wales, Scotland and Northern Ireland. COWZ-UK is based entirely on 2011 Census data. The population base is the 2011 Census workplace population, defined as “All usual residents aged 16 to 74 years in employment in the area the week before the census.” This includes people who were working in any paid work (including casual or temporary work) within the week prior to the 2011 Census Day.

The workplace population encompasses:

  • employees

  • self-employed (with or without employees)

  • people on a government-sponsored training scheme

  • people working for their own or family’s business

  • people on sick leave, maternity leave, holiday or temporarily laid off

  • full-time students who are working

It does not include:

  • those usually resident in one country (for example, England) but working in another (for example, Scotland)

  • those usually resident in the UK but working outside the UK or on offshore installations

  • those with a place of work in the UK but who are not usually resident in the UK

  • short-term residents

  • full-time students who are not working

Respondents answer questions related to their main job (or last main job) that is, the one in which they usually work (or worked) the most hours; any secondary employment is therefore not considered. As part of the 2011 Census processing of workplace data, people who work mainly at or from home, or who do not have a fixed place of work, are geo-referenced to their area of usual residence, while workers who report to a depot are geo-referenced to the depot’s address.

It is important to note that in the context of the census a “workplace” is defined as a place of work recorded by a worker on their census form; workplaces themselves are not explicitly surveyed or listed. As a result, a workplace in the censuses may differ from other entities, such as businesses, enterprises or companies, recorded in other datasets such as the Inter-Departmental Business Register (IDBR).

Nôl i'r tabl cynnwys

4. Methods

The methodology used to create the Classification of Workplace Zones for the UK (COWZ-UK) is almost identical to that employed for the Classification of Workplace Zones for England and Wales (COWZ-EW) (Cockings and others, 2015) and very similar to that used for the Output Area Classification (OAC) (Gale and others, 2016).

First, the main domains related to workers and workplaces were identified and a set of candidate variables representing each of these domains was selected. Exploratory analysis was undertaken to understand the statistical and geographical distributions of the variables and to evaluate the strength of correlation between them. Various methods of transformation were explored prior to the correlation analysis in order to aid inference.

Based on the results of this exploratory analysis, the set of variables was then reduced to the final set employed in the classification. The variables were standardised to ensure that all contributed equally to the clustering process. A k-means clustering algorithm was then employed to produce specified numbers of clusters for the top level of the hierarchy. These clusters were evaluated using a range of objective and subjective statistical and graphical methods. The k-means algorithm was then applied to each top-level cluster in turn to subdivide them, thus creating the next level of the hierarchy. Again, various values of k were tested.

This hierarchical subdivision continued until meaningful results were no longer achieved. Neither the number of clusters at each level nor the number of levels in the hierarchy were predetermined. Once these numbers were confirmed the clusters were profiled and named to enable users to interpret and use them more readily. The rest of this section describes the implementation of these methods in more detail.

Definition of domains

The 2001 and 2011 versions of OAC both identified five domains to represent the main characteristics of residential areas:

  • demographic structure

  • household composition

  • housing

  • socio-economic group

  • employment

Prior to the creation of COWZ-EW, equivalent domains did not exist for representation of the area-based characteristics of workers and workplaces. In creating COWZ-EW, four domains were identified with reference to literature and consultation with Office for National Statistics (ONS) and users:

  • composition of the workplace population

  • composition of the built environment

  • socio-economic characteristics of the workplace population

  • employment characteristics of the workplace population

These four domains were retained for COWZ-UK. The composition of the built environment domain in COWZ-UK is the equivalent of the housing domain in OAC but differs in that it is explicitly designed to capture the workplace or residential mix and workplace population density of a Workplace Zone (WZ).

Aspects of transportation and travel to work are included within the socio-economic characteristics of workplace population domain (as per OAC) rather than introducing a separate, narrowly focused, transportation domain. Note that there is no direct workplace equivalent for household composition, because characteristics of each workplace, such as size or type of employer, are not measured by the census.

Selection and refinement of variables

Aggregate WZ-level data for England and Wales were downloaded from NOMIS. For Scotland and Northern Ireland, National Records of Scotland (NRS) and the Northern Ireland Statistics and Research Agency (NISRA) provided a custom extract of census workplace microdata to ONS. University of Southampton researchers accessed these data as Approved Researchers under secure conditions at ONS Titchfield.

Individual records for Scotland and Northern Ireland were geo-referenced by workplace postcode using the May 2012 version of the Office for National Statistics Postcode Directory (ONSPD). In Scotland, some postcodes that spanned local authority boundaries had previously been split by NRS and the fragments assigned a modified postcode: these postcodes were geo-referenced using a lookup file provided by NRS. The geo-referenced postcodes were then allocated to a WZ using a geographic information system (GIS) point-in-polygon operation and the microdata were geographically aggregated to WZs.

For COWZ-EW an initial long list of 504 candidate variables (intended to represent the four domains) was evaluated, with the intention of reducing this pool down to a short list, which best captured the variation in workers and workplaces across the countries. This initial long list comprised the variables from 13 of the 21 WZ-level tables published for England and Wales (see Table 1 in Cockings and others, 2015) plus three other bespoke variables:

  • number of workplace postcodes (obtained directly from ONS)

  • density of workplace postcodes (per hectare)

  • ratio of the number of OAs to WZs

These additional variables were included as proxies for the composition of the built environment because no variables in the published WZ tables reflect this. The other eight published WZ-level tables were excluded because they were not within the scope of the classification, or because they were highly correlated with variables in other tables. For COWZ-EW, following in-depth analysis, this long list was reduced to 63 and then to 48 variables. These 48 variables formed the final inputs to the COWZ-EW cluster analysis.

For COWZ-UK, the full long list of 504 variables was not re-evaluated because the previous analysis had shown that many of these variables contributed little useful information to the classification for England and Wales and there was no reason to expect this to be any different for the whole of the UK. Analysis for COWZ-UK instead focused on the 63 variables that had comprised the intermediate set of candidate variables for COWZ-EW. This set of variables is shown in Annex A.

In most cases, the variable definitions for Scotland and Northern Ireland were the same as those in the published tables for England and Wales, or were readily mapped to their equivalents. For a few variables, there were important differences, which were overcome in the following ways.

White British [WP201_WhiteBrit]

The categorisation of White British was slightly different in Scotland and Northern Ireland compared with England and Wales. For England and Wales this was obtained from the variable White: English/Welsh/Scottish/Northern Irish/British in Table WP201EW. For Scotland, White Scottish and Other White British from ETHNICID_S were summed. For Northern Ireland, it was not possible to isolate just White British: instead, White from ETHPNI11 was employed. This means that slightly different ethnic groupings might be being measured by these variables.

Highest level of qualification [WP501_GE_L4, WP501_L3, WP501_NoQual]

England and Wales (WP501EW) and Northern Ireland (HLQPUK11) had separate categories for Apprenticeship and Other: Vocational/Work-related Qualifications, Qualifications gained outside the UK (Not stated/level unknown), whereas Scotland (HLQPS11) did not. In England and Wales, and Northern Ireland, these categories were combined with Level 1 and Level 2 qualifications (and ultimately not used in the final classification) but in Scotland there was no such correspondence; workers in these categories may therefore have been placed in the Level 3, 4 or 5 categories that were used in the final classification.

Full-time students [WP601_FT_Stud]

In England and Wales, and Northern Ireland, full-time students (in employment) were included as a separate category in the Employment status table (WP601EW and EMPSTAT respectively). They were also included as a category in the National Statistics Socio-Economic Classification (NS-SEC) table (WP607EW) in England and Wales and as a special (STUDENT) variable for Northern Ireland. The counts from these sources all corresponded, so the Employment Status table was used for England and Wales and the STUDENT variable for Northern Ireland. In Scotland, the NS-SEC variable was employed to identify full-time students as it was the only available source.

Travel to work [WP702 and WP703]

For travel-related questions in England and Wales, the census questionnaire asked only about place of work, whereas in Scotland and Northern Ireland it asked about travel to place of work or study (including school). This meant that, for full-time students in employment, it was not clear whether their response would have related to their place of work or place of study. It was not possible to infer from the routing of the questions, or from post-enumeration processing, whether students were more likely to have reported their place of work or study. In order to reduce the impact of this discrepancy, full-time students were excluded from calculation of the travel to work variables for Scotland and Northern Ireland (but still included for all other variables).

Prior to analysis, all variables (other than workplace population density and the ratio of OAs to WZs, which were kept as ratios) were converted from counts to percentages using the workplace population (that is, all usual residents aged 16 to 74 years in employment in the area the week before the census) as the denominator. The exception was WP613EW (Approximated social grade), for which the denominator was all usual residents aged 16 to 64 years in employment in the area the week before Census Day.

Geo-demographic classification is predominantly data-driven, but the analyst designing the classification must make important decisions concerning the selection and refinement of variables. In common with Vickers and Rees (2007), objective and subjective criteria were employed in the design of COWZ-UK, including:

  • the exclusion of variables representing small percentages of the population

  • the reduction of redundancy by exclusion of one of any pair of very highly-correlated variables

  • the removal of variables with highly-skewed distributions

  • the calculation of composite variables

  • the avoidance of variables for which there were data quality concerns (ONS, 2015a)

  • the exclusion of variables with relatively uniform geographical distributions that added little to the classification

A range of analytical methods was employed to assess the statistical and geographical distributions of the 63 intermediate variables and their inter-relationships at WZ-level across the UK, including summary statistics (mean, median, range, standard deviation), histograms, Quantile-Quantile plots (to assess normality), maps and a correlation matrix (using Pearson’s Product Moment Correlation coefficient) of the normalised variables.

Three transformation methods were evaluated for the purposes of normalising the data: log, Box-Cox and inverse hyperbolic sine (IHS). Theoretically, the log transformation deals with extreme outliers well, the Box-Cox method should deal with a range of different distributions better than the log and the IHS is particularly suited to distributions with a large number of zero values (ONS, 2015b). For COWZ-UK, as for COWZ-EW, the Box-Cox transformation was found to consistently perform best for the broad range of variable distributions.

Following this detailed exploratory analysis, the set of 63 variables was reduced to 48 (shown in Table 1, grouped by domain). These 48 variables, which formed the input to the COWZ-UK cluster analysis, are the same as those employed in COWZ-EW, other than the slight differences between the countries described previously. Note that the variable codes and names employed in Table 1 are based on the England and Wales workplace tables, for example, WP102 but do include data for the entire UK.

Standardisation of variables

It was important to ensure that all variables were measured on the same scale otherwise variables with a greater range in their potential values may carry a disproportionate weight in the classification. The 48 variables were therefore standardised using the range standardisation technique, which produces output values in the range 0 to 1.

Workplace population density had a few very extreme outliers (artefacts arising from the WZ design process where large numbers of workers work in workplaces that are all stacked on the same geographical location, such as large office buildings). These outliers prevented the range standardisation technique from working effectively. The standardisation process for this variable (only) was therefore undertaken with the top 0.01% of values excluded. These outlier values were assigned a value of 1 and added back to the distribution after standardisation of the rest of the values.

Cluster analysis

A k-means clustering method was employed to group the WZs into clusters based on their similarity in terms of the 48 variables, with repeated application to create a nested hierarchy. This was implemented in R using the k-means function and the default Hartigan-Wong algorithm. The squared Euclidean distance was used to evaluate the degree of similarity within and between clusters where 10,000 random starts were employed.

Based on the insights gained from the COWZ-EW analysis, where seven clusters were found to be the most suitable number for the top level of the hierarchy, six to eight clusters were evaluated for the top level of COWZ-UK. Each of these solutions was then further subdivided into two to six clusters. All potential solutions were systematically evaluated using a combination of: prior expectations based on theory and practice, statistical and graphical techniques (such as compactness of cluster solution, homogeneity of cluster size, and stability and robustness throughout the hierarchy), together with mapping to confirm whether the outputs made sense on the ground. Analysis took place at various geographical scales (local, regional and national) to ensure a thorough understanding of the spatial patterns.

Once the number of clusters and levels had been determined, each workplace zone was allocated to the cluster with the lowest squared Euclidean distance between itself and all clusters’ centres. COWZ-UK, like COWZ-EW, is a two-tiered nested hierarchical classification, consisting of seven Supergroups and 29 Groups. Subdivision below this level led to excessively small cluster sizes and fragmentation of the clusters.

To ensure comparability with COWZ-EW, OAC and other classifications, the levels of the COWZ-UK hierarchy are termed “Supergroups” (top level), and “Groups” (second level), although a different coding convention is employed for COWZ-UK to ensure clear differentiation from COWZ-EW.

Nôl i'r tabl cynnwys

5. Outputs: COWZ-UK

There are a number of outputs from the UK version of the Classification of Workplace Zones for the UK (COWZ-UK) and these are available from the workplace-based area classification web pages on the ONS website. The outputs are summarised as follows:

  • cluster membership: a spreadsheet containing all Workplace Zones in the UK with a corresponding Supergroup code and name, Group code and name for each Workplace Zone

  • spreadsheet containing the squared Euclidean distance of each possible combination of Workplace Zone and Supergroup; this shows that each area was allocated to the Supergroup with the smallest value amongst all possible for each workplace zone

  • Annex A contains the list of 63 intermediate candidate variables that were initially considered for inclusion in COWZ-UK; this set was subsequently refined to the set of 48 variables that were used to produce COWZ-UK

  • Annexes B and C provide a full profile for each of the Supergroups and Groups respectively; these profiles include pen portraits, radial plots, example locations and example images

  • maps have been created by the Consumer Data Research Centre (CDRC) showing the spatial distribution of COWZ-UK by Supergroups and Groups for the whole of the UK; these are available from the CDRC website

The codes and names provided in this guide and its related outputs are specific to the COWZ-UK version of the classification. Users should specifically note the following when using COWZ-UK and/or COWZ-EW

  • while the total numbers of clusters at the Supergroup and Group levels are the same for COWZ-EW and COWZ-UK, there are differences in the number of Group-level clusters in some of the Supergroups

  • different coding conventions have been implemented for COWZ-EW and COWZ-UK in order to differentiate between the two classifications and users should ensure that they use the appropriate set

  • some of the COWZ-UK Supergroup and Group names are the same as those in COWZ-EW, others are slightly modified and others are completely new, reflecting the differing characteristics of the clusters in the England and Wales, and UK versions

Availability of constituent data

Users should note that the full set of variables employed to create COWZ-UK is not publicly available. It is therefore unfortunately not possible to fully replicate the methods employed in the creation of COWZ-UK. All of the data employed for England and Wales are publicly available from the NOMIS website. In January 2018, NRS released some workplace data for Scottish WZs but not all of the relevant tables or variable categorisations required for COWZ-UK are available and some of the publicly-released data will vary from that used in COWZ-UK due to differences in postcode geo-referencing. No data are publicly available at WZ level for Northern Ireland due to disclosure control concerns.

Nôl i'r tabl cynnwys

6. Acknowledgements

Classification of Workplace Zones for the UK (COWZ-UK) has been produced by Office for National Statistics in partnership with Samantha Cockings, David Martin and Andrew Harfoot from the University of Southampton and in collaboration with National Records of Scotland and the Northern Ireland Statistics and Research Agency.

We would like to acknowledge our thanks to the National Records of Scotland for supplying the 2011 Census data for Scotland and the Northern Ireland Research and Statistics Agency (NISRA) for the equivalent data for Northern Ireland, supplied to ONS under a data sharing agreement.

COWZ-UK is based on National Statistics data © Crown copyright and database right 2015. It contains Ordnance Survey data © Crown copyright and database right (2015, 2016) and public sector information licensed under the terms of the Open Government Licence version 3.0.

Nôl i'r tabl cynnwys

7. References

Cockings S, Harfoot A, Martin D, Hornby D (2011) Maintaining existing zoning systems using automated zone design techniques: methods for creating the 2011 Census output geographies for England and Wales, Environment and Planning A, 43(10), pages 2399 to 2418

Cockings S, Martin D, Harfoot A (2015) A Classification of Workplace Zones for England and Wales (COWZ-EW): User Guide, University of Southampton [Accessed 3 March 2017]

Gale C, Singleton A, Bates A, Longley P (2016) Creating the 2011 Area Classification for Output Areas (2011 OAC), Journal of Spatial Information Science, 12, pages 1 to 27

Martin D, Cockings S, Harfoot A (2013) Development of a Geographical Framework for Census Workplace Data, Journal of the Royal Statistical Society, Series A, 176(2), pages 585 to 602

ONS (2014a) Workplace Zones: A new geography for workplace statistics [Accessed 3 March 2017]

ONS (2014b) 2011 Census: Workplace population analysis [Accessed 3 March 2017]

ONS (2015a) 2011 Census: General report for England and Wales [Accessed 3 March 2017]

ONS (2015b) Methodology note for the 2011 Area Classification for Output Areas [Accessed 3 March 2017]

Vickers D and Rees P (2007) Creating the UK National Statistics 2001 output area classification, Journal of the Royal Statistical Society, Series A, 170(2), pages 379 to 403

Nôl i'r tabl cynnwys