1. Main points

  • We have developed experimental daytime population estimates at small area geographies for 14 local authorities (LAs) using a modelling framework known as Population 24/7.

  • We have compared these estimates against mid-year population estimates (MYEs) and observe mobility patterns that are broadly consistent with our expectations of the population moving from residential areas to non-residential areas during the daytime; we also observe distinct mobility patterns for urban and rural areas, and different types of LAs, and where results have differed from our expectations, it has generally been possible to find plausible reasons for these discrepancies.

  • We have also developed experimental daytime population estimates derived from anonymised and aggregated mobile phone crowd movement data from O2 Motion and compared these with Population 24/7 estimates.

  • We have observed that mobile-derived daytime population estimates tend to be larger in urban areas when compared with comparable Population 24/7 outputs, and smaller in rural areas.

  • An initial exploratory comparison with mobile-derived estimates has been successful in identifying and validating model limitations.

  • Further work is required to quantify uncertainties in mobile-derived daytime population estimates, and ensure accuracy in our experimental daytime statistics, particularly in the coverage of mobility associated with tourism, leisure, transport, and higher education.

Nôl i'r tabl cynnwys

2. Overview of estimating population by time of day

There is longstanding demand for estimates of population size by time of day at small area geographies. Daytime population statistics have many potential uses, including:

  • infrastructure planning by local government

  • emergency planning and response by emergency services

  • commercial planning in the private sector

Traditional population statistics like the census estimate the size of the usually resident population at their usual place of residence. These statistics can be misleading if used as estimates of daytime population. Their accuracy as daytime estimates will fluctuate as the population moves between locations across the course of a day. These mobility patterns can vary by sub-population and be cyclical, seasonal, and disrupted by planned and unplanned events.

Daytime statistics aim to estimate the population present. That is, the total number of people present in an area at a particular date and time: residents who are not temporarily absent, day trip visitors, and staying visitors. Such statistics could either be:

  • nowcast in real-time or near real-time (how many people are in an area now)

  • forecast (how many people will be in an area at a specific date and time in the future)

  • historical (how many people were in an area at a specific date and time in the past)

  • scenario-based (how many people would be in an area in a given situation, for example if the population was where they spent most of their day)

Previous methods for estimating daytime population have mainly been limited to movements of subpopulations, such as workers. For example, the Centre for Ecology and Hydrology's UK gridded population 2011 based on Census 2011 and Land Cover Map 2015 dataset and the US Census's Remote Work During the Pandemic Shifted Daytime Population of Cities article. Previous methods have also mainly been constrained to broad timeframes, for example daytime versus night time, such as EU ENACT project, available on the European Commission's website and LandScan USA, available on Springer Link's website.

Most of these methods assume that a person's daytime location is entirely based on their primary activity (for example, a worker at their workplace for the entirety of the day, a student at their school or university). They do not allow for population movements for other activities, which could have a significant effect on population numbers for certain locations at particular times, or transit to and from these locations. An exception is work by Data Ventures (the commercial arm of Statistics New Zealand), who have conducted significant work on highly granular population mobility using mobile phone data. For more information, see the COVID-19 Impact on Local Councils' CBD population through to 2021 article, on Data Ventures' website.

We are now assessing Population 24/7, a modelling framework developed by Martin, Cockings, and Leung to produce daytime population statistics. For more information on the framework, see Developing a Flexible Framework for Spatiotemporal Population Modeling, on Taylor and Francis Online's website. In this article we demonstrate the use of this framework in a case study of 14 local authorities (LAs) previously used in our Dynamic population model for local authority case studies in England and Wales: 2011 to 2022 article (listed in Table 1). In this case study we present a scenario-based historical daytime population corresponding to 2pm on a typical weekday during academic term time in 2019. We compare this against our mid-year population estimates and an equivalent experimental daytime population estimate using anonymised and aggregated mobile phone crowd movement data from O2 Motion.

Nôl i'r tabl cynnwys

3. Current aims

The work presented in this paper aims to:

  • provide a case study of 14 local authorities (LAs) at 2pm on a typical termtime weekday in 2019 to evaluate the Population 24/7 framework for daytime population estimation

  • contrast Population 24/7 daytime population outputs with our mid-year population estimates (MYEs) to highlight where statistics that estimate the population at their usual place of residence may be inaccurate when used as daytime estimates

  • compare the level of agreement between Population 24/7 daytime population estimates with equivalent estimates derived from anonymised and aggregated mobile phone crowd movement data from O2 Motion

  • highlight important caveats of the method and detail plans for future improvements to the input data and methodology

Nôl i'r tabl cynnwys

4. Methodology 

Data

Our case study is referenced to 2019. This is the most recent year in which estimates can be produced with all data available while avoiding the impact of coronavirus (COVID-19) on population mobility at this early stage of our research. While most of our input data are referenced to 2019, we have used older sources where these were the most recently available data as of 2019.

Sources used

  • We geolocated residential locations of the population using output area (OA) centroids from the 2011 Census, and mid-year estimates (MYEs) and Higher Education Statistics Agency (HESA) data from 2019 to define the usually resident population of each residential location.

  • We geolocated workplace destinations using Workplace Zones centroids from the 2011 Census and used the Inter-Departmental Business Register (IDBR) from 2019 to define the maximum capacity of each workplace destination.

  • We used HESA data, English School Census and Welsh School Census data from 2019 to calculate maximum capacity of education destinations, which were geolocated using the UK Register for Learning Providers and additional sources like Get Information about Schools.

  • We used a hospital address list from 2019 and a list of acute treatment units from 2017 to geolocate healthcare destinations, and 2019 Hospital Episode Statistics data (which include A&E attendance, inpatient admissions and outpatient appointments) to estimate occupancy rates.

  • We used Geolytix data on supermarket locations up to 2022 to geographically locate some retail destinations, with maximum capacity of these locations estimated based on a combination of further Geolytix data, the IDBR, and MYEs.

  • Geolocation of all destinations used the National Statistics Postcode Lookup (November 2019 release).

  • Data describing population presence at destinations by time of day (time profiles) were constructed by the University of Southampton using data sources including the United Kingdom Time Use Survey, 2014 to 2015 (for a full list of data sources, see Population 24/7 Near Real Time: Data Library, Sample Outputs and Batch Files for England, 2011, on the UK Data Service's website).

  • We created background layers using maps of the coastline and of the major road network from Ordnance Survey, with maximum capacity and time profiles for road usage using 2019 Annual Average Daily Flows data, and proportions of traffic flow by time of day and vehicle occupancy from the Department for Transport.

  • For comparisons of the Population 24/7 outputs to mobile-derived estimates we used aggregated and anonymised mobile phone crowd movement data from O2 Motion (see section "Mobile phone method" for more details), and for comparisons to usually resident statistics we used MYE 2019.

Population 24/7 method

The Population 24/7 framework developed by Martin, Cockings, and Leung produces time-specific gridded population estimates (grid cell size defined by the user) from a variety of administrative and survey-based data. For more information on the framework, see Developing a Flexible Framework for Spatiotemporal Population Modeling, on the Taylor and Francis Online's website.

In the framework, estimates of population present are modelled for origins (the residential population of an area), and destinations (employment, education, healthcare, retail, leisure, and other locations). Each destination is assigned a time profile related to that destination type, indicating the proportion of the location capacity expected to be present or in transit at a given time. When creating an estimate for a specific time, each destination is queried in turn, and people are reallocated from origins within defined catchment distances. Populations are dispersed in a small area around origin and destination centroids defined by the expected size of the sites. In-transit populations are distributed to an area around the destination they are allocated to, based on a background map of weights representing the relative volume of traffic in that location. This travel network currently only includes major roads and does not account for rail or non-motorised travel. As a result, estimates for areas with high volumes of population movement via these uncaptured modes of travel may be underestimates. Populations are not allocated to some types of impassable terrain where masking data are available, for example beyond the coastline.

Our research applies the Population 24/7 framework to sensitive and closed-source data (not freely available to the public), generating daytime population estimates at a resolution of 200 metre by 200 metre grid squares. The population in each grid cell is then apportioned to OA statistical geographies proportional to the area in the grid cell that is covered by each OA for easy comparison with traditional statistics, and further aggregated to produce lower layer super output area (LSOA), middle layer super output area (MSOA), and Local Authority (LA) statistics. For results in the high-resolution grid format, see the Population 24/7 - A method to account for daily population mobility in spatiotemporal population estimates article, on the UK Statistics Authority website (PDF, 984KB).

To produce outputs, we define a square study area centred on each LA surrounded by an 80 kilometre (km) buffer zone in each cardinal direction (north, south, east, and west). This buffer contains origins and destinations, acting as a source of inflowing temporary visitors and a sink for outflowing temporarily absent residents. An 80km buffer was chosen to capture the majority of local movement in and out of LAs, but different buffer sizes could result in different estimates, particularly for LAs with higher proportions of long-distance travel. Additionally, because the study areas contain areas outside each LA, some LAs will have an effective buffer area greater than others after the transformation from grid to OA, particularly if the shape of the LA is tall or wide; this may have a minor effect on the final population estimates. To implement our chosen scenario (2pm on a typical termtime weekday in 2019) we produce outputs using a 2pm time slice, weekday time profiles for all destinations, termtime time profiles for educational destinations (universities, primary, and secondary schools), and termtime origins for student populations.

The main strengths of Population 24/7 are:

  • its flexibility to include various types of data that describe different sorts of population mobility depending on user need

  • the ability to constrain to known population totals at specific times and places

  • the fine spatiotemporal granularity of its outputs; we report LSOA and MSOA results for interpretability but can produce outputs down to OA or bespoke grid cell sizes

However, its main limitation is its dependence on such a large variety of data to model population mobility. Maintaining these input data is a substantial task and carries a risk that some data may be discontinued or altered without notification or recourse, potentially introducing discontinuities in the outputs over time. We therefore take a pragmatic approach to data inclusion whereby the additional work and risk of including data must be justified by its impact on the daytime population estimates. Using data released on different schedules (up to and including annual data) also poses challenges to our aspiration to work towards more timely statistics. Additionally, we cannot currently quantify the uncertainty of the outputs.

For further details on the methodology and specific details on the processing of datasets for use in the framework, please see the Population 24/7 -- A method to account for daily population mobility in spatiotemporal population estimates article, on the UK Statistics Authority website (PDF, 984KB).

We use MYEs to compare Population 24/7 daytime population estimates against traditional usual resident statistics. We quantify the comparison between these two methods using absolute and percentage differences in population totals at LA and LSOA level, and differences in the mean and median population over all LSOAs in each LA.

Mobile phone method

The capability to directly measure daytime population could allow us to move away from traditional survey and model-based methods. Deriving estimates from mobile phone crowd movement data is an innovative approach to measuring daytime population that has been used by other National Statistical Institutes, including Statistics New Zealand in their COVID-19 Impact on Local Councils' CBD Population through to 2021 article.

We have conducted an exploratory comparison of Population 24/7 outputs with daytime population estimates derived from O2 Motion mobile phone crowd movement data. The objective of this comparison is not to attribute causality to any observable differences between these methods. Instead, it is a preliminary investigation into the use of mobile data for deriving daytime population estimates and their suitability for validating other methods.

The mobile-derived population data are aggregated from anonymised records of geolocated mobile phone events. Events are those that use the provider's cellular network, for example calls, SMS messages, and data communications via 3G, 4G and 5G.

Counts within an MSOA are aggregated from users who are geolocated in a consistent location for a period of time defined by O2 Motion. Anonymised age and gender information are attributed to each count using contract details where possible. Therefore, only users aged 18 years and over are theoretically represented in these data. Each count is classified to predict the purpose of the users' presence within the MSOA based on prior activity. This classification is either resident, visitor, or worker within the MSOA.

O2's market share is scaled to UK-wide population estimates by weighting user counts based on MYEs. This weighting has been applied by O2 Motion prior to aggregation at MSOA level. These UK-wide insights have a one-hour frequency, providing a high-resolution timeseries of population estimates.

There are strengths to mobile-based population estimation methods. The automated processing of events provides near real-time insights with fine temporal and spatial granularity. This is favourable over survey and model-based approaches, which have long data collection and processing periods that affect their results' release frequencies. A mobile-based approach also reduces respondent burden and provides an additional insight into purpose of presence in an area over traditional survey methods. This approach, when compared with modelling, also benefits from making direct measurements of device locations. This means it detects changes in mobility patterns following events or changes to an area without having to tweak modelling assumptions or input data sources. It also inherently measures population movement without having to incorporate specific settings into a model, like education, retail, and healthcare.

Mobile-derived estimates also have weaknesses, which can be split between general methods and the method used in this work.

Generally, validating the quality of these data is complex because of the lack of an existing baseline to measure estimates against; this is strictly true for high-resolution spatiotemporal population estimation as a whole. Furthermore, since these data are built considering the number of mobile devices, they are a proxy measurement of population size and movement. As a result, data quality could be affected by biases including:

  • under-coverage in younger populations (who cannot legally undertake a contract)

  • cases where contract information does not match the device user (this risks mistakenly including those aged under 18 years)

  • those who do not own or regularly use a mobile phone (including people with devices switched off for long periods)

  • those who use multiple mobile phones

  • those who do not always use the provider's cellular network for communication (for example, calls or messaging via Wi-Fi)

  • variation in signal coverage nationwide

Missingness in the data may also exist because of technical or contractual reasons. The accuracies associated with geolocating events and aggregating to MSOAs could introduce significant measurement errors. Finally, because of providers having fractional market shares, a dependency remains on traditional methods to address the weighting that approximates entire population coverage.

Within the context of this work's data, there are specific weaknesses. At present, for intellectual property reasons, specific methods the data provider uses to process events and derive outputs are not publicly available. This conflicts with the Code of Practice for Statistics, which calls for trustworthiness through transparency. There is also the consideration of ownership, control, and responsibility for generating and maintaining estimates. From a methodology perspective, only considering users that remain in a consistent location misses a subset of the population that are "in-transit". Furthermore, these mobile-derived estimates apply to a one-hour window in which user events are aggregated. This has the effect of "smoothing out" or missing high frequency mobility patterns. In contrast, modelling approaches like Population 24/7 can generate a "snapshot" view of population estimates at a single timestamp. This limits the mobile data to applications requiring a frequency greater than one hour. It also introduces a definitional difference between the Population 24/7 and mobile-derived populations.

The balance between, and uncertainty around, these strengths and weaknesses is the main justification for only performing a preliminary comparison using mobile-derived estimates in this work. For information on further developments that aim to mitigate these uncertainties and allow more comprehensive use of these data, see Section 6: Mobile-derived population estimates.

To perform a comparison between Population 24/7 outputs and mobile-derived estimates, it is necessary to pre-process both datasets to ensure the underlying assumptions are as close as practically possible. For this specific comparison, the Population 24/7 outputs represent a scenario at 2pm during a termtime weekday in 2019. To reflect this in the mobile estimates, we used the median of MSOA population estimates at 2pm during university termtime weekdays throughout available data during 2019 (a total of 82 days). Online university termtimes were used in this case since only those aged 18 years and over are included in the mobile data. The median value reflects the fact that population mobility is not random, and the central tendency may not be well described by the mean. The mobile-derived MSOA population estimates include all genders, ages, and purposes to match Population 24/7 as closely as possible. The Population 24/7 outputs were also filtered to remove counts associated with people aged under 18 years (a demography that the mobile data does not represent), and those considered "in-transit" (to match the assumption that a mobile user must remain in a fixed location to be counted). Note that the Population 24/7 outputs were not filtered to remove these populations when compared with MYEs.

To quantify the comparison between these two methods, percentage difference and median absolute difference are used. The Spearman's rank correlation coefficient is calculated to explore the correlation between estimates.

Nôl i'r tabl cynnwys

5. Results 

Contrasting daytime statistics with traditional statistics

All daytime population results estimate the population present at 2pm on a typical weekday during termtime in 2019. Results are given for 14 local authorities (LAs), listed in Table 1, with their total mid-year estimate (MYE) population, as well as daytime population from the Population 24/7 framework and the difference between the two estimates. Table 2 gives the mean and median population estimates across all Lower Super Output Areas (LSOAs) in each LA from both MYEs and Population 24/7 outputs, with the difference between the two.

Several figures are presented in this section that contrast daytime populations from Population 24/7 with resident populations from MYE. Figure 1 shows choropleth maps of each LA, showing:

  • MYE population by LSOA

  • the Population 24/7 daytime estimate

  • the percentage difference between the two

Figure 2 shows the relationship between the absolute MYE and the Population 24/7 estimates across all LSOAs in each LA. Figure 3 shows the percentage differences between MYEs and Population 24/7 outputs at LSOA level. Figures 4 and 5 also illustrate the percentage differences between MYEs and Population 24/7 outputs, this time by our LSOA Rural Urban Classification 2011 (RUC) (PDF, 805KB) and our 2011 area classification for local authorities (version 2). The RUC categorises statistical geographies based on resident population size (where an Output Area (OA) in a settlement with a population greater than 10,000 people is classed as "urban"). RUC is assigned at LSOA based on the majority category of the constituent OAs.

We see town and city-centre populations increase in Population 24/7 outputs compared with MYEs, but populations decrease in primarily residential areas (Figures 1 to 4). This makes logical sense when considering expected population movement from residential areas to workplaces during the working day. It highlights how usually resident statistics can be inaccurate when used as daytime estimates. For example, within the Swansea LA, the residential areas of Brynmill and Sandfields (for example, Swansea 026C, Swansea 026D, and Swansea 026A) have a reduced population at 2pm in Population 24/7 compared with the MYE (a reduction of 14%, 24%, and 24%, respectively), but an increased population in the marina (Swansea 021B; 53% increase) and city centre (Swansea 025F; 267% increase). Similar examples from other areas are the outlier LSOA in Westminster (Westminster 018B with a 4,137% increase in Population 24/7 outputs compared with MYE), which includes multiple universities and institutions with a relatively large registered workforce (for example, The Royal Courts of Justice, Royal Opera House, theatres, hotels, restaurants, and shops), and a predominantly residential LSOA in Newham (Newham 037E with a 78% decrease in Population 24/7 outputs compared with MYE).

LAs with many rural LSOAs such as Boston, Ceredigion, Gwynedd, and North Norfolk, typically have LSOA populations that are higher in Population 24/7 outputs than in MYEs (Table 2, Figures 3 and 4). This may be because rural LSOAs tend to be larger with a lower population density than urban LSOAs, and so have a larger relative in-transit population on the road network. Additionally, destinations in rural LAs may be more evenly dispersed across LSOAs than in urban LAs where high-population destinations tend to be concentrated in urban centres.

Within each LA, most LSOAs have similar or slightly smaller populations in the Population 24/7 outputs than the MYEs, with a small number of remaining LSOAs having a much larger population in Population 24/7 outputs than MYEs (Figures 2 and 3). This is highlighted by comparing the median and mean LSOA populations, where in 8 of the 14 LAs the median LSOA shows a reduced population, but the mean LSOA increases in population (Table 2). This is consistent with our expectation that populations across a wide residential area will move into concentrated areas of daytime activity, for example to go to schools or workplaces clustered in a small geographical area. Indeed, we see the largest estimated daytime population increases in LAs that are predominantly business, education, industrial and service areas (Figure 5).

While the origin population that is redistributed by the Population 24/7 framework is equal to the MYE for each LA, we see that the total LA population varies between MYE and Population 24/7 (Table 1). Increases and decreases in population within each LA occur when population is drawn from or redistributed to the "buffer zone" outside the LA boundary (see Section 4: Methodology for an explanation).

Islington and Newham LAs, both in London, as well as the Blackpool LA, show an overall reduction in population compared with the MYEs (Table 1). This suggests that these LAs are populated primarily by people who travel to other LAs for work or education. However, the overall population in these LAs may not be decreasing by the amount suggested by the Population 24/7 framework. This framework does not consider all types of population movement, and comparisons against estimates derived from mobile data (discussed in more detail later in this publication) suggest that the Population 24/7 framework may be undercounting people in these LAs.

The Guildford LA is unusual as, despite being an LA known for having a large commuter population because of its proximity to London, both the mean and median LSOA population increase compared with MYEs (Table 2). Examining the percentage difference map for the Guildford LA in Figure 1, we can see that this increase seems to be mainly in the larger LSOAs, again suggesting that this could be caused by lower population density, more evenly dispersed destinations, and relatively larger in-transit populations on the road network.

Some LSOAs have a substantially higher Population 24/7 estimate than the MYE (for example, in Manchester, Swansea and Westminster; Figure 2). These LSOAs usually include specific destinations where a very high daytime population is expected (for example, university campuses and hospitals). Swansea 027D, which contains both the Singleton Campus of Swansea University, as well as Singleton Hospital, has a population increase of 1,568% in the Population 24/7 estimates compared with the MYE.

For universities, this may be an artefact of known issues in our specific implementation of the Population 24/7 framework. In the current implementation of our method, secondary campuses are not accounted for, and all students are allocated to the primary location of the university. This causes an overestimate of students at the main university campus, as no students are allocated to any university buildings outside of a small radius at the primary location, including secondary campuses in other areas. This can be seen in Swansea 027D, where all students are being allocated to the main campus, and none to its equally-sized Bay Campus (located just outside of the LA). This can also cause issues for universities that are spread across a city, such as the University of Manchester, which is geolocated entirely in Manchester 018B (762% increase compared with MYE) in our current implementation, but has buildings located in other areas in the city.

Additionally, we currently assign hospital patients to the hospital closest to their home address, rather than the hospital they were treated at, because of issues with the data used. The magnitude of the difference between the MYE and the Population 24/7 output in these specific cases is extreme and difficult to validate; further discussion on these types of LSOAs is provided when compared with mobile data in Section 4: Methodology.

Figure 1: Comparison between Population 24/7 daytime population estimates and MYEs

Spatial distribution of Population 24/7 daytime population estimates and MYEs, England and Wales, 2019

Embed code

Notes:
  1. Estimates are given by LSOA for each LA.
  2. Each LA can be viewed in terms of its MYE population, the Population 24/7 daytime population estimate, or the percentage difference between the two.
Download the data

.xlsx

Figure 2: The relationship between Population 24/7 and MYE at LSOA level

LSOA population totals according to Population 24/7 daytime population estimates and MYEs, England and Wales, 2019

Embed code

Notes:
  1. The dashed line indicates where the estimates are equal (y=x).
Download the data

.xlsx

Figure 3: Differences between Population 24/7 and MYE at LSOA level

Percentage differences between LSOA population totals according to Population 24/7 daytime population estimates and MYEs, England and Wales, 2019

Embed code

Download the data

.xlsx

Figure 4: Differences between Population 24/7 and MYE at LSOA level by rural and urban classification

Percentage differences, England and Wales, 2019

Embed code

Download the data

.xlsx

Figure 5: Differences between Population 24/7 and MYE at LSOA level by LA area classification

Percentage differences, England and Wales, 2019

Embed code

Notes:
  1. Area classification supergroups taken from 2011 area classification for local authorities version 2.
  2. Supergroups that are not represented by any of our case study LAs are not shown.
Download the data

.xlsx

Contrasting modelling with mobile methods

This section shares the results of an initial exploratory comparison between Population 24/7 model outputs and mobile-derived population estimates. All figures in this section show daytime population estimates at 2pm during a university term time weekday in 2019. They are at the MSOA level for all 14 LAs investigated in this work.

As described in the methods, there are two important considerations when interpreting the results in this section. Firstly, people aged under 18 years and those "in-transit" were excluded from the Population 24/7 when comparing against the mobile methodology. Secondly, 2pm for mobile-derived estimates should be taken as aggregates derived from user events between 2pm and 3pm.

Figure 6 shows a comparison of the two methods as a scatter plot. The x-axis shows the Population 24/7 estimate. The y-axis denotes the median mobile-derived population estimate. The dashed diagonal line represents the ideal scenario in which both of these estimates are the same.

Figure 7 geographically demonstrates a comparison between Population 24/7 outputs and mobile-derived estimates. The first subplot is a choropleth map showing the percentage difference between the two methods. The last two subplots show choropleth maps denoting estimated population totals.

Figures 8 and 9 show the distribution in percentage difference between the two methods, split by LA and RUC, respectively.

Figure 6: The relationship between mobile-derived estimates and Population 24/7 at MSOA level

Comparison of the two methods, England and Wales, 2019

Embed code

Notes:
  1. Error bars denote the interquartile range of the mobile-derived estimate.
  2. The dashed line indicates where the estimates are equal (y=x).
Download the data

.xlsx

Figure 7: Comparison between mobile-derived daytime population estimates and Population 24/7 estimates

Spatial distribution of mobile-derived daytime population median and comparable Population 24/7 estimates, England and Wales, 2019

Embed code

Notes:
  1. Estimates are given by MSOA for each LA.
  2. Each LA can be viewed in terms of the mobile-derived median daytime population, the comparable Population247 estimate, or the percentage difference between the two.
Download the data

.xlsx

Figure 8: Differences between mobile-derived daytime estimates and Population 24/7 at MSOA level

Percentage differences, England and Wales, 2019

Embed code

Download the data

.xlsx

Figure 9. Differences between mobile-derived estimates and Population 24/7 estimates at MSOA level by rural and urban classification

Percentage differences, England and Wales, 2019

Embed code

Download the data

.xlsx

Mobile-derived population estimates only partly agree with Population 24/7 estimates. There is a strong correlation between them (sr(327) = 0.76, p<0.001), but urban areas have larger differences. This pattern is clear in Figure 9 and when considering the median absolute difference for rural and urban contexts - 969 and 1,447 people, respectively.

Furthermore, mobile-derived estimates tend to be larger in urban areas. This is likely because of additional population movement types not currently included in the Population 24/7 estimates, such as leisure, tourism, and a subset of retail and healthcare settings. Additionally, our implementation of the Population 24/7 framework does not apportion travellers to railway stations and airports - in these settings the model only considers their employees. This is not representative since in reality people could be waiting at these locations (and therefore be measurable via a mobile-derived method). Mobile-derived overestimates may also arise because of variations in mobile usage patterns.

The three London LAs (Islington, Westminster, and Newham) in Figures 6 to 8 demonstrate the previous points well. Most MSOAs in these authorities show considerably larger populations in the mobile data estimates compared with Population 24/7. The lack of consideration of tourism, retail not related to supermarkets, and travellers apportioned to rail stations within the Population 24/7 model will likely lead to underestimation in such densely populated and visited regions.

Additionally, the Population 24/7 estimates are low compared with the mobile-derived estimates in the Blackpool LA, particularly around the coastline. Similar to the London authorities, this is likely because of large tourist populations that are not currently captured by the Population 24/7 estimates.

Population 24/7 underestimates in urban areas may be further compounded by another modelling assumption that fixes the starting population size within a buffer zone around the LA being analysed. This is not representative of real-world behaviour since it will exclude people outside the buffer. In practice, the mobile-derived estimates will measure independently of origin and permit counts of people from further afield. This assumption is made purely for computational complexity reasons and could be removed for further iterations of work.

The figures in this section also show the mobile-derived estimates are slightly lower in rural areas. From a Population 24/7 perspective, there may be an overestimate attributable to usual residents being left in their origin locations, again because of missing destinations such as leisure and tourism. From a mobile perspective, it is difficult to pinpoint a likely cause (as outlined in Section 4: Methodology) but the overall underestimate may be because of poorer mobile infrastructure in rural areas and/or mobile usage patterns.

Where there are major discrepancies between the two population counts it is often possible to identify causes in the Population 24/7 methodology. Considering the Swansea LA, the outlier MSOA containing Swansea University's Singleton Campus as well as Singleton Hospital shows a significantly higher population in the Population 24/7 framework estimates compared with the mobile data. This difference is at least partly because of modelling assumptions. Firstly, the student population may not be properly split across Swansea University's two main campuses. Secondly, there are complexities surrounding the assignment of people to the correct hospital in the healthcare aspect of the framework because of a lack of data. We can also hypothesise that mobile usage patterns could be different in a university context and there may be an underestimate in the mobile values (for example students may utilise more readily-available Wi-Fi networks). Furthermore, geolocating events in urban contexts with larger university and hospital buildings could lead to greater measurement errors.

In conclusion, when making comparisons between these methods it is not possible to quantify the impact of potential measurement errors in the mobile-derived outputs at this stage. As discussed in Section 4: Methodology, this is because of the opacity of the methodologies used, complexity surrounding validation, data quality, and potential biases. Therefore, the absolute magnitude of differences between these two methods cannot be definitively interpreted as over- or underestimates. However, this initial comparison has been useful in identifying and validating model limitations, as discussed.

Nôl i'r tabl cynnwys

6. Future developments

Improving input data

There are many additional types of mobility not currently included in our framework. When adding further data, consideration needs to be given to the impact of those data on total estimated population movement.

Our results suggest that our spatiotemporal allocation of university students is inaccurate because they are only allocated within a small radius at the university's main site. Future work is needed to accurately allocate students who attend secondary campuses and peripheral buildings on large campuses.

The most significant destination types not represented in our estimates are leisure and tourism. This will lead to underestimates of daytime population in areas with leisure venues and high levels of tourism. Future work will include these destination types.

Retail destinations are also underdeveloped in the current estimates. Future work is planned to extend time profiles and magnitude estimation for retail data, using Wi-Fi connection and other remote sensor data for shopping centres and high streets.

There are substantial gaps in our coverage of travel for healthcare reasons, including GP visits, specialist clinics, dentists, and private healthcare locations. We hope to address this using additional data, such as GP Episode Statistics and the Private Healthcare Information Network.

While some data on immobile populations are already included, our estimation of these populations could be improved. We are obtaining more granular data on populations residing in prisons and are looking into additional sources to provide more detailed information on populations in residential care.

Our outputs currently redistribute the usually resident population by time of day, and so do not include short-term immigrants, international tourists, and other non-domestic temporary visitors. Capturing these populations in our origin data will be a goal of future work.

Additionally, second home ownership is not well represented. Owners of second homes will be considered residents of their primary residence, but their activity around their second home may still be measured by our destination data. This could lead to underestimates in residential areas where second homes are common, especially during times of peak usage. Data that capture second home occupancy over time could help address this.

Another limitation to the current estimates is that our travel network only covers major roads. We continue to liaise with collaborators from the University of Southampton and within the Office for National Statistics (ONS) to develop the travel network.

Finally, the time profiles used in the current model use data from 2011 to 2015. We will be incorporating more recent sources to reflect changing travel patterns for more timely population mobility estimation.

Validation and estimating quality

Validating our outputs is challenging because we use the best available data as inputs to the model. A full discussion of potential methods to validate Population 24/7 outputs is given in the Population 24/7 -- A method to account for daily population mobility in spatiotemporal population estimates article, on the UK Statistics Authority website (PDF, 984KB). However, some potential approaches are:

  • qualitative sense-checking specific test areas

  • quality assessment of input sources (for example, timeliness, trustworthiness, completeness, purpose)

  • comparisons with daytime population statistics published from other sources, for example the EU ENACT project, on the European Commission's website

  • comparisons with mobile data and other high-frequency low-latency mobility data

An important challenge when estimating quality is accounting for the spatial and temporal dimensions of our outputs; quality may not be consistent at different times of day, types of day, or at different levels of statistical geography.

We will be seeking input from the local authorities included in this case study to use their local knowledge to assess the accuracy of these experimental statistics, and to discuss their needs as potential users of daytime population statistics.

Population mobility across the daytime

Population mobility varies substantially across each day, and daily mobility patterns vary over time. Future analysis will investigate daytime populations across different times of day and types of day.

Mobile-derived population estimates

Our use of O2 Motion data is limited to an initial exploratory comparison with Population 24/7 estimates. In the future we will assess the feasibility of using mobile data for both daytime population estimation and validation of Population 24/7 estimates. This may include updating destination time profiles and capturing population movements not captured in other data.

However, the uncertainties surrounding the use of mobile phone data to derive daytime population estimates must be understood and mitigated further to move from this preliminary comparison to fully assessing feasibility. We will also be attempting to validate the accuracy of the mobile-derived estimates, as previously described.

An important limitation to overcome is the use of data derived using opaque, proprietary methodologies. This conflicts with the ONS' statistical standards for transparency and public good. A means of mitigating this would be to set up a close partnership with the data suppliers. This would improve the transparency of the methodologies used while potentially improving them by combining expertise of both the supplier (as technology and market experts) and the ONS (as population estimation experts). This will be a priority when acquiring any mobile phone data in the future.

We will also be focussing more on the ethical considerations of mobile-derived population estimates and will engage with bodies like the Geospatial Commission to discuss issues around ethics and public acceptance.

Acknowledgements

We are grateful to Professor David Martin and Dr Samantha Cockings for their input throughout this research.

Nôl i'r tabl cynnwys

8. Cite this methodology

Office for National Statistics (ONS), published 30 May 2023, ONS website, methodology, ONS working paper no 31 - Estimating population by time of day

Nôl i'r tabl cynnwys