FOI reference: FOI-2025-3126

You asked

Under the Freedom of Information Act 2000, I am writing to request information regarding datasets currently produced by the Office for National Statistics that utilise modelled data rather than direct survey data or other official primary sources. 

I request the following information: 

  1. A complete list of all ONS datasets that are currently being produced using modelled data, statistical models, or synthetic data rather than being derived directly from surveys, censuses, or other official primary data collection methods. 

For each dataset identified in point 1 above, please provide: 

  1. The date from which the dataset began being produced using statistical models or modelled data (i.e., when the methodology changed from survey/primary data collection to modelling approaches). 
  2. The type and nature of statistical modelling being employed for each dataset, including but not limited to: 
  3. The specific modelling methodology (e.g., econometric models, time series models, regression analysis, machine learning algorithms, nowcasting models, demographic projection models, etc.) 
  4. Whether the models use synthetic data generation techniques 
  5. The primary data sources or inputs used to feed the models 
  6. Whether the modelling approach involves interpolation, extrapolation, or estimation techniques 
  7. Whether there is a current plan, policy, or timeline in place for the dataset to return to being produced through surveys, censuses, or other official primary data collection methods. If such plans exist, please provide: 
  8. The expected timeline for implementation 
  9. The rationale for the planned change 
  10. Any interim milestones or review dates 
  11. For datasets where no plan exists to return to survey-based methodology, please provide the rationale for maintaining the modelled approach. 

Clarification: 

  • By "modelled data" I mean datasets where the published figures are derived through statistical modelling, estimation techniques, synthetic data generation, or interpolation rather than direct measurement or survey responses. 

  • This includes but is not limited to: nowcasting models, econometric models, demographic projections beyond standard population estimates, and any datasets using machine learning or AI-based estimation methods.

We said

Thank you for your request. 

You have asked us to distinguish between statistics that are based on modelling and estimation and those that are directly measured from surveys or other primary data sources and address a number of questions related to those that fall into the former category. As we outline in more detail below, this distinction is not, in practice, clear cut as many statistics are derived from both. 

The use of models and imputations is a normal and essential part of turning a wide range of data into consistent statistical datasets. Within economic statistics and population frameworks, the outputs are naturally complex and highly integrated for coherence. Our statistics draw on a large range of data sources depending on the topic, from surveys, administrative records and external datasets. For example, the supply-use tables which assess the level of the UK economy are based on a 112 product by 112 industry breakdown on an annual basis and then also by sector, so over 10,000 time series in that production process alone. The ONS website contains over 50,000 published time series.  

The ONS uses complex reproducible statistical production systems which take input data at very detailed levels and apply standard statistical processing with well- established standard methods of estimation, data confrontation, and imputation to address gaps, inconsistencies, or missing data. This ensures that the resulting figures are timely, robust, and internationally comparable, while reflecting the best available evidence at the point of publication. Updated input data, such as late or corrected data returns, then naturally lead to revisions to published estimates once these updated data are available. An example of revision analysis, which can capture how data is updated due to latest data replacing earlier estimates is available here: GDP revisions in Blue Book - Office for National Statistics.  

Our use of modelling, estimation and integration is an integral part of the production of all statistics. For example, population statistics also depend on combining multiple sources, such as census data, administrative systems, and surveys, with modelling approaches used to support estimates between census years or for small geographic areas. 

The ONS is transparent in publishing quality management information (QMIs) which are available on the ONS website for a wide range of outputs covering data, methods and compilation practices. The QMIs are available on a searchable repository here: All methodology - Office for National Statistics

Given the above, we recommend narrowing the scope of your request significantly, so we may answer appropriately. It is difficult to provide specifics on how to narrow the scope of the request owing to the scope of information you are interested in, but a more manageable request could include a much more restricted timeframe and focus on a specific time series.