FOI Reference: FOI/2022/3995

You asked

Can you please publish a list of data (or documents that show the list of data) from non-government organisations that (a) is web scraped by ONS, (b) has been requested by the ONS, and (c) has been received and is used by ONS. Where data is being used by ONS, please say if it is published in its own right or, if not, which published variables each data set contributes to.

We said

Thank you for your request.

ONS uses multiple data sources from government and non-government organisations to support the ONS statutory function. The following information relates to some our non-government data sources.

Financial transaction data (business to consumer, business to business) solely used for statistical analysis of UK card spending. You can find the data sources here: Statement on data sources

Point of sales data from UK retailers, and other commercial prices data (used car sales and public transport fares) to support the transformation of inflation and retail sales statistics. Several web scraped data covering clothing, electronic items and package holidays are additionally used for the consumer price statistics. An example of a retailer data used is here: Transformation of consumer price statistics : November 2021

Mobility data to support the impact of mobility restrictions during the Covid-19 pandemic. How we used these data sources are listed here: Understanding mobility during the COVID-19 pandemic

UK Business websites text data from glass.ai to inform survey response chasing efforts and gain insight into the impact of Covid-19: Extracting text data from business website Covid-19 notices.

Utilities, Students Hall, Holiday home, and Hospital & Care home address data to validate the active addresses of households for the Census 2021 only.

Trade data from numerous sources including CAA, providers of road freight surveys, Vessels Value data and UK Chamber of Shipping data which, combined with other government data, help form the national accounts: UK National Accounts, The Blue Book

National expenditure data from commercial sources, including people's spending on horses, private airplanes, and boats which is used within the calculation of GNI within the national accounts: All data related to gross domestic product (gdp)

Data relating to the 17 sustainable development goals is provided from open licence data in the public domain and is published on the SDG website: UK data for the Sustainable Development Goals

Data for weekly real-time Faster Indicators, including Adzuna job vacancy data and Springboard retail footfall data:  Economic activity and social change in the UK, real-time indicators

Other non-government data sources that ONS has acquired and contain personal information are all published here: External data sources containing personal data

Please find information relating to our web scraped data sources in the associated downloads.

We intend to publish the information requested in parts b) and c) of your request. Therefore, the information is exempt under s.22(1) of the Freedom of Information Act (FOIA) 2000, as there is an intention to publish this information in the future.

This exemption is subject to a public interest test. We recognise the desirability of information being freely available. The concept of transparency has been seriously considered by the Office for National Statistics when setting out to create and publish lists of our data sources but also disclosing everything we reasonably can in response to this request.

A large part of this project is discussing the publication of these data sources with our suppliers, particularly balancing the concepts of transparency and confidentiality. Early disclosure, prior to completion of this project and appropriate discussions with our data suppliers, could undermine our supplier relationships and our functions should access to data sources be withdrawn.

As said previously, we obtain these data sources to fulfil our functions as not only a producer of official statistics that inform the public good, but as a supporter of statistical research and the research community. If our access to such data sources were affected, this could undermine our ability to fulfil our statutory functions, which is not in the public interest.