The previous 6 articles on Value Added Tax (VAT) turnover outlined progress we are making in the development of this administrative data source. We have established a new approach to implement VAT turnover data within the national accounts by the end of 2017.
ONS is in the process of creating a new VAT turnover processing platform which incorporates recently undertaken methodological research and makes use of new technologies. This platform will provide the means to combine estimates of output for UK industries that are currently covered by the Monthly Business Survey (MBS), estimates that are primarily used to form part of the output measure of gross domestic product, GDP(O).
The new approach announced today 2 February 2017 involves initially introducing VAT turnover data covering small and medium-sized businesses alongside extant MBS data, with MBS data alone continuing to provide estimates of output for the largest businesses.
The use of VAT turnover data provides us with the opportunity to produce economic output time series for a range of businesses that have formerly been too small to survey effectively on a frequent basis. To highlight these opportunities, this latest article includes detailed breakdowns of experimental data in the sports, amusement and recreation sectors of the economy. However, as these data are derived from an experimental dataset and are both non-seasonally adjusted and published in current prices, they are not directly comparable with headline GDP data.Nôl i'r tabl cynnwys
This article summarises our research and analysis findings on the administrative dataset, referred to as “VAT turnover data”, created from Value Added Tax (VAT) returns data provided by HM Revenue and Customs (HMRC). We are undertaking a programme of work for the initial implementation of VAT turnover data within the national accounts, which will see first official publication in the Quarterly National Accounts (QNA), due for release on 22 December 2017.
A new ONS VAT processing platform is being developed, involving a substantial review and update of our methodology and making use of new technologies. The new platform will provide the capability for us to process large complex datasets, of which VAT is a significant example. The updated methodology and processes that we aim to employ are addressing a range of specific data challenges highlighted in our previous article, as well as more general data challenges for the use of administrative data.
Ongoing research and analysis alongside the production of the new ONS VAT processing platform aims to establish further value of the VAT turnover data for estimating the output of other industries, such as agriculture, where quality improvements could be introduced for particular parts of the agriculture industry.
For this article, we have been able to make use of the latest available snapshot of the VAT turnover data, taken on 3 January 2017. It brings updates to our previously published data and includes analysis of the revisions we see to the time series due to the latest VAT returns. The data presented in this research article are preliminary and subject to change, following the ongoing methodological, analytical and processing improvements. They are presented as a guide to the possibilities, challenges and the characteristics of the VAT turnover data. As they are based on an experimental dataset, that is non-seasonally adjusted and is on a current price basis, they will differ from the data that will be implemented in the national accounts. The next article, currently planned for publication in May 2017, will make use of data and metrics produced with the new processing platform to provide analytical insights into the VAT turnover data, the impact our new methodology has on the quality of our estimates and an update on our progress towards implementation within the national accounts.Nôl i'r tabl cynnwys
The Office for National Statistics (ONS) is committed to addressing the strategic recommendations made in the Bean Review of UK Economic Statistics, in line with our Economic Statistics and Analysis Strategy (ESAS). We intend to make use of VAT turnover data within the national accounts by the end of 2017 as part of this commitment.
The use of the VAT turnover dataset is one of the first steps towards transforming the way that we use large externally collected administrative data in preference to data collected via ONS survey. With this development and implementation of the VAT turnover dataset, we are helping to realise the potential of new technologies and methods that could deliver useful statistical data from a range of other externally collected data, including those from administrative data sources.
3.2 Scope of implementation and coverage
The use of VAT turnover data across relevant statistics within the national accounts is being considered for further strategic deployment. For 2017, we are working to implement VAT turnover in specific areas of the output measure of gross domestic product GDP(O), where the value and effectiveness of its use is carefully balanced against the risks and challenges that are to be overcome in order to deliver the solution.
We have evaluated the options following an internal review of our methodology, consultation with stakeholders, academic associates and international experts and have agreed that the best implementation strategy for the use of VAT turnover in 2017 lies in the creation of monthly time series data for industries as identified within the Standard Industrial Classification 2007 (SIC) structure, providing appropriate levels of detail for VAT turnover time series.
This data will primarily be used to create time series data representing output estimates for industries covered by the Monthly Business Survey (MBS). Details of the industries covered by the MBS can be found in the associated Quality and Methodology Information report.
Our agreed model for the initial implementation of VAT turnover data within the national accounts involves a combination of output estimates from both the MBS and the newly developed VAT turnover dataset. MBS data pertaining to the largest businesses will continue to be used as a timely and effective measure of these entities. This form of data retains detailed information about the typically complex structures of these business enterprises, which can be better understood via the usual liaison with survey respondents when necessary. The VAT turnover data will be initially employed to improve coverage of smaller businesses alongside existing MBS data, where coverage as a proportion of observations of all businesses can be vastly increased.
As part of our development of the new VAT processing platform, we are performing analysis to establish the suitability of our agreed approach for implementation in particular industries and the feasibility of replacing survey data with high-quality administrative data where possible to help realise the ESAS goal of compiling statistics at lower cost without compromising the quality of our statistics.
An important issue currently under analysis is the degree to which the VAT data can be used for detailed size bands for small businesses, for which we propose to initially investigate data for construction (SIC 41.2 to 43) and retail industries (SIC 47). Other industries to be covered by the VAT turnover dataset will make use of grouped employment size bands, though further investigation aims to assess the viability of using more detailed size band information.
3.3 Comparison of data source coverage fundamentals
For the selected industries, construction and retail (SIC 41.2 to 43 and 47, respectively), we have provided MBS paradata to compare with the VAT turnover for the same industries. We provided information helpful for understanding the MBS paradata more generally in section 3 of our December 2015 VAT article.
Table 1: A comparison of the relative numbers of observations available from MBS, the VAT turnover dataset and the Inter-Departmental Business Register – Construction (SIC 41.2-43) – Correct as of January 2017
|Construction (SIC 41.2-43)||Monthly Business Survey employment size-band share of universe turnover||Monthly Business Survey forms despatched (November 2016)||VAT reporting units used in this article|
|Employment size band|
|<100 employment and turnover >£60m||3.3%||67||0.9%||Not applicable in VAT as already included within smaller employment bands above|
|Source: Office for National Statistics|
Download this table Table 1: A comparison of the relative numbers of observations available from MBS, the VAT turnover dataset and the Inter-Departmental Business Register – Construction (SIC 41.2-43) – Correct as of January 2017.xls (27.6 kB)
Table 2: A comparison of the relative numbers of observations available from MBS, the VAT turnover dataset and the Inter-Departmental Business Register – Retail (SIC 47) – Correct as of January 2017
|Retail (SIC 47)||Monthly Business Survey percentage share of universe turnover||Monthly Business Survey forms despatched (November 2016)||VAT reporting units used in this article|
|Employment size band|
|<100 employment and turnover >£60m||1.6%||16||0.3%||Not applicable in VAT as already included within smaller employment bands above|
|Source: Office for National Statistics|
Download this table Table 2: A comparison of the relative numbers of observations available from MBS, the VAT turnover dataset and the Inter-Departmental Business Register – Retail (SIC 47) – Correct as of January 2017.xls (20.0 kB)
For both the construction industry section and the retail industry section, the large number of returns available in the VAT turnover dataset reveals huge potential for improved coverage. This is especially noticeable for the smaller employment bands, where consideration of respondent burden and other survey efficiency challenges lead to proportionately lower coverage. When comparing this low coverage with the significant contribution made by the smallest construction businesses (22.9%), the benefits of a fully realised VAT turnover dataset that covers a much greater proportion of these businesses are patent.Nôl i'r tabl cynnwys
The VAT turnover data we use is open to revisions for 25 months previous to the most recent snapshot of data received from HM Revenue and Customs (HMRC). This has been set to mirror the revision policy of short-term output indicators which fall in line with the National Accounts revisions policy. Short-term output indicators have the latest 13 months open for revision. In order for the current cleaning methodology to work, the 12 months prior to this point are also required and are subject to minimal revisions, hence the requirement for 25 months to be open for revisions at each monthly snapshot of the data. This means that many of the reasons for revisions outlined in the following section will, for this publication, only affect data since January 2015. Revisions outside this period are a result of manual adjustments made to the data. Revisions since the previous article are to be expected considering the experimental nature of our current processes.
4.1 New VAT returns
When a VAT-registered business submits a VAT return to HMRC, the turnover reported for the period they are returning for will be incorporated into the following month's snapshot of VAT data. We do not currently estimate for non-response; therefore, aggregate turnover for a given reference period will increase as more returns are received. This generally causes upward revisions to our dataset. In some cases new returns may lead to downward revisions where cleaning has previously taken place. As more data become available, our cleaning methodology may revise figures given for previous snapshots, replacing them with figures that are more coherent with the data now available. The subsequent revisions can be positive or negative.
The rate at which HMRC receives VAT returns for each month from businesses in the manufacturing industries is demonstrated in Figure 1. This shows the rate of response for the months January 2016 until June 2016. However, this pattern is generally consistent throughout the snapshots of data we are using for VAT, showing the same trend regardless of the time period analysed. The majority of VAT registered firms report quarterly and are subject to a stagger and approximately 60% of VAT returns are available at 2 months after the reference period (T plus 2 months). For example, 61% of returns relating to January were submitted by March. Whereas by T plus 5 months, we have a close to complete dataset with approximately 98% of returns submitted. However, this only provides an indication of how complete the dataset is, as the number of businesses due to return in any period fluctuates and figures are subject to slight alterations. More returns may be added to the individual reference periods in following deliveries, which could affect some of the percentage values.
Figure 1 shows that, at the time of our last publication, we already had an almost complete picture of the data. Using the September 2016 snapshot to publish data up to and including March 2016 meant that we had approximately 99% of the returns due for the period. The result is a lack of large upward revisions in the data that we have published. Had we included up to June 2016 in our last publication, the data would see upward revisions as we receive additional returns for the latest period.
Details on the estimation methods we are currently looking to employ in order to address the issues presented by the timeliness of response can be found in section 7 of this article: ‘New methodology and data challenges’. The effect of new returns on the data will generally have the greatest impact in the most recent months, but as we are publishing a mature dataset this issue is largely diminished. However, late submission of returns and businesses opting to submit their VAT return annually can cause revisions to the data further back, though these revisions are typically small.
Data cleaning also causes revisions to the data. The new cleaning methods that we have developed for use in the new VAT processing platform are more comprehensive and better analysis of the impact that such cleaning introduces is now possible. It is important to recognise that for this article, many of the larger revisions to the data since the previous article were the result of the current cleaning and processing methodology developed for initial research and analysis purposes.
4.3 Business errors
It is possible for businesses to make clerical errors when submitting their VAT return. Businesses are able to change the level of turnover they have submitted for a reporting period and any such change is then updated in the VAT turnover data for the following monthly snapshot. This can create either positive or negative revisions depending on whether a firm overstated or understated their turnover.
Businesses that make these revisions to their turnover are identifiable by the “return type” variable in our VAT dataset. We can use this administrative variable to identify the particular reason for the revision and make appropriate amendments to our data. VAT payment enforcement can also cause revisions across historical time periods of our data.
4.4 Changes to apportionment as a result of partial VAT return response in the business structure
Some complex VAT structures may experience revisions as a result of the apportionment method we currently use. An illustrating example involves an enterprise that is linked to 3 VAT registration units and 3 reporting units within its business structure. Within one monthly snapshot, if only 2 out of 3 VAT registration units have submitted their turnover for a given reference period, the turnover from the 2 VAT registered units would be linked to the enterprise and then apportioned over the 3 reporting units. If in the following month, the third VAT registration unit submitted a VAT return for the same reference period, this additional turnover would be linked to the enterprise total and again apportioned to the reporting units, thus revising their turnover between snapshots. This is another example of how additional returns can affect the data; however, this is a more specific issue related to our apportionment methodology and is being improved for use in our new VAT processing platform.
Nôl i'r tabl cynnwys
The data in this article are similar to that presented in the previous article but have been updated using the latest available snapshot of the HM Revenue and Customs (HMRC) VAT turnover dataset (dated 3 January 2017). It provides data for an additional quarter, Quarter 2 (Apr to June) 2016 and includes revisions across the full time series of this experimental data.
We have used this data to highlight some of the revisions that have been made since the previous article and elaborated more specifically on the provenance of these revisions. The data issues that we describe in section 4: “Revisions overview and analysis” and section 5.3: “Data revisions” are to be addressed and resolved using the new ONS VAT processing platform and the latest methodology derived for its development. The overall VAT turnover processing issues and methodology are summarised in section 7: “New methodology and data challenges”.
5.1 Manufacturing – SIC section C
Nominal estimates, derived from the VAT database of non-seasonally adjusted (NSA) turnover, for manufacturing (UK Standard Industrial Classification 2007 [SIC] Section C: 10 to 33)1 in Quarter 2 2016 was £142.9 billion compared with £140.1 billion in Quarter 2 2015. Table 1 provides a time series of nominal NSA Manufacturing turnover from Quarter 1 (Jan to Mar) 2014.2
Table 3: VAT-derived non-seasonally adjusted current price turnover for UK manufacturing, without adjustment for non-response, £ million
|Period||Q1 2014||Q2 2014||Q3 2014||Q4 2014||Q1 2015||Q2 2015||Q3 2015||Q4 2015||Q1 2016||Q2 2016|
|Feb 2017 article manufacturing data (£m)||137,602||139,101||140,032||145,064||140,337||140,132||139,061||143,184||136,831||142,910|
|Oct 2016 article manufacturing data (£m)||137,569||139,068||140,032||145,061||139,535||139,913||139,194||142,870||137,814||–|
|Source: Experimental Data, Office for National Statistics|
Download this table Table 3: VAT-derived non-seasonally adjusted current price turnover for UK manufacturing, without adjustment for non-response, £ million.xls (18.4 kB)
5.2 Services industry selection
Data covering six 2-digit SIC industries in the services sectors have also been updated and can be found in the dataset accompanying this article. These industries have been selected to demonstrate a diverse range of industries covered by the VAT turnover data.
5.3 Data revisions
Additional data produced for our research and analysis are presented in the dataset accompanying this article. Comparing this data with that presented in the previous article shows that “air transport” (SIC 51) and “food and beverage service activities” (SIC 56) have experienced modest revisions.
A moderate revision to Quarter 1 2016 in SIC-C is the result of improved cleaning of our dataset, which detected a business classified within “manufacture of other transport equipment” (SIC 30) that was reporting unrealistic monthly turnover figures. The reported turnover was then adjusted to bring this period back into alignment with previous periods, hence causing the negative revision to this industry section.
Further analysis into SIC 93 led to the detection of 2 businesses with reporting errors in 2014. These were then adjusted downward to correct figures, hence the revisions of negative 4.8% and negative 4.3% to Quarter 2 and Quarter 3 (July to Sept) 2014 respectively. These detections provide further examples of data challenges that will be addressed by our new methodology.
The changes are the result of minor adjustments to the data due to general improvements in cleaning and further analysis of the individual industries. More recent snapshots of the VAT dataset from HMRC contain late VAT returns from businesses, which cause small, positive increases to recent quarters for these industries. Our current methods do not estimate for this non-response type missingness and as such an upward revision bias introduced by more recent returns is present. This missingness will be estimated and imputed for using the new methodology developed for use in the new VAT turnover processing platform.
These revisions are similar in magnitude to revisions seen in other short-term output indicators, which are themselves subject to late or revised returns and survey methodology changes. An example of such revisions can be seen in the Revisions to the Index of Production published on the ONS website.
The industry of “accommodation” (SIC 55), although generally unchanged, experienced a large adjustment of negative 4.7% in the first quarter of 2014 compared with the total turnover reported in October 2016. This was the result of improved cleaning and further investigation into this industry, which revealed the presence of a single business that had submitted a “£,000” reporting error. These errors are typical of the data cleaning adjustments that we have built into our current processing system and are being improved within the new VAT turnover processing platform.
In the “advertising and market research” industry (SIC 73), analysis revealed a business which had consistently been reporting a “£,000” error throughout the series, this was then adjusted and can explain the large revisions to this industry.
The “travel agency, tour operator and other reservation service and related activities” industry (SIC 79) saw a large revision due to a processing error, which was increasing the turnover for one business between Quarter 3 2015 and Quarter 1 2016. The additional data available since our publication in October 2016 has enabled our current processing system to identify a “£,000” error and made the necessary correction.
The use of VAT turnover data provides time series information for all industries across the UK. As such, it provides a measure of industries that are not surveyed by our Monthly Business Survey (MBS). An industry in this category of interest is the agriculture industry, SIC 01.
5.4.1 Current agriculture data
Our national accounts makes use of data from the Department for Environment, Food and Rural Affairs (DEFRA) and the Annual Business Survey (ABS) to provide economic estimates regarding the agriculture industry.
We currently use short-term indicators (provided by DEFRA) in terms of current price output with reference to 2010 prices. The indicators in use are based on annual, monthly and quarterly estimates of the production of a range of agricultural products:
- monthly – cattle, sheep, pigs, poultry, milk
- quarterly - eggs
- annual – wheat, barley, potatoes, oilseed rape
For further information regarding agriculture in the output approach to estimating gross domestic product, GDP(O), please refer to the GDP source catalogue.
Analysis based on DEFRA’s 2010 Aggregated Annual Accounts revealed that the indicators that we currently use provide a combined coverage of approximately 71.2% of the total current price output. Better coverage of this industry would realise obvious benefits through improved understanding and analysis of the agricultural economy. “fruit” (3.2%), “fresh vegetables” (6.8%) and “plants and flowers” (5.4%) have been highlighted as 3 components that could improve coverage of the industry.
Furthermore, some indicators employed are of annual periodicity only and are reported on the year-to-date before harvest later in the same year. Whilst these figures from DEFRA are of high quality generally, the periodicity and timeliness dimensions of the indicators reduce the suitability in early estimates of GDP. The annual harvest estimates are a significant part of our short-term output indicators for agriculture, without which coverage of the industry falls from approximately 71.2% to approximately 52.6%.
5.4.2 VAT turnover data
The use of monthly VAT turnover to augment the indicators that we already use could potentially improve the coverage and timeliness of data in some components of the industry.
Whereas the agricultural industry is not surveyed in the MBS, it is sampled in the ABS. The ABS is used to benchmark the short-term indicators provided by DEFRA in conjunction with the other indicators previously mentioned. The ABS for SIC 013 only covers businesses classified to SIC 01.6 and SIC 01.7 (excluding groups 01.1 to 01.5) that have a VAT registered turnover of £83,000 or more, and also includes businesses that are registered voluntarily.
By comparing businesses that feature in the Inter-Departmental Business Register (IDBR) with those that are registered with HMRC for VAT, we are aware that there are a large number of agriculture businesses (approximately 40%) that are below the legal turnover threshold and therefore are not registered with HMRC to pay VAT. These small unregistered agriculture businesses account for approximately 20% of total turnover within the agricultural industry. Although the reduced coverage for small businesses makes creation of short-term indicators from VAT turnover data more difficult, further analysis should reveal whether data pertaining to parts of the agriculture industry would prove suitable for use within the national accounts.
5.4.3 Data characteristics
One complication to consider when comparing VAT turnover data with DEFRA output estimates is the basis of measurement. VAT turnover data present economic output of a particular industry class whereas the DEFRA data covering the industries of interest are provided on a product basis.
Whilst businesses are classified to industries according to their main activity and product, it is common for larger businesses to produce more than one product. Products are classified according to the Classification of Products by Activity (2008), which is coherent with the SIC, as part of the overall framework of the Statistical classification of economic activities in the European Community (NACE). The subsequent problem is that output indicators from DEFRA on a product basis cannot be directly linked with VAT turnover data on an industry basis. Whilst the difference of measurement basis means that direct comparisons cannot be made between the 2 datasets, their co-measurement of economic output encourages the use of both together, to improve the overall estimation of output by this industry.
5.4.4 Further analysis
We aim to further analyse the agriculture industry in the new ONS VAT processing platform, with particular attention on whether it can provide data of suitable quality and coverage of the SIC 01 groups excluded from the ABS and MBS. We will compare the economic output time series that we derive from VAT turnover data and perform correlation and coherence analyses against the data we derive from our other data sources, including current price output estimates from DEFRA. This procedure will also be applied for SIC 01.6 to 01.7, as we can gather data from the ABS to compare the results to our VAT data in a similar way.
We aim to analyse the VAT turnover data in this industry more carefully before proceeding to evaluate the suitability of making these comparisons with the data from DEFRA. Following our analysis in this area, we will consult our relevant stakeholders and experts to assess the suitability of the use of this data following the initial use of VAT turnover data elsewhere in the national accounts.
In addition to the further investigation of the potential uses of VAT turnover data in the compilation of output estimates for the agriculture industry, VAT data is currently being utilised as part of quality assurance of the current estimates of agriculture gross value added (GVA).
Notes for VAT Turnover data, January 2014 to June 2016:
- The estimates provided in this article are classified to the 2007 UK Standard Industrial Classification (SIC 2007).
- As provided in all VAT returns received by 31 December 2016 and delivered to ONS in January 2017.
- For a detailed breakdown of what is included in Division 01, see section 12.1: ‘SIC(2007): 01 explanatory breakdown’.
The use of VAT turnover data provides us with the opportunity to produce economic output time series for a range of businesses that have formerly been too small to survey effectively on a frequent basis. Additionally, such businesses are subject to a reduced survey selection that is consistent with the Osmotherly guarantee on reducing the burden for survey respondents.
In order to demonstrate this possibility, we have extracted VAT turnover data on the Standard Industrial Classification 2007 (SIC) Division 93, which includes businesses whose major economic activity involves sports, amusement and recreation activities. Our current VAT turnover processing methods allow us to produce experimental data down to the SIC “class” or 4-digit level. We have provided a number of time series visualisations demonstrating this data in this section.
Please note that for a detailed breakdown of what is included in Division 93, see section 12.2 ‘SIC(2007): 93 explanatory breakdown’.
Figure 3 shows the time series for the entirety of Division 93. It follows a strong seasonal path with significant growth from Quarter 1 (Jan to Mar) to Quarter 3 (July to Sept) each year. A trend of growth is evident for the industry across the time series, as highlighted by the line of best fit. Part of the contribution to growth in this industry is due to the increasing number of businesses over time. For example, the number of received VAT returns within a monthly snapshot has increased from 23,294 in January 2015 to 24,066 in January 2016.
Figure 4 demonstrates a class that is dominant within the division and helps drive the seasonal path. This class contains professional sports clubs from sports such as football, rugby and cricket. This class make up a significant amount of turnover for the division (approximately 40 to 50% across the time series based on VAT turnover).
Figure 5 shows the remaining classes within SIC(2007) Division 93, revealing the comparable characteristics of the related industries.
SIC 93.11 – Operation of sports facilities
This class includes businesses that operate sports facilities such as sporting arenas and stadiums. This class is the second most dominant class within the division and follows a seasonal path and a trend of growth similar to the parent division.
SIC 93.13 – Fitness facilities
This class illustrates a relatively stable picture with growth of approximately 11% across the time series. The majority of turnover here is attributed to fitness clubs.
SIC 93.19 – Other sports activities
This class does not follow the divisional quarterly profile, but has displayed strong growth over the period, approximately 50% since Quarter 1 2014. This class is where activities of sports leagues, sports regulation bodies and other related sports ventures are classified.
SIC 93.21 – Activities of amusement parks and theme parks
This is the smallest class within Division 93. It follows a similar quarterly path and trend as the overall division. As one might expect anecdotally, amusement parks and theme parks show a well-defined “summer holiday” effect.
SIC 93.29 – Other amusement and recreation activities
This class also follows a seasonal path similar to that of the division and similar to class 93.21. The types of businesses classified to this SIC class also look to raise the majority of their turnover during holiday and recreation periods.Nôl i'r tabl cynnwys
Following an internal methodological review, consultation with stakeholders, academic associates and international experts, we are creating a comprehensive process for the creation of a VAT turnover-derived dataset that can supplement short-term output indicators currently used within the national accounts. A major economic statistic that we hope to improve with this data is the output measure of gross domestic product, GDP(O).
The new processing platform and the methods employed therein aim to address a number of administrative data challenges, including those which we have mentioned in the previous article. Visualisations are being created as an aid to interpretation of the data across the many combinations of variables available. The progress of our understanding and approach to each of the data challenges is presented in this section.
7.1 Estimation for incompleteness (non-response)
As part of our methodological review of VAT processing, we have selected a number of methods for testing with the VAT turnover data, including expansion estimation1, ratio estimation2 and winsorisation of outliers3. These methods are to be tested using the full VAT dataset in order to discover any biases that the technique might introduce to our output statistics and find the impact on turnover reported for each Standard Industrial Classification (SIC) industry.
7.2 Cleaning and validation
The current processing methods used to better understand the VAT turnover data have relied upon strictly logical rules for identifying and correcting errors in the data, such as magnitude-of-entry errors (that is, thousands pounds rule). Further rigorous analysis is then performed by our research team, using anomaly detection techniques to discover where suspicious values have not been cleaned in an automated fashion. With such a large dataset to process this method is resource-inefficient, is subject to human error and may introduce bias to the results.
The new processing methodology improves the overall cleaning stage and reduces the need for manual intervention by our research team by applying more comprehensive cleaning rules that have been derived from the research of the VAT dataset to date and our methodology review and recommendation process specifically for VAT.
The cleaning of the VAT data is performed at multiple stages within our new methods process. The raw data delivered by HMRC is cleaned using the available information pertaining to each return:
- duplicate returns are identified:
- information about what “return types” variables tell us and how we use this information
- invalid returns are cleaned:
- HMRC-applied error codes are used to remove returns that cannot be used and are to be imputed for
- suspicious values:
- automated cleaning for common form entry errors (for example, “£,000”)
- are flagged for further investigation by process analysts
We have previously described how the apportionment of turnover to reporting unit level is currently undertaken using business register employment. As we develop the new ONS VAT processing platform, we will be testing a number of apportionment methods to find the most effective and accurate for the apportionment of turnover, from VAT unit to enterprise unit and from enterprise unit to reporting unit.
Our primary options for testing in the apportionment module of the new platform include:
- comparison of “apportionment by business register employment” and “apportionment by business register turnover-per-employee”
- weighted apportionment across industries according to weights derived from VAT turnover-per-employee of simple business structures
- more detailed weighting information derived from turnover-per-head dependent on reporting unit size (by employment)
Testing these different methods will help us meet the challenges mentioned in our previous article regarding industries that often contain large and complex businesses, whose turnover contributes a large proportion of the respective industries. Industries where we have noted a difficulty in proper apportionment when using hitherto developed methods are “construction” (SIC 41 to 43), “wholesale” (SIC 46), “retail” (SIC 47), “real estate” (SIC 68), “head offices” (SIC 70.1) and “business administration” (SIC 82). A methodological approach undergoing analysis using “weighted apportionment by industry” will take into account the differing turnover-per-head seen across all industries and should address the difficulties with these highlighted industries.
An industry group that we have previously highlighted as been distinct when analysing the turnover-per-head methodology is the financial services industries group (SIC 64 to 66). Financial services are notable for higher turnover values per head of employment than other industries. Though this suggests it being a more labour-productive industry, this assumption does not hold when comparing the gross value added by the financial services industries. Establishing an effective means of apportionment of turnover to this industry group will prove most beneficial in the overall allocation of VAT turnover within the economy and we are currently working to develop and analyse the results of these methods and will communicate the results in the next article.
The hitherto employed method has been a simple frequency conversion of annual and quarterly to monthly series. This interpolation method is well established for creating higher frequency data but does not take account of any seasonality that may be present in any particular industry.
The characteristics of VAT returns provide for a rich dataset of mixed frequencies that provide us with different options for creating monthly series from annual and quarterly series. The current proportion of VAT returns at annual, quarterly and monthly reference frequencies are approximately 1%, 89% and 10% respectively (by number of returns).
Each of the non-monthly returns is assigned a regular stagger by HMRC to ensure that each business is reporting for regular periods and returns are received at even spreads across a year or quarter, easing the administrative burden faced by HMRC when processing the returns.
This staggering of the reported periods is one of the challenges introduced by the use of administrative data as opposed to survey data where such factors can be controlled. We are using a number of techniques to analyse the impact that this staggered reporting has on output estimates derived from the VAT turnover data.
Due to the practice of using staggers for different frequency returns that establish an equal spread of returns to HMRC, the proportion of returns available at particular intervals after the period remains consistent for any particular month, as discussed in section 4: ‘Revisions overview and analysis’.
The data presented in this article is up to the period Quarter 2 (Apr to June) 2016. At this point in time we have made use of a January 2017 (T plus 7 months) delivery of VAT turnover data, making for a relatively mature dataset. More timely output estimates require methodologically rigorous treatment to ensure the reliability and accuracy of data processing across the different frequencies of returns.
Appropriate calendarisation methods have been established in our methodological review of all the options available to us. We intend to test a range of methods for interpolation, extrapolation and imputation of data at sequential stages of processing.
A particular challenge to overcome that has a potentially high impact on the output statistics we can derive from VAT turnover data is the matching of VAT registration units to the enterprise unit entities on the Inter-Departmental Business Register (IDBR).
A new set of matching algorithms within the new ONS VAT processing platform are currently being developed and tested. These will improve upon the previous issues found when matching, specifically the resolution of non-matches, where VAT registration units cannot be matched with the IDBR.
Seasonal adjustment methods will be developed for VAT turnover upon implementation in the national accounts. Seasonal patterns have potential impacts on the cleaning, validation, apportionment and calendarisation processing methods. Therefore, an effort to analyse and understand these seasonal patterns within the data will help ensure that our methods for analysing and processing the data take into account such patterns.
7.7 Turnover and gross value-added
You should be aware of the significant conceptual differences between turnover, output and gross value added (the sum of outputs minus the sum of inputs). Although turnover can be a very good proxy for output, for many industries this is not a universal relationship. In particular, turnover in the financial sector is a poor proxy for output and value added.
The Index of Production and Index of Services assign a weight, as a proportion of gross value added, to each industry – it does not necessarily hold that the higher the turnover the higher the gross value added weight. This is most easily understood when considering that in 2014 the financial services industries (SIC 64 to 66) accounted for an estimated 38% of VAT turnover from our early research; while in comparison it only covered 7.6% of value added in 2013 (the latest year in which value-added weights are available).
Notes for New methodology and data challenges:
- Expansion estimation produces population estimates by calculating a weighted sum of sampled returns and using this weighting to proportionally uplift the sample’s contribution to a value consistent with the rest of the population.
- Ratio estimation is a technique that uses available auxiliary information which is correlated with the variable of interest.
- Winsorisation limits the impact of extreme values within a dataset by revising their values to a less extreme value threshold that is proportional to the distribution of the dataset.
We have recently met with the Economic Experts Working Group to consult them on our methods and the use of VAT turnover in the national accounts and we will continue to work with the ONS economic experts and ONS fellows, in coordination with the Economic Statistics Centre of Excellence.
We are currently engaging with experts from other National Statistics Institutes, such as those of the Netherlands and Sweden, to share their knowledge of the implementation of similar consumption tax-based administrative datasets into their own official economic statistics.
The inaugural International Economic Statistics Conference – “Economic Statistics in a Digital Age: meeting the challenges of an evolving, modern economy” is being held in Newport, Wales, UK on the 21 and 22 of February 2017. We are presenting our progress on VAT turnover implementation and discussing the use of administrative data with representatives from other National Statistics Institutes.Nôl i'r tabl cynnwys
The next article will be published in May 2017 and will include an update on the latest progress of our new ONS VAT processing platform and provide initial results of analysis of the VAT turnover dataset within this new platform.Nôl i'r tabl cynnwys
Manylion cyswllt ar gyfer y Erthygl
Ffôn: +44 (0)1633 456578