Vacancies data processing

There are three stages to the data processing pipeline, each being written in Python:

Data sources

The sources for this release are:

These datasets are taken from our own Economic Data Dashboard (EDD) repository, created and managed by Christian Spence, that automatically extracts labour market statistics as they are released and processes them into a single CSV extract.

VACS01 processing

For the VACS01 figures, the transform script extracts the following variable from the EDD extract:

We then we save a CSV of estimated vacancies by quarter and another CSV by rolling 3-month period for visualisation on the vacancies dashboard.

VACS02 processing

VACS02 processing is similar, with the following alterations:

From the EDD extract, we query the following measures for each sector:

Quarterly values are extracted for these measures and saved as a CSV file of all vacancies by sector.

In the Prepare script, these sectors are broken down further and saved as separate CSV files that are used to power the visualisations on the vacancies dashboard.

The sector groupings are below:

All sectors (as above)

Key sectors for young people (where young people are most likely to work):

Sectors targeted by young people (where young people want to work):

Headline statistics

The above datasets are summarised by the following methods.