QLFS data processing

There are three stages to the data processing pipeline, each being written in Python:

Data sources

The sources for this data is the Labour Market Survey. These are pulled from NOMIS, via the LMS extract in the YFF Data Pipelines repository.

We extract the data of interest to a monthly rolling file.

Employment status processing

For the A06 figures, the transform script extracts the following measures from the 'People' sheet (i.e. not split by Gender)

We then select every third period starting at the most recent line to avoid overlapping quarters. We also convert the quarter from a string to the date representing the start of the quarter (e.g. Jan-Mar 2023 is converted to a proper datetime object at 1-Jan-2023)

Finally, we save a CSV of unemployment by education status for further processing.

Long-term unemployment status processing

This processing is similar, with the following alterations:

We extract the levels of unemployment and unemployment over 12 months as well as the rate for both ages 16-17 and age 18-24.

We convert the quarters as described above, and then combine the unemployment total and over 12 months levels across the two age ranges to come up with an aggregated figure from 16-24. We then calculate the resulting rate by simple division.

Finally, we save the last three years to a CSV of long-term unemployment data as before.