NEET data processing

There are three stages to the data processing pipeline, each being written in Python:

extract: where we get a copy of the file from an appropriate source
transform: where we convert it into a simpler form by selecting rows and filtering columns, and transforming formats to meet what we need
prepare: where we build files which will directly drive our visualisations. These may be summarised or transposed data, or in a completely different format (e.g. JSON).

Data sources

The sources for this are:

The extract scripts find the latest Excel link from these pages and download them for transformation.

We extract the following figures for 16-24 year olds:

This processing is repeated for the People - SA, Men - SA and Women - SA sheets.

In each case we

We then combine the three data sets into a single file and save as a NEET CSV file. Along with this we create a metadata file.