CPI data processing

There are four stages to the data processing pipeline, each being written in Python:

Data sources

The source for this is:

The extract scripts find the latest Excel link from these pages and download them for transformation. However, increasingly we are moving to using EDD to source the datasets (since they have been transformed into a consitent format) and DVC remote url to download them. In the CPI pipeline, the extract script is no longer used, and will be removed in the near future.

CPI processing

For the CPI figures, the transform script extracts the following measures:

Here, percentage change shows by how much the cosumer price index has increased relative to the price last month, quarter and year. To calculate percentage changes we take the most recent month's index value (the last line in the file), subtract the previous month/quarter/year's value, divide by that same value, and multiply by 100. As an equation this can be written as (current-previous / previous)*100. The result is rounded to one decimal place.

Separately, we also compare this to the same set of values from the previous update (usually last month) to determine whether it has increased or decreased relative to the last update.

We explicitly indicate the sign to show if the percentage change is an positive or negative. A positive percentage change means the cost of goods is rising, and negative change means the cost of goods is falling.

The youth-focused average is a mean average of the index for four categories most-affecting young people: Food and Non-Alcoholic Beverages (D7BU), Housing and fuels (D7BX), Transport (D7C2), and Recreation and Culture (D7C4). These are the top 4 categories which people under 30 spend their money on weekly, calculated from the ONS dataset Family spending workbook 1: detailed expenditure and trends.

For the line chart, we take monthly data for the past 10 years from the most recent release date.

Finally, we save CSV files to drive our visualisations.