CPI data processing
There are four stages to the data processing pipeline, each being written in Python:
- extract: where we get a copy of the file from an appropriate source
- transform: where we convert it into a simpler form by selecting rows and filtering columns, and transforming formats to meet what we need
- prepare: where we build files which will directly drive our visualisations. These may be summarised or transposed data, or in a completely different format (e.g. JSON).
- vis: where we update the axis labels and minimum/maximum values to fit the data nicely.
Data sources
The source for this is:
The extract scripts find the latest Excel link from these pages and download them for transformation. However, increasingly we are moving to using EDD to source the datasets (since they have been transformed into a consitent format) and DVC remote url to download them. In the CPI pipeline, the extract script is no longer used, and will be removed in the near future.
CPI processing
For the CPI figures, the transform script extracts the following measures:
D7BT
: All CPI CategoriesD7BU
: Food and non-alcoholic beveragesD7BV
: Alcoholic beverages and tobaccoD7BW
: Clothing and footwearD7BX
: Housing, water, electricity, gas and other fuelsD7BY
: Furniture, household equipment and maintenanceD7BZ
: HealthD7C2
: TransportD7C3
: CommunicationD7C4
: Recreation and cultureD7C5
: EducationD7C6
: Restaurants and hotelsD7C7
: Miscellaneous goods and services
Here, percentage change shows by how much the cosumer price index has increased relative to the price last month, quarter and year. To calculate percentage changes we take the most recent month's index value (the last line in the file), subtract the previous month/quarter/year's value, divide by that same value, and multiply by 100. As an equation this can be written as (current-previous / previous)*100. The result is rounded to one decimal place.
Separately, we also compare this to the same set of values from the previous update (usually last month) to determine whether it has increased or decreased relative to the last update.
We explicitly indicate the sign to show if the percentage change is an positive or negative. A positive percentage change means the cost of goods is rising, and negative change means the cost of goods is falling.
†The youth-focused average is a mean average of the index for four categories most-affecting young people: Food and Non-Alcoholic Beverages (D7BU), Housing and fuels (D7BX), Transport (D7C2), and Recreation and Culture (D7C4). These are the top 4 categories which people under 30 spend their money on weekly, calculated from the ONS dataset Family spending workbook 1: detailed expenditure and trends.
For the line chart, we take monthly data for the past 10 years from the most recent release date.
Finally, we save CSV files to drive our visualisations.