Read this article to find out how we create data for custom areas and the quality assurance process all our data goes through to ensure it is accurate.
Data for custom areas
For indicators within Local Insight, data for custom areas is created in the following way;
For custom areas where an entire Lower Layer Super Output Area (LSOA) is within the boundary of the custom area, the data value for this whole area is included.
For custom areas where part of an LSOA is within the boundary of the custom area, the data is apportioned down to Output Area (OA) level. OAs are included into the custom area if more than 50% of the residential postcodes within that OA reside within the custom area.
This is done in different ways, depending upon what level the data is published to (eg OA or LSOA level):
1. When data is available to OA level: We use the actual OA counts in these cases. For example, this is the case for Census data.
2. Where data is available at below OA level: We aggregate the data to OA level for example for Police.uk and Land Registry data which are published to point (postcode or lat/longitude) level. Where data is a count, data can be summed from the sub-LA geography to get an OA value. Where data is a ratio, score, rate or average value we apply population weighted aggregation using a relevant weighting value e.g. average house prices would be weighted by the number of housing transactions.
3. When data is not available to OA level: As we do not know the exact data value for the OAs, we apportion the published data to OA level. Population data from the census is available down to OA level and we use this data to get a weighting for each OA, by calculating the population of the OA divided by the population of the LSOA (or whatever area-types for which the data is published). We then apply this weighting to approximate the data value for the OAs.
Finally, the data for the custom area is created by summing up the data for the OAs and/ or LSOAs as per above.
Rate / percentage data is created by carrying out the above process separately for both the numerators and denominators, and using these to calculate the custom area rate / percentage.
This process is run through our internal databases using MySQL stored procedures to run the aggregation and apportioning process.
The Quality Assurance process
There are several Quality Assurance processes which are applied to ensure the aggregation/apportioning process is successful.
1) We apply stringent database checks to ensure that only certain kinds of aggregation and apportion processes can be applied to certain types of data.
2) Where data is published at multiple geographic levels we run validation checks to ensure that our derived aggregations match up to published data.
3) In order to validate the store procedure code developed in MySQL we performed the same aggregation approaches across three different statistical packages (SQL, STATA, Python) on a large subset of our data to ensure they each produced the same results.
4) For sum aggregation, we run checks to ensure the component areas add up to the aggregated area
5) For apportioning we run the reverse check to ensure that the count data for the areas we have produced adds up to the parent area.
6) For apportioning of derived rate data (data where we have both the numerator and denominator) we run a check to ensure the derived rate produced is the same for the area we have apportioned to and the parent area we have created the data for.
7) Where data is copied from larger geographies to smaller geographies (for data which is ratio/score/life expectancy/rate data where the numerator or denominator is not published) we check that the data value is the same for the larger and smaller area.