https://github.com/jplusplus/overlap
Extrapolate data from one set of administrative entities to another
https://github.com/jplusplus/overlap
Last synced: 11 months ago
JSON representation
Extrapolate data from one set of administrative entities to another
- Host: GitHub
- URL: https://github.com/jplusplus/overlap
- Owner: jplusplus
- Created: 2015-03-30T10:35:46.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2015-03-30T10:48:06.000Z (about 11 years ago)
- Last Synced: 2024-04-14T04:55:28.500Z (about 2 years ago)
- Language: Python
- Size: 125 KB
- Stars: 0
- Watchers: 12
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
The problem
-----------
You have two different, overlapping administrative divisions, and want statistics based on one of them extrapolated to the other.
Case: We have hardly any demographic data for the 15,000 or so Swedish postal codes. On the other hand we have plenty of interesting data on the ≈ 6,000 electorial districts. Given a fairly large dataset where we know the postal codes, we can extrapolate statistics from electorial districts, and get a fair approximation.
The solution
------------
Use your favourite GIS software to intersect the two administrative systems. Create a .dfb (QGIS) or .csv file containing an area column for the intersections. Run the file through `create_factors.py`, to create a table of weighing factors. Then run your statistics through `run_stats.py` to apply.
This will obviously create useful results only for fairly small and homogenous administrative entities, and fairly large datasets. Common sense is your friend here.
Example
-------
We have two administrative systems: *Counties* and *provinces*. We know the number of camels in each *province*:
province, num_camels
Värmland, 12
Dalarna, 20
Now we want to know the approximate number of camels in each *county*.
1. Using QGIS, we produce a .dbf file with all intersections:
`ID | county | province | area`
`1 | Värmlands län | Värmland | 190`
`2 | Värmlands län | Dalarna | 6`
`3 | Dalarnas län | Dalarna | 180`
2. Then we run `weighted_data --id_1=county --id_2=province --area=area` to produce a json file, `factors.json`, with weighing factors:
`"Värmlands län": {"Värmland": 1, "Dalarna": .03},`
`"Dalarnas län": {"Dalarna": .97}`
3. Finally, we run our camel data through this filter, `run_stats --id=province --value=num_camels --factors=factors.json --input=input.csv`, to get the approximate count of camels in each county:
`Värmlands län, 12.6`
`Dalarnas län, 19.4`