Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/makepath/census-parquet
Python tools for creating Parquet files from 2020 Census Data
https://github.com/makepath/census-parquet
Last synced: 9 days ago
JSON representation
Python tools for creating Parquet files from 2020 Census Data
- Host: GitHub
- URL: https://github.com/makepath/census-parquet
- Owner: makepath
- License: mit
- Created: 2021-08-31T20:01:38.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2022-09-29T20:34:52.000Z (about 2 years ago)
- Last Synced: 2024-09-24T18:36:43.751Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 84 KB
- Stars: 16
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# census-parquet
Python tools for creating and maintaining Parquet files from [US 2020 Census Data](https://www.census.gov/programs-surveys/decennial-census/decade/2020/2020-census-main.html).## Installation
To use the data download shell script files first install [wget](https://en.wikipedia.org/wiki/Wget).
To install the census-parquet package use
```
pip install census-parquet
```This will also install the required Python dependencies which are:
1. [click](https://github.com/pallets/click)
2. [dask](https://docs.dask.org/en/latest/install.html)
3. [dask_geopandas](https://github.com/geopandas/dask-geopandas)
4. [geopandas](https://geopandas.org/getting_started/install.html)
5. [numpy](https://numpy.org/install/)
6. [openpyxl](https://openpyxl.readthedocs.io/en/stable/#installation)
7. [pandas](https://pandas.pydata.org/docs/getting_started/install.html)
8. [pyarrow](https://arrow.apache.org/docs/python/install.html)## Usage
To run the census-parquet code simply use
```
run_census_parquet
```This runs the following scripts in order:
1. `download_boundaries.sh` - This script downloads the Census Boundary data needed to run `process_boundaries.py`
2. `download_population_stats.sh` - This script downloads population stat data needed for process_blocks.py
3. `download_blocks.sh` - This script downloads the Census Block data needed to run process_blocks.py
4. `process_boundaries.py` - This script processes the Census Boundary data and creates parquet files. The parquet files will be output into a `boundary_outputs` folder.
5. `process_blocks.py` - This script processes Census Block data and creates parquet files. The final combined parquet file will have the name `tl_2020_FULL_tabblock20.parquet`.