https://github.com/makepath/census-parquet

Python tools for creating Parquet files from 2020 Census Data
https://github.com/makepath/census-parquet

Last synced: 4 months ago
JSON representation

Python tools for creating Parquet files from 2020 Census Data

Host: GitHub
URL: https://github.com/makepath/census-parquet
Owner: makepath
License: mit
Created: 2021-08-31T20:01:38.000Z (almost 4 years ago)
Default Branch: master
Last Pushed: 2022-09-29T20:34:52.000Z (over 2 years ago)
Last Synced: 2025-02-10T10:21:13.476Z (4 months ago)
Language: Python
Homepage:
Size: 84 KB
Stars: 16
Watchers: 2
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

# census-parquet
Python tools for creating and maintaining Parquet files from [US 2020 Census Data](https://www.census.gov/programs-surveys/decennial-census/decade/2020/2020-census-main.html).

## Installation

To use the data download shell script files first install [wget](https://en.wikipedia.org/wiki/Wget).

To install the census-parquet package use
```
pip install census-parquet
```

This will also install the required Python dependencies which are:
1. [click](https://github.com/pallets/click)
2. [dask](https://docs.dask.org/en/latest/install.html)
3. [dask_geopandas](https://github.com/geopandas/dask-geopandas)
4. [geopandas](https://geopandas.org/getting_started/install.html)
5. [numpy](https://numpy.org/install/)
6. [openpyxl](https://openpyxl.readthedocs.io/en/stable/#installation)
7. [pandas](https://pandas.pydata.org/docs/getting_started/install.html)
8. [pyarrow](https://arrow.apache.org/docs/python/install.html)

## Usage
To run the census-parquet code simply use
```
run_census_parquet
```

This runs the following scripts in order:
1. `download_boundaries.sh` - This script downloads the Census Boundary data needed to run `process_boundaries.py`
2. `download_population_stats.sh` - This script downloads population stat data needed for process_blocks.py
3. `download_blocks.sh` - This script downloads the Census Block data needed to run process_blocks.py
4. `process_boundaries.py` - This script processes the Census Boundary data and creates parquet files. The parquet files will be output into a `boundary_outputs` folder.
5. `process_blocks.py` - This script processes Census Block data and creates parquet files. The final combined parquet file will have the name `tl_2020_FULL_tabblock20.parquet`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/makepath/census-parquet

Awesome Lists containing this project

README