Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script
Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script
carto carto-api extract-transform-load heroku-scheduler nyc open-data python socrata-api
Last synced: 5 days ago
JSON representation
Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
- Host: GitHub
- URL: https://github.com/GreenInfo-Network/nyc-crash-mapper-etl-script
- Owner: GreenInfo-Network
- Created: 2017-02-14T03:29:45.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-10-20T04:49:48.000Z (about 1 year ago)
- Last Synced: 2024-06-21T12:26:17.605Z (5 months ago)
- Topics: carto, carto-api, extract-transform-load, heroku-scheduler, nyc, open-data, python, socrata-api
- Language: Python
- Homepage:
- Size: 4.29 MB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-starred - GreenInfo-Network/nyc-crash-mapper-etl-script - Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO. (python)
README
# ETL SCRIPT FOR NYC CRASH MAPPER
Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
## Python
This script is written for Python 3.8 The `python` and `pip` commands below reflect this.
## Setup
Set the following environment variables in your shell. Copy in the values from the Heroku panel, or from `heroku config -a nyc-crash-mapper-etl` if you use the Heroku CLI.
```
export CARTO_API_KEY=''
export CARTO_MASTER_KEY=''
export SOCRATA_APP_TOKEN_SECRET=''
export SOCRATA_APP_TOKEN_PUBLIC=''
export SENDGRID_API_KEY=''
export SENDGRID_USERNAME=''
export SENDGRID_TO_EMAIL=""
```You may find it useful to create a file called `.env` which contains these commands, then to use `source .env` to load those variables into your shell.
Double check that the variable was set and is in your environment:
```
echo $SENDGRID_USERNAME
```Install Python requirements:
```
pip3.8 install -r requirements.txt
```## Running Locally
Run the script using Python 2.7 by doing:
```
python3.8 main.py
```## Running via a Heroku Scheduler
To run on Heroku, fill in the values and send them to Heroku via commands such as these. Include all of the variables in that environment variable list described above.
```
heroku git:remote -a nyc-crash-mapper-etlheroku config:set CARTO_API_KEY=
heroku config:set CARTO_MASTER_KEY=
heroku config:set SOCRATA_APP_TOKEN_SECRET=
heroku config:set SOCRATA_APP_TOKEN_PUBLIC=
heroku config:set SENDGRID_API_KEY=''
heroku config:set SENDGRID_USERNAME=''
heroku config:set SENDGRID_TO_EMAIL=""
```Then provision the Heroku Scheduler, and add a job simply with the following command:
```
python3.8 main.py
```## Deploying the Scheduled Task
After making changes to the script, you will want to push these changes to Heroku scheduler so the script is used the next day.
To deploy the site to the Heroku scheduler, push the code to the Heroku remote:
```
heroku git:remote -a nyc-crash-mapper-etl
``````
git push heroku master
```## Note about qgtunnel
In 2023, we needed to have a static IP for this service, so that it could be safelisted for use with a MySQL database the client is using for the Walkmapper project. Heroku does not offer static IPs itself, but there's an addon for it. the `.qgtunnel` file in the root of this repo is the config for that. Settings and docs are reachable via the add-on section of the control panel on heroku.com.