https://github.com/usds/dataficator
https://github.com/usds/dataficator
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/usds/dataficator
- Owner: usds
- Created: 2022-05-23T16:38:46.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-09T06:31:42.000Z (over 2 years ago)
- Last Synced: 2025-01-24T21:28:38.848Z (5 months ago)
- Language: Python
- Size: 2.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dataficator
[](https://github.com/usds/dataficator/actions/workflows/etl.yml)
This repo contains sample code for reading data from a Cloudflare R2 bucket, applying some transformation to the data, and writing the results back to a Cloudflare R2 bucket. All of those operations are triggered by a Github Actions workflow associated with the repository, and triggered on a scheduled basis (the Github Actions equivalent of a cron job).
## How to run this manually
The entrypoint for this system is [etl.py](./etl.py).
There is a Github Actions workflow defined in [.github/workflows/etl.yml](./.github/workflows/etl.yml) which specifies an environment and triggers for running etl.py.
Two triggers are defined:
- A cron job (see the yaml file for current intervals)
- A mechanism allowing manual invocationTo manually invoke etl.py inside a Github Action, click the "Actions" tab in Github; click "etl" on the left, and then "Run workflow"

## How to set up Cloudflare R2 buckets
1. Get a Cloudflare account.
2. Log into the [Cloudflare dashboard](https://dash.cloudflare.com)
3. Click on "Workers" in the left-hand side nav menu and enable a Workers subdomain.
4. Click on R2 in the left-hand side nav menu and create two R2 buckets. The etl.py code assumes that they'll be called `incoming` (for inbound data) and `inventory` (for processed results).
5. Make the `inventory` bucket publicly readable by setting up a Cloudflare Worker to intercept incoming requests.