Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/usds/dataficator
https://github.com/usds/dataficator
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/usds/dataficator
- Owner: usds
- Created: 2022-05-23T16:38:46.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-09T06:31:42.000Z (about 2 years ago)
- Last Synced: 2024-04-14T05:40:49.937Z (9 months ago)
- Language: Python
- Size: 2.2 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# dataficator
[![etl](https://github.com/usds/dataficator/actions/workflows/etl.yml/badge.svg)](https://github.com/usds/dataficator/actions/workflows/etl.yml)
This repo contains sample code for reading data from a Cloudflare R2 bucket, applying some transformation to the data, and writing the results back to a Cloudflare R2 bucket. All of those operations are triggered by a Github Actions workflow associated with the repository, and triggered on a scheduled basis (the Github Actions equivalent of a cron job).
## How to run this manually
The entrypoint for this system is [etl.py](./etl.py).
There is a Github Actions workflow defined in [.github/workflows/etl.yml](./.github/workflows/etl.yml) which specifies an environment and triggers for running etl.py.
Two triggers are defined:
- A cron job (see the yaml file for current intervals)
- A mechanism allowing manual invocationTo manually invoke etl.py inside a Github Action, click the "Actions" tab in Github; click "etl" on the left, and then "Run workflow"
![howto](img/actions.png)
## How to set up Cloudflare R2 buckets
1. Get a Cloudflare account.
2. Log into the [Cloudflare dashboard](https://dash.cloudflare.com)
3. Click on "Workers" in the left-hand side nav menu and enable a Workers subdomain.
4. Click on R2 in the left-hand side nav menu and create two R2 buckets. The etl.py code assumes that they'll be called `incoming` (for inbound data) and `inventory` (for processed results).
5. Make the `inventory` bucket publicly readable by setting up a Cloudflare Worker to intercept incoming requests.