https://github.com/splitgraph/socrata-to-seafowl
Sync the Splitgraph Socrata dataset catalog history into Seafowl
https://github.com/splitgraph/socrata-to-seafowl
Last synced: 10 months ago
JSON representation
Sync the Splitgraph Socrata dataset catalog history into Seafowl
- Host: GitHub
- URL: https://github.com/splitgraph/socrata-to-seafowl
- Owner: splitgraph
- Created: 2022-10-29T14:21:11.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2023-04-13T18:04:47.000Z (over 2 years ago)
- Last Synced: 2025-03-10T03:55:52.474Z (10 months ago)
- Language: Shell
- Size: 62.5 KB
- Stars: 0
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Socrata-to-Seafowl sync job
This repository contains the code used to sync the data from
[our index of open data on Socrata](https://www.splitgraph.com/splitgraph/socrata)
into a [Seafowl](https://github.com/splitgraph/seafowl) instance.
This data will power the [SocFeed app](https://socfeed.vercel.app) in the future.
In the meantime, see the [Observable notebook](https://observablehq.com/@seafowl/socrata)
that showcases this dataset.
## How it works
- Every night (currently on-demand), we initiate a download of the new snapshots of
[Socrata's Discovery API](https://socratadiscovery.docs.apiary.io/) from
Splitgraph in the Parquet format
- This gives us a pre-signed S3 URL to download the file
- We use [`CREATE EXTERNAL TABLE`](https://seafowl.io/docs/guides/csv-parquet-http-external)
on Seafowl with this URL to append this data to a history table (bypassing having to download
this file from the GitHub Actions instance)
- Then, we use a [not dbt](https://github.com/splitgraph/socrata-to-seafowl/blob/master/src/s2sf/notdbt.py) script
that creates some derived tables (monthly/weekly/daily summary) used by the [SocFeed app](https://socfeed.vercel.app)
(actual dbt support coming soon!)