Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/simonw/scrape-open-data
Scrape various open data directories to create an index of what's available out there
https://github.com/simonw/scrape-open-data
git-scraping socrata
Last synced: 2 months ago
JSON representation
Scrape various open data directories to create an index of what's available out there
- Host: GitHub
- URL: https://github.com/simonw/scrape-open-data
- Owner: simonw
- Created: 2022-06-07T23:31:14.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-16T02:53:12.000Z (2 months ago)
- Last Synced: 2024-11-16T03:26:09.165Z (2 months ago)
- Topics: git-scraping, socrata
- Language: Python
- Homepage: https://open-data.datasette.io
- Size: 5.14 GB
- Stars: 31
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - simonw/scrape-open-data - Scrape various open data directories to create an index of what's available out there (Python)
README
# scrape-open-data
[![Scrape latest data](https://github.com/simonw/scrape-open-data/actions/workflows/scrape.yml/badge.svg)](https://github.com/simonw/scrape-open-data/actions/workflows/scrape.yml)
Scrapes every available dataset from Socrata and stores them as newline-delimited JSON in this repository, to track changes over time through [Git scraping](https://simonwillison.net/2020/Oct/9/git-scraping/).
- `socrata/data.delaware.gov.jsonl` contains the latest datasets for a specific domain. This is updated twice a day.
- `socrata/data.delaware.gov.stats.jsonl` contains information on page views and download numbers. This is updated once a week to avoid every single fetch including updated counts for many different datasets.The resulting database is deployed to https://open-data.datasette.io/
## scrape_socrata.py
Run `python scrape_socrata.py socrata/` to scrape the data from Socrata and save it in the `socrata/` directory.
Add `--stats` to include page view and download statistics in separate files.
Add `--verbose` for verbose output.
## build_socrata_db.py`
Run this command to build a SQLite database from the `.jsonl` files in `socrata/`:
python build_socrata_db.py socrata.db socrata