Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/simonw/scrape-instances-social
https://instances.social/instances.json
https://github.com/simonw/scrape-instances-social
gitscraping
Last synced: 26 days ago
JSON representation
https://instances.social/instances.json
- Host: GitHub
- URL: https://github.com/simonw/scrape-instances-social
- Owner: simonw
- Created: 2022-11-20T01:13:25.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-01T22:55:47.000Z (6 months ago)
- Last Synced: 2024-05-01T23:17:54.763Z (6 months ago)
- Topics: gitscraping
- Language: Shell
- Homepage:
- Size: 4.32 GB
- Stars: 19
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# scrape-instances-social
For full details about how this works, see [Tracking Mastodon user numbers over time with a bucket of tricks](https://simonwillison.net/2022/Nov/20/tracking-mastodon/) on my blog.
https://instances.social/instances.json is a list of Mastodon instances, including their number of statuses and users.
This repo scrapes that and records the history of the file, a form of [Git scraping](https://simonwillison.net/2020/Oct/9/git-scraping/).
Visit this Observable notebook to see users-over-time figures plotted on charts:
https://observablehq.com/@simonw/mastodon-users-and-statuses-over-time
You can browse the most recent copy of the scraped data using Datasette Lite here: https://lite.datasette.io/?json=https%3A%2F%2Fraw.githubusercontent.com%2Fsimonw%2Fscrape-instances-social%2Fmain%2Finstances.json#/data/instances?_sort=users&_sort_by_desc=on
## Building a database
You can use the [git-history](https://datasette.io/tools/git-history) tool to build a SQLite database of the history of the instances:
pip install -r requirements.txt
# (or just pip install git-history)
./build-instance-history.shYou can run that script multiple times and it will only update the database with new commits that have not been seen before.
You can also build a much smaller SQLite database of just the counts of users and statuses over time:
./build-count-history.sh
## Accessing the database
A script in this repository builds and publishes the `counts.db` database to S3. You can download the latest copy here - it's pretty small as it only records the total sum of users and statuses over time across all tracked instances.
https://scrape-instances-social.s3.amazonaws.com/counts.db
You can open this in [Datasette Lite](https://lite.datasette.io/) like so:
https://lite.datasette.io/?url=https://scrape-instances-social.s3.amazonaws.com/counts.db