An open API service indexing awesome lists of open source software.

https://github.com/chibuzordev/bluesky-scraper

This is a work of art that enables you to scrape data off BlueSky.
https://github.com/chibuzordev/bluesky-scraper

analytics bluesky bluesky-api bluesky-client data datascraper-framework datascraping scraping social-media web webscraping

Last synced: 7 months ago
JSON representation

This is a work of art that enables you to scrape data off BlueSky.

Awesome Lists containing this project

README

          

### Blue Sky Scraper

A lightweight, modular FastAPI service for collecting and analyzing social and institutional data across Bluesky.

##### Overview

The Social Data Intelligence API enables you to collect text-based data from public online platforms. With keyword-based searches and retrieve clean, structured datasets, the results will be saved incrementally to avoid data loss. The methods used maintain compliance with rate limits and pagination controls that can be extended to new sources using a unified schema.

It’s built for researchers, analysts, and developers who need quick access to social discourse data for analytics, monitoring, or policy intelligence.

The system supports:
- keyword searches for profiles and 'skeets'
- smart pagination & rate Limiting backed by deduplication & incremental Saving
- modular and deployment as a microservice or standalone backend via FastAPI
```
social_scraper/

├── app/
│ ├── __init__.py
│ ├── main.py # Entry point (FastAPI app)
│ ├── routes/
│ │ ├── __init__.py
│ │ └── scrape_routes.py # Contains API route logic (optional modularity)
│ │
│ ├── scrapers/
│ │ ├── __init__.py
│ │ ├── scraper_bluesky.py # Bluesky scraper
│ │ ├── scraper_reddit.py # Reddit scraper (to be added)
│ │ └── scraper_factory.py # Returns scraper dynamically
│ │
│ ├── cache/
│ │ ├── __init__.py
│ │ └── cache_manager.py # Cache system for csv/json/sqlite
│ │
│ └── utils/
│ ├── __init__.py
│ └── logger.py # (optional) unified logging system

├── .env # Environment variables (API keys, credentials)
├── requirements.txt # Dependencies
├── Dockerfile
├── render.yaml # Render deployment config (optional)
└── README.md
```

| Layer | Purpose |
| ------------------ | ------------------------------------------------------------------------ |
| `app/main.py` | Entry point of the FastAPI server |
| `app/scrapers/` | Contains scraper implementations (one file per platform) |
| `app/cache/` | Handles caching & persistence for all platforms |
| `app/routes/` | Keeps route logic clean and modular (optional if you prefer fewer files) |
| `app/utils/` | For shared logic such as logging, formatting, or error handling |
| `.env` | Keeps credentials & settings out of code |
| `requirements.txt` | Lists all dependencies for local, Docker, or Render builds |
| `render.yaml` | Automates Render deployment |
| `Dockerfile` | Ensures reproducible build & deployment environment |