{"id":32662527,"url":"https://github.com/chibuzordev/bluesky-scraper","last_synced_at":"2025-10-31T20:01:51.683Z","repository":{"id":318249853,"uuid":"1070493719","full_name":"chibuzordev/bluesky-scraper","owner":"chibuzordev","description":"This is a work of art that enables you to scrape data off BlueSky. ","archived":false,"fork":false,"pushed_at":"2025-10-27T03:54:09.000Z","size":54,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-27T05:36:05.341Z","etag":null,"topics":["analytics","bluesky","bluesky-api","bluesky-client","data","datascraper-framework","datascraping","scraping","social-media","web","webscraping"],"latest_commit_sha":null,"homepage":"https://bluesky-scraper.onrender.com/docs","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chibuzordev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-06T02:31:45.000Z","updated_at":"2025-10-27T03:54:12.000Z","dependencies_parsed_at":"2025-10-06T04:28:39.223Z","dependency_job_id":"7984c378-7ca2-4eeb-b483-c396fa12d00c","html_url":"https://github.com/chibuzordev/bluesky-scraper","commit_stats":null,"previous_names":["chibuzordev/bluesky-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chibuzordev/bluesky-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chibuzordev%2Fbluesky-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chibuzordev%2Fbluesky-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chibuzordev%2Fbluesky-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chibuzordev%2Fbluesky-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chibuzordev","download_url":"https://codeload.github.com/chibuzordev/bluesky-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chibuzordev%2Fbluesky-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":282049664,"owners_count":26605488,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","bluesky","bluesky-api","bluesky-client","data","datascraper-framework","datascraping","scraping","social-media","web","webscraping"],"created_at":"2025-10-31T20:00:41.134Z","updated_at":"2025-10-31T20:01:51.677Z","avatar_url":"https://github.com/chibuzordev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"### Blue Sky Scraper\n\nA lightweight, modular FastAPI service for collecting and analyzing social and institutional data across Bluesky.\n\n##### Overview\n\nThe Social Data Intelligence API enables you to collect text-based data from public online platforms. With keyword-based searches and retrieve clean, structured datasets, the results will be saved incrementally to avoid data loss. The methods used maintain compliance with rate limits and pagination controls that can be extended to new sources using a unified schema.\n\nIt’s built for researchers, analysts, and developers who need quick access to social discourse data for analytics, monitoring, or policy intelligence.\n\nThe system supports:\n- keyword searches for profiles and 'skeets' \n- smart pagination \u0026 rate Limiting backed by deduplication \u0026 incremental Saving\n- modular and deployment as a microservice or standalone backend via FastAPI\n```\nsocial_scraper/\n│\n├── app/\n│   ├── __init__.py\n│   ├── main.py                 # Entry point (FastAPI app)\n│   ├── routes/\n│   │   ├── __init__.py\n│   │   └── scrape_routes.py    # Contains API route logic (optional modularity)\n│   │\n│   ├── scrapers/\n│   │   ├── __init__.py\n│   │   ├── scraper_bluesky.py  # Bluesky scraper\n│   │   ├── scraper_reddit.py   # Reddit scraper (to be added)\n│   │   └── scraper_factory.py  # Returns scraper dynamically\n│   │\n│   ├── cache/\n│   │   ├── __init__.py\n│   │   └── cache_manager.py    # Cache system for csv/json/sqlite\n│   │\n│   └── utils/\n│       ├── __init__.py\n│       └── logger.py           # (optional) unified logging system\n│\n├── .env                        # Environment variables (API keys, credentials)\n├── requirements.txt            # Dependencies\n├── Dockerfile\n├── render.yaml                 # Render deployment config (optional)\n└── README.md\n```\n\n\n| Layer              | Purpose                                                                  |\n| ------------------ | ------------------------------------------------------------------------ |\n| `app/main.py`      | Entry point of the FastAPI server                                        |\n| `app/scrapers/`    | Contains scraper implementations (one file per platform)                 |\n| `app/cache/`       | Handles caching \u0026 persistence for all platforms                          |\n| `app/routes/`      | Keeps route logic clean and modular (optional if you prefer fewer files) |\n| `app/utils/`       | For shared logic such as logging, formatting, or error handling          |\n| `.env`             | Keeps credentials \u0026 settings out of code                                 |\n| `requirements.txt` | Lists all dependencies for local, Docker, or Render builds               |\n| `render.yaml`      | Automates Render deployment                                              |\n| `Dockerfile`       | Ensures reproducible build \u0026 deployment environment                      |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchibuzordev%2Fbluesky-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchibuzordev%2Fbluesky-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchibuzordev%2Fbluesky-scraper/lists"}