{"id":49655201,"url":"https://github.com/nedhmn/streeteasy-scraper","last_synced_at":"2026-05-06T08:43:40.233Z","repository":{"id":291032193,"uuid":"976347021","full_name":"nedhmn/streeteasy-scraper","owner":"nedhmn","description":"StreetEasy data scraper built as a Dockerized monorepo with uv workspaces. Supports synchronous and asynchronous scraping via BrightData, featuring a FastAPI webhook and Cloudflared tunnels.","archived":false,"fork":false,"pushed_at":"2025-07-31T20:37:03.000Z","size":3941,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-31T23:22:05.566Z","etag":null,"topics":["cloudflare-tunnel","docker-compose","fastapi","monorepo","postgresql","python","scraping","uv"],"latest_commit_sha":null,"homepage":"https://nedhmn.github.io/streeteasy-scraper/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nedhmn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-02T00:21:25.000Z","updated_at":"2025-07-31T20:34:45.000Z","dependencies_parsed_at":"2025-05-02T12:46:14.180Z","dependency_job_id":null,"html_url":"https://github.com/nedhmn/streeteasy-scraper","commit_stats":null,"previous_names":["nedhmn/streeteasy-scraper","nedhmn/streeteasy-scraper-monorepo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/nedhmn/streeteasy-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nedhmn%2Fstreeteasy-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nedhmn%2Fstreeteasy-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nedhmn%2Fstreeteasy-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nedhmn%2Fstreeteasy-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nedhmn","download_url":"https://codeload.github.com/nedhmn/streeteasy-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nedhmn%2Fstreeteasy-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32685751,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloudflare-tunnel","docker-compose","fastapi","monorepo","postgresql","python","scraping","uv"],"created_at":"2026-05-06T08:43:39.536Z","updated_at":"2026-05-06T08:43:40.225Z","avatar_url":"https://github.com/nedhmn.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# StreetEasy Scraper\n\n[![Documentation](https://img.shields.io/badge/Documentation-Link-blue)](https://nedhmn.github.io/streeteasy-scraper/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](./LICENSE)\n\nThe **StreetEasy Scraper** is a flexible and scalable data collection monorepo designed to scrape property listing information from StreetEasy.com and manage the scraping process using a PostgreSQL database.\n\n\u003cdiv align=\"center\" style=\"margin-bottom: 20px\"\u003e\n    \u003cimg src=\"./docs/src/assets/scraper-hero.png\" alt=\"Scraper mascot hero section\" height=\"350px\"\u003e\n\u003c/div\u003e\n\n## ✨ Features\n\n- **Monorepo Structure:** Organized as a **monorepo** using `uv` workspaces, facilitating shared code (packages) and independent applications.\n- **Flexible Scraping Methods:** Choose between a simple **synchronous** multi-threaded approach or a scalable **asynchronous** workflow leveraging BrightData callbacks.\n- **BrightData Integration:** Seamlessly integrates with BrightData's Web Unlocker for handling complex scraping challenges at scale.\n- **Efficient Asynchronous Workflow:** Utilizes a dedicated **FastAPI webhook** and background processing for reliable callback handling and data processing.\n- **Secure Webhook Exposure:** Employs **Cloudflared tunnels** for secure and reliable public exposure of the webhook without exposing your network directly.\n- **Enhanced Security:** Includes options for implementing **Cloudflare IP whitelisting and SSL/TLS encryption** for the webhook endpoint.\n- **Robust Data Management:** Stores and manages addresses to be scraped and the collected data in a **PostgreSQL** database.\n- **Simplified Deployment:** Easily deploy and manage all project services using **Docker Compose** with distinct profiles for synchronous and asynchronous modes.\n- **Automated Address Seeding:** Includes a tool to seed the database with initial addresses from nyc.gov.\n\n## 🚀 Getting Started (Synchronous Quickstart)\n\nThis quickstart will get the simpler synchronous scraping profile up and running using Docker.\n\n### Prerequisites\n\nYou will need **Docker** and **Docker Compose** installed on your machine.\n\n### Clone the Repository\n\n```bash\ngit clone https://github.com/nedhmn/streeteasy-scraper.git\ncd streeteasy-scraper\n```\n\n### Configure Environmental Variables\n\nCopy the `.env.example` file to `.env` and fill in the required values. At a minimum, you'll need to configure the database credentials and basic BrightData Web Unlocker proxy details for the synchronous profile.\n\n```bash\ncp .env.example .env\n# Edit the .env file with your credentials\n```\n\nRefer to the **[Configuration section in the documentation](https://nedhmn.github.io/streeteasy-scraper/getting-started/configuration/)** for detailed instructions on all environment variables.\n\n### Build and Run with Docker Compose (Sync Profile)\n\n```bash\ndocker-compose --profile sync up --build\n```\n\nThis command will build the necessary images, start the PostgreSQL database, run the prestart script (which includes initial address seeding), and launch the synchronous scraper.\n\nFor detailed information on the asynchronous setup, system architecture, providing addresses, and accessing scraped data, please refer to the **[full documentation](https://nedhmn.github.io/streeteasy-scraper/)**.\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnedhmn%2Fstreeteasy-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnedhmn%2Fstreeteasy-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnedhmn%2Fstreeteasy-scraper/lists"}