{"id":31737593,"url":"https://github.com/dappsar/ethglobal-crawler","last_synced_at":"2025-10-09T09:26:17.909Z","repository":{"id":311371642,"uuid":"1043509688","full_name":"dappsar/ethglobal-crawler","owner":"dappsar","description":"A web crawler that scrapes and aggregates projects from ETHGlobal  hackathons. It collects project details such as title, description, team members, tech stack, and links, providing structured data for analysis, discovery, or integration with other tools.","archived":false,"fork":false,"pushed_at":"2025-08-24T02:46:05.000Z","size":76,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-24T11:56:56.660Z","etag":null,"topics":["crawler","ethglobal","python"],"latest_commit_sha":null,"homepage":"https://ethglobal.com/showcase","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dappsar.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-24T02:29:32.000Z","updated_at":"2025-08-24T02:46:08.000Z","dependencies_parsed_at":"2025-08-24T12:07:05.413Z","dependency_job_id":null,"html_url":"https://github.com/dappsar/ethglobal-crawler","commit_stats":null,"previous_names":["dappsar/ethglobal-crawler"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/dappsar/ethglobal-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dappsar%2Fethglobal-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dappsar%2Fethglobal-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dappsar%2Fethglobal-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dappsar%2Fethglobal-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dappsar","download_url":"https://codeload.github.com/dappsar/ethglobal-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dappsar%2Fethglobal-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001126,"owners_count":26083021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","ethglobal","python"],"created_at":"2025-10-09T09:26:12.681Z","updated_at":"2025-10-09T09:26:17.903Z","avatar_url":"https://github.com/dappsar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ETHGlobal Project Crawler\r\n\r\nThis repository provides Python scripts to **scrape and download projects from [ETHGlobal](https://ethglobal.com/)**.  \r\nThe scripts bypass ETHGlobal’s limited search functionality, enabling **comprehensive offline access** to project data for exploration, analysis, or integration.\r\n\r\n## Features\r\n\r\n- **Complete project listing:** Crawls all project listings from ETHGlobal and saves them to a single CSV file.  \r\n- **Detailed project extraction:** Fetches individual project pages and stores structured details (description, tech stack, awards, links, etc.) as separate CSV files.  \r\n- **Resumable execution:** Automatically resumes from the last processed project if interrupted.  \r\n- **Rate-limit handling:** Detects and waits on HTTP 429 responses to avoid bans.  \r\n- **Duplicate cleanup:** Removes sequential duplicate entries in the final project list.  \r\n\r\n## Files\r\n\r\n### `ethglobal_crawler.py`\r\nScrapes the ETHGlobal `/showcase` pages:\r\n\r\n- Iterates through all paginated project listings.  \r\n- Extracts **name, short description, and project URL** for each project.  \r\n- Appends results to `ethglobal_projects.csv`, creating it if missing.  \r\n- Handles HTTP rate limits gracefully and deduplicates sequential duplicates at the end.  \r\n- Configurable **rate-limiting** (delay between requests, retry on HTTP 429).  \r\n\r\n### `ethglobal_crawler_detail.py`\r\nFetches detailed information for each project listed in `ethglobal_projects.csv`:\r\n\r\n- Visits each project page and extracts:\r\n  - **Name** and **short description**.  \r\n  - **Live demo** and **source code URLs** (if available).  \r\n  - **Full project description** and **“How it’s Made”** section.  \r\n  - **Creation hackathon link** and **awards** (ETHGlobal or track prizes).  \r\n- Saves each project’s details to a **separate CSV** in `project_details/`.  \r\n- Supports **resume-on-crash** (continues from the last saved project).  \r\n- Logs errors to `project_details/error_log.txt` when saving fails.  \r\n\r\n## Setup\r\n\r\n```bash\r\n# Create virtual environment (optional but recommended)\r\npython -m venv venv\r\nsource venv/bin/activate   # On Linux/macOS\r\nvenv\\Scripts\\activate      # On Windows\r\n\r\n# Install dependencies\r\npip install requests beautifulsoup4 pandas\r\n```\r\n\r\n## Usage\r\n\r\n```bash\r\n# 1. Fetch the general list of projects (creates ethglobal_projects.csv)\r\npython ethglobal_crawler.py\r\n\r\n# 2. Fetch detailed data for each project (saves CSVs to project_details/)\r\npython ethglobal_crawler_detail.py\r\n```\r\n\r\n## Output\r\n\r\n- `ethglobal_projects.csv` – All ETHGlobal projects with **name, description, URL**.  \r\n- `project_details/` – One CSV per project with **full metadata**.  \r\n- `project_details/error_log.txt` – Records any file save errors.  \r\n\r\n---\r\n\r\nThis setup provides **structured, offline-accessible ETHGlobal hackathon data**, ready for analysis or integration into your own tools.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdappsar%2Fethglobal-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdappsar%2Fethglobal-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdappsar%2Fethglobal-crawler/lists"}