{"id":30770406,"url":"https://github.com/bigdata5911/etsy-scrapper","last_synced_at":"2026-05-18T06:05:29.325Z","repository":{"id":307275481,"uuid":"1027437572","full_name":"bigdata5911/etsy-scrapper","owner":"bigdata5911","description":"A Python-based web scraper built with Selenium that extracts listing and shop information from Etsy based on user-defined search terms and pagination limits.","archived":false,"fork":false,"pushed_at":"2025-07-30T10:00:41.000Z","size":2223,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-04T23:15:21.875Z","etag":null,"topics":["etsy","scraping","selenium","selenium-webdriver"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bigdata5911.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-28T02:32:23.000Z","updated_at":"2025-08-02T07:32:42.000Z","dependencies_parsed_at":"2025-07-30T12:50:02.153Z","dependency_job_id":null,"html_url":"https://github.com/bigdata5911/etsy-scrapper","commit_stats":null,"previous_names":["bigdata5911/etsy-scrapper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bigdata5911/etsy-scrapper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigdata5911%2Fetsy-scrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigdata5911%2Fetsy-scrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigdata5911%2Fetsy-scrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigdata5911%2Fetsy-scrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bigdata5911","download_url":"https://codeload.github.com/bigdata5911/etsy-scrapper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigdata5911%2Fetsy-scrapper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33167430,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-18T05:43:36.989Z","status":"ssl_error","status_checked_at":"2026-05-18T05:43:19.133Z","response_time":71,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etsy","scraping","selenium","selenium-webdriver"],"created_at":"2025-09-04T23:04:43.577Z","updated_at":"2026-05-18T06:05:29.292Z","avatar_url":"https://github.com/bigdata5911.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Etsy Web Scraper\n\nA Python-based web scraper built with Selenium that extracts listing and shop information from Etsy based on user-defined search terms and pagination limits.\n\n## Features\n\n- Extract comprehensive listing data including pricing, ratings, reviews, and shop information\n- Configurable search terms and page limits (up to 240 pages per term)\n- Built-in IP blocking mitigation\n- Shop name anonymization for privacy\n- CSV output with 16 data columns\n- Detailed progress tracking and summary reporting\n\n## Prerequisites\n\n- **Python:** 3.9+\n- **Dependencies:** See `requirements.txt`\n- **WebDriver:** Chrome/Firefox WebDriver (path configurable)\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/bigdata5911/Etsy-Scraping.git\ncd Etsy-Scraping\n```\n\n2. Create and activate a virtual environment:\n```bash\npython -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n```\n\n3. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n## Configuration\n\nBefore running the scraper, configure the following in `scraper_options.py`:\n\n1. **WebDriver Path:** Set the path to your WebDriver executable\n2. **Search Terms:** Define your list of search terms in the `search_terms` variable\n3. **Page Limit:** Set the number of pages to scrape per search term (max: 240)\n\n## Usage\n\nRun the main scraper:\n```bash\npython main_scraper.py\n```\n\nThe scraper will:\n- Display progress for each page scraped\n- Provide summary statistics for each search term\n- Output results to `raw_data.csv`\n\n### File Structure\n\n- `main_scraper.py` - Main execution script\n- `scraper_functions.py` - Core scraping functions\n- `scraper_options.py` - Configuration settings\n- `raw_data.csv` - Example output data\n\n## Data Schema\n\nThe scraper extracts 16 columns of data per listing:\n\n| Column | Description | Type |\n|--------|-------------|------|\n| `Title` | Listing title | String |\n| `Shop_Name` | Anonymized shop identifier | Integer |\n| `Is_Ad` | Advertisement flag (1/0) | Integer |\n| `Star_Rating` | Shop's overall star rating | Float |\n| `Num_Reviews` | Total shop reviews | Integer |\n| `Price` | Item price | Float |\n| `Is_Bestseller` | Bestseller tag flag (1/0) | Integer |\n| `Num_Sales` | Total shop sales | Integer |\n| `Num_Basket` | Items in customer baskets | Integer |\n| `Description` | Listing description | String |\n| `Days_to_Arrival` | Estimated delivery days | Integer |\n| `Cost_Delivery` | Shipping cost | Float |\n| `Returns_Accepted` | Return policy flag (1/0) | Integer |\n| `Dispatched_From` | Origin country | String |\n| `Num_Images` | Number of listing images | Integer |\n| `Category` | Search term used | String |\n\n## Performance\n\n- **Current Speed:** ~15,500 records in 24 hours\n- **Rate Limiting:** Built-in delays to prevent IP blocking\n- **Memory Usage:** Optimized for large datasets\n\n## Customization Options\n\n### Remove Shop Name Anonymization\nTo retain actual shop names, comment out the anonymization code in `main_scraper.py`.\n\n### Potential Enhancements\n- **Performance:** Implement headless browsing and concurrent processing\n- **Additional Data:** Sale status, personalization options, bulk availability\n- **Review Analysis:** Extract and analyze customer reviews\n- **Advanced Filtering:** Category-specific data extraction\n\n## Known Limitations\n\n- Scraping speed is conservative to avoid IP blocking\n- Maximum 240 pages per search term (Etsy limitation)\n- Delivery costs displayed in user's local currency\n- Requires active internet connection and WebDriver maintenance\n\n## Troubleshooting\n\n### Common Issues\n- **WebDriver errors:** Ensure WebDriver version matches your browser\n- **IP blocking:** Increase delays in scraper settings\n- **Missing data:** Check Etsy's page structure for changes\n\n## Contributing\n\nContributions are welcome! Please feel free to:\n- Submit bug reports and feature requests\n- Improve scraping efficiency\n- Add new data extraction capabilities\n- Enhance error handling\n\n## Resources\n\n- [Web Scraping Best Practices](https://medium.com/swlh/improve-your-web-scraper-with-limited-retry-loops-python-35e21730cbf5)\n- [Selenium Documentation](https://selenium-python.readthedocs.io/)\n\n## License\n\nThis project is provided as-is for educational and research purposes. Please ensure compliance with Etsy's Terms of Service and robots.txt when using this scraper.\n\n## Contact\n\nFor questions or support, please open an issue on GitHub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigdata5911%2Fetsy-scrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbigdata5911%2Fetsy-scrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigdata5911%2Fetsy-scrapper/lists"}