{"id":35240838,"url":"https://github.com/oceanside-chess/email-scraper","last_synced_at":"2026-03-17T20:04:28.095Z","repository":{"id":285164979,"uuid":"906870059","full_name":"oceanside-chess/email-scraper","owner":"oceanside-chess","description":"Scrape emails from a website using recursive crawling, the best anti-obfuscation techniques, and validate all addresses before saving to a file.","archived":false,"fork":false,"pushed_at":"2025-03-29T23:34:55.000Z","size":26,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-02T10:32:42.402Z","etag":null,"topics":["bot","email-extraction","email-extractor","email-scraper","email-validation","go","go-package","golang","spider","web-crawler","web-scraper","web-scraping","web-scraping-software","website-scraper"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oceanside-chess.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-22T06:37:16.000Z","updated_at":"2025-06-10T15:13:28.000Z","dependencies_parsed_at":"2025-03-30T00:24:19.212Z","dependency_job_id":"b7336161-c70e-4160-9ede-806b733bf0ce","html_url":"https://github.com/oceanside-chess/email-scraper","commit_stats":null,"previous_names":["pythoript/email-scraper","oceanside-chess/email-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oceanside-chess/email-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oceanside-chess%2Femail-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oceanside-chess%2Femail-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oceanside-chess%2Femail-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oceanside-chess%2Femail-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oceanside-chess","download_url":"https://codeload.github.com/oceanside-chess/email-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oceanside-chess%2Femail-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30630038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T17:32:55.572Z","status":"ssl_error","status_checked_at":"2026-03-17T17:32:38.732Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","email-extraction","email-extractor","email-scraper","email-validation","go","go-package","golang","spider","web-crawler","web-scraper","web-scraping","web-scraping-software","website-scraper"],"created_at":"2025-12-30T05:00:44.708Z","updated_at":"2026-03-17T20:04:28.090Z","avatar_url":"https://github.com/oceanside-chess.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Email Scraper\n\nThis project is designed to defeat as many email obfuscation methods as possible, creating a single bot capable of crawling the web and harvesting emails. It supports common and uncommon obfuscation methods such as Cloudflare email protection, ROT Cipher, HTML entity decoding, RTL (Right-to-Left) obfuscation, JavaScript-based obfuscation, SVG-encoded emails, Hex and Unicode obfuscation, object and iframe embedded addresses, JavaScript hrefs, splitting addresses with comments, Base64 encoding, basic AJAX and API request obfuscation, text-based obfuscation, and many more coming soon!\n\n## Features\n\n- **Email Extraction**: Scrapes email addresses from HTML content.\n- **Obfuscation Handling**: Decodes obfuscated emails, including JavaScript-based methods.\n- **Depth-based Crawling**: Crawls through websites up to a specified depth, staying within the domain or subdirectories.\n- **Email Validation**: Validates email addresses against known standards and checks DNS records for each domain.\n- **Logging**: Outputs logs to a file for debugging and analysis.\n\n## Installation\n\n1. Ensure Go is installed on your system. [Download Go](https://golang.org/dl/).\n2. Clone the repository or download the source code.\n\n```bash\ngit clone https://github.com/Pythoript/email-scraper.git\ncd email-scraper\n```\n\n3. Install dependencies:\n\n```bash\ngo mod tidy\n```\n\n4. Compile the project:\n\n```bash\ngo build -o run\n```\n\n### Command-Line Arguments\n\n- `URL` (required): The URL where the crawl starts.\n- `-v`, `--verbose`: Enable verbose logging.\n- `--disable-cookies`: Disable cookies during requests.\n- `--log \u003clogfile\u003e`: Log output to the specified file.\n- `-o`, `--output \u003cfilename\u003e`: Output file to save scraped emails (default: `emails.txt`).\n- `--skip-validation`: Skip the email validation.\n- `--user-agent \u003cuser-agent\u003e`: Custom User-Agent string for requests.\n- `--max-depth \u003cdepth\u003e`: Set the maximum crawling depth (default: 3).\n- `--domain-mode \u003cmode\u003e`: Set crawling domain mode:\n  - `1`: Stay within the current site (default).\n  - `2`: Explore subdirectories.\n  - `3`: Unrestricted.\n\n### Example\n\nTo run the crawler with verbose output, skip email validation, and save emails to a file:\n\n```bash\n./run https://example.com --verbose --skip-validation --output emails.txt\n```\n\n## Functionality Breakdown\n\n### Email Extraction\n\n- Extracts emails from:\n  - Normal email addresses found in the page content.\n  - Obfuscated emails (like `data-cfemail` attributes).\n  - Emails encoded in SVG images.\n  - Emails obfuscated in JavaScript.\n\n### Depth-based Crawling\n\nThe crawler supports multiple levels of recursion, allowing it to traverse deeper into a website. The `--max-depth` flag controls how many levels deep the crawler will go.\n\n### Logging\n\nLogs are generated for important actions, errors, and other debugging information. You can specify a log file using the `--log` flag.\n\n## TODO\n\n- Add OCR support.\n- Capture redirects to `mailto`.\n- Support CSS pseudo-element encoding.\n- Remove non-visible HTML elements.\n\n## License\n\nThis project is licensed under the AGPL-3.0 License - see the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foceanside-chess%2Femail-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foceanside-chess%2Femail-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foceanside-chess%2Femail-scraper/lists"}