{"id":29604764,"url":"https://github.com/levonhak/email-scraper-google-sheets","last_synced_at":"2026-05-12T23:38:37.947Z","repository":{"id":304813643,"uuid":"1020073582","full_name":"LevonHak/email-scraper-google-sheets","owner":"LevonHak","description":"A robust Python scraper that extracts contact emails, LinkedIn \u0026 Facebook URLs from websites, syncing results into Google Sheets.","archived":false,"fork":false,"pushed_at":"2025-07-15T15:25:48.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-15T23:00:06.600Z","etag":null,"topics":["automation","colab","contact-finder","data-extraction","email-scraper","facebook-scraper","google-sheets","linkedin-scraper","playwright","python-script","regex","web-scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LevonHak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-15T09:50:26.000Z","updated_at":"2025-07-15T15:25:51.000Z","dependencies_parsed_at":"2025-07-16T00:38:42.622Z","dependency_job_id":"5f2a90cf-8bcd-4c7f-a5c3-ca5102beb32c","html_url":"https://github.com/LevonHak/email-scraper-google-sheets","commit_stats":null,"previous_names":["levonhak/email-scraper-google-sheets"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/LevonHak/email-scraper-google-sheets","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LevonHak%2Femail-scraper-google-sheets","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LevonHak%2Femail-scraper-google-sheets/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LevonHak%2Femail-scraper-google-sheets/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LevonHak%2Femail-scraper-google-sheets/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LevonHak","download_url":"https://codeload.github.com/LevonHak/email-scraper-google-sheets/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LevonHak%2Femail-scraper-google-sheets/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278159465,"owners_count":25939995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","colab","contact-finder","data-extraction","email-scraper","facebook-scraper","google-sheets","linkedin-scraper","playwright","python-script","regex","web-scraper"],"created_at":"2025-07-20T16:00:25.918Z","updated_at":"2025-10-03T12:16:54.914Z","avatar_url":"https://github.com/LevonHak.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# email-scraper-google-sheets\nA robust Python scraper that extracts contact emails, LinkedIn \u0026amp; Facebook URLs from websites, syncing results into Google Sheets.\n\n# PLEASE NOTE THAT THE CODE IS WRITTEN ON GOOGLE COLAB.\n\n# Email Scraper with Google Sheets Integration\n\nA powerful Python web scraper that extracts contact emails, Facebook and LinkedIn profile links, and source URLs from websites - then syncs everything into Google Sheets using a service account.\n\n##  Features\n\n-  Multi-phase scraping: homepage, contact pages, and subpages\n-  Smart email extraction: handles obfuscation and scores matches based on domain, keywords, and frequency\n-  Social media detection: identifies relevant Facebook \u0026 LinkedIn links while skipping generic ones\n-  Google Sheets sync: writes results directly to a specified sheet with retry and validation logic\n-  Timeout control and interruption support for long-running processes\n-  Real-time logging with timestamped outputs for easy traceability\n\n##  Setup Instructions\n\n### 1. Google Colab or Local Jupyter Notebook\n\nThis scraper is designed to run in [Google Colab](https://colab.research.google.com) or any Python notebook environment. If you're using a restricted device (e.g., admin-controlled macOS), Colab is ideal.\n\n### 2. Service Account Setup\n\nTo authenticate with Google Sheets and Drive:\n\n- Go to Google Cloud Console → create a **Service Account**\n- Enable the **Google Sheets API** and **Google Drive API**\n- Download credentials and name the file exactly:  \n  **`service-account.json`**\n- Upload it to your Colab environment or place it in your working directory\n\n\u003e 🔐 Your script specifically looks for the file at `/content/service-account.json`, so do not rename it!\n\n### 3. Google Sheet Structure\n\nYour sheet should have these columns (headers starting from the first row):\n\n | Column Name         | Description                                     |\n |---------------------|-------------------------------------------------|\nA| `Website`           | Website URL to analyze                          |\nB| `Emails`            | Valid domain-matching email                     |\nC| `Suspicious Email`  | Valid email that doesn't match the domain       |\nD| `Facebook URL`      | Scraped Facebook profile link                   |\nE| `LinkedIn URL`      | Scraped LinkedIn page/company profile           |\nF| `Email Source URL`  | Page where the email was found                  |\n\n\u003e 📝 If any column is missing, the script will automatically create and populate it with default values.\n\n### 4. Paste Your Google Sheet ID\n\nInside the `main()` function, there is a variable called:\nsheet_id = \"PASTE-YOUR-ID-HERE\"\n\n\n5. **Run the Script**: Start the main function to launch threaded scraping.\n\n## 📦 Dependencies\n\nMake sure these packages are installed (the script handles it automatically):\n\n\n## 🧑‍💻 Author\n\nBuilt by [LevonHak](https://github.com/LevonHak) - automation enthusiast \u0026 web scraping architect.\n\n\n---\n\n## ⚖️ LICENSE\n\nMIT License\n\nCopyright (c) 2025 LevonHak\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nIn the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software...\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevonhak%2Femail-scraper-google-sheets","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flevonhak%2Femail-scraper-google-sheets","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flevonhak%2Femail-scraper-google-sheets/lists"}