{"id":13826124,"url":"https://github.com/darideveloper/phone-emails-scraper-multithreading","last_synced_at":"2026-03-11T10:36:43.299Z","repository":{"id":141224054,"uuid":"586142453","full_name":"darideveloper/phone-emails-scraper-multithreading","owner":"darideveloper","description":"Project for extract emails and phones from a list of web pages, with multithreading, using requests, bs4, regex and selenium for get more data.","archived":false,"fork":false,"pushed_at":"2023-11-27T05:08:30.000Z","size":42,"stargazers_count":6,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-11-20T04:33:32.578Z","etag":null,"topics":["python","script","web-automation","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/darideveloper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-01-07T04:39:34.000Z","updated_at":"2024-01-11T00:44:33.000Z","dependencies_parsed_at":"2023-11-27T06:24:45.743Z","dependency_job_id":null,"html_url":"https://github.com/darideveloper/phone-emails-scraper-multithreading","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/darideveloper/phone-emails-scraper-multithreading","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darideveloper%2Fphone-emails-scraper-multithreading","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darideveloper%2Fphone-emails-scraper-multithreading/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darideveloper%2Fphone-emails-scraper-multithreading/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darideveloper%2Fphone-emails-scraper-multithreading/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/darideveloper","download_url":"https://codeload.github.com/darideveloper/phone-emails-scraper-multithreading/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darideveloper%2Fphone-emails-scraper-multithreading/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264365456,"owners_count":23596837,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["python","script","web-automation","web-scraping"],"created_at":"2024-08-04T09:01:32.424Z","updated_at":"2026-03-11T10:36:43.257Z","avatar_url":"https://github.com/darideveloper.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\u003cdiv\u003e\u003ca href='https://github.com/darideveloper/phone-emails-scraper-multithreading/blob/master/LICENSE' target='_blank'\u003e\n            \u003cimg src='https://img.shields.io/github/license/darideveloper/phone-emails-scraper-multithreading.svg?style=for-the-badge' alt='MIT License' height='30px'/\u003e\n        \u003c/a\u003e\u003ca href='https://www.linkedin.com/in/francisco-dari-hernandez-6456b6181/' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=LinkedIn\u0026color=0A66C2\u0026logo=LinkedIn\u0026logoColor=FFFFFF\u0026label=' alt='Linkedin' height='30px'/\u003e\n            \u003c/a\u003e\u003ca href='https://t.me/darideveloper' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=Telegram\u0026color=26A5E4\u0026logo=Telegram\u0026logoColor=FFFFFF\u0026label=' alt='Telegram' height='30px'/\u003e\n            \u003c/a\u003e\u003ca href='https://github.com/darideveloper' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=GitHub\u0026color=181717\u0026logo=GitHub\u0026logoColor=FFFFFF\u0026label=' alt='Github' height='30px'/\u003e\n            \u003c/a\u003e\u003ca href='https://www.fiverr.com/darideveloper?up_rollout=true' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=Fiverr\u0026color=222222\u0026logo=Fiverr\u0026logoColor=1DBF73\u0026label=' alt='Fiverr' height='30px'/\u003e\n            \u003c/a\u003e\u003ca href='https://discord.com/users/992019836811083826' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=Discord\u0026color=5865F2\u0026logo=Discord\u0026logoColor=FFFFFF\u0026label=' alt='Discord' height='30px'/\u003e\n            \u003c/a\u003e\u003ca href='mailto:darideveloper@gmail.com?subject=Hello Dari Developer' target='_blank'\u003e\n                \u003cimg src='https://img.shields.io/static/v1?style=for-the-badge\u0026message=Gmail\u0026color=EA4335\u0026logo=Gmail\u0026logoColor=FFFFFF\u0026label=' alt='Gmail' height='30px'/\u003e\n            \u003c/a\u003e\u003c/div\u003e\u003cdiv align='center'\u003e\u003cbr\u003e\u003cbr\u003e\u003cimg src='https://github.com/darideveloper/phone-emails-scraper-multithreading/raw/master/imgs/logo.png' alt='Phone Emails Scraper Multithreading' height='80px'/\u003e\n\n# Phone Emails Scraper Multithreading\n\nProject for extract emails and phones from a list of web pages, with multithreading, using requests, bs4, regex and selenium for get more data.\n\nProject type: **client**\n\n\u003c/div\u003e\u003cbr\u003e\u003cdetails\u003e\n            \u003csummary\u003eTable of Contents\u003c/summary\u003e\n            \u003col\u003e\n\u003cli\u003e\u003ca href='#buildwith'\u003eBuild With\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#media'\u003eMedia\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#details'\u003eDetails\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#install'\u003eInstall\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#settings'\u003eSettings\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#run'\u003eRun\u003c/a\u003e\u003c/li\u003e\n\u003cli\u003e\u003ca href='#roadmap'\u003eRoadmap\u003c/a\u003e\u003c/li\u003e\u003c/ol\u003e\n        \u003c/details\u003e\u003cbr\u003e\n\n# Build with\n\n\u003cdiv align='center'\u003e\u003ca href='https://www.python.org/' target='_blank'\u003e \u003cimg src='https://cdn.svgporn.com/logos/python.svg' alt='Python' title='Python' height='50px'/\u003e \u003c/a\u003e\u003ca href='https://requests.readthedocs.io/en/latest/' target='_blank'\u003e \u003cimg src='https://requests.readthedocs.io/en/latest/_static/requests-sidebar.png' alt='Requests' title='Requests' height='50px'/\u003e \u003c/a\u003e\u003ca href='https://www.crummy.com/software/BeautifulSoup/' target='_blank'\u003e \u003cimg src='https://github.com/darideveloper/darideveloper/blob/main/imgs/logo%20bs4.png?raw=true' alt='BeautifulSoup4' title='BeautifulSoup4' height='50px'/\u003e \u003c/a\u003e\u003ca href='https://www.selenium.dev/' target='_blank'\u003e \u003cimg src='https://cdn.svgporn.com/logos/selenium.svg' alt='Selenium' title='Selenium' height='50px'/\u003e \u003c/a\u003e\u003c/div\u003e\n\n# Details\n\nThis project is for extract emails and phones from a list of web pages, with multithreading, using requests, bs4, regex and selenium for get more data.\n\nThe script extract emails and phones from the web pages in the `input .txt` file, and save the output in the `output.csv` file.\n\nThe script use multithreading for extract data from the web pages faster.\n\nThe script use selenium (google chrome) for get more data from the web pages, because some web pages use javascript to show the data. You can use or not it (see the `USE_SELENIUM` variable in the `.env` file).\n\nYou can setup the number of threads in the `.env` file (see the `THREADS` variable).\n\n# Install\n\n## Prerequisites\n\n* [Google chrome](https://www.google.com/intl/es-419/chrome/)\n* [Python \u003e=3.10](https://www.python.org/)\n* [Git](https://git-scm.com/)\n\n## Installation\n\n1. Clone the repo\n   ```sh\n   git clone https://github.com/darideveloper/phone-emails-scraper-multithreading\n   ```\n2. Install python packages (opening a terminal in the project folder)\n   ```sh\n   python -m pip install -r requirements.txt \n   ```\n\n# Settings\n\n1. Set your option in the file `.env`\n2. Put the web pages in the `input.csv` file\n\n# Run\n\n1. Run the project folder with python: \n    ```sh\n    python .\n    ```\n2. Wait until the script finish, and check the `output.csv` file in the project folder\n\n# Roadmap\n\n- [x] Extract email and phone using requests and bs4\n- [x] Extract email and phone using regex\n- [x] Extract email and phone using selenium\n- [x] Multithreading\n- [x] `.env` file for options\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarideveloper%2Fphone-emails-scraper-multithreading","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarideveloper%2Fphone-emails-scraper-multithreading","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarideveloper%2Fphone-emails-scraper-multithreading/lists"}