Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ashad001/emailscraping
https://github.com/ashad001/emailscraping
Last synced: 5 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/ashad001/emailscraping
- Owner: Ashad001
- Created: 2023-11-05T17:14:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-29T10:43:03.000Z (8 months ago)
- Last Synced: 2024-03-30T10:58:19.125Z (8 months ago)
- Language: Python
- Size: 56.6 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Email Scraping with Selenium
This project is a Python-based educational demonstration of how to scrape emails from web pages using Selenium and BeautifulSoup. The purpose of this project is to provide a hands-on learning experience for those interested in web scraping and data extraction.
## Project Description
The script navigates through Google search results, looking for LinkedIn profiles related to specific marketing tags. It then extracts email addresses found on these pages. The extracted emails are stored in a DataFrame along with the associated tag and country, and finally exported to a CSV file.
## Technologies Used
- Python
- Selenium WebDriver
- BeautifulSoup
- pandas## How to Run
1. Install the required Python libraries with pip:
```bash
pip install selenium beautifulsoup4 pandas
```2. Run the script:
```bash
python main.py
```## Note
This project is for educational purposes only. Web scraping should be done responsibly and in accordance with the terms of service of the website being scraped. Always respect privacy and do not use this for spam or any form of unsolicited communication.
## Future Improvements
- Implement a more robust error handling system.
- Improve the email extraction process to reduce false positives.## Acknowledgements
- [Selenium](https://www.selenium.dev/)
- [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [pandas](https://pandas.pydata.org/)