Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ashad001/emailscraping

Email Scraping in python
https://github.com/ashad001/emailscraping

emailscraping linkedin-sc logging python scraping

Last synced: about 1 month ago
JSON representation

Email Scraping in python

Host: GitHub
URL: https://github.com/ashad001/emailscraping
Owner: Ashad001
Created: 2023-11-05T17:14:55.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-03-29T10:43:03.000Z (11 months ago)
Last Synced: 2024-11-11T05:07:23.645Z (3 months ago)
Topics: emailscraping, linkedin-sc, logging, python, scraping
Language: Python
Homepage:
Size: 56.6 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Email Scraping with Selenium

This project is a Python-based educational demonstration of how to scrape emails from web pages using Selenium and BeautifulSoup. The purpose of this project is to provide a hands-on learning experience for those interested in web scraping and data extraction.

## Project Description

The script navigates through Google search results, looking for LinkedIn profiles related to specific marketing tags. It then extracts email addresses found on these pages. The extracted emails are stored in a DataFrame along with the associated tag and country, and finally exported to a CSV file.

## Technologies Used

- Python
- Selenium WebDriver
- BeautifulSoup
- pandas

## How to Run

1. Install the required Python libraries with pip:

```bash
pip install selenium beautifulsoup4 pandas
```

2. Run the script:

```bash
python main.py
```

## Note

This project is for educational purposes only. Web scraping should be done responsibly and in accordance with the terms of service of the website being scraped. Always respect privacy and do not use this for spam or any form of unsolicited communication.

## Future Improvements

- Implement a more robust error handling system.
- Improve the email extraction process to reduce false positives.

## Acknowledgements

- [Selenium](https://www.selenium.dev/)
- [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [pandas](https://pandas.pydata.org/)