https://github.com/ddayto21/lead-scraper

Repository contains a web crawler that searches for emails in a webpage, along with a webscraping script that collects leads from various webpages online filters those links based on some criteria and adds the new links to a queue. All the HTML or some specific information is extracted to be processed by a different pipeline.
https://github.com/ddayto21/lead-scraper

beautifulsoup4 python requests webcrawler webscraper yellow-pages

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/ddayto21/lead-scraper
Owner: ddayto21
Created: 2022-07-18T23:10:19.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-07-19T00:22:42.000Z (about 3 years ago)
Last Synced: 2025-04-30T08:54:05.070Z (6 months ago)
Topics: beautifulsoup4, python, requests, webcrawler, webscraper, yellow-pages
Language: Python
Homepage:
Size: 50.8 KB
Stars: 15
Watchers: 1
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Repository Overview

This repository was built to provide business owners a way to save time by collecting thousands of business leads from Yellow Pages, a website that contains over 27 million businesses in the United States. 

![Python-Cover](images/python-image.jpg)

We use 'requests', a Python library to collect large amounts of unstructured data from Yellow Pages. Then, we use BeautifulSoup to parse relevant information from HTML format. After this process, we use Pandas to create dataframes and save those leads to .CSV files that can be used for marketing campaigns. 

## Install the 'Requests' Library

```

$ pip install requests

```

## Import the Requests Library 

```python

import requests

```

## Send HTTP Request to Server 

```python

response = requests.get(url)

```

## Extract Relevant Data from Response 

We use BeautifulSoup, a Python library that makes it easy to parse data in HTML files.

### Install the Beautiful Soup Library

```

$ pip install beautifulsoup4

```

### Import the Beautiful Soup Library

```python

 from bs4 import BeautifulSoup

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ddayto21/lead-scraper

Awesome Lists containing this project

README