https://github.com/hjsblogger/web-crawling-with-python
Demonstration of Web Crawling using Python and Beautiful Soup
https://github.com/hjsblogger/web-crawling-with-python
beautifulsoup beautifulsoup4 lambdatest python python3 web-crawler web-crawling web-crawling-and-scraping
Last synced: 10 months ago
JSON representation
Demonstration of Web Crawling using Python and Beautiful Soup
- Host: GitHub
- URL: https://github.com/hjsblogger/web-crawling-with-python
- Owner: hjsblogger
- Created: 2025-05-05T05:40:55.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-03T17:17:54.000Z (11 months ago)
- Last Synced: 2025-07-03T18:30:01.029Z (11 months ago)
- Topics: beautifulsoup, beautifulsoup4, lambdatest, python, python3, web-crawler, web-crawling, web-crawling-and-scraping
- Language: Python
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web Crawling with Python
Image generated using Grok
In this 'Web Crawling with Python' repo, we have covered the following scenario:
Unique links from [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/) are crawled using Beautiful Soup. Content (i.e., product meta-data) from the crawled content is than scraped with Beautiful Soup. I have a detailed blog & repo on **Web Scraping with Python**, details below:
* [Blog - Web Scraping with Python](https://www.lambdatest.com/blog/web-scraping-with-python/)
* [Repo - Web Scraping with Python](https://github.com/hjsblogger/web-scraping-with-python)
## Pre-requisites for test execution
**Step 1**
Create a virtual environment by triggering the *virtualenv venv* command on the terminal
```bash
virtualenv venv
```

**Step 2**
Navigate the newly created virtual environment by triggering the *source venv/bin/activate* command on the terminal
```bash
source venv/bin/activate
```
Follow steps(3) and (4) for performing web scraping on LambdaTest Cloud Grid:
**Step 3**
Run the *make install* command on the terminal to install the desired packages (or dependencies) - Beautiful Soup,urrlib3, etc.
```bash
make install
```

With this, all the dependencies and environment variables are set. We are all set for web crawling with Beautiful Soup (bs4).
## Web Crawling using Beautiful Soup
Follow the below mentioned steps to for crawling the [LambdaTest E-commerce Playground](https://ecommerce-playground.lambdatest.io/)
**Step 1**
Trigger the command ```make clean``` to clean the remove _pycache_ folder(s) and .pyc files
**Step 2**
Trigger the ```make crawl-ecommerce-playground``` command on the terminal to crawl the LambdaTest E-Commerce Playground


As seen above, the content from LambdaTest E-commerce playground was crawled successfully! Fifty five unique product links are now available to be scraped in the exported JSON file (i.e., ecommerce_crawled_urls.json)
**Step 3**
Now that we have the crawled information, trigger the ```make scrap-ecommerce-playground``` command on the terminal to scrap the product information (i.e., product name, product price, product availability, etc.) from the exported JSON file.


Also, all the 55 links on are scraped without any issues!
## Have feedback or need assistance?
Feel free to fork the repo and contribute to make it better! Email to [himanshu[dot]sheth[at]gmail[dot]com](mailto:himanshu.sheth@gmail.com) for any queries or ping me on the following social media sites:
LinkedIn: [@hjsblogger](https://linkedin.com/in/hjsblogger)
Twitter: [@hjsblogger](https://www.twitter.com/hjsblogger)