Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup
Web Scrapped a popular tech gadgets website using Selenium and BeautifulSoup, also performed Data Analysis on scrapped data.
https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup
beautifulsoup data datacleaning datagathering eda exploratory-data-analysis python selenium webscraping
Last synced: 28 days ago
JSON representation
Web Scrapped a popular tech gadgets website using Selenium and BeautifulSoup, also performed Data Analysis on scrapped data.
- Host: GitHub
- URL: https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup
- Owner: dilkushsingh
- License: mit
- Created: 2024-08-09T09:26:55.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-08-24T11:21:34.000Z (3 months ago)
- Last Synced: 2024-10-11T18:22:29.109Z (28 days ago)
- Topics: beautifulsoup, data, datacleaning, datagathering, eda, exploratory-data-analysis, python, selenium, webscraping
- Language: Jupyter Notebook
- Homepage: https://medium.com/@dilkushsingh/web-scraping-tech-gadgets-site-unleashing-the-power-of-selenium-and-beautifulsoup-e97f6f47689f
- Size: 1.47 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WebScrapping with Selenium and BeautifulSoup
This project involves web scraping smartphones data from a popular tech gadgets website, followed by data cleaning and Exploratory Data Analysis (EDA). The goal is to gather comprehensive data on smartphones available on that website and analyze it to gain insights.
## Project Overview
1. **Web Scraping**: [scrapping.py](https://github.com/dilkushsingh/WebScraping-with-Selenium-and-BeautifulSoup/blob/main/scrapping.py) fetches HTML source code from website. It handles multiple pages dynamically and stores the data in a single HTML file.
2. **Data Gathering**: Loaded the scrapped HTML content into soup object and then extracted the relevant info and created DataFrame from it. Extracted the DataFrame to csv file for make it suitable for further analysis.## Web Scraping
The web scraping is implemented using Selenium. The script dynamically handles page navigation to scrape data from multiple pages. Here’s a brief overview of the scraping process:
- **Desired Page**: The smartphones data is on a specific page which is navigated through clicking specific elements.
- **Filtering Data**: Applied the filter that price should be greater than 5000Rs.
- **Navigating Pages**: The scraper starts from the first page and navigates through all available pages up to there are no more pages.
- **Data Storage**: Saves the extracted data repeatedly for every page into a single HTML file.## Data Fetching
This Webscrapped data is directly available on github along with notebooks for Data Cleaning and EDA, if you directly want to use data in jupyter notebook then write following code:
```bash
pip3 install kaggle
!kaggle datasets download -d dilkushsingh/smartphones-dataset-upto-july24
```## Prerequisites
Ensure you have the following installed:
- **Python 3.x**
- **Selenium** if not then install using cmd command $pip install selenium
- **BeautifulSoup** if not then install using cmd command $pip install beautifulsoup4
- **Web Browser** prefer web browser that have compatible web driver.
- **Chromedriver** web driver according to my web browser.