Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup

Web Scrapped a popular tech gadgets website using Selenium and BeautifulSoup, also performed Data Analysis on scrapped data.
https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup

beautifulsoup data datacleaning datagathering eda exploratory-data-analysis python selenium webscraping

Last synced: about 1 month ago
JSON representation

Web Scrapped a popular tech gadgets website using Selenium and BeautifulSoup, also performed Data Analysis on scrapped data.

Host: GitHub
URL: https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup
Owner: dilkushsingh
License: mit
Created: 2024-08-09T09:26:55.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-08-24T11:21:34.000Z (6 months ago)
Last Synced: 2024-11-21T08:53:36.172Z (3 months ago)
Topics: beautifulsoup, data, datacleaning, datagathering, eda, exploratory-data-analysis, python, selenium, webscraping
Language: Jupyter Notebook
Homepage: https://medium.com/@dilkushsingh/web-scraping-tech-gadgets-site-unleashing-the-power-of-selenium-and-beautifulsoup-e97f6f47689f
Size: 1.47 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# WebScrapping with Selenium and BeautifulSoup

This project involves web scraping smartphones data from a popular tech gadgets website, followed by data cleaning and Exploratory Data Analysis (EDA). The goal is to gather comprehensive data on smartphones available on that website and analyze it to gain insights.

## Project Overview

1. **Web Scraping**: [scrapping.py](https://github.com/dilkushsingh/WebScraping-with-Selenium-and-BeautifulSoup/blob/main/scrapping.py) fetches HTML source code from website. It handles multiple pages dynamically and stores the data in a single HTML file.
2. **Data Gathering**: Loaded the scrapped HTML content into soup object and then extracted the relevant info and created DataFrame from it. Extracted the DataFrame to csv file for make it suitable for further analysis.

## Web Scraping

The web scraping is implemented using Selenium. The script dynamically handles page navigation to scrape data from multiple pages. Here’s a brief overview of the scraping process:
- **Desired Page**: The smartphones data is on a specific page which is navigated through clicking specific elements.
- **Filtering Data**: Applied the filter that price should be greater than 5000Rs.
- **Navigating Pages**: The scraper starts from the first page and navigates through all available pages up to there are no more pages.
- **Data Storage**: Saves the extracted data repeatedly for every page into a single HTML file.

## Data Fetching
This Webscrapped data is directly available on github along with notebooks for Data Cleaning and EDA, if you directly want to use data in jupyter notebook then write following code:
```bash
pip3 install kaggle
!kaggle datasets download -d dilkushsingh/smartphones-dataset-upto-july24
```

## Prerequisites

Ensure you have the following installed:

- **Python 3.x**
- **Selenium** if not then install using cmd command $pip install selenium
- **BeautifulSoup** if not then install using cmd command $pip install beautifulsoup4
- **Web Browser** prefer web browser that have compatible web driver.
- **Chromedriver** web driver according to my web browser.