https://github.com/gajendrasharma-github/web-scraping
Using Selenium and Beautiful Soup
https://github.com/gajendrasharma-github/web-scraping
beautifulsoup python scraping-websites selenium
Last synced: 5 months ago
JSON representation
Using Selenium and Beautiful Soup
- Host: GitHub
- URL: https://github.com/gajendrasharma-github/web-scraping
- Owner: gajendrasharma-github
- Created: 2024-08-16T14:08:05.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-19T18:22:55.000Z (almost 2 years ago)
- Last Synced: 2025-05-16T07:10:04.387Z (about 1 year ago)
- Topics: beautifulsoup, python, scraping-websites, selenium
- Language: Jupyter Notebook
- Homepage:
- Size: 721 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Guide to the repositories:
`1.` Scraping Ambition Box to gather company reviews, ratings, and other important information [Link](https://github.com/gajendrasharma-github/Web-Scraping/blob/main/Ambition%20Box%20Data%20Scraping.ipynb)
`2.` Scraping Ajio using Selenium with step by step process and guiding notes [Link](https://github.com/gajendrasharma-github/Web-Scraping/blob/main/Webscraping%20Selenium%20Ajio%20with%20Notes.ipynb)
`3.` Scraping Laptop Details from Amazon for a comprehensive analysis [Link](https://github.com/gajendrasharma-github/Web-Scraping/blob/main/Scraping%20Laptop%20Details%20from%20Amazon.ipynb)
`4.` Scraping Laptop Details from Amazon for the brand Asus [Link](https://github.com/gajendrasharma-github/Web-Scraping/blob/main/Extracting%20Laptop%20Details%20for%20Brand%20Asus%20Using%20Selenium.ipynb)
`5.` Scraping Election Outcomes from Election Commision of India Results Website [Link](https://github.com/gajendrasharma-github/Web-Scraping/blob/main/Election%20Results%20Scraping.ipynb)
## Introduction to Web Scraping
Web scraping is the automated process of extracting information from websites. It involves fetching the HTML content of a web page, parsing the data, and extracting the desired information for analysis or further processing. Web scraping is a powerful technique for gathering large amounts of data quickly and efficiently, often used for tasks such as price comparison, product reviews analysis, job listings aggregation, and more.
### Key Components of Web Scraping:
1. **HTML Parsing:** The process of breaking down the HTML structure of a webpage to access specific elements like text, images, and links.
2. **Data Extraction:** Identifying and extracting the relevant information from the parsed HTML, such as product prices, names, reviews, and more.
3. **Handling Dynamic Content:** Many websites use JavaScript to load content dynamically. Scraping such sites often requires simulating a real browser to capture all data.
4. **Data Storage:** Once the data is extracted, it is often stored in a structured format such as CSV, JSON, or a database for further analysis.
## Project Overview
This repository contains a series of scripts and notebooks developed from scratch to scrape data from various popular websites including Amazon, Ambition Box, Ajio, and others. Each script demonstrates a practical approach to web scraping, from simple static pages to more complex dynamic websites that require advanced techniques.
### Websites Scraped:
- **Amazon:** Extracting product details such as names, prices, ratings, and reviews.
- **Ambition Box:** Gathering company reviews, ratings, and employee feedback.
- **Ajio:** Scraping product listings, prices, discounts, and availability.
### Libraries and Tools Used:
- **BeautifulSoup:** For parsing HTML and navigating the page structure to extract specific elements.
- **Requests:** For sending HTTP requests to fetch the HTML content of web pages.
- **Selenium:** Used for handling dynamic content and interacting with JavaScript-rendered pages.
- **Pandas:** For storing and manipulating the scraped data in a tabular format.