https://github.com/pb319/scrap_with_selenium

Let's dive deeper into the domain of web scraping using Selenium.
https://github.com/pb319/scrap_with_selenium

beautifulsoup pandas pandas-dataframe python python-script selenium

Last synced: 5 months ago
JSON representation

Let's dive deeper into the domain of web scraping using Selenium.

Host: GitHub
URL: https://github.com/pb319/scrap_with_selenium
Owner: pb319
Created: 2024-08-24T00:40:44.000Z (11 months ago)
Default Branch: main
Last Pushed: 2024-08-25T07:23:40.000Z (11 months ago)
Last Synced: 2025-02-01T15:03:50.452Z (5 months ago)
Topics: beautifulsoup, pandas, pandas-dataframe, python, python-script, selenium
Language: HTML
Homepage:
Size: 584 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Scrap_with_Selenium

Let's dive deeper into the domain of web scraping using Selenium. This repository leverages automation tool Selenium to scrap web pages and later on uses Beautiful Soup to parse HTML files to fetch specific elements of concern.

### Table of Contents

-  [Resources](https://github.com/pb319/Scrap_with_Selenium#resource) 

-  [Objective](https://github.com/pb319/Scrap_with_Selenium#objective)

-  [Approach](https://github.com/pb319/Scrap_with_Selenium#approach)

-  [Output Files](https://github.com/pb319/Scrap_with_Selenium#output-files)

#### Resource:

- Youtube Video Link: [Click Here](https://www.youtube.com/watch?v=XI5_nsClCYI&t=197s)

- Tech Stack: `Selenium`, `Beautiful Soup`, `Pandas`

- Selenium Getting Started: [Selenium](https://selenium-python.readthedocs.io/getting-started.html)

- Beautiful Soup: [Beautiful Soup](https://beautiful-soup-4.readthedocs.io/en/latest/#quick-start)

#### Objective:

- Create a database of laptops available on `amazon.in`.

#### Approach:

- Export HTML formatted search results one by one from all available pages in the local machine.

- Fetch multiple elements (`title, price, link`) from the HTML files.

- Finally export it as a CSV formatted file.

#### Output Files:

-  [Python Script](https://github.com/pb319/Scrap_with_Selenium/blob/main/collect.py)

-  [HTML Files](https://github.com/pb319/Scrap_with_Selenium/tree/main/Data)

-  [CSV File](https://github.com/pb319/Scrap_with_Selenium/blob/main/data.csv)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pb319/scrap_with_selenium

Awesome Lists containing this project

README