https://github.com/nikhleshshukla123/web-scraping-using-python
scrapes multiple pages of Amazon search results using python.
https://github.com/nikhleshshukla123/web-scraping-using-python
beautifulsoup4 numpy pandas python
Last synced: 4 months ago
JSON representation
scrapes multiple pages of Amazon search results using python.
- Host: GitHub
- URL: https://github.com/nikhleshshukla123/web-scraping-using-python
- Owner: Nikhleshshukla123
- Created: 2025-10-03T13:50:10.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-10-03T14:09:17.000Z (4 months ago)
- Last Synced: 2025-10-03T16:10:05.041Z (4 months ago)
- Topics: beautifulsoup4, numpy, pandas, python
- Language: Jupyter Notebook
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Amazon Product Scraper
## Project Overview
This project is a Python-based web scraper that collects product information from Amazon search results. Using **Requests** and **BeautifulSoup**, it extracts details such as product title, price, rating, number of reviews, and availability status, and saves the data into a CSV file for further analysis.
---
## Features
- Scrapes multiple pages of Amazon search results.
- Extracts product information:
- Product Title
- Price
- Rating
- Number of Reviews
- Availability
- Saves the collected data to a CSV file.
- Includes **error handling** and polite delays to avoid blocking by Amazon.
---
## Technologies Used
- Python 3.x
- Libraries:
- `requests` – for making HTTP requests
- `BeautifulSoup` – for parsing HTML
- `pandas` – for data storage and manipulation
- `numpy` – for handling missing values
- `time` and `random` – for delays between requests
---
## Project Structure
amazon_scraper/
│
├── amazon_scraper.ipynb # Jupyter notebook with the full scraping workflow
├── amazon_data.csv # CSV output file with scraped data
├── README.md # Project documentation
└── requirements.txt # Required Python libraries
---
## Usage
1. Clone the repository or oprn the notebook in kaggle/local jupyter.
2. Install the required libraries
```bash
pip install -r requirements.txt
3. Open the notebook and run the cells in order.
4. Scraped product data will be saved as:
/kaggle/working/amazon_data.csv (Kaggle)
or amazon_data.csv (Local)
5. You can modify the search term and number of pages to scrape:
BASE_URL = "https://www.amazon.com/s?k=playstation+5&crid=3G12O79UMR7B1&sprefix=playstation+5%2Caps%2C414&ref=nb_sb_noss_1"
TOTAL_PAGES = 5
6. Output Example