https://github.com/thepravin/amazon-web-scripting
https://github.com/thepravin/amazon-web-scripting
amazon jupyter-notebook python web webscraper webscraping webscraping-data
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/thepravin/amazon-web-scripting
- Owner: thepravin
- Created: 2025-02-25T10:51:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-25T10:57:48.000Z (over 1 year ago)
- Last Synced: 2025-10-22T05:20:24.886Z (8 months ago)
- Topics: amazon, jupyter-notebook, python, web, webscraper, webscraping, webscraping-data
- Language: Jupyter Notebook
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Amazon Product Scraper for Samsung Products
This Jupyter Notebook scrapes product information from Amazon's search results for "Samsung" and extracts details such as product titles, prices, and ratings. The data is then cleaned and saved to a CSV file for further analysis.
## Features
- **Web Scraping**: Extracts product details from Amazon search results.
- **Data Cleaning**: Filters out invalid entries (e.g., "Page 1 of 1" prices).
- **CSV Export**: Saves the cleaned dataset to `amaon_data.csv` (note the typo in the filename).
## Requirements
- Python 3.x
- Libraries: `beautifulsoup4`, `requests`, `pandas`, `numpy`
## Installation
1. Install the required libraries:
```bash
pip install beautifulsoup4 requests pandas numpy
## Usage
1. Run the Jupyter Notebook `Main.ipynb`.
2. The script will:
- Send a request to Amazon's search page for "Samsung".
- Extract product links from the search results.
- Scrape title, price, and rating from each product page.
- Clean the data by removing entries with invalid prices.
- Save the results to `amaon_data.csv`.
## Data Output
The final dataset includes the following columns:
- `title`: Product name.
- `price`: Product price (formatted as a string, e.g., `$499.99`).
- `rating`: Product rating (out of 5, extracted as a string like `4.5`).
Example output:
| title | price | rating |
|------------------------------------------------------|-----------|--------|
| SAMSUNG Galaxy S24 FE AI Phone, 128GB Unlocked... | $499.99 | 4.5 |
| SAMSUNG Galaxy Buds 3 Pro AI True Wireless Blu... | $249.99 | 4.1 |
## Notes
- **Selectors**: The script uses specific HTML class/ID selectors (e.g., `productTitle`, `a-offscreen`). These may change over time, requiring updates to the code.
- **User-Agent**: A valid `USER_AGENT` header is included to mimic a real browser request.
🧑💻 Happy coding!