https://github.com/mariam-khediri/amazon-scraper

A configurable web scraper that extracts product data from Amazon while evading bot detection
https://github.com/mariam-khediri/amazon-scraper

custom-chrome-flags pandas python selenium selenium-webdriver webdriver-manager

Last synced: about 2 months ago
JSON representation

A configurable web scraper that extracts product data from Amazon while evading bot detection

Host: GitHub
URL: https://github.com/mariam-khediri/amazon-scraper
Owner: mariam-khediri
Created: 2025-05-14T16:50:54.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-15T15:13:58.000Z (about 1 year ago)
Last Synced: 2025-07-18T13:45:54.209Z (11 months ago)
Topics: custom-chrome-flags, pandas, python, selenium, selenium-webdriver, webdriver-manager
Language: Python
Homepage:
Size: 7.81 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Amazon Product Scraper with Selenium

![Python](https://img.shields.io/badge/python-3.8%2B-blue)
![Selenium](https://img.shields.io/badge/selenium-4.15.2-red)

A configurable web scraper that extracts product data from Amazon while evading bot detection.

## Features

- **Stealth scraping** with anti-detection techniques
- **CAPTCHA handling** with manual intervention support
- **CSV export** of product data (title, price, rating)
- **Pagination support** for multi-page scraping
- **Human-like behavior** with randomized delays

## Tech Stack

| Component | Technology |
|-----------|------------|
| Core Language | Python 3.8+ |
| Browser Automation | Selenium WebDriver |
| Chrome Management | webdriver-manager |
| Data Export | pandas |
| Anti-Detection | Custom Chrome flags |

## Installation

1. Clone the repository:
```bash
git clone https://github.com/yourusername/amazon-scraper.git
cd amazon-scraper
```

2. Set up virtual environment:
```bash
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Configuration

Edit `config.py`:
```python
SEARCH_TERM = "wireless headphones" # Your target product
BASE_URL = "https://www.amazon.com" # Regional domain if needed
MAX_PAGES = 3 # Pages to scrape
```

## Usage

Run the scraper:
```bash
python scraper.py
```

For debugging:
```bash
python -u scraper.py # Unbuffered output
```

## File Structure

```
amazon-scraper/
├── scraper.py # Main scraping logic
├── config.py # Configuration settings
├── requirements.txt # Dependencies
└── outputs/ # Generated CSV files
```

## Legal Disclaimer

This project is for educational purposes only. Always:
- Check Amazon's Terms of Service
- Respect robots.txt rules
- Limit request frequency
- Consider using official APIs when available

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mariam-khediri/amazon-scraper

Awesome Lists containing this project

README