An open API service indexing awesome lists of open source software.

https://github.com/metaphysics0/scrape-amazon-reviews


https://github.com/metaphysics0/scrape-amazon-reviews

Last synced: 10 months ago
JSON representation

Awesome Lists containing this project

README

          

# Amazon Review Extractor

A simple JavaScript tool to extract 4 and 5-star Amazon reviews from product pages and save them as structured JSON data.

## What it does

This script automatically extracts high-rating reviews (4-5 stars) from Amazon product review pages and organizes them into a clean JSON format with:

- Author name
- Review date
- Review title/header
- Review text
- Product details (size, color, etc.)
- Star rating
- Verification status
- Profile images
- And more metadata

## How to use

### For Non-Developers

1. **Go to an Amazon product's review page**
- Navigate to any Amazon product
- Click on the customer reviews section
- Make sure you can see the individual reviews on the page

2. **Open Browser Developer Tools**
- **Chrome/Edge**: Press `F12` or right-click → "Inspect"
- **Firefox**: Press `F12` or right-click → "Inspect Element"
- **Safari**: Enable Developer menu first, then press `Option + Cmd + I`

3. **Go to the Console tab**
- In the developer tools panel, click on the "Console" tab

4. **Copy and paste the script**
- Copy the entire JavaScript code from `amazon-review-extractor.js`
- Paste it into the console and press Enter

5. **Navigate through review pages**
- Go to the next page of reviews (click "Next" button)
- Run the script again by typing `extractHighRatingReviews()` and pressing Enter
- Repeat for as many pages as you want

6. **Download your data**
- Type `downloadReviewsAsJSON()` and press Enter
- A JSON file will automatically download with all your collected reviews

### For Developers

```javascript
// Initialize and extract reviews from current page
extractHighRatingReviews();

// Get all collected reviews
const allReviews = getAllReviews();

// Clear the collection
clearReviewCollection();

// Extract without adding to global collection
const currentPageOnly = extractHighRatingReviews(false);

// Download as file
downloadReviewsAsJSON('my-reviews.json');
```

## Output Format

The script generates a JSON array with this structure:

```json
[
{
"author": "John Doe",
"date": "Reviewed in the United States on January 15, 2025",
"review": "Great product! Highly recommend it.",
"header": "Excellent quality",
"metadata": {
"starRating": 5.0,
"productDetails": "Size: Large | Color: Blue",
"verifiedPurchase": "Verified Purchase",
"reviewId": "R1JWWMD5G5H2WV",
"avatarImage": "https://...",
"images": ["https://..."]
}
}
]
```

## Features

- ✅ **Automatic deduplication** - Won't add the same review twice
- ✅ **Multi-page support** - Collect reviews across multiple pages
- ✅ **Persistent storage** - Data persists as you navigate between pages
- ✅ **Error handling** - Gracefully handles missing or malformed data
- ✅ **Export functionality** - Download results as JSON file
- ✅ **Metadata rich** - Captures images, product details, and verification status

## Helpful Commands

While in the browser console:

```javascript
// Check how many reviews you've collected
console.log(`Total reviews: ${getAllReviews().length}`);

// View all collected reviews
console.log(getAllReviews());

// Start over with a fresh collection
clearReviewCollection();

// Download your data
downloadReviewsAsJSON();
```

## Use Cases

- **Market Research**: Analyze customer sentiment and feedback
- **Product Development**: Understand what customers love about products
- **Content Creation**: Gather authentic customer testimonials
- **Data Analysis**: Build datasets for sentiment analysis or ML projects
- **Academic Research**: Study consumer behavior and reviews

## Browser Compatibility

Works in all modern browsers:
- Chrome 60+
- Firefox 55+
- Safari 12+
- Edge 79+

## Important Notes

⚠️ **Legal and Ethical Use**: This tool is for personal research and analysis. Always respect Amazon's Terms of Service and robots.txt. Don't use this for commercial scraping or to violate any terms of service.

⚠️ **Rate Limiting**: Don't run this too aggressively. Navigate pages at a normal human pace to be respectful to Amazon's servers.

⚠️ **Data Privacy**: Be mindful of the personal information in reviews when sharing or publishing extracted data.

## Troubleshooting

**Script not finding reviews?**
- Make sure you're on an Amazon product review page
- Check that reviews are visible on the page
- Try refreshing the page and running the script again

**No data being collected?**
- Ensure you're looking at 4-5 star reviews (the script filters out lower ratings)
- Check the browser console for any error messages

**Downloaded file is empty?**
- Make sure you've run `extractHighRatingReviews()` at least once before downloading
- Check that `getAllReviews().length` returns a number greater than 0

## Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

## License

MIT License - feel free to use this tool for your projects!