{"id":31030966,"url":"https://github.com/markolofsen/unrealon-parser-amazon","last_synced_at":"2025-09-13T23:57:05.864Z","repository":{"id":310034661,"uuid":"1038434267","full_name":"markolofsen/unrealon-parser-amazon","owner":"markolofsen","description":"Enterprise-grade Amazon scraper with AI-powered automation, anti-detection, and real-time monitoring capabilities.","archived":false,"fork":false,"pushed_at":"2025-08-15T15:10:42.000Z","size":20,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-06T11:53:54.008Z","etag":null,"topics":["amazon","bs4","parser","parsing","proxy","scraper"],"latest_commit_sha":null,"homepage":"https://unrealon.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/markolofsen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-15T07:31:58.000Z","updated_at":"2025-08-17T17:03:23.000Z","dependencies_parsed_at":"2025-08-16T12:17:38.878Z","dependency_job_id":null,"html_url":"https://github.com/markolofsen/unrealon-parser-amazon","commit_stats":null,"previous_names":["markolofsen/unrealon-parser-amazon"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/markolofsen/unrealon-parser-amazon","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markolofsen%2Funrealon-parser-amazon","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markolofsen%2Funrealon-parser-amazon/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markolofsen%2Funrealon-parser-amazon/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markolofsen%2Funrealon-parser-amazon/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/markolofsen","download_url":"https://codeload.github.com/markolofsen/unrealon-parser-amazon/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/markolofsen%2Funrealon-parser-amazon/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274986307,"owners_count":25386112,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-13T02:00:10.085Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon","bs4","parser","parsing","proxy","scraper"],"created_at":"2025-09-13T23:57:02.259Z","updated_at":"2025-09-13T23:57:05.852Z","avatar_url":"https://github.com/markolofsen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🛒 Amazon Parser - Professional Amazon Data Extraction Tool\n\n**Enterprise-grade Amazon scraper with AI-powered automation, anti-detection, and real-time monitoring capabilities.**\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![UnrealOn](https://img.shields.io/badge/UnrealOn-green.svg)](https://pypi.org/project/unrealon/)\n\n## 🎯 Overview\n\n**Amazon Parser** is a production-ready Python tool for extracting comprehensive product data from Amazon.com. Built on the **[UnrealOn](https://unrealon.com)** platform ([PyPI Package](https://pypi.org/project/unrealon/)), it provides enterprise-grade web scraping capabilities with AI-powered automation, anti-bot detection, and real-time orchestration.\n\n**Perfect for**: E-commerce intelligence, price monitoring, product research, competitive analysis, and Amazon data collection at scale.\n\n## 🚀 Ready-to-Use Project\n\n**Get started immediately with our pre-configured Amazon parser:**\n- **Complete Amazon parser** with all configurations and optimizations\n- **Zero Setup**: Clone and run with minimal configuration\n- **Production Ready**: Includes all enterprise features and optimizations\n\n## 🚀 Key Features\n\n### ✅ **AI-Powered Amazon Data Extraction**\n- **Automatic Selector Generation**: AI identifies optimal CSS selectors for Amazon's dynamic layout\n- **Smart Content Analysis**: Intelligent parsing of product listings, prices, ratings, and reviews\n- **Adaptive Extraction**: Handles Amazon's frequent layout changes automatically\n\n### ✅ **Anti-Detection \u0026 Stealth Technology**\n- **Advanced Browser Automation**: Chromium with stealth plugins and anti-detection measures\n- **Proxy Rotation**: Built-in proxy management for IP rotation\n- **Human-like Behavior**: Realistic browsing patterns and timing\n- **Cookie Management**: Persistent sessions and profile management\n\n### ✅ **Enterprise-Grade Architecture**\n- **Real-time Monitoring**: Live parser status and performance metrics\n- **Scalable Design**: Horizontal scaling with load balancing\n- **Error Recovery**: Automatic retry mechanisms and failover\n- **Compliance Ready**: Audit trails and data governance features\n\n### ✅ **Zero Configuration Setup**\n- **Production Defaults**: Optimized settings for Amazon scraping\n- **Auto-detection**: Automatic detection of Amazon page types\n- **Smart Waiting**: Intelligent content loading and dynamic content handling\n\n## 📊 What Data Can You Extract?\n\n### 🛍️ **Product Information**\n```json\n{\n  \"asin\": \"B0DTW26PXY\",\n  \"title\": \"Lenovo IdeaPad 3 15.6\\\" HD Laptop\",\n  \"price\": {\n    \"current\": 799.99,\n    \"currency\": \"USD\",\n    \"original\": 899.99,\n    \"discount_percent\": 11\n  },\n  \"rating\": {\n    \"rating\": 4.5,\n    \"review_count\": 1247,\n    \"rating_text\": \"4.5 out of 5 stars\"\n  },\n  \"availability\": \"In Stock\",\n  \"prime_eligible\": true,\n  \"images\": [\n    {\n      \"url\": \"https://m.media-amazon.com/images/I/71...\",\n      \"is_primary\": true\n    }\n  ],\n  \"url\": \"https://www.amazon.com/dp/B0DTW26PXY\"\n}\n```\n\n### 📋 **Search Results \u0026 Categories**\n- Product listings from search results\n- Category browsing and navigation\n- Pagination handling\n- Filter and sort options\n- Sponsored vs organic listings\n\n### 📈 **Pricing \u0026 Availability**\n- Current and original prices\n- Discount percentages\n- Price history tracking\n- Stock availability status\n- Shipping information\n\n### ⭐ **Reviews \u0026 Ratings**\n- Star ratings (1-5)\n- Review counts\n- Rating distribution\n- Review text extraction\n- Verified purchase badges\n\n## 🏗️ Architecture\n\n### Core Components\n\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Amazon Parser │◄──►│  UnrealOn       │◄──►│  Amazon.com     │\n│   (Client)      │    │  (Platform)     │    │  (Target)       │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n         │                       │                       │\n         ▼                       ▼                       ▼\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   AI Services   │    │   Browser       │    │   Proxy \u0026       │\n│   (LLM)         │    │   Automation    │    │   Stealth       │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### Module Structure\n\n- **`catalog/`**: Product catalog parsing and data models\n- **`extractor/`**: Simple extraction utilities\n- **`amazon_config.py`**: Amazon-specific configuration\n- **`config.env`**: Environment variables and API keys\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.9 or higher\n- Poetry for dependency management\n- OpenRouter API key (for AI-powered extraction)\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/markolofsen/unrealon-parser-amazon.git\ncd unrealon-parser-amazon\n\n# Install dependencies\npoetry install\n\n# Configure environment\ncp config.env.example config.env\n# Edit config.env with your API keys\n```\n\n### Basic Usage\n\n```python\nfrom catalog.parser import AmazonCatalogParser\n\nasync def main():\n    # Initialize parser\n    parser = AmazonCatalogParser()\n    await parser.setup()\n    \n    # Search for products\n    result = await parser.search_products(\"laptop\", max_pages=2)\n    \n    if result.success:\n        print(f\"Found {len(result.data.products)} products\")\n        for product in result.data.products:\n            print(f\"- {product.title}: ${product.price.current}\")\n    \n    await parser.cleanup()\n\n# Run the parser\nimport asyncio\nasyncio.run(main())\n```\n\n### Advanced Usage with AI\n\n```python\nfrom extractor.simple_extractor import SimpleExtractor\n\nasync def ai_extraction():\n    extractor = SimpleExtractor()\n    await extractor.setup()\n    \n    # AI-powered extraction with automatic selector generation\n    result = await extractor.extract_products(\n        \"https://www.amazon.com/s?k=laptop\"\n    )\n    \n    print(f\"AI extracted {len(result.products)} products\")\n    print(f\"Confidence: {result.confidence}%\")\n    print(f\"Cost: ${result.cost_usd:.4f}\")\n    \n    await extractor.cleanup()\n```\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nCreate a `config.env` file:\n\n```bash\n# API Keys\nUNREALON_OPENROUTER_API_KEY=sk-or-v1-your-openrouter-key\n\n# Browser Settings\nUNREALON_BROWSER_HEADLESS=true\nUNREALON_BROWSER_TIMEOUT=30\n\n# Runtime Limits\nUNREALON_MAX_PAGES=5\nUNREALON_LLM_DAILY_LIMIT=10.0\n\n# Logging\nUNREALON_LOG_LEVEL=INFO\n```\n\n### Custom Configuration\n\n```python\nfrom amazon_config import amazon_config, parser_instance_config\n\nclass CustomAmazonParser(AmazonCatalogParser):\n    def __init__(self):\n        super().__init__(\n            parser_id=\"my_amazon_parser\",\n            parser_name=\"Custom Amazon Parser\",\n            config=amazon_config\n        )\n    \n    async def parse_specific_category(self, category: str):\n        \"\"\"Parse specific Amazon category.\"\"\"\n        return await self.search_products(category, max_pages=3)\n```\n\n## 📈 Performance \u0026 Scalability\n\n### Benchmarks\n\n| Metric | Traditional Scrapers | Amazon Parser |\n|--------|---------------------|---------------|\n| **Success Rate** | 60-80% | 95%+ |\n| **Detection Rate** | High | \u003c5% |\n| **Data Accuracy** | 70-85% | 95%+ |\n| **Setup Time** | Days/Weeks | Minutes |\n| **Maintenance** | High | Minimal |\n\n### Scalability Features\n\n- **Horizontal Scaling**: Add parser instances without code changes\n- **Load Balancing**: Automatic distribution of parsing tasks\n- **Rate Limiting**: Built-in Amazon-friendly request pacing\n- **Proxy Rotation**: Automatic IP rotation and management\n- **Session Management**: Persistent browser profiles\n\n## 🔒 Security \u0026 Compliance\n\n### Anti-Detection Measures\n\n- **Stealth Browser**: Chromium with anti-detection plugins\n- **Human-like Behavior**: Realistic mouse movements and timing\n- **Header Rotation**: Dynamic User-Agent and header management\n- **Cookie Handling**: Proper session management and persistence\n- **Proxy Support**: Built-in proxy rotation and management\n\n### Data Privacy\n\n- **Local Processing**: Data processed on your infrastructure\n- **No Data Storage**: No data stored on external servers\n- **API Key Security**: Secure handling of API credentials\n- **Audit Logging**: Comprehensive audit trails\n\n## 🛠️ Development \u0026 Testing\n\n### Testing Framework\n\n```python\nimport pytest\nfrom catalog.parser import AmazonCatalogParser\n\n@pytest.mark.asyncio\nasync def test_product_extraction():\n    parser = AmazonCatalogParser()\n    await parser.setup()\n    \n    result = await parser.search_products(\"laptop\", max_pages=1)\n    \n    assert result.success\n    assert len(result.data.products) \u003e 0\n    assert all(p.asin for p in result.data.products)\n    \n    await parser.cleanup()\n```\n\n### CLI Interface\n\n```bash\n# Test mode\npoetry run python -m catalog.parser\n\n# Scheduled mode\npoetry run python -m catalog.parser --schedule \"every 1h\"\n\n# Daemon mode\npoetry run python -m catalog.parser --daemon\n```\n\n## 📊 Use Cases \u0026 Applications\n\n### 🏢 **E-commerce Intelligence**\n- **Price Monitoring**: Track competitor pricing strategies\n- **Product Research**: Analyze market trends and product performance\n- **Inventory Tracking**: Monitor stock levels and availability\n- **Competitive Analysis**: Benchmark against competitors\n\n### 📈 **Market Research**\n- **Trend Analysis**: Identify emerging product categories\n- **Demand Forecasting**: Analyze search volume and interest\n- **Customer Insights**: Study review patterns and sentiment\n- **Market Positioning**: Understand product positioning\n\n### 💰 **Investment \u0026 Finance**\n- **Company Analysis**: Monitor Amazon's business performance\n- **Market Sentiment**: Analyze customer satisfaction trends\n- **Supply Chain**: Track product availability and shipping\n- **Economic Indicators**: Monitor consumer spending patterns\n\n### 🎓 **Academic Research**\n- **Consumer Behavior**: Study purchasing patterns and preferences\n- **Market Dynamics**: Analyze competition and pricing strategies\n- **Data Science**: Large-scale e-commerce data collection\n- **Social Research**: Study online shopping behavior\n\n## 🔧 Troubleshooting\n\n### Common Issues\n\n**Q: Parser gets blocked by Amazon**\nA: Enable stealth mode and use proxy rotation. The parser includes built-in anti-detection measures.\n\n**Q: Missing product data**\nA: Check if selectors need updating. The AI-powered extraction automatically adapts to layout changes.\n\n**Q: High API costs**\nA: Use traditional BeautifulSoup parsing for cost-free extraction, or optimize LLM usage with caching.\n\n**Q: Slow performance**\nA: Adjust browser settings, enable headless mode, and optimize page limits.\n\n### Debug Mode\n\n```python\n# Enable debug logging\nimport logging\nlogging.basicConfig(level=logging.DEBUG)\n\n# Enable browser debugging\nparser = AmazonCatalogParser()\nparser.config.browser_config.debug_mode = True\n```\n\n## 📚 API Reference\n\n### Core Classes\n\n#### `AmazonCatalogParser`\n\nMain parser class for Amazon product extraction.\n\n```python\nclass AmazonCatalogParser(Parser):\n    async def search_products(query: str, max_pages: int = 2) -\u003e AmazonExtractionResult\n    async def get_product_details(asin: str) -\u003e AmazonExtractionResult\n    async def parse() -\u003e Dict[str, Any]\n```\n\n#### `AmazonProduct`\n\nPydantic model for product data.\n\n```python\nclass AmazonProduct(BaseModel):\n    asin: str\n    title: str\n    price: Optional[AmazonPrice]\n    rating: Optional[AmazonRating]\n    availability: Optional[str]\n    images: List[AmazonImage]\n    url: str\n    prime_eligible: bool\n```\n\n### Data Models\n\n#### `AmazonPrice`\n```python\nclass AmazonPrice(BaseModel):\n    current: Optional[float]\n    original: Optional[float]\n    currency: str = \"USD\"\n    discount_percent: Optional[int]\n```\n\n#### `AmazonRating`\n```python\nclass AmazonRating(BaseModel):\n    rating: Optional[float]\n    review_count: Optional[int]\n    rating_text: Optional[str]\n```\n\n---\n\n**Amazon Parser** - Professional Amazon data extraction powered by [UnrealOn](https://pypi.org/project/unrealon/).\n\n*Built for developers, by developers. No vendor lock-in, predictable pricing, enterprise-grade reliability.*\n\n**For enterprise features, managed hosting, and professional support, visit [unrealon.com](https://unrealon.com/).**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkolofsen%2Funrealon-parser-amazon","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarkolofsen%2Funrealon-parser-amazon","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkolofsen%2Funrealon-parser-amazon/lists"}