{"id":24316051,"url":"https://github.com/venkat-0706/amazon-webscraper","last_synced_at":"2025-06-26T02:33:51.890Z","repository":{"id":269193485,"uuid":"906688831","full_name":"venkat-0706/Amazon-WebScraper","owner":"venkat-0706","description":" An Amazon web scraper extracts product data like prices, reviews, and ratings using tools like BeautifulSoup or Scrapy, aiding in market research while adhering to ethical and legal guidelines.","archived":false,"fork":false,"pushed_at":"2024-12-21T16:14:37.000Z","size":8,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T22:11:13.053Z","etag":null,"topics":["api-and-data-parsing","automation","beautifulsoup","data-extraction","ethical-scraping","python-programming","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/venkat-0706.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-21T16:05:48.000Z","updated_at":"2025-03-08T06:54:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"c084a89e-0f2c-4a6a-a810-4b1e03315501","html_url":"https://github.com/venkat-0706/Amazon-WebScraper","commit_stats":null,"previous_names":["venkat-0706/python-task","venkat-0706/amazon-webscraper"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/venkat-0706%2FAmazon-WebScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/venkat-0706%2FAmazon-WebScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/venkat-0706%2FAmazon-WebScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/venkat-0706%2FAmazon-WebScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/venkat-0706","download_url":"https://codeload.github.com/venkat-0706/Amazon-WebScraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248667400,"owners_count":21142437,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-and-data-parsing","automation","beautifulsoup","data-extraction","ethical-scraping","python-programming","webscraping"],"created_at":"2025-01-17T12:17:51.627Z","updated_at":"2025-04-13T05:26:22.842Z","avatar_url":"https://github.com/venkat-0706.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Amazon Best Sellers Web Scraper\n\n## Overview\nThis Python script uses Selenium to scrape product information from Amazon's Best Sellers section. It focuses on products offering discounts greater than 50% in 10 different categories and saves the data into structured formats (CSV or JSON). The script automates login using valid Amazon credentials and extracts key product details from each category.\n\n---\n\n## Features\n- **Authentication:** Logs in to Amazon using provided credentials.\n- **Data Collection:** Scrapes details of up to 1500 best-selling products from each category.\n  - Product Name\n  - Product Price\n  - Sale Discount\n  - Best Seller Rating\n  - Ship From\n  - Sold By\n  - Rating\n  - Product Description\n  - Number Bought in the Past Month (if available)\n  - Category Name\n  - All Available Images\n- **Error Handling:** Robust handling of missing elements, timeouts, and page load issues.\n- **Data Storage:** Saves scraped data into a CSV or JSON file for analysis.\n\n---\n\n## Prerequisites\n1. **Python:** Install Python 3.7 or later.\n2. **Libraries:**\n   - Selenium: Install using `pip install selenium`.\n3. **WebDriver:**\n   - Download the appropriate WebDriver (e.g., [ChromeDriver](https://chromedriver.chromium.org/downloads)) and ensure it's in your system PATH.\n4. **Amazon Account:** Provide valid Amazon credentials for authentication.\n\n---\n\n## Setup Instructions\n\n1. **Clone the Repository:**\n   ```bash\n   git clone \u003crepository_url\u003e\n   cd amazon-scraper\n   ```\n\n2. **Install Dependencies:**\n   ```bash\n   pip install selenium\n   ```\n\n3. **Download WebDriver:**\n   - Download ChromeDriver from [here](https://chromedriver.chromium.org/downloads).\n   - Place it in your system PATH or the script directory.\n\n4. **Update Credentials:**\n   - Replace `your_email@example.com` and `your_password` in the script with your Amazon login credentials.\n\n5. **Run the Script:**\n   ```bash\n   python amazon_scraper.py\n   ```\n\n---\n\n## How It Works\n\n1. **Authentication:**\n   - The script navigates to the Amazon login page and authenticates using the provided email and password.\n\n2. **Category Navigation:**\n   - Visits the URLs of the 10 specified Best Seller categories.\n\n3. **Data Extraction:**\n   - Collects product details, including the name, price, rating, and more.\n   - Skips products with missing or inaccessible data.\n\n4. **Data Storage:**\n   - Saves the scraped data as `amazon_best_sellers.csv` or `amazon_best_sellers.json` in the script's directory.\n\n---\n\n## Output Format\n- **CSV File:**\n  - Columns include `Name`, `Price`, `Discount`, `Rating`, `Ship From`, `Sold By`, etc.\n- **JSON File:**\n  - Structured JSON with the same details.\n\n---\n\n## Example URLs\n- **Best Seller Section:**\n  - [Best Sellers](https://www.amazon.in/gp/bestsellers/?ref_=nav_em_cs_bestsellers_0_1_1_2)\n- **Sample Categories:**\n  - [Kitchen](https://www.amazon.in/gp/bestsellers/kitchen/ref=zg_bs_nav_kitchen_0)\n  - [Shoes](https://www.amazon.in/gp/bestsellers/shoes/ref=zg_bs_nav_shoes_0)\n  - [Computers](https://www.amazon.in/gp/bestsellers/computers/ref=zg_bs_nav_computers_0)\n  - [Electronics](https://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_nav_electronics_0)\n\n---\n\n## Notes\n- Scraping Amazon may violate their [Terms of Service](https://www.amazon.in/gp/help/customer/display.html). Ensure you comply with their policies.\n- If the page structure changes, you may need to update the script's XPath or CSS selectors.\n\n---\n\n## Troubleshooting\n1. **Login Issues:**\n   - Ensure your credentials are correct.\n   - Check for CAPTCHA prompts during login.\n\n2. **Missing WebDriver:**\n   - Verify that ChromeDriver is installed and in your PATH.\n\n3. **Slow Page Load:**\n   - Increase wait times using Selenium's `WebDriverWait`.\n\n4. **Blocked Requests:**\n   - Reduce the scraping speed to avoid being flagged by Amazon.\n\n---\n\n## License\nThis project is for educational purposes only. Use responsibly and adhere to Amazon's terms of service.\n\n---\n\n## Contact\nFor questions or suggestions, reach out to: `chanduabbireddy247@gmail.com`. \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvenkat-0706%2Famazon-webscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvenkat-0706%2Famazon-webscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvenkat-0706%2Famazon-webscraper/lists"}