{"id":25733723,"url":"https://github.com/thepravin/amazon-web-scripting","last_synced_at":"2026-05-16T17:35:46.706Z","repository":{"id":279401763,"uuid":"938689163","full_name":"thepravin/amazon-web-scripting","owner":"thepravin","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-25T10:57:48.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-22T05:20:24.886Z","etag":null,"topics":["amazon","jupyter-notebook","python","web","webscraper","webscraping","webscraping-data"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thepravin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-02-25T10:51:06.000Z","updated_at":"2025-02-25T11:00:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"2b2a9593-e8fb-4e9d-875d-e17c1f82dcb2","html_url":"https://github.com/thepravin/amazon-web-scripting","commit_stats":null,"previous_names":["thepravin/amazon-web-scripting"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thepravin/amazon-web-scripting","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepravin%2Famazon-web-scripting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepravin%2Famazon-web-scripting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepravin%2Famazon-web-scripting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepravin%2Famazon-web-scripting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thepravin","download_url":"https://codeload.github.com/thepravin/amazon-web-scripting/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thepravin%2Famazon-web-scripting/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33111962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T04:41:52.686Z","status":"ssl_error","status_checked_at":"2026-05-16T04:41:52.009Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon","jupyter-notebook","python","web","webscraper","webscraping","webscraping-data"],"created_at":"2025-02-26T04:22:32.517Z","updated_at":"2026-05-16T17:35:46.701Z","avatar_url":"https://github.com/thepravin.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Amazon Product Scraper for Samsung Products\n\nThis Jupyter Notebook scrapes product information from Amazon's search results for \"Samsung\" and extracts details such as product titles, prices, and ratings. The data is then cleaned and saved to a CSV file for further analysis.\n\n## Features\n- **Web Scraping**: Extracts product details from Amazon search results.\n- **Data Cleaning**: Filters out invalid entries (e.g., \"Page 1 of 1\" prices).\n- **CSV Export**: Saves the cleaned dataset to `amaon_data.csv` (note the typo in the filename).\n\n## Requirements\n- Python 3.x\n- Libraries: `beautifulsoup4`, `requests`, `pandas`, `numpy`\n\n## Installation\n1. Install the required libraries:\n   ```bash\n   pip install beautifulsoup4 requests pandas numpy\n\n\n## Usage\n1. Run the Jupyter Notebook `Main.ipynb`.\n2. The script will:\n   - Send a request to Amazon's search page for \"Samsung\".\n   - Extract product links from the search results.\n   - Scrape title, price, and rating from each product page.\n   - Clean the data by removing entries with invalid prices.\n   - Save the results to `amaon_data.csv`.\n\n## Data Output\nThe final dataset includes the following columns:\n- `title`: Product name.\n- `price`: Product price (formatted as a string, e.g., `$499.99`).\n- `rating`: Product rating (out of 5, extracted as a string like `4.5`).\n\nExample output:\n| title                                                | price     | rating |\n|------------------------------------------------------|-----------|--------|\n| SAMSUNG Galaxy S24 FE AI Phone, 128GB Unlocked...    | $499.99   | 4.5    |\n| SAMSUNG Galaxy Buds 3 Pro AI True Wireless Blu...    | $249.99   | 4.1    |\n\n## Notes\n- **Selectors**: The script uses specific HTML class/ID selectors (e.g., `productTitle`, `a-offscreen`). These may change over time, requiring updates to the code.\n- **User-Agent**: A valid `USER_AGENT` header is included to mimic a real browser request.\n\n\n\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e🧑‍💻 Happy coding!\u003c/h1\u003e\n\u003c/div\u003e\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthepravin%2Famazon-web-scripting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthepravin%2Famazon-web-scripting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthepravin%2Famazon-web-scripting/lists"}