{"id":24864755,"url":"https://github.com/emmanuel10701/data_scraping","last_synced_at":"2025-09-23T07:42:08.521Z","repository":{"id":272574037,"uuid":"917056517","full_name":"Emmanuel10701/Data_Scraping","owner":"Emmanuel10701","description":"Data-Scraping","archived":false,"fork":false,"pushed_at":"2025-03-19T19:22:15.000Z","size":11,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-23T07:42:08.357Z","etag":null,"topics":["beautifulsoup","csv","excel","numpy","pandas","python","web-scapping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Emmanuel10701.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-15T09:17:02.000Z","updated_at":"2025-05-16T12:54:32.000Z","dependencies_parsed_at":"2025-01-15T11:23:25.939Z","dependency_job_id":"f639c93a-5cf1-4e1c-ad08-f6c5e5973d40","html_url":"https://github.com/Emmanuel10701/Data_Scraping","commit_stats":null,"previous_names":["emmanuel10701/data_scraping"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Emmanuel10701/Data_Scraping","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Emmanuel10701%2FData_Scraping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Emmanuel10701%2FData_Scraping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Emmanuel10701%2FData_Scraping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Emmanuel10701%2FData_Scraping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Emmanuel10701","download_url":"https://codeload.github.com/Emmanuel10701/Data_Scraping/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Emmanuel10701%2FData_Scraping/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276538271,"owners_count":25659932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-23T02:00:09.130Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","csv","excel","numpy","pandas","python","web-scapping"],"created_at":"2025-01-31T23:55:34.729Z","updated_at":"2025-09-23T07:42:08.512Z","avatar_url":"https://github.com/Emmanuel10701.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# E-commerce Reviews Scraping\n\nThis Python project scrapes customer reviews from an e-commerce website (or a local HTML file) and saves the extracted data into both **CSV** and **Excel** formats. It uses libraries like **BeautifulSoup**, **Pandas**, **Requests**, and **openpyxl** to achieve this.\n\n## Requirements\n\nBefore using this project, ensure that **Python** is installed on your machine, and the necessary libraries are set up:\n\n1. **Python Installation**:\n   - Make sure you have Python installed on your system. You can download it from [python.org](https://www.python.org/downloads/).\n   - After installation, verify by running the following command in your terminal:\n     ```bash\n     python --version\n     ```\n     or\n     ```bash\n     python3 --version\n     ```\n     This should print the Python version (e.g., `Python 3.x.x`).\n\n2. **Library Installation**:\n   The following Python libraries are required to run this project:\n   - **pandas**: For handling and saving the scraped data.\n   - **beautifulsoup4**: For parsing and extracting data from HTML.\n   - **requests**: For sending HTTP requests to scrape data from a live URL.\n   - **openpyxl**: For saving data to an Excel file.\n\n   To install the required libraries, open your terminal and run the following command:\n   \n   ```bash\n   pip install pandas beautifulsoup4 requests openpyxl\n   ```\n\n## Usage\n\n1. **Scraping from a Live Website**:\n   - Update the `url` variable in the script with the target e-commerce website URL.\n   - Run the script, and it will fetch and parse the reviews.\n\n2. **Scraping from a Local HTML File**:\n   - Save the e-commerce page as an HTML file.\n   - Update the script to read from the local file instead of making an HTTP request.\n\n3. **Saving the Data**:\n   - The script extracts key information such as review text, rating, author, and date.\n   - The data is then saved into both `reviews.csv` and `reviews.xlsx`.\n\n## Example Output\n\nA sample of the extracted data:\n\n| Author | Rating | Review | Date |\n|--------|--------|--------|------|\n| JohnDoe | 5 | \"Great product!\" | 2024-03-10 |\n| JaneSmith | 4 | \"Good value for money.\" | 2024-03-11 |\n\n## Notes\n- Ensure compliance with the website's **robots.txt** and terms of service before scraping.\n- If the website uses JavaScript to load reviews dynamically, consider using **Selenium** or **Scrapy** for advanced scraping techniques.\n\n## Future Enhancements\n- Implement multi-threading for faster scraping.\n- Support for additional data formats (JSON, SQLite database).\n- Integration with sentiment analysis for review insights.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmanuel10701%2Fdata_scraping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femmanuel10701%2Fdata_scraping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femmanuel10701%2Fdata_scraping/lists"}