https://github.com/emmanuel10701/data_scraping
Data-Scraping
https://github.com/emmanuel10701/data_scraping
beautifulsoup csv excel numpy pandas python web-scapping
Last synced: about 1 month ago
JSON representation
Data-Scraping
- Host: GitHub
- URL: https://github.com/emmanuel10701/data_scraping
- Owner: Emmanuel10701
- Created: 2025-01-15T09:17:02.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-03-19T19:22:15.000Z (7 months ago)
- Last Synced: 2025-09-23T07:42:08.357Z (about 1 month ago)
- Topics: beautifulsoup, csv, excel, numpy, pandas, python, web-scapping
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# E-commerce Reviews Scraping
This Python project scrapes customer reviews from an e-commerce website (or a local HTML file) and saves the extracted data into both **CSV** and **Excel** formats. It uses libraries like **BeautifulSoup**, **Pandas**, **Requests**, and **openpyxl** to achieve this.
## Requirements
Before using this project, ensure that **Python** is installed on your machine, and the necessary libraries are set up:
1. **Python Installation**:
- Make sure you have Python installed on your system. You can download it from [python.org](https://www.python.org/downloads/).
- After installation, verify by running the following command in your terminal:
```bash
python --version
```
or
```bash
python3 --version
```
This should print the Python version (e.g., `Python 3.x.x`).
2. **Library Installation**:
The following Python libraries are required to run this project:
- **pandas**: For handling and saving the scraped data.
- **beautifulsoup4**: For parsing and extracting data from HTML.
- **requests**: For sending HTTP requests to scrape data from a live URL.
- **openpyxl**: For saving data to an Excel file.
To install the required libraries, open your terminal and run the following command:
```bash
pip install pandas beautifulsoup4 requests openpyxl
```
## Usage
1. **Scraping from a Live Website**:
- Update the `url` variable in the script with the target e-commerce website URL.
- Run the script, and it will fetch and parse the reviews.
2. **Scraping from a Local HTML File**:
- Save the e-commerce page as an HTML file.
- Update the script to read from the local file instead of making an HTTP request.
3. **Saving the Data**:
- The script extracts key information such as review text, rating, author, and date.
- The data is then saved into both `reviews.csv` and `reviews.xlsx`.
## Example Output
A sample of the extracted data:
| Author | Rating | Review | Date |
|--------|--------|--------|------|
| JohnDoe | 5 | "Great product!" | 2024-03-10 |
| JaneSmith | 4 | "Good value for money." | 2024-03-11 |
## Notes
- Ensure compliance with the website's **robots.txt** and terms of service before scraping.
- If the website uses JavaScript to load reviews dynamically, consider using **Selenium** or **Scrapy** for advanced scraping techniques.
## Future Enhancements
- Implement multi-threading for faster scraping.
- Support for additional data formats (JSON, SQLite database).
- Integration with sentiment analysis for review insights.