https://github.com/yash22222/bmc-product-reviews-web-scrapping-sentiment-analysis
https://github.com/yash22222/bmc-product-reviews-web-scrapping-sentiment-analysis
gssoc2025
Last synced: 28 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/yash22222/bmc-product-reviews-web-scrapping-sentiment-analysis
- Owner: Yash22222
- License: mit
- Created: 2025-07-05T08:37:39.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-07-26T15:55:19.000Z (2 months ago)
- Last Synced: 2025-07-26T20:56:09.312Z (2 months ago)
- Topics: gssoc2025
- Homepage:
- Size: 21.5 KB
- Stars: 4
- Watchers: 0
- Forks: 8
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BMC Product Review Scrapping & Sentiment Analysis
This project performs **Web Scrapping** & **Sentiment Analysis** on verified Gartner reviews of popular **BMC Software Products**, using **Python NLP Techniques** and **Data Visualization**.
BMC Product Review Scrapping & Sentiment Analysis is an open source project designed for performing sentiment analysis on customer reviews of BMC Software products scraped from public platforms like Gartner. It leverages Natural Language Processing (NLP) techniques and visualization tools to extract actionable insights from product reviews.
This project is perfect for beginners and intermediate contributors who want hands-on experience with web scraping, NLP, data visualization, and open source collaboration.
It includes:
- Web scraping from [Gartner Peer Insights](https://www.gartner.com/reviews)
- Preprocessing text with NLP
- VADER-based sentiment scoring
- Charts, word clouds, and Excel exports## ๐ Products Covered
We scrape verified reviews from the following Gartner pages:
| Product Name | Review Page |
|--------------|-------------|
| ๐ง BMC Helix ITSM | [Link](https://www.gartner.com/reviews/market/software-asset-management-tools/vendor/bmc/product/bmc-helix-itsm/reviews) |
| ๐ BMC Helix Operations Management | [Link](https://www.gartner.com/reviews/market/aiops-platforms/vendor/bmc/product/bmc-helix-operations-management-with-aiops/reviews) |
| โ๏ธ TrueSight Server Automation | [Link](https://www.gartner.com/reviews/market/integrated-systems/vendor/bmc/product/bmc-truesight-automation-for-servers/reviews) |
| ๐ Control-M | [Link](https://www.gartner.com/reviews/market/service-orchestration-and-automation-platforms/vendor/bmc/product/bmc-control-m/reviews) |---
## ๐ Output Format
Your final analysis should look like this (in Excel or CSV):
| Product Name | Review Title | Overall Rating | Industry | Function | Date | Other Vendors | Country | Pros | Cons | Overall Comment | Sentiment |
|--------------|--------------|----------------|----------|----------|------|----------------|---------|------|------|------------------|-----------|Visuals like pie charts and word clouds should be stored in the `outputs/` folder.
---
## ๐ฆ Example Directory Structure
```bash
BMC-Product-Review-Scrapping-and-Sentiment-Analysis/
โ
โโโ ๐ data/ # Sample scraped data files (Excel/CSV)
โโโ ๐ notebooks/ # Jupyter notebooks for quick experimentation
โโโ ๐ scripts/
โ โโโ scraper.py # Scraper module
โ โโโ nlp_preprocessing.py # Text cleaning + POS + lemmatization
โ โโโ sentiment.py # VADER-based sentiment scoring
โ โโโ visualize.py # Wordclouds, pie charts, bar graphs
โ
โโโ ๐ outputs/ # Saved images, processed files
โ
โโโ requirements.txt # Install dependencies
โโโ README.md # Project overview
โโโ CONTRIBUTING.md # Contribution guidelines
โโโ LICENSE # Open-source license
โโโ .gitignore
```---
### ๐ง IMP Features
1. Robust product review scraper for BMC products
2. Clean text with:-
Tokenization
Lemmatization
POS Tagging
Stopword Removal
3. Sentiment classification using VADER
4. Generate sentiment reports and dashboards
5. Modularized structure for easy expansion and contributions
6. Export analysis to Excel and visual graphs---
## ๐ Tech Stack
- **Python 3.x**
- **Selenium / Playwright** (for scraping)
- **NLTK, VADER** (for sentiment)
- **Pandas, Matplotlib, WordCloud**
- **Excel output (xlsxwriter/openpyxl)**
- **Any**
---## ๐ ๏ธ Getting Started
### ๐ง Installation
```bash
git clone https://github.com/Yash22222/BMC-Product-Review-Scrapping-and-Sentiment-Analysis.git
cd BMC-Product-Review-Scrapping-and-Sentiment-Analysis
pip install -r requirements.txt
````### ๐ Run Sentiment Analysis
1. Scrape reviews using the `scraper.py` script.
2. Clean and preprocess with `nlp_preprocessing.py`.
3. Analyze sentiment using `sentiment.py`.
4. Visualize using `visualize.py`.---
## ๐ค How to Contribute (for GSSoC'25)
We welcome contributions from **GSSoC contributors and all open source enthusiasts**!
### ๐ Steps to Contribute
1. **Fork** the repository
2. **Clone** your fork```bash
git clone https://github.com/YOUR_USERNAME/BMC-Product-Review-Scrapping-and-Sentiment-Analysis.git
```
3. Commit your changes```bash
git commit -m "โจ Added sentiment model for XYZ"
```
4. Push to your fork```bash
git push origin feature/your-feature-name
```
6. Open a **Pull Request** with a clear explanation.## ๐ง Contribution Ideas
| Type | Ideas |
| ----------------------------- | --------------------------------------- |
| ๐ Add new BMC products | Expand the scraper |
| ๐จ Streamlit UI | Upload reviews & analyze sentiment |
| ๐งพ PDF/Excel report generator | Auto reports for each product |
| ๐ค Add BERT | Use HuggingFace transformer models |
| ๐ Multi-language support | Translate & analyze non-English reviews |
| ๐ Docker Support | Add Dockerfile for easy setup |---
## ๐ License
This project is licensed under the **MIT License**.
---
## ๐ Credits
* Proudly open for contributions under GSSoC 2025
```