https://github.com/subh888999/stackoverflow-tag-predtiction
A machine learning-powered Streamlit app that predicts relevant Stack Overflow tags based on question content, using NLP and multi-label classification for accurate and real-time tag suggestions.
https://github.com/subh888999/stackoverflow-tag-predtiction
machine-learning matplotlib multilabel-classification nlp nltk pandas python sns stackoverflow-api statistics webscraping
Last synced: 21 days ago
JSON representation
A machine learning-powered Streamlit app that predicts relevant Stack Overflow tags based on question content, using NLP and multi-label classification for accurate and real-time tag suggestions.
- Host: GitHub
- URL: https://github.com/subh888999/stackoverflow-tag-predtiction
- Owner: subh888999
- License: mit
- Created: 2025-07-10T12:33:13.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-07-10T13:03:53.000Z (11 months ago)
- Last Synced: 2025-07-10T19:55:15.516Z (11 months ago)
- Topics: machine-learning, matplotlib, multilabel-classification, nlp, nltk, pandas, python, sns, stackoverflow-api, statistics, webscraping
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/spaces/Subh777/stackoverflow_tag_prediction
- Size: 8.87 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🧠 Stack Overflow Tag Predictor
An AI-powered web app that **automatically predicts relevant tags** for Stack Overflow questions using **Machine Learning** and **Natural Language Processing**.
---
## 📌 Business Problem
Stack Overflow hosts millions of developer questions, but many are tagged incorrectly or inconsistently.
Tags play a vital role in content organization, searchability, and directing questions to the right experts.
However, **manual tagging is error-prone and time-consuming**, affecting content discoverability and user experience.
---
## 🎯 Project Goal
To build a smart, automated system that predicts relevant tags based on question content.
The system aims to enhance **accuracy**, **speed**, and **consistency** in tag assignment using ML/NLP techniques.
---
## ✅ Objectives
- Predict **multiple relevant tags** from a question's text.
- Preprocess noisy HTML/code using **NLP techniques**.
- Use **TF-IDF + Logistic Regression** for efficient multi-label classification.
- Support real-time predictions via a **Streamlit web interface**.
- Ensure the solution is lightweight and deployment-ready.
---
## 📊 Data Understanding
| Feature | Description | Importance |
|--------|-------------|------------|
| `Body` | Main content of the question (may include code, text, HTML). | Primary input for prediction. |
| `Tags` | List of correct tags for the question. | Supervised multi-label target. |
---
## ⚙️ Model Pipeline
- **Text Cleaning**: Remove HTML tags, non-alphabetic characters, lowercase conversion
- **Tokenization & Lemmatization**: Normalize words using NLTK
- **TF-IDF Vectorization**: Convert processed text into feature vectors
- **Multi-Label Classification**: One-vs-Rest strategy using Logistic Regression
- **Evaluation**: Micro-averaged F1 Score
---
## 🖥️ Tech Stack
- **Programming**: Python
- **Libraries**: Pandas, Scikit-learn, NLTK, BeautifulSoup
- **Modeling**: TF-IDF, Logistic Regression
- **UI**: Streamlit
- **Model Persistence**: Joblib
- **Deployment**: Hugging Face Spaces
---
## 🌟 Output
- **Predicted Tags**: e.g., `['python', 'pandas', 'dataframe']`
- **Real-Time Prediction**: Users can input a question and receive instant tag predictions
- **Lightweight App**: Fast and suitable for public demos or small-scale production
---
## 🚀 Deployment
The app is deployed on **Hugging Face Spaces** for live demo and usage.
> 🔗 [Live Demo Link](#) *(https://huggingface.co/spaces/Subh777/stackoverflow_tag_prediction)*
---
## 📝 License
This project is licensed under the [MIT License](LICENSE).