{"id":29855025,"url":"https://github.com/subh888999/stackoverflow-tag-predtiction","last_synced_at":"2026-05-08T04:02:58.841Z","repository":{"id":303976930,"uuid":"1017369251","full_name":"subh888999/Stackoverflow-tag-predtiction","owner":"subh888999","description":"A machine learning-powered Streamlit app that predicts relevant Stack Overflow tags based on question content, using NLP and multi-label classification for accurate and real-time tag suggestions.","archived":false,"fork":false,"pushed_at":"2025-07-10T13:03:53.000Z","size":9300,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-10T19:55:15.516Z","etag":null,"topics":["machine-learning","matplotlib","multilabel-classification","nlp","nltk","pandas","python","sns","stackoverflow-api","statistics","webscraping"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/Subh777/stackoverflow_tag_prediction","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/subh888999.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-10T12:33:13.000Z","updated_at":"2025-07-10T13:09:30.000Z","dependencies_parsed_at":"2025-07-10T20:05:41.375Z","dependency_job_id":null,"html_url":"https://github.com/subh888999/Stackoverflow-tag-predtiction","commit_stats":null,"previous_names":["subh888999/stackoverflow-tag-predtiction"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/subh888999/Stackoverflow-tag-predtiction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subh888999%2FStackoverflow-tag-predtiction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subh888999%2FStackoverflow-tag-predtiction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subh888999%2FStackoverflow-tag-predtiction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subh888999%2FStackoverflow-tag-predtiction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/subh888999","download_url":"https://codeload.github.com/subh888999/Stackoverflow-tag-predtiction/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subh888999%2FStackoverflow-tag-predtiction/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267771974,"owners_count":24142083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","matplotlib","multilabel-classification","nlp","nltk","pandas","python","sns","stackoverflow-api","statistics","webscraping"],"created_at":"2025-07-29T22:21:56.175Z","updated_at":"2026-05-08T04:02:53.795Z","avatar_url":"https://github.com/subh888999.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧠 Stack Overflow Tag Predictor\n\nAn AI-powered web app that **automatically predicts relevant tags** for Stack Overflow questions using **Machine Learning** and **Natural Language Processing**.\n\n---\n\n## 📌 Business Problem\n\nStack Overflow hosts millions of developer questions, but many are tagged incorrectly or inconsistently.  \nTags play a vital role in content organization, searchability, and directing questions to the right experts.  \nHowever, **manual tagging is error-prone and time-consuming**, affecting content discoverability and user experience.\n\n---\n\n## 🎯 Project Goal\n\nTo build a smart, automated system that predicts relevant tags based on question content.  \nThe system aims to enhance **accuracy**, **speed**, and **consistency** in tag assignment using ML/NLP techniques.\n\n---\n\n## ✅ Objectives\n\n- Predict **multiple relevant tags** from a question's text.\n- Preprocess noisy HTML/code using **NLP techniques**.\n- Use **TF-IDF + Logistic Regression** for efficient multi-label classification.\n- Support real-time predictions via a **Streamlit web interface**.\n- Ensure the solution is lightweight and deployment-ready.\n\n---\n\n## 📊 Data Understanding\n\n| Feature | Description | Importance |\n|--------|-------------|------------|\n| `Body` | Main content of the question (may include code, text, HTML). | Primary input for prediction. |\n| `Tags` | List of correct tags for the question. | Supervised multi-label target. |\n\n---\n\n## ⚙️ Model Pipeline\n\n- **Text Cleaning**: Remove HTML tags, non-alphabetic characters, lowercase conversion  \n- **Tokenization \u0026 Lemmatization**: Normalize words using NLTK  \n- **TF-IDF Vectorization**: Convert processed text into feature vectors  \n- **Multi-Label Classification**: One-vs-Rest strategy using Logistic Regression  \n- **Evaluation**: Micro-averaged F1 Score\n\n---\n\n## 🖥️ Tech Stack\n\n- **Programming**: Python  \n- **Libraries**: Pandas, Scikit-learn, NLTK, BeautifulSoup  \n- **Modeling**: TF-IDF, Logistic Regression  \n- **UI**: Streamlit  \n- **Model Persistence**: Joblib  \n- **Deployment**: Hugging Face Spaces\n\n---\n\n## 🌟 Output\n\n- **Predicted Tags**: e.g., `['python', 'pandas', 'dataframe']`  \n- **Real-Time Prediction**: Users can input a question and receive instant tag predictions  \n- **Lightweight App**: Fast and suitable for public demos or small-scale production\n\n---\n\n## 🚀 Deployment\n\nThe app is deployed on **Hugging Face Spaces** for live demo and usage.\n\n\u003e  🔗 [Live Demo Link](#) *(https://huggingface.co/spaces/Subh777/stackoverflow_tag_prediction)*\n\n---\n## 📝 License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubh888999%2Fstackoverflow-tag-predtiction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsubh888999%2Fstackoverflow-tag-predtiction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubh888999%2Fstackoverflow-tag-predtiction/lists"}