{"id":25609844,"url":"https://github.com/saksham-jain177/text_classification","last_synced_at":"2026-05-14T22:46:44.208Z","repository":{"id":276996985,"uuid":"930994818","full_name":"saksham-jain177/text_classification","owner":"saksham-jain177","description":"ML pipeline for classifying IMDb reviews as positive or negative using TF-IDF and Logistic Regression. Features an interactive Streamlit UI with caching for efficient predictions.","archived":false,"fork":false,"pushed_at":"2025-02-11T14:56:48.000Z","size":7,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-18T04:06:53.311Z","etag":null,"topics":["machine-learning","nlp","sentiment-analysis","streamlit","text-classification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saksham-jain177.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-11T14:54:24.000Z","updated_at":"2025-02-11T15:53:54.000Z","dependencies_parsed_at":"2025-02-11T15:42:49.999Z","dependency_job_id":"a94769b0-0902-44c1-af30-a07412c9dbf7","html_url":"https://github.com/saksham-jain177/text_classification","commit_stats":null,"previous_names":["saksham-jain177/text_classification"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/saksham-jain177/text_classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saksham-jain177%2Ftext_classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saksham-jain177%2Ftext_classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saksham-jain177%2Ftext_classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saksham-jain177%2Ftext_classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saksham-jain177","download_url":"https://codeload.github.com/saksham-jain177/text_classification/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saksham-jain177%2Ftext_classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280878346,"owners_count":26406641,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-24T02:00:06.418Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","nlp","sentiment-analysis","streamlit","text-classification"],"created_at":"2025-02-21T21:58:44.349Z","updated_at":"2025-10-24T22:34:57.611Z","avatar_url":"https://github.com/saksham-jain177.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Customer Review Sentiment Classification\n\n## Overview\nThis project implements a machine learning pipeline for classifying customer reviews from the IMDb dataset as positive or negative. The solution covers data loading, text preprocessing, TF-IDF feature extraction, model training, evaluation, and an interactive Streamlit interface for real-time predictions.**The training pipeline also uses caching and persistent model saving to avoid retraining on every run.**\n\n\n## Objectives\n- **Data Collection:** Load reviews from the [aclImdb dataset](https://ai.stanford.edu/~amaas/data/sentiment/).\n- **Preprocessing:** Clean and tokenize reviews.\n- **Feature Extraction:** Convert text to TF-IDF features.\n- **Model Training:** Train a classifier (Logistic Regression) to predict sentiment.\n- **Evaluation:** Assess model performance using standard metrics.\n- **User Interface:** Provide an interactive UI for evaluation and review classification.\n\n## Components\n\n### Data Collection \u0026 Preprocessing\n- Load the [aclImdb dataset](https://ai.stanford.edu/~amaas/data/sentiment/) (organized into train/test with positive and negative reviews).\n- Clean and tokenize review texts.\n\n### Feature Extraction\n- Use TF-IDF vectorization to convert reviews into numerical features.\n\n### Model Training \u0026 Evaluation\n- Split the data into training and test sets.\n- Train a Logistic Regression model.\n- Evaluate the model using accuracy, precision, recall, F1-score, and a confusion matrix.\n\n### User Interface\n- A Streamlit app to run the entire pipeline, display evaluation metrics, and classify new reviews.\n- Caching and model persistence to avoid retraining on every run.\n\n## How to Run\n\n1. **Clone the Repository:**\n\n   ```\n   git clone https://github.com/saksham-jain177/text_classification.git\n   cd text-classification\n   ```\n2. **Install Dependencies:**\n    ```\n   pip install -r requirements.txt\n    ```\n3. **Run the Application:**\n   ```\n   streamlit run app.py\n   ```\n   \n## Directory Structure\n    text_classification/\n    ├── app.py                     # Main application file (Streamlit interface)\n    ├── data/\n    │   └── aclImdb/               # IMDb dataset organized into train/test with pos/neg reviews\n    ├── evaluation.py              # metrics and visualization\n    ├── feature_extraction.py      # TF-IDF feature extraction\n    ├── model.py                   # model training \n    ├── preprocessing.py           # data loading and text preprocessing\n    ├── requirements.txt           \n    └── README.md                  \n## Challenges and Insights\n- Balancing data cleaning and feature extraction to capture meaningful signals.\n- Tuning the TF-IDF vectorizer for effective text representation.\n- Achieving robust model performance given the variability in customer reviews.\n\n## Future Improvements\n- Experimenting with alternative classifiers (e.g., Naive Bayes, SVM) and ensemble methods.\n- Integrate hyperparameter tuning for optimized performance.\n- Enhance the UI with additional visualizations and batch prediction capabilities.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaksham-jain177%2Ftext_classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaksham-jain177%2Ftext_classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaksham-jain177%2Ftext_classification/lists"}