{"id":18478303,"url":"https://github.com/ayusharma03/codsoft_internship","last_synced_at":"2026-01-29T10:02:43.071Z","repository":{"id":252124074,"uuid":"839494393","full_name":"ayusharma03/CodSoft_Internship","owner":"ayusharma03","description":"CodSoft Internship Projects containing, SMS Spam prediction Model, Customer Churn Prediction and Movie Classification System Based On the Movie's Summary","archived":false,"fork":false,"pushed_at":"2024-08-07T18:53:42.000Z","size":558,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-05T05:04:58.981Z","etag":null,"topics":["bag-of-words","codsoft","codsoft-internship","codsoft-machine-learning","codsoft-virtual-internship","codsoftinternship","machine-learning","nltk","tfidf-vectorizer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ayusharma03.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-07T18:12:50.000Z","updated_at":"2024-08-07T19:30:02.000Z","dependencies_parsed_at":"2024-08-07T21:41:25.252Z","dependency_job_id":null,"html_url":"https://github.com/ayusharma03/CodSoft_Internship","commit_stats":null,"previous_names":["ayusharma03/codsoft_internship"],"tags_count":0,"template":true,"template_full_name":null,"purl":"pkg:github/ayusharma03/CodSoft_Internship","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayusharma03%2FCodSoft_Internship","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayusharma03%2FCodSoft_Internship/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayusharma03%2FCodSoft_Internship/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayusharma03%2FCodSoft_Internship/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ayusharma03","download_url":"https://codeload.github.com/ayusharma03/CodSoft_Internship/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ayusharma03%2FCodSoft_Internship/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28875446,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-29T09:47:23.353Z","status":"ssl_error","status_checked_at":"2026-01-29T09:47:19.357Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bag-of-words","codsoft","codsoft-internship","codsoft-machine-learning","codsoft-virtual-internship","codsoftinternship","machine-learning","nltk","tfidf-vectorizer"],"created_at":"2024-11-06T12:09:34.844Z","updated_at":"2026-01-29T10:02:43.046Z","avatar_url":"https://github.com/ayusharma03.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CodSoft Internship 🚀\n\nThis project is the first of three projects I am working on during my internship at Codsoft. The internship provides an opportunity to apply theoretical knowledge to real-world problems, enhance technical skills, and gain valuable industry experience.\n\n\n## SMS Spam Prediction Machine Learning Project 📱🤖\n\nThis project is the first of three projects undertaken during my internship at Codsoft. It focuses on building a machine learning model to predict SMS spam messages. The dataset used for training and evaluation is imported from Kaggle. The project leverages various technologies such as numpy, pandas, os, and nltk for data manipulation and natural language processing.\n\n![alt text](/SpamSms/image.png)\n\n## Table of Contents\n\n1. [Project Overview](#project-overview)\n2. [Technologies Used](#technologies-used)\n3. [Setup and Installation](#setup-and-installation)\n4. [Dataset](#dataset)\n5. [Data Preprocessing](#data-preprocessing)\n7. [Results](#results)\n\n## Project Overview 📝\n\nThe objective of this project is to classify SMS messages as spam or not spam using machine learning techniques. The project involves data preprocessing, feature extraction, model training, and evaluation.\n\n## Technologies Used 🛠️\n\n- **Python**: The primary programming language used.\n- **numpy**: For numerical operations.\n- **pandas**: For data manipulation and analysis.\n- **os**: For handling file paths and directories.\n- **nltk**: For natural language processing tasks.\n\n## Setup and Installation ⚙️\n\n1. Clone the repository to your local machine:\n    ```bash\n    git clone https://github.com/ayusharma03/sms-spam-prediction.git\n    ```\n\n2. Navigate to the project directory:\n    ```bash\n    cd sms-spam-prediction\n    ```\n\n3. Install the required dependencies:\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n4. Ensure you have the necessary NLTK data:\n    ```python\n    import nltk\n    nltk.download('stopwords')\n    nltk.download('punkt')\n    ```\n\n## Dataset 📊\n\nThe dataset used in this project is sourced from Kaggle. It contains a collection of SMS messages labeled as spam or not spam.\n\nYou can refer to the link [Dataset](https://kaggle.com/datasets/uciml/sms-spam-collection-dataset/code) \n\nTo load the dataset, the following code is used:\n```python\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n    for filename in filenames:\n        print(os.path.join(dirname, filename))\n        \ndf1 = pd.read_csv(\"/\"relative_path\"/spam.csv\", encoding='latin1')\n```\n\n## Data Preprocessing 🧹\n\nData preprocessing involves cleaning the text data, tokenizing, removing stop words, and vectorizing the text. The `nltk` library is used extensively for these tasks.\n\n\n## Results 📈\n\nI've got a 0.9524% accuracy on the test dataset, highlighting the effectiveness of the model in identifying spam messages.\n\n\n---\n\nFeel free to reach out if you have any questions or suggestions. Happy coding! 💻","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fayusharma03%2Fcodsoft_internship","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fayusharma03%2Fcodsoft_internship","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fayusharma03%2Fcodsoft_internship/lists"}