{"id":19618957,"url":"https://github.com/arjunan-k/spilter","last_synced_at":"2025-09-01T01:33:47.960Z","repository":{"id":118925588,"uuid":"581581772","full_name":"arjunan-k/Spilter","owner":"arjunan-k","description":"Build a model for classifying the email \u0026 SMS messages into spam or not spam using standard classifiers.","archived":false,"fork":false,"pushed_at":"2022-12-23T16:21:02.000Z","size":9156,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-26T18:32:02.673Z","etag":null,"topics":["machine-learning","naive-bayes-algorithm","spam-detection","spam-filtering"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arjunan-k.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-23T16:07:14.000Z","updated_at":"2022-12-23T16:18:01.000Z","dependencies_parsed_at":"2023-12-26T03:16:40.506Z","dependency_job_id":null,"html_url":"https://github.com/arjunan-k/Spilter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arjunan-k/Spilter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arjunan-k%2FSpilter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arjunan-k%2FSpilter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arjunan-k%2FSpilter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arjunan-k%2FSpilter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arjunan-k","download_url":"https://codeload.github.com/arjunan-k/Spilter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arjunan-k%2FSpilter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273064382,"owners_count":25039259,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","naive-bayes-algorithm","spam-detection","spam-filtering"],"created_at":"2024-11-11T11:11:35.850Z","updated_at":"2025-09-01T01:33:47.940Z","avatar_url":"https://github.com/arjunan-k.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  \u003cbr\u003e\n  SPILTER\n  \u003cbr\u003e\n  \u003ch4 align=\"center\"\u003eIn this project I build a model for classifying the Email \u0026 SMS into spam or not spam using machine learning.\u003c/h4\u003e\n  \u003cbr\u003e\n  \u003ca href=\"https://github.com/arjunan-k/Spilter\"\u003e\u003cimg src=\"https://github.com/arjunan-k/Spilter/blob/main/jupyter/bg.jpg?raw=true\" alt=\"SPILTER\"\u003e\u003c/a\u003e\n\u003c/h1\u003e\n\n## What It Does: \n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003cimg src=\"https://github.com/arjunan-k/Spilter/blob/main/jupyter/1.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## Preview:\nhttps://user-images.githubusercontent.com/104669486/208367453-f84989f2-83f6-46ca-bf51-fbc92506aa89.mp4\n\n\u003c!-- \u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003cimg src=\"https://github.com/arjunan-k/Spilter/blob/main/jupyter/demo.gif?raw=true\"\u003e\n\u003c/p\u003e --\u003e\n\n## How It Does:\nExtract the text and the target class from the dataset. Extract the features of the test using TF-IDF vectorizer for the Input features. Used MultinomialNB standard classifier to classify the data into spam or not spam.\n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003cimg src=\"https://github.com/arjunan-k/Spilter/blob/main/jupyter/2.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## Prerequisites:\nI would highly recommend that before the hack night you have some kind of toolchain and development environment already installed and ready. If you have no idea where to start with this, try a combination like:\n-  `Python`\n-  `scikit-learn` / `sklearn`\n-  `Pandas`\n-  `NumPy`\n-  `matplotlib`\n-  An environment to work in - something like `Jupyter` or `Spyder`\nFor Linux people, your package manager should be able to handle all of this. If it somehow can't, see if you can at least install Python and pip and then use pip to install the above packages.\n\n## Dataset:\nThe SMS/Email Spam Collection is a set of SMS tagged messages that have been collected for SMS/Email Spam research. It contains one set of SMS messages in English of 5,567 messages, tagged according being ham (legitimate) or spam.\n\n\u003e You can collect raw dataset from [here](https://raw.githubusercontent.com/arjunan-k/Spilter/main/Jupyter/spam.csv).\n\nThe files contain one message per line. Each line is composed by two columns:\n- `Class`- contains the label (ham or spam) \n- `Message` - contains the raw text.\n\n## Model Pipeline:\n\u003cp align=\"center\"\u003e\n  \u003cbr\u003e\n  \u003cimg src=\"https://github.com/arjunan-k/Spilter/blob/main/jupyter/3.jpg?raw=true\"\u003e\n\u003c/p\u003e\n\n## Components:\n-  Using TF-IDF for feature extraction of the text data for the messages.\n-  Use splits for skewed data(Since the number of ham are far more than the number of spam messages,the data is not balanced.)\n-  Use different standard classifiers for classification of the SMS/Emails.\n-  Compare the accuracy of various classifiers using standard classification metrics\n\n## Accuracy Result:\n```python\nimport sklearn\nfrom sklearn.feature_extraction.text import TfidfVectorizer\ntfidf = TfidfVectorizer(max_features=3000)\nx = tfidf.fit_transform(df[\"transformed_text\"]).toarray()\ny = df[\"target\"].values\n\nfrom sklearn.model_selection import train_test_split\nx_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=2)\n\nfrom sklearn.naive_bayes import MultinomialNB\nmnb = MultinomialNB()\n\nmnb.fit(x_train,y_train)\ny_pred = mnb.predict(x_test)\nprint(\"MultinomialNB TfidfVectorizer with max_features=3000\")\nprint(f\"accuracy: {accuracy_score(y_test,y_pred)}\")\nprint(f\"precision: {precision_score(y_test,y_pred)}\")\n```\n`Multinomial Naive Bayes with TfidfVectorizer having max_features=3000`\n```text\naccuracy: 0.9709864603481625\nprecision: 1.0\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farjunan-k%2Fspilter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farjunan-k%2Fspilter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farjunan-k%2Fspilter/lists"}