{"id":25026141,"url":"https://github.com/amruta33/email_classification_app","last_synced_at":"2026-04-14T15:32:44.993Z","repository":{"id":229166523,"uuid":"775956072","full_name":"amruta33/Email_Classification_App","owner":"amruta33","description":"Developing an application to analyze whether a message is spam or ham using Flask.","archived":false,"fork":false,"pushed_at":"2024-03-30T10:51:49.000Z","size":3404,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-30T15:47:36.473Z","etag":null,"topics":["css","flask","html","nlp","pickle","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amruta33.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-22T11:39:52.000Z","updated_at":"2024-03-30T10:50:30.000Z","dependencies_parsed_at":"2025-03-30T15:44:47.247Z","dependency_job_id":"830bbe6f-9626-4d31-bc17-831cf92a071e","html_url":"https://github.com/amruta33/Email_Classification_App","commit_stats":null,"previous_names":["amruta33/email_classification_app"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/amruta33/Email_Classification_App","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amruta33%2FEmail_Classification_App","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amruta33%2FEmail_Classification_App/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amruta33%2FEmail_Classification_App/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amruta33%2FEmail_Classification_App/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amruta33","download_url":"https://codeload.github.com/amruta33/Email_Classification_App/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amruta33%2FEmail_Classification_App/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31803331,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T11:13:53.975Z","status":"ssl_error","status_checked_at":"2026-04-14T11:13:53.299Z","response_time":153,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["css","flask","html","nlp","pickle","python"],"created_at":"2025-02-05T17:19:26.974Z","updated_at":"2026-04-14T15:32:44.976Z","avatar_url":"https://github.com/amruta33.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\r\n## Email Spam Classification using NLP and ML Techniques\r\n\r\n\u003cdiv style=\"text-align: center; background: #ff8c00; font-family: 'Montserrat', sans-serif; color: white; padding: 15px; font-size: 30px; font-weight: bold; line-height: 1; border-radius: 20px 20px 0 0; margin-bottom: 20px; box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.2);\"\u003e🚀 SMS Spam Classification: Detecting Unwanted Messages 🚀\u003c/div\u003e  \r\n\r\nOverview\r\n\r\nThe objective of this project is to build a classifier that can differentiate between spam and non-spam emails using Natural Language Processing (NLP) techniques and Machine Learning (ML) algorithms. The dataset consists of email messages labeled as either \"ham\" (non-spam) or \"spam\". Through the utilization of NLP methods and ML algorithms, we aim to develop a model capable of accurately classifying incoming emails as either spam or non-spam, thereby assisting in the identification and filtering of unwanted or potentially harmful messages.\r\n\r\n\r\nKey Insights:\r\nImbalanced Dataset: The dataset exhibits an imbalance, with a higher proportion of spam emails.\r\nHigher Word Counts: Spam emails tend to have higher word counts compared to non-spam emails,\r\nas observed during Exploratory Data Analysis (EDA).\r\n\r\nNLP Techniques Utilized:\r\nText Preprocessing:\r\nConvert text to lowercase.\r\nTokenization: Split text into individual words.\r\nRemoval of stopwords and punctuation.\r\nStemming: Reduce words to their root form.\r\n\r\nTF-IDF Vectorization:\r\nTerm Frequency-Inverse Document Frequency considers the importance of terms in the entire dataset.\r\n  1. Term Frequency (TF): This measures how frequently a term appears in a document. It is calculated by dividing the number of occurrences of a term in a document by the total number of terms in the document.\r\n  2. Inverse Document Frequency (IDF): This measures the rarity of a term across the entire dataset. It is calculated by taking the logarithm of the ratio of the total number of documents to the number of      documents containing the term.\r\nThis is advantageous for text classification tasks, especially with an imbalanced dataset.\r\n\r\nML Algorithms Employed:\r\nMultinomial Naive Bayes (MNB):\r\n\r\nInitially explored for its suitability in text classification tasks.\r\nSupport Vector Machine (SVM):\r\n\r\nDemonstrated high accuracy in classifying emails.\r\nK-Nearest Neighbors (KNN):\r\n\r\nAchieved decent accuracy but lower precision compared to MNB and SVM.\r\n\r\nRandom Forest:\r\n\r\nYielded high accuracy and precision.\r\n\r\nUltimate Selection:\r\nMultinomial Naive Bayes (MNB):\r\nChosen ultimately for its effectiveness in text classification tasks, particularly when combined with TF-IDF vectorization.\r\nRandom Forest  Accuracy: 97.69%\r\n\r\n\r\nFurther Enhancements:\r\nPrecision over Accuracy: Due to the dataset's imbalance, emphasis was placed on \r\nprecision over accuracy in model evaluation.\r\n\r\nTF-IDF vs Bag-of-Words: TF-IDF was chosen over Bag-of-Words due to its ability to a\r\nssign higher weights to rare terms, making it more suitable for imbalanced datasets.\r\n\r\nConclusion:\r\nThis project showcases the efficacy of Natural Language Processing (NLP) techniques when coupled with Machine Learning (ML) algorithms for the classification of spam emails. With a focus on precision, the classifier prioritizes accurately identifying spam emails while minimizing false positives. Leveraging the Multinomial Naive Bayes algorithm, the classifier demonstrates reliable performance. Through meticulous feature selection and model training, the classifier achieves a high level of accuracy in distinguishing spam from legitimate emails, thereby enhancing email security and user experience.\r\n\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famruta33%2Femail_classification_app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famruta33%2Femail_classification_app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famruta33%2Femail_classification_app/lists"}