{"id":18676990,"url":"https://github.com/abdulsamie10/naivebayestextclassification","last_synced_at":"2026-04-27T08:31:21.240Z","repository":{"id":150726352,"uuid":"586593111","full_name":"abdulsamie10/NaiveBayesTextClassification","owner":"abdulsamie10","description":"This reposioty contains Naive Bayes algorithm using python, numpy.","archived":false,"fork":false,"pushed_at":"2023-01-08T17:22:52.000Z","size":87,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-05T08:16:09.081Z","etag":null,"topics":["colab","colab-notebook","jupyter-notebook","naive-bayes","naive-bayes-algorithm","naive-bayes-classification","naive-bayes-classifier","python","python-3","python3"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abdulsamie10.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-01-08T17:19:12.000Z","updated_at":"2023-01-08T17:24:16.000Z","dependencies_parsed_at":"2023-08-23T09:39:53.204Z","dependency_job_id":null,"html_url":"https://github.com/abdulsamie10/NaiveBayesTextClassification","commit_stats":null,"previous_names":["abdulsamie10/naivebayestextclassification"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abdulsamie10/NaiveBayesTextClassification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulsamie10%2FNaiveBayesTextClassification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulsamie10%2FNaiveBayesTextClassification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulsamie10%2FNaiveBayesTextClassification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulsamie10%2FNaiveBayesTextClassification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abdulsamie10","download_url":"https://codeload.github.com/abdulsamie10/NaiveBayesTextClassification/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abdulsamie10%2FNaiveBayesTextClassification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32329462,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"online","status_checked_at":"2026-04-27T02:00:06.769Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colab","colab-notebook","jupyter-notebook","naive-bayes","naive-bayes-algorithm","naive-bayes-classification","naive-bayes-classifier","python","python-3","python3"],"created_at":"2024-11-07T09:32:04.281Z","updated_at":"2026-04-27T08:31:21.203Z","avatar_url":"https://github.com/abdulsamie10.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"Naïve Bayes\n\n1. What is Naïve Bayes Algorithm?\n\nNaive Bayes is among one of the very simple and powerful algorithms for\nclassification based on Bayes Theorem with an assumption of independence among\nthe predictors. The Naive Bayes classifier assumes that the presence of a feature in\na class is not related to any other feature. Naive Bayes is a classification algorithm\nfor binary and multi-class classification problems.\n2. Bayes Theorem \n \nBased on prior knowledge of conditions that may be related to an event,\nBayes theorem describes the probability of the event\nconditional probability can be found this way\nAssume we have a Hypothesis(H) and evidence(E), \nAccording to Bayes theorem, the relationship between the probability of\nHypothesis before getting the evidence represented as P(H) and the\nprobability of the hypothesis after getting the evidence represented\nas P(H|E) is:\n \n\nP(H|E) = P(E|H)*P(H)/P(E)\nPrior probability = P(H) is the probability before getting the evidence \nPosterior probability = P(H|E) is the probability after getting evidence\nIn general, \n \n\nP(class|data) = (P(data|class) * P(class)) / P(data)\nBayes Theorem Example:\nAssume we have to find the probability of the randomly picked card to be king given\nthat it is a face card. \nThere are 4 Kings in a Deck of Cards which implies that P(King) = 4/52 \nas all the Kings are face Cards so P(Face|King) = 1 \nthere are 3 Face Cards in a Suit of 13 cards and there are 4 Suits in total so P(Face)\n= 12/52 \nTherefore, \nP(King|face) = P(face|king)*P(king)/P(face) = 1/3\nTypes of Naïve Bayes:\n\n1\nY\n\nThese three distributions are so common that the Naive Bayes implementation is often\nnamed after the distribution. For example:\nBinomial Naive Bayes: Naive Bayes that uses a binomial distribution.\nMultinomial Naive Bayes: Naive Bayes that uses a multinomial distribution.\nGaussian Naive Bayes: Naive Bayes that uses a Gaussian distribution.\nA dataset with mixed data types for the input variables may require the selection of\ndifferent types of data distributions for each variable.\nUsing one of the three common distributions is not mandatory; for example, if a real-\nvalued variable is known to have a different specific distribution, such as exponential,\nthen that specific distribution may be used instead. If a real-valued variable does not\nhave a well-defined distribution, such as bimodal or multimodal, then a kernel density\nestimator can be used to estimate the probability distribution instead.\n\n1 The Classifier\nThe Bayes Naive classifier selects the most likely classification V nb given the\nattribute values a 1 , a 2 , . . . a n . This results in:\n\nV nb = argmaxvj ∈V P (v j ) P\n(a i |v j ) (1)\nWe generally estimate P (a i |v j ) using m-estimates:\n\nwhere:\n\nP (a i |v j ) =n c + mp (2)\n\nn + m\n\n1\n|\n\nn = the number of training examples for\nwhich v = v j n c = number of examples for\nwhich v = v j and a = a i p = a priori\nestimate for P (a i v j )\nm = the equivalent sample size\n\n2 Car theft Example\nAttributes are Color , Type , Origin, and the subject, stolen can be either yes or no.\n2.1 data set\nExample No. Color Type Origin Stolen?\n1 Red Sports Domestic Yes\n2 Red Sports Domestic No\n3 Red Sports Domestic Yes\n4 Yellow Sports Domestic No\n5 Yellow Sports Imported Yes\n6 Yellow SUV Imported No\n7 Yellow SUV Imported Yes\n8 Yellow SUV Domestic No\n9 Red SUV Imported No\n10 Red Sports Imported Yes\n2.2 Training example\nWe want to classify a Red Domestic SUV. Note there is no example of a Red\nDomestic SUV in our data set. Looking back at equation (2) we can see how to\ncompute this. We need to calculate the probabilities\nP(Red|Yes), P(SUV|Yes), P(Domestic|Yes) ,\nP(Red|No) , P(SUV|No), and P(Domestic|No)\nand multiply them by P(Yes) and P(No) respectively . We can estimate these\nvalues using equation (3).\nYes: No:\nRed: Red:\nn = 5 n = 5\n\n1\n\n|\n\n|\n\n5 + 3 5 + 3\n5 + 3 5 + 3\n5 + 3 5 + 3\n\nn_c= 3 n_c = 2\np = .5 p = .5\nm = 3 m = 3\nSUV: SUV:\nn = 5 n = 5\nn_c = 1 n_c = 3\np = .5 p = .5\nm = 3 m = 3\nDomestic: Domestic:\nn = 5 n = 5\nn_c = 2 n_c = 3\np = .5 p = .5\nm = 3 m =3\nLooking at P (Red Y es), we have 5 cases where v j = Yes , and in 3 of those\ncases a i = Red. So for P (Red Y es), n = 5 and n c = 3. Note that all attribute are\nbinary (two possible values). We are assuming no other information so, p = 1 /\n(number-of-attribute-values) = 0.5 for all of our attributes. Our m value is\narbitrary, (We will use m = 3) but consistent for all attributes. Now we simply\napply eqauation (3) using the precomputed values of n , n c , p, and m.\nP (Red|Y es) = 3 + 3 ∗ .5 = .56 P (Red|No) = 2 + 3 ∗ .5 = .43\nP (SUV |Y es) = 1 + 3 ∗ .5 = .31 P (SUV |No) = 3 + 3 ∗ .5 = .56\nP (Domestic|Y es) = 2 + 3 ∗ .5 = .43 P (Domestic|No) = 3 + 3 ∗ .5 = .56\n\nWe have P (Y es) = .5 and P (No) = .5, so we can apply equation (2). For v = Y\nes, we have\nP(Yes) * P(Red | Yes) * P(SUV | Yes) * P(Domestic|Yes)\n= .5 * .56 * .31 * .43 = .037\nand for v = No, we have\nP(No) * P(Red | No) * P(SUV | No) * P (Domestic | No)\n= .5 * .43 * .56 * .56 = .069\nSince 0.069 \u0026gt; 0.037, our example gets classified as ’NO’\n\n1\n\nTask\nABOUT DATASET: It is for non-functional requirement analysis. 5 different classes.\n\n\nExplore the dataset carefully\n1. Plot the class count\n2. Encode the labels\n3. Count the words in each row\n4. Convert the text to lower case and split into words\n5. Remove the alpha-numeric\n6. Remove the stop words i.e. the, is, an, a, here, their, there etc. (without nltk)\n6. Split the dataset to 75 25\n7. Use Bag of Word for vectorization(feature extraction)\n8. Implement the models( variations of naive bayes).\n9. Predict the accuracy in case of  class imbalance f1-score\n10. Comparison of different variations\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulsamie10%2Fnaivebayestextclassification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdulsamie10%2Fnaivebayestextclassification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdulsamie10%2Fnaivebayestextclassification/lists"}