Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anuraganalog/navie-bayes
Implemented Gaussian Naive Bayes Classifier from scratch
https://github.com/anuraganalog/navie-bayes
bayes classifier dataset datasets gaussian guassian-naive-bayes iris multinomial-naive-bayes naive naive-bayes-classifiers numpy py scratch
Last synced: 15 days ago
JSON representation
Implemented Gaussian Naive Bayes Classifier from scratch
- Host: GitHub
- URL: https://github.com/anuraganalog/navie-bayes
- Owner: AnuragAnalog
- License: gpl-3.0
- Created: 2020-11-02T04:39:45.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-11-21T14:50:45.000Z (about 4 years ago)
- Last Synced: 2025-01-05T21:39:46.077Z (29 days ago)
- Topics: bayes, classifier, dataset, datasets, gaussian, guassian-naive-bayes, iris, multinomial-naive-bayes, naive, naive-bayes-classifiers, numpy, py, scratch
- Language: Python
- Homepage:
- Size: 25.1 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Naive Bayes Classifiers
The file `naivebayes.py` contanins the implementation Guassian and Multinomail Naive Bayes Classifier
To make it much more simple I have restricted the input domain only to numpy arrays.
![formula](./formula.png)
## Gaussian Naive Bayes
The formula is used compute the posterior probability, in Guassian Naive Bayes for calculating the Likelihood we use the normal uni/multivariate distribution(depending on the features)
```python3
self.n_features_ # Number of featuresself.n_classes_ # Number of classes
self.class_mean_ # Contains the class mean's
self.class_std_ # Contains the class standard deviation's
self.prior_proba_ # Prior Probabilities of each class
self.class_encoding_ # Contains Class encodings
```### Example using Iris dataset
```ipython
In [1]: import pandas as pdIn [2]: from naivebayes import GaussianNB
In [3]: from sklearn.model_selection import train_test_split
In [4]: data = pd.read_csv('Iris.csv', index_col='Id')
In [5]: train_X, test_X, train_y, test_y = train_test_split(data.loc[:, data.columns != 'Species'], data['Species'], test_size=0.2)
In [6]: clf = GaussianNB()
In [7]: clf.fit(train_X.values, train_y.values)
In [8]: clf.predict(test_X.values)
Out[8]: array([2, 2, 0, 1, 1, 2, 0, 0, 0, 2, 0, 2, 1, 1, 2, 1, 1, 0, 1, 1, 2, 0, 0, 2, 1, 0, 2, 0, 0, 0])In [9]: clf.evaluate(train_X.values, train_y.values) # R^2 score on training data
Out[9]: 0.9248747913188647In [10]: clf.evaluate(test_X.values, test_y.values) # R^2 score on testing data
Out[10]: 0.9486301369863014
```> Change the data reading according to your dataset file.
## Multinomial Naive Bayes
The above figure refers to the Multinomial Naive Bayes formula.
```python3
self.n_features_ # Number of featuresself.n_classes_ # Number of classes
self.prior_proba_ # Prior Probabilities of each class
self.class_encoding_ # Contains Class encodings
```### Example using IMDB dataset
```ipython
In [1]: import pandas as pdIn [2]: from naivebayes import MultinomialNB
In [3]: from sklearn.model_selection import train_test_split
In [4]: data = pd.read_csv('imdb.zip', compression='zip')
In [5]: train_X, test_X, train_y, test_y = train_test_split(data.loc[:, data.columns != 'sentiment'], data['sentiment'], test_size=0.2)
In [6]: clf = MultinomialNB()
In [7]: clf.fit(train_X.values, train_y.values)
In [8]: clf.predict(test_X.values)
Out[8]: array([0 0 0 ... 1 0 0])In [9]: clf.evaluate(train_X.values, train_y.values) # MSE on training data
Out[9]: 0.4501In [10]: clf.evaluate(test_X.values, test_y.values) # MSE on testing data
Out[10]: 0.4517
```