Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dayyass/extended-naive-bayes
[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.
https://github.com/dayyass/extended-naive-bayes
data-science distributions generative-model machine-learning naive-bayes python sampling scikit-learn
Last synced: 3 months ago
JSON representation
[WIP] Extension of sklearn Naive Bayes models that allows sampling and more feature distributions.
- Host: GitHub
- URL: https://github.com/dayyass/extended-naive-bayes
- Owner: dayyass
- License: mit
- Created: 2021-05-22T19:53:48.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-06-18T08:44:13.000Z (over 3 years ago)
- Last Synced: 2023-03-06T20:35:49.682Z (almost 2 years ago)
- Topics: data-science, distributions, generative-model, machine-learning, naive-bayes, python, sampling, scikit-learn
- Language: Python
- Homepage:
- Size: 120 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Extended Naive Bayes
### Installation
```python3
# clone repo
git clone https://github.com/dayyass/naive_bayes.git# install dependencies
cd naive_bayes
pip install -r requirements.txt
```### Usage
The repository consists two modules:
1) `distributions` - contains different parametric distributions (*univariate/multivariate*, *discrete/continuous*) to fit into data;
2) `models` - contains different naive bayes models.#### 1. Distributions
`distributions` module contains different parametric distributions (*univariate/multivariate*, *discrete/continuous*) to fit into data.
All distributions share the same interface (methods):
- `.fit(X, y)` - compute MLE given `X` (data). If y is provided, computes MLE of `X` for each class `y`;
- `.predict_log_proba(X)` - compute log probabilities given `X` (data).> :warning: If `y` were provided to `.fit` method, then `.predict_log_proba` will compute log probabilities for each class `y`.
List of available distributions:
Distribution | Discrete | Continuous
--- | --- | ---
**Univariate** | `Bernoulli`
`Binomial`
`Categorical`
`Geometric`
`Poisson` | `Normal`
`Exponential`
`Gamma`
`Beta`
**Multivariate** | `Multinomial` | `MultivariateNormal`There are also two special kind of distributions:
- [x] `ContinuousUnivariateDistribution` - any continuous univariate distribution from scipy.stats with method `.fit` *(scipy.stats.rv_continuous.fit)* (see *example 3*);
- [x] `KernelDensityEstimator` - Kernel Density Estimation *(Parzen–Rosenblatt window method)* - non-parametric method (see *example 4*).#### Example 1: Bernoulli distribution
```python3
import numpy as npfrom naive_bayes.distributions import Bernoulli
n_classes = 3
n_samples = 100
X = np.random.randint(low=0, high=2, size=n_samples)
y = np.random.randint(
low=0, high=n_classes, size=n_samples
) # categorical feature# if only X provided to fit method, then fit marginal distribution p(X)
distribution = Bernoulli()
distribution.fit(X)
distribution.predict_log_proba(X)# if X and y provided to fit method, then fit conditional distribution p(X|y)
distribution = Bernoulli()
distribution.fit(X, y)
distribution.predict_log_proba(X)
```#### Example 2: Normal distribution
```python3
import numpy as npfrom naive_bayes.distributions import Normal
n_classes = 3
n_samples = 100
X = np.random.randn(n_samples)
y = np.random.randint(
low=0, high=n_classes, size=n_samples
) # categorical feature# if only X provided to fit method, then fit marginal distribution p(X)
distribution = Normal()
distribution.fit(X)
distribution.predict_log_proba(X)# if X and y provided to fit method, then fit conditional distribution p(X|y)
distribution = Normal()
distribution.fit(X, y)
distribution.predict_log_proba(X)
```#### Example 3: ContinuousUnivariateDistribution
```python3
import numpy as np
from scipy import statsfrom naive_bayes.distributions import ContinuousUnivariateDistribution
n_classes = 3
n_samples = 100
X = np.random.randn(n_samples)
y = np.random.randint(
low=0, high=n_classes, size=n_samples
) # categorical feature# if only X provided to fit method, then fit marginal distribution p(X)
distribution = ContinuousUnivariateDistribution(stats.norm)
distribution.fit(X)
distribution.predict_log_proba(X)# if X and y provided to fit method, then fit conditional distribution p(X|y)
distribution = ContinuousUnivariateDistribution(stats.norm)
distribution.fit(X, y)
distribution.predict_log_proba(X)
```#### Example 4: KernelDensityEstimator
```python3
import numpy as npfrom naive_bayes.distributions import KernelDensityEstimator
n_classes = 3
n_samples = 100
X = np.random.randn(n_samples)
y = np.random.randint(
low=0, high=n_classes, size=n_samples
) # categorical feature# if only X provided to fit method, then fit marginal distribution p(X)
distribution = KernelDensityEstimator()
distribution.fit(X)
distribution.predict_log_proba(X)# if X and y provided to fit method, then fit conditional distribution p(X|y)
distribution = KernelDensityEstimator()
distribution.fit(X, y)
distribution.predict_log_proba(X)
```#### 2. Models
`models` module contains different naive bayes models.
All models share the same interface (methods):
- `.fit(X, y)` - fit the model;
- `.predict(X)` - compute model predictions;
- `.predict_proba(X)` - compute class probabilities;
- `.predict_log_proba(X)` - compute class log probabilities;
- `.score(X, y)` - compute mean accuracy.List of available models:
- [x] `NaiveBayes` - model with parameterizable feature distribution;
- [x] `BernoulliNaiveBayes` - model with Bernoulli feature distribution;
- [x] `CategoricalNaiveBayes` - model with Categorical feature distribution;
- [x] `GaussianNaiveBayes` - model with Normal feature distribution;#### Example 1: Bernoulli distributed data
```python3
import numpy as np
from sklearn.model_selection import train_test_splitfrom naive_bayes import BernoulliNaiveBayes
n_samples = 1000
n_features = 10
n_classes = 3X = np.random.randint(low=0, high=2, size=(n_samples, n_features))
y = np.random.randint(low=0, high=n_classes, size=n_samples)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=0
)model = BernoulliNaiveBayes(n_features=n_features)
model.fit(X_train, y_train)
model.predict(X_test)
```#### Example 2: Normal distributed data
```python3
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_splitfrom naive_bayes import GaussianNaiveBayes
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=0
)model = GaussianNaiveBayes(n_features=X.shape[1])
model.fit(X_train, y_train)
model.predict(X_test)
```#### Example 3: Mix of Bernoulli and Normal distributed data
```python3
import numpy as np
from sklearn.model_selection import train_test_splitfrom naive_bayes import NaiveBayes
from naive_bayes.distributions import Bernoulli, Normaln_samples = 1000
bernoulli_features = 3
normal_features = 3
n_classes = 3X_bernoulli = np.random.randint(
low=0, high=2, size=(n_samples, bernoulli_features)
)
X_normal = np.random.randn(n_samples, normal_features)X = np.hstack(
[X_bernoulli, X_normal]
) # shape (n_samples, bernoulli_features + normal_features)
y = np.random.randint(low=0, high=n_classes, size=n_samples)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=0
)model = NaiveBayes(
distributions=[
Bernoulli(),
Bernoulli(),
Bernoulli(),
Normal(),
Normal(),
Normal(),
]
)
model.fit(X_train, y_train)
model.predict(X_test)
```### Requirements
- `Python>=3.6`
- `numpy>=1.20.3`
- `pre-commit>=2.13.0`
- `scikit-learn>=0.24.2`To install requirements use:
`pip install -r requirements`### Tests
All implemented distributions and models are covered with unittest.
To run tests use:
`python -m unittest discover tests`### Citations
If you use **extended_naive_bayes** in a scientific publication, we would appreciate references to the following BibTex entry:
```bibtex
@misc{dayyass2021naivebayes,
author = {El-Ayyass, Dani},
title = {Extension of Naive Bayes Classificator},
howpublished = {\url{https://github.com/dayyass/extended_naive_bayes}},
year = {2021}
}
```