https://github.com/haydenhigg/bengal
Easy-to-use Go implementation of the multinomial Naive Bayes classifier for multilabel text classification.
https://github.com/haydenhigg/bengal
go golang machine-learning multilabel-classification naive-bayes naive-bayes-classifier text-classification text-classifier
Last synced: 27 days ago
JSON representation
Easy-to-use Go implementation of the multinomial Naive Bayes classifier for multilabel text classification.
- Host: GitHub
- URL: https://github.com/haydenhigg/bengal
- Owner: haydenhigg
- License: mit
- Created: 2020-11-18T22:52:52.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2022-10-31T18:55:36.000Z (over 3 years ago)
- Last Synced: 2025-12-17T10:42:12.041Z (6 months ago)
- Topics: go, golang, machine-learning, multilabel-classification, naive-bayes, naive-bayes-classifier, text-classification, text-classifier
- Language: Go
- Homepage:
- Size: 67.4 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bengal
Optimized Go implementation of Naive Bayes classifiers for multilabel text classification.
## install
In your project:
`$ go get github.com/haydenhigg/bengal`
Then, import it as:
```go
import "github.com/haydenhigg/bengal"
```
## use
### modeling
- `TrainMultinomial(xs, ys [][]string, smoothing float64) NaiveBayesModel`: Creates and trains a multinomial model.
- `(model *NaiveBayesModel) PredictMultinomial(x []string) []string`: Predicts the labels for an input using token presence only.
- `NewBernoulli(xs, ys [][]string, smoothing float64) NaiveBayesModel`: Creates and trains a Bernoulli model.
- `(model *NaiveBayesModel) PredictBernoulli(x []string) []string`: Predicts the labels for an input using token presence and absence.
### example
```go
package main
import (
"fmt"
"github.com/haydenhigg/bengal"
)
func main() {
inputs := [][]string{
[]string{"the", "cat", "was", "crying"},
[]string{"dogs", "like", "to", "smile"},
...,
}
outputs := [][]string{
[]string{"cat", "sad"},
[]string{"dog", "happy"},
...,
}
smoothing := 1.0 // fix the zero-probability problem, 1.0 is common
model := bengal.NewBernoulli(inputs, outputs, smoothing)
fmt.Println(model.PredictBernoulli([]string{...}))
}
```
## notes
- It is recommended to stem all input examples using something like [this](https://github.com/dchest/stemmer) before training or predicting.
- This uses log probabilities and smoothing for robustness.
- It's viable to use a different training function than prediction function. You can use `NewBernoulli` for training but `PredictMultinomial` for faster -- and similarly accurate on short documents -- predictions.