https://github.com/chief/clj-naive-bayes
Yet another naive bayes implementation in Clojure
https://github.com/chief/clj-naive-bayes
Last synced: about 1 year ago
JSON representation
Yet another naive bayes implementation in Clojure
- Host: GitHub
- URL: https://github.com/chief/clj-naive-bayes
- Owner: chief
- Created: 2014-11-03T10:20:01.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2016-06-15T11:21:34.000Z (almost 10 years ago)
- Last Synced: 2024-05-07T18:18:52.836Z (about 2 years ago)
- Language: Clojure
- Size: 588 KB
- Stars: 6
- Watchers: 6
- Forks: 0
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# clj-naive-bayes
_Warning_: This project is under heavy development. Things will break!
## Usage
First of all you will need a new classifier:
```clojure
(require '[clj_naive_bayes.core :as nb])
(def my-classifier (nb/new-classifier {:name :ngram-nb :ngram-size 2 :ngram-type :multinomial}))
```
### Available options are
* __:name__ : Currently `:ngram-nb`, `:multinomial-nb` and `:binary-nb` are
supported. (Default `:multinomial-nb`)
* __:ngram-size__ : Sets ngram size. (Default 2)
* __:ngram-type__ : Whether the ngram should be `:binary` or `:multinomial`
* __:boost-start__ : Boolean. (Default `false`). This flag has only effect
with ngrams.
* __:keep-sorted__ : Boolean. (Default `false`). With this flag on all tokens
in ngram keys are stores in alphabetical order.
## Train
Suppose you have a training dataset. This should be a CSV file, consisting of
lines with `` or `` elements. In the
second case, the `count` column should contain the number of occurences of each
sample. This is purely for space-saving purposes, so e.g. instead of using five
lines of the same `` pair, a single `` line
can be used instead.
```clojure
(require '[clj_naive_bayes.train :as train])
(train/parallel-train-from-file my-classifier "resources/train.csv" :limit 400000)
```
## Classify
Now we can try classifying a new document:
```clojure
(nb/classify my-classifier "iphone 6s")
=> "40"
```
## Export Probabilities to a Hashmap
This could be useful for e.g. persisting the classifier:
```clojure
(def out (nb/export a))
=> #'user/out
(keys out)
=> (:terms :cats)
```
## Evaluate Performance
```clojure
(use 'clj_naive_bayes.core)
(use 'clj_naive_bayes.eval)
(def logs (parallel-classifications my-classifier "resources/test.json"))
```
## Persist classifiers
Currently only file disk persistance is supported. Suppose you have a trained
classifier named `my-classifier` you can write it to a file:
```clojure
(use 'clj_naive_bayes.utils)
(persist-classifier my-classifier "resources/data.clj")
```
And later on load it:
```clojure
(use 'clj_naive_bayes.utils)
(load-classifier my-classifier "resources/data.clj")
```
### Testing
`lein test` will run all tests.
`lein test [TEST]` will run only tests in the TESTS namespaces.
## Tooling
### Kibit
`lein kibit` will analyze code
### Marginalia
`lein marg` will produce documentation under `/docs`