{"id":13507385,"url":"https://github.com/fredwu/simple_bayes","last_synced_at":"2025-04-04T13:12:56.343Z","repository":{"id":57548208,"uuid":"63347581","full_name":"fredwu/simple_bayes","owner":"fredwu","description":"A Naive Bayes machine learning implementation in Elixir.","archived":false,"fork":false,"pushed_at":"2017-09-25T19:48:26.000Z","size":100,"stargazers_count":392,"open_issues_count":1,"forks_count":24,"subscribers_count":23,"default_branch":"master","last_synced_at":"2024-05-03T00:18:15.455Z","etag":null,"topics":["bayes","classifier","machine-learning","naive-bayes-classifier"],"latest_commit_sha":null,"homepage":"","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fredwu.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-07-14T15:21:22.000Z","updated_at":"2024-01-12T15:46:16.000Z","dependencies_parsed_at":"2022-08-28T11:22:18.634Z","dependency_job_id":null,"html_url":"https://github.com/fredwu/simple_bayes","commit_stats":null,"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredwu%2Fsimple_bayes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredwu%2Fsimple_bayes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredwu%2Fsimple_bayes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fredwu%2Fsimple_bayes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fredwu","download_url":"https://codeload.github.com/fredwu/simple_bayes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247182401,"owners_count":20897381,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayes","classifier","machine-learning","naive-bayes-classifier"],"created_at":"2024-08-01T02:00:32.693Z","updated_at":"2025-04-04T13:12:56.318Z","avatar_url":"https://github.com/fredwu.png","language":"Elixir","readme":"# Simple Bayes [![Travis](https://img.shields.io/travis/fredwu/simple_bayes.svg)](https://travis-ci.org/fredwu/simple_bayes) [![Coverage](https://img.shields.io/coveralls/fredwu/simple_bayes.svg)](https://coveralls.io/github/fredwu/simple_bayes?branch=master) [![Hex.pm](https://img.shields.io/hexpm/v/simple_bayes.svg)](https://hex.pm/packages/simple_bayes)\n\nA [Naive Bayes](https://en.wikipedia.org/wiki/Naive_Bayes_classifier) machine learning implementation in Elixir.\n\n\u003e In machine learning, __naive Bayes classifiers__ are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.\n\n\u003e Naive Bayes has been studied extensively since the 1950s. It was introduced under a different name into the text retrieval community in the early 1960s, and remains a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate preprocessing, it is competitive in this domain with more advanced methods including support vector machines. It also finds application in automatic medical diagnosis.\n\n\u003e Naive Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers. - [Wikipedia](https://en.wikipedia.org/wiki/Naive_Bayes_classifier)\n\n## Features\n\n- Naive Bayes algorithm with different models\n  - Multinomial\n  - Binarized (boolean) multinomial\n  - Bernoulli\n- Multiple storage options\n  - In-memory (default)\n  - File system\n  - [Dets](http://erlang.org/doc/man/dets.html) (Disk-based Erlang Term Storage)\n- Ignores stop words\n- [Additive smoothing](https://en.wikipedia.org/wiki/Additive_smoothing)\n- [TF-IDF](https://en.wikipedia.org/wiki/Tf-idf)\n- Optional keywords weighting\n- Optional word [stemming](https://en.wikipedia.org/wiki/Stemming) via [Stemmer](https://github.com/fredwu/stemmer)\n\n### Feature Matrix\n\n|                    | Multinomial | Binarized multinomial | Bernoulli |\n|--------------------|-------------|-----------------------|-----------|\n| Stop words         | ✅          | ✅                    | ✅       |\n| Additive smoothing | ✅          | ✅                    |          |\n| TF-IDF             | ✅          |                       |          |\n| Keywords weighting | ✅          |                       |          |\n| Stemming           | ✅          | ✅                    | ✅       |\n\n## Usage\n\nInstall by adding `:simple_bayes` and optionally `:stemmer` (for the default\nstemming functionality) to `deps` in your\n`mix.exs`:\n\n```elixir\ndefp deps do\n  [\n    {:simple_bayes, \"~\u003e 0.12\"},\n    {:stemmer,      \"~\u003e 1.0\"}\n  ]\nend\n```\n\nIf you're on Elixir 1.3 or below, ensure `:simple_bayes` and optionally\n`:stemmer` are started before your application:\n\n```elixir\ndef application do\n  [applications: [:logger, :simple_bayes, :stemmer]]\nend\n```\n\n```elixir\nbayes = SimpleBayes.init()\n        |\u003e SimpleBayes.train(:apple, \"red sweet\")\n        |\u003e SimpleBayes.train(:apple, \"green\", weight: 0.5)\n        |\u003e SimpleBayes.train(:apple, \"round\", weight: 2)\n        |\u003e SimpleBayes.train(:banana, \"sweet\")\n        |\u003e SimpleBayes.train(:banana, \"green\", weight: 0.5)\n        |\u003e SimpleBayes.train(:banana, \"yellow long\", weight: 2)\n        |\u003e SimpleBayes.train(:orange, \"red\")\n        |\u003e SimpleBayes.train(:orange, \"yellow sweet\", weight: 0.5)\n        |\u003e SimpleBayes.train(:orange, \"round\", weight: 2)\n\nbayes |\u003e SimpleBayes.classify_one(\"Maybe green maybe red but definitely round and sweet.\")\n# =\u003e :apple\n\nbayes |\u003e SimpleBayes.classify(\"Maybe green maybe red but definitely round and sweet.\")\n# =\u003e [\n#   apple:  0.18519202529366116,\n#   orange: 0.14447781772131096,\n#   banana: 0.10123406763124557\n# ]\n\nbayes |\u003e SimpleBayes.classify(\"Maybe green maybe red but definitely round and sweet.\", top: 2)\n# =\u003e [\n#   apple:  0.18519202529366116,\n#   orange: 0.14447781772131096,\n# ]\n```\n\nWith and without word stemming (requires a stem function, we recommend [Stemmer](https://github.com/fredwu/stemmer)):\n\n```elixir\nSimpleBayes.init()\n|\u003e SimpleBayes.train(:apple, \"buying apple\")\n|\u003e SimpleBayes.train(:banana, \"buy banana\")\n|\u003e SimpleBayes.classify(\"buy apple\")\n# =\u003e [\n#   banana: 0.05719389206673358,\n#   apple: 0.05719389206673358\n# ]\n\nSimpleBayes.init(stem: \u0026Stemmer.stem/1) # or any other stemming function\n|\u003e SimpleBayes.train(:apple, \"buying apple\")\n|\u003e SimpleBayes.train(:banana, \"buy banana\")\n|\u003e SimpleBayes.classify(\"buy apple\")\n# =\u003e [\n#   apple: 0.18096114003107086,\n#   banana: 0.15054767928902865\n# ]\n```\n\n## Configuration (Optional)\n\nFor application wide configuration, in your application's `config/config.exs`:\n\n```elixir\nconfig :simple_bayes, model: :multinomial\nconfig :simple_bayes, storage: :memory\nconfig :simple_bayes, default_weight: 1\nconfig :simple_bayes, smoothing: 0\nconfig :simple_bayes, stem: false\nconfig :simple_bayes, top: nil\nconfig :simple_bayes, stop_words: ~w(\n  a about above after again against all am an and any are aren't as at be\n  because been before being below between both but by can't cannot could\n  couldn't did didn't do does doesn't doing don't down during each few for from\n  further had hadn't has hasn't have haven't having he he'd he'll he's her here\n  here's hers herself him himself his how how's i i'd i'll i'm i've if in into\n  is isn't it it's its itself let's me more most mustn't my myself no nor not of\n  off on once only or other ought our ours ourselves out over own same shan't\n  she she'd she'll she's should shouldn't so some such than that that's the\n  their theirs them themselves then there there's these they they'd they'll\n  they're they've this those through to too under until up very was wasn't we\n  we'd we'll we're we've were weren't what what's when when's where where's\n  which while who who's whom why why's with won't would wouldn't you you'd\n  you'll you're you've your yours yourself yourselves\n)\n```\n\nAlternatively, you may pass in the configuration options when you initialise:\n\n```elixir\nSimpleBayes.init(\n  model:          :multinomial,\n  storage:        :memory,\n  default_weight: 1,\n  smoothing:      0,\n  stem:           false,\n  top:            nil,\n  stop_words:     []\n)\n```\n\n### Available options for `:model` are:\n\n- `:multinomial` (default)\n- `:binarized_multinomial`\n- `:bernoulli`\n\n### Available options for `:storage` are:\n\n- `:memory` (default, can also be used by any database, [see below](#in-memory-save2-load1-and-the-encoded_data-option) for more details)\n- `:file_system`\n- `:dets`\n\nStorage options have extra configurations:\n\n#### Memory\n\n- `:namespace` - optional, it's only useful when you want to `load` by the namespace\n\n#### File System\n\n- `:file_path`\n\n#### Dets\n\n- `:file_path`\n\n#### File System vs Dets\n\nFile system encodes and decodes data using base64, whereas Dets is a native Erlang library. Performance wise file system with base64 tends to be faster with less data, and Dets faster with more data. YMMV, please do your own comparison.\n\n#### Configuration Examples\n\n```elixir\n# application-wide configuration:\nconfig :simple_bayes, storage: :file_system\nconfig :simple_bayes, file_path: \"path/to/the/file.txt\"\n\n# per-initialization configuration:\nSimpleBayes.init(\n  storage: :file_system,\n  file_path: \"path/to/the/file.txt\"\n)\n```\n\n#### Storage Usage\n\n```elixir\nopts = [\n  storage:   :file_system,\n  file_path: \"test/temp/file_sysmte_test.txt\"\n]\n\nSimpleBayes.init(opts)\n|\u003e SimpleBayes.train(:apple, \"red sweet\")\n|\u003e SimpleBayes.train(:apple, \"green\", weight: 0.5)\n|\u003e SimpleBayes.train(:apple, \"round\", weight: 2)\n|\u003e SimpleBayes.train(:banana, \"sweet\")\n|\u003e SimpleBayes.save()\n\nSimpleBayes.load(opts)\n|\u003e SimpleBayes.train(:banana, \"green\", weight: 0.5)\n|\u003e SimpleBayes.train(:banana, \"yellow long\", weight: 2)\n|\u003e SimpleBayes.train(:orange, \"red\")\n|\u003e SimpleBayes.train(:orange, \"yellow sweet\", weight: 0.5)\n|\u003e SimpleBayes.train(:orange, \"round\", weight: 2)\n|\u003e SimpleBayes.save()\n\nSimpleBayes.load(opts)\n|\u003e SimpleBayes.classify(\"Maybe green maybe red but definitely round and sweet\")\n```\n\n#### In-memory `save/2`, `load/1` and the `encoded_data` option\n\nCalling `SimpleBayes.save/2` is unnecessary for `:memory` storage. However, when using the in-memory storage, you are able to get the encoded data - this is useful if you would like to store the encoded data in your persistence of choice. For example:\n\n```elixir\n{:ok, _pid, encoded_data} = SimpleBayes.init()\n|\u003e SimpleBayes.train(:apple, \"red sweet\")\n|\u003e SimpleBayes.train(:apple, \"green\", weight: 0.5)\n|\u003e SimpleBayes.train(:apple, \"round\", weight: 2)\n|\u003e SimpleBayes.train(:banana, \"sweet\")\n|\u003e SimpleBayes.save()\n\n# now store `encoded_data` in your database of choice\n# once `encoded_data` is fetched again from the database, you are then able to:\n\nSimpleBayes.load(encoded_data: encoded_data)\n|\u003e SimpleBayes.train(:banana, \"green\", weight: 0.5)\n|\u003e SimpleBayes.train(:banana, \"yellow long\", weight: 2)\n|\u003e SimpleBayes.train(:orange, \"red\")\n|\u003e SimpleBayes.train(:orange, \"yellow sweet\", weight: 0.5)\n|\u003e SimpleBayes.train(:orange, \"round\", weight: 2)\n|\u003e SimpleBayes.classify(\"Maybe green maybe red but definitely round and sweet\")\n```\n\n## Changelog\n\nPlease see [CHANGELOG.md](CHANGELOG.md).\n\n## License\n\nLicensed under [MIT](http://fredwu.mit-license.org/).\n","funding_links":[],"categories":["Artificial Intelligence","Elixir"],"sub_categories":["Tools","[Tools](#tools-1)","Speech Recognition"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffredwu%2Fsimple_bayes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffredwu%2Fsimple_bayes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffredwu%2Fsimple_bayes/lists"}