{"id":19575060,"url":"https://github.com/chief/clj-naive-bayes","last_synced_at":"2025-04-27T06:30:51.349Z","repository":{"id":22767745,"uuid":"26113584","full_name":"chief/clj-naive-bayes","owner":"chief","description":"Yet another naive bayes implementation in Clojure","archived":false,"fork":false,"pushed_at":"2016-06-15T11:21:34.000Z","size":602,"stargazers_count":6,"open_issues_count":13,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-05-07T18:18:52.836Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Clojure","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chief.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-11-03T10:20:01.000Z","updated_at":"2023-12-13T00:13:10.000Z","dependencies_parsed_at":"2022-08-21T12:00:52.719Z","dependency_job_id":null,"html_url":"https://github.com/chief/clj-naive-bayes","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chief%2Fclj-naive-bayes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chief%2Fclj-naive-bayes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chief%2Fclj-naive-bayes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chief%2Fclj-naive-bayes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chief","download_url":"https://codeload.github.com/chief/clj-naive-bayes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224062762,"owners_count":17249289,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-11T06:45:36.778Z","updated_at":"2024-11-11T06:45:38.229Z","avatar_url":"https://github.com/chief.png","language":"Clojure","funding_links":[],"categories":[],"sub_categories":[],"readme":"# clj-naive-bayes\n\n_Warning_: This project is under heavy development. Things will break!\n\n## Usage\n\nFirst of all you will need a new classifier:\n\n```clojure\n\n(require '[clj_naive_bayes.core :as nb])\n\n(def my-classifier (nb/new-classifier {:name :ngram-nb :ngram-size 2 :ngram-type :multinomial}))\n\n```\n\n### Available options are\n\n* __:name__ : Currently `:ngram-nb`, `:multinomial-nb` and `:binary-nb` are\n  supported. (Default `:multinomial-nb`)\n\n* __:ngram-size__ : Sets ngram size. (Default 2)\n\n* __:ngram-type__ : Whether the ngram should be `:binary` or `:multinomial`\n\n* __:boost-start__ : Boolean. (Default `false`). This flag has only effect\n  with ngrams.\n\n* __:keep-sorted__ : Boolean. (Default `false`). With this flag on all tokens\n  in ngram keys are stores in alphabetical order.\n\n## Train\n\nSuppose you have a training dataset. This should be a CSV file, consisting of\nlines with `\u003cdocument,class\u003e` or `\u003cdocument,class,count\u003e` elements. In the\nsecond case, the `count` column should contain the number of occurences of each\nsample. This is purely for space-saving purposes, so e.g. instead of using five\nlines of the same `\u003cdocument,class\u003e` pair, a single `\u003cdocument,class,5\u003e` line\ncan be used instead.\n\n```clojure\n\n(require '[clj_naive_bayes.train :as train])\n\n(train/parallel-train-from-file my-classifier \"resources/train.csv\" :limit 400000)\n\n```\n\n## Classify\n\nNow we can try classifying a new document:\n\n```clojure\n(nb/classify my-classifier \"iphone 6s\")\n=\u003e \"40\"\n```\n\n## Export Probabilities to a Hashmap\n\nThis could be useful for e.g. persisting the classifier:\n\n```clojure\n(def out (nb/export a))\n=\u003e #'user/out\n(keys out)\n=\u003e (:terms :cats)\n```\n\n## Evaluate Performance\n\n```clojure\n\n(use 'clj_naive_bayes.core)\n(use 'clj_naive_bayes.eval)\n\n(def logs (parallel-classifications my-classifier \"resources/test.json\"))\n\n```\n\n## Persist classifiers\n\nCurrently only file disk persistance is supported. Suppose you have a trained\nclassifier named `my-classifier` you can write it to a file:\n\n```clojure\n\n(use 'clj_naive_bayes.utils)\n\n(persist-classifier my-classifier \"resources/data.clj\")\n\n```\n\nAnd later on load it:\n\n```clojure\n\n(use 'clj_naive_bayes.utils)\n\n  (load-classifier my-classifier \"resources/data.clj\")\n\n```\n\n### Testing\n\n`lein test` will run all tests.\n\n`lein test [TEST]` will run only tests in the TESTS namespaces.\n\n## Tooling\n\n### Kibit\n\n`lein kibit` will analyze code\n\n### Marginalia\n\n`lein marg` will produce documentation under `/docs`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchief%2Fclj-naive-bayes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchief%2Fclj-naive-bayes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchief%2Fclj-naive-bayes/lists"}