{"id":24196407,"url":"https://github.com/namsor/java-naive-bayes-classifier-jnbc","last_synced_at":"2025-09-21T21:31:51.453Z","repository":{"id":53291123,"uuid":"140424036","full_name":"namsor/Java-Naive-Bayes-Classifier-JNBC","owner":"namsor","description":"A scalable, explainable Java Naive Bayes Classifier that works either in memory or on persistent fast key-value store (MapDB, RocksDB or LevelDB)","archived":false,"fork":false,"pushed_at":"2023-06-14T22:29:23.000Z","size":1313,"stargazers_count":6,"open_issues_count":1,"forks_count":4,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-08-20T14:58:42.966Z","etag":null,"topics":["explain-classifiers","explainable-ai","explainable-artificial-intelligence","leveldb","mapdb","naive-bayes-algorithm","naive-bayes-classification","naive-bayes-classifier","naivebayesclassifier","rocksdb"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/namsor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-10T11:47:53.000Z","updated_at":"2023-02-06T10:21:19.000Z","dependencies_parsed_at":"2022-08-19T23:40:16.863Z","dependency_job_id":null,"html_url":"https://github.com/namsor/Java-Naive-Bayes-Classifier-JNBC","commit_stats":null,"previous_names":[],"tags_count":6,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/namsor%2FJava-Naive-Bayes-Classifier-JNBC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/namsor%2FJava-Naive-Bayes-Classifier-JNBC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/namsor%2FJava-Naive-Bayes-Classifier-JNBC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/namsor%2FJava-Naive-Bayes-Classifier-JNBC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/namsor","download_url":"https://codeload.github.com/namsor/Java-Naive-Bayes-Classifier-JNBC/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233798123,"owners_count":18731919,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["explain-classifiers","explainable-ai","explainable-artificial-intelligence","leveldb","mapdb","naive-bayes-algorithm","naive-bayes-classification","naive-bayes-classifier","naivebayesclassifier","rocksdb"],"created_at":"2025-01-13T19:35:11.593Z","updated_at":"2025-09-21T21:31:45.363Z","avatar_url":"https://github.com/namsor.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Java-Naive-Bayes-Classifier-JNBC\nA Java Naive Bayes Classifier that works in-memory or off the heap on fast key-value stores (MapDB, LevelDB or RocksDB). Naive Bayes Classification is fast. The objective of this ground-up implementations is to provide a self-contained, vertically scalable and explainable implementation.  \n\nSee https://naivebayesclassifier.org/ for more information.\n\n\nMaven Quick-Start\n------------------\n\nThis Java Naive Bayes Classifier can be installed as any other dependency.\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.namsor\u003c/groupId\u003e\n    \u003cartifactId\u003eJava-Naive-Bayes-Classifier-JNBC\u003c/artifactId\u003e\n    \u003cversion\u003ev2.0.4\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\nExample\n------------------\n\nHere is an excerpt from the example http://ai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML-Classification-NaiveBayes-2014.pdf. \n\n```java\n\npackage com.namsor.oss.samples;\n\nimport com.namsor.oss.classify.bayes.ClassifyException;\nimport com.namsor.oss.classify.bayes.NaiveBayesClassifierMapImpl;\nimport com.namsor.oss.classify.bayes.PersistentClassifierException;\nimport java.util.HashMap;\nimport java.util.Map;\nimport java.util.logging.Level;\nimport java.util.logging.Logger;\nimport com.namsor.oss.classify.bayes.IClassification;\nimport com.namsor.oss.classify.bayes.IClassificationExplained;\nimport com.namsor.oss.classify.bayes.NaiveBayesExplainerImpl;\nimport javax.script.ScriptEngine;\nimport javax.script.ScriptEngineManager;\n\n/**\n * Simple example of Naive Bayes Classification (Sport / No Sport) inspired by\n * http://ai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML-Classification-NaiveBayes-2014.pdf\n *\n * @author elian\n */\npublic class MainSample1 {\n\n    public static final String YES = \"Yes\";\n    public static final String NO = \"No\";\n    /**\n     * Header table as per https://taylanbil.github.io/boostedNB or\n     * http://ai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML-Classification-NaiveBayes-2014.pdf\n     */\n    public static final String[] colName = {\n        \"outlook\", \"temp\", \"humidity\", \"wind\", \"play\"\n    };\n\n    /**\n     * Data table as per https://taylanbil.github.io/boostedNB or\n     * http://ai.fon.bg.ac.rs/wp-content/uploads/2015/04/ML-Classification-NaiveBayes-2014.pdf\n     */\n    public static final String[][] data = {\n        {\"Sunny\", \"Hot\", \"High\", \"Weak\", \"No\"},\n        {\"Sunny\", \"Hot\", \"High\", \"Strong\", \"No\"},\n        {\"Overcast\", \"Hot\", \"High\", \"Weak\", \"Yes\"},\n        {\"Rain\", \"Mild\", \"High\", \"Weak\", \"Yes\"},\n        {\"Rain\", \"Cool\", \"Normal\", \"Weak\", \"Yes\"},\n        {\"Rain\", \"Cool\", \"Normal\", \"Strong\", \"No\"},\n        {\"Overcast\", \"Cool\", \"Normal\", \"Strong\", \"Yes\"},\n        {\"Sunny\", \"Mild\", \"High\", \"Weak\", \"No\"},\n        {\"Sunny\", \"Cool\", \"Normal\", \"Weak\", \"Yes\"},\n        {\"Rain\", \"Mild\", \"Normal\", \"Weak\", \"Yes\"},\n        {\"Sunny\", \"Mild\", \"Normal\", \"Strong\", \"Yes\"},\n        {\"Overcast\", \"Mild\", \"High\", \"Strong\", \"Yes\"},\n        {\"Overcast\", \"Hot\", \"Normal\", \"Weak\", \"Yes\"},\n        {\"Rain\", \"Mild\", \"High\", \"Strong\", \"No\"},};\n\n    public static final void main(String[] args) {\n\n        try {\n            String[] cats = {YES, NO};\n            // Create a new bayes classifier with string categories and string features.\n            NaiveBayesClassifierMapImpl bayes = new NaiveBayesClassifierMapImpl(\"tennis\", cats);\n            \n            // Examples to learn from.\n            for (int i = 0; i \u003c data.length; i++) {\n                Map\u003cString, String\u003e features = new HashMap();\n                for (int j = 0; j \u003c colName.length - 1; j++) {\n                    features.put(colName[j], data[i][j]);\n                }\n                // learn ex. Category=Yes Conditions=Sunny, Cool, Normal and Weak.\n                bayes.learn(data[i][colName.length - 1], features);\n            }\n\n            Map\u003cString, String\u003e features = new HashMap();\n            features.put(\"outlook\", \"Sunny\");\n            features.put(\"temp\", \"Cool\");\n            features.put(\"humidity\", \"High\");\n            features.put(\"wind\", \"Strong\");\n\n            // Shall we play given weather conditions Sunny, Cool, Rainy and Windy ?\n            IClassification predict = bayes.classify(features, true);\n            for (int i = 0; i \u003c predict.getClassProbabilities().length; i++) {\n                System.out.println(\"P(\" + predict.getClassProbabilities()[i].getCategory() + \")=\" + predict.getClassProbabilities()[i].getProbability());\n            }\n            if (predict.getExplanationData() != null) {\n                NaiveBayesExplainerImpl explainer = new NaiveBayesExplainerImpl();\n                IClassificationExplained explained = explainer.explain(predict);\n                System.out.println(explained.toString());\n\n                ScriptEngineManager scriptEngineManager = new ScriptEngineManager();\n                ScriptEngine scriptEngine = scriptEngineManager.getEngineByName(\"JavaScript\");\n                // JavaScript code from String\n                Double proba = (Double) scriptEngine.eval(explained.toString());\n                System.out.println(\"Result of evaluating mathematical expressions in String = \" + proba);\n            }\n        } catch (PersistentClassifierException ex) {\n            Logger.getLogger(MainSample1.class.getName()).log(Level.SEVERE, null, ex);\n        } catch (ClassifyException ex) {\n            Logger.getLogger(MainSample1.class.getName()).log(Level.SEVERE, null, ex);\n        } catch (Throwable ex) {\n            Logger.getLogger(MainSample1.class.getName()).log(Level.SEVERE, null, ex);\n        }\n    }\n}\n\n```\nExplaining output \n------------------\nWhen running the above example, we see that we are unlikely to play given the weather conditions Sunny, Cool, Rainy and Windy :\n```\nP(No)=0.795417348608838\nP(Yes)=0.204582651391162\n```\nWe can further explain how the likelyhoods were calculated by calling the Explainer. The explainer output can be humanly interpreted, but also the formulae and expressions can be interpreted using JavaScript. \n\n```\n// observation table variables \nvar gL=14\nvar gL_cA_No=5\nvar gL_cA_No_fE_humidity=5\nvar gL_cA_No_fE_humidity_is_High=4\nvar gL_cA_No_fE_outlook=5\nvar gL_cA_No_fE_outlook_is_Sunny=3\nvar gL_cA_No_fE_temp=5\nvar gL_cA_No_fE_temp_is_Cool=1\nvar gL_cA_No_fE_wind=5\nvar gL_cA_No_fE_wind_is_Strong=3\nvar gL_cA_Yes=9\nvar gL_cA_Yes_fE_humidity=9\nvar gL_cA_Yes_fE_humidity_is_High=3\nvar gL_cA_Yes_fE_outlook=9\nvar gL_cA_Yes_fE_outlook_is_Sunny=2\nvar gL_cA_Yes_fE_temp=9\nvar gL_cA_Yes_fE_temp_is_Cool=3\nvar gL_cA_Yes_fE_wind=9\nvar gL_cA_Yes_fE_wind_is_Strong=3\nvar gL_fE_humidity=14\nvar gL_fE_outlook=14\nvar gL_fE_temp=14\nvar gL_fE_wind=14\n\n\n// likelyhoods by category \n\n// likelyhoods for category No\nvar likelyhoodOfNo=gL_cA_No / gL * (gL_cA_No_fE_temp_is_Cool / gL_cA_No_fE_temp * gL_cA_No_fE_humidity_is_High / gL_cA_No_fE_humidity * gL_cA_No_fE_outlook_is_Sunny / gL_cA_No_fE_outlook * gL_cA_No_fE_wind_is_Strong / gL_cA_No_fE_wind * 1 )\nvar likelyhoodOfNoExpr=5 / 14 * (1 / 5 * 4 / 5 * 3 / 5 * 3 / 5 * 1 )\nvar likelyhoodOfNoValue=0.020571428571428574\n\n// likelyhoods for category Yes\nvar likelyhoodOfYes=gL_cA_Yes / gL * (gL_cA_Yes_fE_temp_is_Cool / gL_cA_Yes_fE_temp * gL_cA_Yes_fE_humidity_is_High / gL_cA_Yes_fE_humidity * gL_cA_Yes_fE_outlook_is_Sunny / gL_cA_Yes_fE_outlook * gL_cA_Yes_fE_wind_is_Strong / gL_cA_Yes_fE_wind * 1 )\nvar likelyhoodOfYesExpr=9 / 14 * (3 / 9 * 3 / 9 * 2 / 9 * 3 / 9 * 1 )\nvar likelyhoodOfYesValue=0.005291005291005291\n\n\n// probability estimates by category \n\n// probability estimate for category No\nvar probabilityOfNo=likelyhoodOfNo/(likelyhoodOfNo+likelyhoodOfYes+0)\nvar probabilityOfNoValue=0.795417348608838\n\n// probability estimate for category Yes\nvar probabilityOfYes=likelyhoodOfYes/(likelyhoodOfNo+likelyhoodOfYes+0)\nvar probabilityOfYesValue=0.204582651391162\n\n\n// return the highest probability estimate for evaluation \nprobabilityOfNo\n```\n\nPerformance \n------------------\nBinomial classifiers : the AbstractNaiveBayesClassifierMapImpl with in-memory ConcurrentHashMap can learn from billions of facts and classify new data very fast.\nUsing off-the-heap persistent key-value stores can help scaling vertically to even larger volumes. For example, the MapDB implementation on SSDs is only ~3-5 times slower and it can scale on large volumes. \n\nMultinomial classifiers : with many class categories and many features, you may need to use the in-memory ConcurrentHashMap implementation and allocate more memory to the java heap. This implementation is known to run smoothly on servers with 192Gb RAM. \nFurther optimization will be needed to effectively use MemDB, LevelDB or RocksDB when the classification needs to read a LOT of data. \n\n\nThe GNU LGPLv3 License\n------------------\nCopyright (c) 2018 - Elian Carsenat, NamSor SAS\nhttps://www.gnu.org/licenses/lgpl-3.0.en.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnamsor%2Fjava-naive-bayes-classifier-jnbc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnamsor%2Fjava-naive-bayes-classifier-jnbc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnamsor%2Fjava-naive-bayes-classifier-jnbc/lists"}