{"id":16729843,"url":"https://github.com/dfdx/naivebayes.jl","last_synced_at":"2026-02-06T07:20:57.734Z","repository":{"id":24292789,"uuid":"27687867","full_name":"dfdx/NaiveBayes.jl","owner":"dfdx","description":"Naive Bayes classifier","archived":false,"fork":false,"pushed_at":"2025-02-11T22:39:34.000Z","size":124,"stargazers_count":25,"open_issues_count":8,"forks_count":19,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-18T05:22:56.990Z","etag":null,"topics":["julia","machine-learning"],"latest_commit_sha":null,"homepage":null,"language":"Julia","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dfdx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-12-07T22:48:51.000Z","updated_at":"2025-02-11T22:39:38.000Z","dependencies_parsed_at":"2024-10-28T11:34:43.701Z","dependency_job_id":"89b68581-8832-4692-a577-63dd86125f55","html_url":"https://github.com/dfdx/NaiveBayes.jl","commit_stats":{"total_commits":91,"total_committers":17,"mean_commits":5.352941176470588,"dds":0.7252747252747253,"last_synced_commit":"fad0cecb7cd646b5351dcc0115a808aae89485e7"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfdx%2FNaiveBayes.jl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfdx%2FNaiveBayes.jl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfdx%2FNaiveBayes.jl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dfdx%2FNaiveBayes.jl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dfdx","download_url":"https://codeload.github.com/dfdx/NaiveBayes.jl/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244874161,"owners_count":20524576,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["julia","machine-learning"],"created_at":"2024-10-12T23:30:08.170Z","updated_at":"2025-10-06T18:43:42.508Z","avatar_url":"https://github.com/dfdx.png","language":"Julia","funding_links":[],"categories":[],"sub_categories":[],"readme":"NaiveBayes.jl\n=============\n\n\u003e :warning: This package has been created years ago and has never been modernized. Its usage\n\u003e is restricted to concrete types (e.g. `Vector{Float64}` instead of `AbstractVector{\u003c:Real}`).\n\u003e The API is inconsistent and sometimes confusing.\n\u003e [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) wraps NaiveBayes.jl, fixing some of\n\u003e these issues, but ghosts of the past still show up. You have been warned!\n\n[![Build Status](https://travis-ci.org/dfdx/NaiveBayes.jl.svg)](https://travis-ci.org/dfdx/NaiveBayes.jl)\n[![codecov.io](http://codecov.io/github/dfdx/NaiveBayes.jl/coverage.svg)](http://codecov.io/github/dfdx/NaiveBayes.jl)\n\nNaive Bayes classifier. Currently 3 types of NB are supported:\n\n * **MultinomialNB** - Assumes variables have a multinomial distribution. Good for text classification. See `examples/nums.jl` for usage.\n * **GaussianNB** - Assumes variables have a multivariate normal distribution. Good for real-valued data. See `examples/iris.jl` for usage.\n * **HybridNB** - A hybrid empirical naive Bayes model for a mixture of continuous and discrete features. The continuous features are estimated using Kernel Density Estimation.\n*Note*: fit/predict methods take `Dict{Symbol/AstractString, Vector}` rather than a `Matrix`. Also, discrete features must be integers while continuous features must be floats. If all features are continuous `Matrix` input is supported.\n\n\nSince `GaussianNB` models multivariate distribution, it's not really a \"naive\" classifier (i.e. no independence assumption is made), so the name may change in the future.\n\nAs a subproduct, this package also provides a `DataStats` type that may be used for incremental calculation of common data statistics such as mean and covariance matrix. See `test/datastatstest.jl` for a usage example.\n\n### Examples:\n1. Continuous and discrete features as `Dict{Symbol, Vector}}`\n\n    ```julia\n    f_c1 = randn(10)\n    f_c2 = randn(10)\n    f_d1 = rand(1:5, 10)\n    f_d2 = rand(3:7, 10)\n    training_features_continuous = Dict{Symbol, Vector{Float64}}(:c1=\u003ef_c1, :c2=\u003ef_c2)\n    training_features_discrete   = Dict{Symbol, Vector{Int}}(:d1=\u003ef_d1, :d2=\u003ef_d2) #discrete features as Int64\n\n    labels = rand(1:3, 10)\n\n    hybrid_model = HybridNB(labels)\n\n    # train the model\n    fit(hybrid_model, training_features_continuous, training_features_discrete, labels)\n\n    # predict the classification for new events (points): features_c, features_d\n    features_c = Dict{Symbol, Vector{Float64}}(:c1=\u003erandn(10), :c2=\u003erandn(10))\n    features_d = Dict{Symbol, Vector{Int}}(:d1=\u003erand(1:5, 10), :d2=\u003erand(3:7, 10))\n    y = predict(hybrid_model, features_c, features_d)\n    ```\n\n2. Continuous features only as a `Matrix`\n    ```julia\n    X_train = randn(3,400);\n    X_classify = randn(3,10)\n\n    hybrid_model = HybridNB(labels) # the number of discrete features is 0 so it's not needed\n    fit(hybrid_model, X_train, labels)\n    y = predict(hybrid_model, X_classify)\n    ```\n3. Continuous and discrete features as a `Matrix{Float}`\n    ```julia\n    #X is a matrix of features\n    # the first 3 rows are continuous\n    training_features_continuous = restructure_matrix(X[1:3, :])\n    # the last 2 rows are discrete and must be integers\n    training_features_discrete = map(Int, restructure_matrix(X[4:5, :]))\n    # train the model\n    hybrid_model = train(HybridNB, training_features_continuous, training_features_discrete, labels)\n\n    # predict the classification for new events (points): features_c, features_d\n    y = predict(hybrid_model, features_c, features_d)\n    ```\n\n\n### Write/Load models to files\n\nIt is useful to train a model once and then use it for prediction many times later. For example, train your classifier on a local machine and then use it on a cluster to classify points in parallel.\n\nThere is support for writing `HybridNB` models to HDF5 files via the methods `write_model` and `load_model`. This is useful for interacting with other programs/languages. If the model file is going to be read only in Julia it is easier to use **JLD.jl** for saving and loading the file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfdx%2Fnaivebayes.jl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdfdx%2Fnaivebayes.jl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdfdx%2Fnaivebayes.jl/lists"}