{"id":19288340,"url":"https://github.com/techascent/tech.ml","last_synced_at":"2025-12-12T01:12:27.282Z","repository":{"id":53982944,"uuid":"156102743","full_name":"techascent/tech.ml","owner":"techascent","description":"This library has been superceded by https://github.com/scicloj/scicloj.ml.","archived":false,"fork":false,"pushed_at":"2021-10-16T22:00:04.000Z","size":955,"stargazers_count":96,"open_issues_count":3,"forks_count":4,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-10-13T06:57:54.981Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"epl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/techascent.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-11-04T16:37:48.000Z","updated_at":"2023-10-24T16:49:42.000Z","dependencies_parsed_at":"2022-08-13T05:40:49.898Z","dependency_job_id":null,"html_url":"https://github.com/techascent/tech.ml","commit_stats":null,"previous_names":["techascent/tech.ml-base"],"tags_count":156,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techascent%2Ftech.ml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techascent%2Ftech.ml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techascent%2Ftech.ml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/techascent%2Ftech.ml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/techascent","download_url":"https://codeload.github.com/techascent/tech.ml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223888372,"owners_count":17220083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T22:08:45.705Z","updated_at":"2025-12-12T01:12:22.199Z","avatar_url":"https://github.com/techascent.png","language":"Clojure","funding_links":[],"categories":["Clojure"],"sub_categories":["[Tools](#tools-1)"],"readme":"# tech.ml\n\n## This Library Has Been Superceded by scicloj.ml!\n\n\nThis is great news!  The clojure community has come together and together built\na more stable and wider ranging ml subsystem.  Please head over to \n[scicloj.ml](https://github.com/scicloj/scicloj.ml) for the best and most \nup-to-date machine learning toolkit available for Clojure.\n\n\n\n[![Clojars Project](https://img.shields.io/clojars/v/techascent/tech.ml.svg)](https://clojars.org/techascent/tech.ml)\n\nBasic machine learning library for use with `tech.v3.dataset`.\n\n* [API Documentation](https://techascent.github.io/tech.ml/)\n\n\n\n## Simple Regression And Classification\n\nWe start out a require:\n\n```clojure\nuser\u003e (require '[tech.v3.dataset :as ds])\n```\n\nAnd move to a dataset.  We will use the famous Iris dataset:\n\n\n```clojure\nuser\u003e (def ds (ds/-\u003edataset \"https://raw.githubusercontent.com/techascent/tech.ml/master/test/data/iris.csv\"))\n#'user/ds\nuser\u003e (ds/head ds)\nhttps://raw.githubusercontent.com/techascent/tech.ml/master/test/data/iris.csv [5 5]:\n\n| sepal_length | sepal_width | petal_length | petal_width | species |\n|--------------|-------------|--------------|-------------|---------|\n|          5.1 |         3.5 |          1.4 |         0.2 |  setosa |\n|          4.9 |         3.0 |          1.4 |         0.2 |  setosa |\n|          4.7 |         3.2 |          1.3 |         0.2 |  setosa |\n|          4.6 |         3.1 |          1.5 |         0.2 |  setosa |\n|          5.0 |         3.6 |          1.4 |         0.2 |  setosa |\n```\n\n### Preparing The Dataset\n\nWe need to have all numeric columns in our dataset.  The species column is a categorical\ncolumn and we will need to convert it to a numeric column while remembering\nwhat mapping we used.  We introduce the column filters namespace that performs\nvarious filtering operations on the columns themselves returning new datasets:\n\n```clojure\nuser\u003e (require '[tech.v3.dataset.column-filters :as cf])\nnil\nuser\u003e (ds/head (cf/numeric ds))\nhttps://raw.githubusercontent.com/techascent/tech.ml/master/test/data/iris.csv [5 4]:\n\n| sepal_length | sepal_width | petal_length | petal_width |\n|--------------|-------------|--------------|-------------|\n|          5.1 |         3.5 |          1.4 |         0.2 |\n|          4.9 |         3.0 |          1.4 |         0.2 |\n|          4.7 |         3.2 |          1.3 |         0.2 |\n|          4.6 |         3.1 |          1.5 |         0.2 |\n|          5.0 |         3.6 |          1.4 |         0.2 |\nuser\u003e (ds/head (cf/categorical ds))\nhttps://raw.githubusercontent.com/techascent/tech.ml/master/test/data/iris.csv [5 1]:\n\n| species |\n|---------|\n|  setosa |\n|  setosa |\n|  setosa |\n|  setosa |\n|  setosa |\nuser\u003e (def numeric-ds (ds/categorical-\u003enumber ds cf/categorical))\n#'user/numeric-ds\nuser\u003e (ds/head numeric-ds)\nhttps://raw.githubusercontent.com/techascent/tech.ml/master/test/data/iris.csv [5 5]:\n\n| sepal_length | sepal_width | petal_length | petal_width | species |\n|--------------|-------------|--------------|-------------|---------|\n|          5.1 |         3.5 |          1.4 |         0.2 |     1.0 |\n|          4.9 |         3.0 |          1.4 |         0.2 |     1.0 |\n|          4.7 |         3.2 |          1.3 |         0.2 |     1.0 |\n|          4.6 |         3.1 |          1.5 |         0.2 |     1.0 |\n|          5.0 |         3.6 |          1.4 |         0.2 |     1.0 |\nuser\u003e (meta (numeric-ds \"species\"))\n{:categorical? true,\n :name \"species\",\n :datatype :float64,\n :n-elems 150,\n :categorical-map\n {:lookup-table {\"versicolor\" 0, \"setosa\" 1, \"virginica\" 2},\n  :src-column \"species\",\n  :result-datatype :float64}}\n\n;;More transforms like this including one-hot and inverting the mapping are\n;;available in tech.v3.dataset.categorical.\n```\n\n### Regression\n\nFor regression we will mark the `petal_width` as the field we want to infer on and\nuse the default xgboost regression model.  Moving into actual modelling, we will\nincluding the `tech.v3.dataset.modelling` namespace and the xgboost bindings.\n\n\nWe will use a method were we split the dataset into train/test datasets\nvia random sampling, train the model, and calculate the loss using the\ntest-ds and the default regression loss function - mean average error or\n`mae`:\n\n\n```clojure\nuser\u003e (require '[tech.v3.dataset.modelling :as ds-mod])\nnil\nuser\u003e (def regression-ds (ds-mod/set-inference-target numeric-ds \"petal_width\"))\n#'user/regression-ds\nuser\u003e (require '[tech.v3.libs.xgboost])\nnil\n;; Also tech.v3.libs.smile.regression and tech.v3.libs.smile.classification provide quite\n;; a few models.\nuser\u003e (require '[tech.v3.ml :as ml])\nnil\nuser\u003e (def model (ml/train-split regression-ds {:model-type :xgboost/regression}))\n#'user/model\nuser\u003e (:loss model)\n0.1272654171784719\n```\n\nWe split the dataset into k datasets via the k-fold algorithm and get more error information:\n\n```clojure\nuser\u003e (def k-fold-model (ml/train-k-fold regression-ds {:model-type :xgboost/regression}))\n#'user/k-fold-model\nuser\u003e (select-keys k-fold-model [:min-loss :avg-loss :max-loss])\n{:min-loss 0.1050250555238416,\n :avg-loss 0.1569393319631037,\n :max-loss 0.19597491553196542}\n```\n\nGiven a model we can predict what the answer will be for the column the model\nwas trained for:\n\n```clojure\nuser\u003e (ds/head (ml/predict regression-ds k-fold-model))\n:_unnamed [5 1]:\n\n| petal_width |\n|-------------|\n|      0.2615 |\n|      0.1569 |\n|      0.1862 |\n|      0.1780 |\n|      0.2410 |\n```\n\nAnd thus calculating our own loss is easy:\n\n```clojure\nuser\u003e (require '[tech.v3.ml.loss :as loss])\nnil\nuser\u003e (def predictions (ml/predict regression-ds k-fold-model))\n#'user/predictions\nuser\u003e (loss/mae (predictions \"petal_width\")\n                (regression-ds \"petal_width\"))\n0.04563391995429992\n```\n\nThe loss in this case is artificially low because we are testing on the same\ndata that we trained on.\n\n\n### Classification\n\nFor classification will we attempt to predict the species.\n\n\n```clojure\nuser\u003e (def classification-ds (ds-mod/set-inference-target numeric-ds \"species\"))\n#'user/classification-ds\nuser\u003e (def k-fold-model (ml/train-k-fold classification-ds {:model-type :xgboost/classification}))\n#'user/k-fold-model\nuser\u003e (select-keys k-fold-model [:min-loss :avg-loss :max-loss])\n{:min-loss 0.0, :avg-loss 0.04516129032258065, :max-loss 0.09677419354838712}\n```\n\nThe XGBoost system has a powerful classification engine!\n\n\nWe can ask the model which columns it found the most useful:\n\n```clojure\nuser\u003e (ml/explain k-fold-model)\n_unnamed [4 3]:\n\n| :importance-type |     :colname |      :gain |\n|------------------|--------------|------------|\n|             gain | petal_length | 3.38923719 |\n|             gain |  petal_width | 2.50506807 |\n|             gain | sepal_length | 0.22811251 |\n|             gain |  sepal_width | 0.22783045 |\n```\n\n\nWhen we predict a classification dataset we get back a probability distribution along with\nthe original column mapped to whatever index had the max probability:\n\n\n```clojure\nuser\u003e (ds/head (ml/predict classification-ds k-fold-model))\n:_unnamed [5 4]:\n\n| versicolor | setosa | virginica | species |\n|------------|--------|-----------|---------|\n|   0.006784 | 0.9902 |  0.003023 |     1.0 |\n|   0.006032 | 0.9900 |  0.003945 |     1.0 |\n|   0.006348 | 0.9906 |  0.003025 |     1.0 |\n|   0.006348 | 0.9906 |  0.003031 |     1.0 |\n|   0.006348 | 0.9906 |  0.003025 |     1.0 |\n\n;;The columns are marked with their type:\n\nuser\u003e (map meta (vals *1))\n({:name \"versicolor\",\n  :datatype :object,\n  :n-elems 5,\n  :column-type :probability-distribution}\n {:name \"setosa\",\n  :datatype :object,\n  :n-elems 5,\n  :column-type :probability-distribution}\n {:name \"virginica\",\n  :datatype :object,\n  :n-elems 5,\n  :column-type :probability-distribution}\n {:categorical? true,\n  :categorical-map\n  {:lookup-table {\"versicolor\" 0, \"setosa\" 1, \"virginica\" 2},\n   :src-column \"species\",\n   :result-datatype :float64},\n  :name \"species\",\n  :datatype :float64,\n  :n-elems 5,\n  :column-type :prediction})\n```\n\nDue to the metadata saved on the `species` column we can reverse map back to the\noriginal column values using the `tech.v3.dataset.categorical` namespace:\n\n\n```clojure\nuser\u003e (require '[tech.v3.dataset.categorical :as ds-cat])\nnil\nuser\u003e (def predictions (ml/predict classification-ds k-fold-model))\n#'user/predictions\nuser\u003e (ds/head (ds-cat/reverse-map-categorical-xforms predictions))\n:_unnamed [5 4]:\n\n| versicolor | setosa | virginica | species |\n|------------|--------|-----------|---------|\n|   0.006784 | 0.9902 |  0.003023 |  setosa |\n|   0.006032 | 0.9900 |  0.003945 |  setosa |\n|   0.006348 | 0.9906 |  0.003025 |  setosa |\n|   0.006348 | 0.9906 |  0.003031 |  setosa |\n|   0.006348 | 0.9906 |  0.003025 |  setosa |\n```\n\n### Concluding\n\n\nWe have generic support for xgboost and smile.  This gives you quite a few models and\nthey are all gridsearcheable.  We put this forward in an attempt to simplify\ndoing ML that we do and in an attempt to move the Clojure ML conversation forward\ntowards getting the best possible results for a dataset in the least amount of\n(developer) time.\n\n\n\n* For a more in-depth walkthrough of XGBoost features, checkout the\n  [XGBoost topic](topics/xgboost_metrics.md).\n\n## License\n\nCopyright © 2019 Tech Ascent, LLC\n\nDistributed under the Eclipse Public License either version 1.0 or (at\nyour option) any later version.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechascent%2Ftech.ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftechascent%2Ftech.ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftechascent%2Ftech.ml/lists"}