{"id":27430982,"url":"https://github.com/bigmlcom/clj-bigml","last_synced_at":"2025-09-16T04:05:50.133Z","repository":{"id":3337825,"uuid":"4382014","full_name":"bigmlcom/clj-bigml","owner":"bigmlcom","description":"Clojure bindings for the BigML.io API","archived":false,"fork":false,"pushed_at":"2018-03-27T18:38:45.000Z","size":107,"stargazers_count":49,"open_issues_count":1,"forks_count":14,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-02-15T08:15:26.825Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"tony56a/Instagram-Java","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bigmlcom.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-05-20T01:24:32.000Z","updated_at":"2023-01-04T00:26:15.000Z","dependencies_parsed_at":"2022-09-06T21:52:58.355Z","dependency_job_id":null,"html_url":"https://github.com/bigmlcom/clj-bigml","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigmlcom%2Fclj-bigml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigmlcom%2Fclj-bigml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigmlcom%2Fclj-bigml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bigmlcom%2Fclj-bigml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bigmlcom","download_url":"https://codeload.github.com/bigmlcom/clj-bigml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248905859,"owners_count":21181065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-14T15:28:11.895Z","updated_at":"2025-09-16T04:05:45.103Z","avatar_url":"https://github.com/bigmlcom.png","language":"Clojure","funding_links":[],"categories":["Machine Learning"],"sub_categories":[],"readme":"# clj-bigml\n\n**clj-bigml** provides Clojure bindings for the\n[BigML.io API](https://bigml.com/developers/overview).\n\n## Installation\n\n`clj-bigml` is available as a Maven artifact from\n[Clojars](http://clojars.org/bigml/clj-bigml).\n\nFor [Leiningen](https://github.com/technomancy/leiningen):\n\n[![Clojars Project](https://img.shields.io/clojars/v/bigml/clj-bigml.svg)](https://clojars.org/bigml/clj-bigml)\n\n## Overview\n\nBigML offers a REST-style service, BigML.io, for creating and managing\nBigML resources programmatically. You can use BigML.io for basic\nsupervised and unsupervised machine learning tasks and also to create\nsophisticated machine learning pipelines.\n\n`clj-bigml` aims to make it easier to use BigML.io features from\nClojure. BigML.io features are always growing and adapting to BigML\ncustomers' requirements, and `clj-bigml` currently supports only\na limited subset of those features.\n\nBigML.io takes a white-box approach where it makes sense. This\nincludes the possibility of downloading your datasets, models,\nclusters and anomaly detectors and use them locally (either as BigML's\nnative JSON format or as\n[PMML](http://www.dmg.org/v4-1/GeneralStructure.html)) in addition to\nbeing able to use them through BigML API.\n\nPlease note, all code samples in this document assume that the\nfollowing namespaces are already required:\n\n```clojure\n(require '[bigml.api [core :as api]\n                     [source :as source]\n                     [dataset :as dataset]\n                     [model :as model]\n                     [cluster :as cluster]\n                     [centroid :as centroid]\n                     [anomaly-detector :as anomaly-detector]\n                     [anomaly-score :as anomaly-score]\n                     [prediction :as prediction]\n                     [evaluation :as evaluation]\n                     [script :as script]])\n```\n\n## Authentication\n\nTo use this library you'll need\nan [account with BigML](https://bigml.com/accounts/register/), your\nuser name, and your [API key](https://bigml.com/account/apikey).\n\nWhile new BigML accounts come with some free credits, you may avoid\nspending your credits by enabling development mode.  Development mode\nallows free access to the API but limits the size of the data you can\nmodel (~1 MB limit).\n\nThere are three approaches to authentication when using the library.\nThe first, and perhaps easiest, is to set environment variables:\n\n```console\nexport BIGML_USERNAME=johndoe\nexport BIGML_API_KEY=0123456789\nexport BIGML_DEV_MODE=true\n```\n\nIf the environment variables are set, you may make calls to the client\nwithout specifying the authentication information or the development\nmode (this next code sample creates a data source, which will be\ndiscussed later):\n\n```clojure\n(source/create \"some_file.csv\")\n```\n\nAlternatively, you can use the `with-connection` and `make-connection`:\n\n```clojure\n(api/with-connection (api/make-connection \"johndoe\" \"0123456789\" true)\n  (source/create \"some_file.csv\"))\n```\n\nFinally, you can add the authentication information and the\ndevelopment mode as parameters when calling client functions:\n\n```clojure\n(source/create \"some_file.csv\"\n               :username \"johndoe\"\n               :api_key \"0123456789\"\n               :dev_mode true)\n```\n\n## Resources\n\nThe BigML API provides access to a growing number\nof [ML resources](https://bigml.com/developers/overview#ov_bigml_resources),\nincluding such resources as [sources](https://bigml.com/developers/sources),\n[datasets](https://bigml.com/developers/datasets),\n[models](https://bigml.com/developers/models), etc.\nFor more about them, see the\n[tutorial videos on YouTube](http://www.youtube.com/playlist?list=PL16FC91153F8C47A7\u0026feature=plcp).\n\n`bigml.api.core` provides a set of functions that act as primitives\nfor all the resource types:\n\n  - `list` - Returns a paginated list of the desired resource.\n  - `create` - Creates a resource (although we recommended avoiding\n               this and using the friendlier resource specific `create`\n               fns).\n  - `update` - Updates a resource (usually limited to textual descriptions\n               like `name`).\n  - `delete` - Deletes a resource.\n  - `get` - Retrieves a resource.\n  - `get-final` - Repeatedly attempts to `get` a resource until it is\n                  finalized.\n\nThe other namespaces (`bigml.api.source`, `bigml.api.dataset`, …)\noffer functions specific to that resource type.  At a minimum this\nincludes resource specific `list` and `create` functions which are\nmore convenient than the generic version.\n\n### Sources\n\n[Sources](https://bigml.com/developers/sources) represent the raw data\nthat you wish to analyze.\n\n`bigml.api.source/create` makes it convenient to create sources.  It\nsupports three types of sources and will create the appropriate type\ndepending on the input to `create`.\n\n**Local sources** are created from local files:\n\n```clojure\n(source/create \"test/data/iris.csv.gz\")\n```\n\nThis will upload the file from your computer to BigML.  BigML supports\nmultiple formats such as CSVs (with space, tab, comma, or semicolon\nseparators), Excel spreadsheets, iWork Numbers, and Weka's ARFF files.\n\nA variety of compression formats are also supported such as `.Z`\n(Unix-compressed), `gz`, and `bz2`.\n\n**Remote sources** are created from URLs:\n\n```clojure\n(source/create \"https://static.bigml.com/csv/iris.csv\")\n```\n\nBigML also supports [s3, azure, and odata URLs](http://blog.bigml.com/2012/12/07/bigmler-in-da-cloud-machine-learning-made-even-easier/).\n\n**Inline sources** are created directly from Clojure data:\n\n```clojure\n(source/create [[\"Make\" \"Model\" \"Year\" \"Weight\" \"MPG\"]\n                [\"AMC\" \"Gremlin\" 1970 2648 21]\n                [\"AMC\" \"Matador\" 1973 3672 14]\n                [\"AMC\" \"Gremlin\" 1975 2914 20]\n                [\"Honda\" \"Civic\" 1974 2489 24]\n                [\"Honda\" \"Civic\" 1976 1795 33]])\n```\n\nPlease note that inline sources support only small-ish amounts of data\n(~5 MB limit).\n\n### Datasets\n\n[Datasets](https://bigml.com/developers/datasets) represent processed\ndata ready for modeling. They are created from sources or other\ndatasets and contain statistical summarizations for each field (or\ncolumn) in the data.  `bigml.api.dataset/create` makes dataset\ncreation convenient.\n\nIn this example, from the well\nknown [Iris data](http://en.wikipedia.org/wiki/Iris_flower_data_set)\nwe create a source, wait for completion, initiate a dataset, and wait\nfor its completion.\n\n```clojure\n(def iris-source\n  (api/get-final (source/create \"https://static.bigml.com/csv/iris.csv\")))\n\n(def iris-dataset\n  (api/get-final (dataset/create iris-source)))\n```\n\nOnce we've created a dataset, we can transform it, optionally sampling\nor filtering its rows, using the `dataset/clone` function.\n\n### Models\n\n[Models](https://bigml.com/developers/models) are tree-based\npredictive models built from datasets.\n\nContinuing the Iris example, we now initialize a model and wait for it\nto complete.  Since we don't specify an objective field or input\nfields when building the model, it will default the objective as the\nlast field (in this case, \"species\").  The other fields become the\ndefault inputs (\"sepal length\", \"sepal width\", \"petal length\" and\n\"petal width\").\n\n```clojure\n(def iris-model\n  (api/get-final (model/create iris-dataset)))\n```\n\n### Predictions\n\n[Predictions](https://bigml.com/developers/predictions) may be\ngenerated through the API.  Creating a prediction requires a model and\na set of inputs.  Prediction inputs can be formed two ways. They may\neither be a map from field-id (assigned by the dataset when it's\ncreated) to value, or it may be a list of input values that appear in\nthe same order as they did in the data source.\n\n```clojure\n(def iris-remote-prediction\n  (prediction/create iris-model [7.6 3.0 6.6 2.1]))\n\n(:prediction iris-remote-prediction)\n;; --\u003e {:000004 \"Iris-virginica\"}\n\n;; Also valid:\n;; (prediction/create iris-model {\"000000\" 7.6\n;;                                \"000001\" 3.0\n;;                                \"000002\" 6.6\n;;                                \"000003\" 2.1})\n```\n\nAlternatively, we can use the model to create a local Clojure fn for\nmaking predictions.\n\n```clojure\n(def iris-local-predictor\n  (prediction/predictor iris-model))\n\n(iris-local-predictor {\"000000\" 7.6\n                       \"000001\" 3.0\n                       \"000002\" 6.6\n                       \"000003\" 2.1}) ;; --\u003e \"Iris-virginica\"\n\n(iris-local-predictor [7.6 3.0 6.6 2.1]) ;; --\u003e \"Iris-virginica\"\n```\n\nThe local prediction fn will also accept `:details` as an optional\nparameter.  When true the fn will return extra information about the\nprediction.  This includes the number of training instances that\nreached this point in the tree, their objective field distribution,\nand the [confidence](https://bigml.com/developers/predictions#p_confidence)\nof the prediction.\n\n```clojure\n(iris-local-predictor [7.6 3.0 6.6 2.1] :details true)\n;; --\u003e {:confidence 0.90819,\n;;      :count 38,\n;;      :objective_summary {:categories [[\"Iris-virginica\" 38]]},\n;;      :prediction {:000004 \"Iris-virginica\"}}\n```\n\n### Evaluations\n\nAn [evaluation](https://bigml.com/developers/evaluations) of\na model on a dataset may be generated through the API.\n\nWe continue the Iris example by evaluating our model on its own\ntraining data.  This is poor form for a data scientist, but it will do\nas a demonstration.\n\n```clojure\n(def iris-evaluation\n  (api/get-final (evaluation/create iris-model iris-dataset)))\n\n(:accuracy (:model (:result iris-evaluation)))\n;; --\u003e 1\n\n(:confusion_matrix (:model (:result iris-evaluation)))\n;; --\u003e [[50 0 0] [0 50 0] [0 0 50]]\n```\n\nWe have perfect accuracy and a spotless confusion matrix.  But of\ncourse, never trust evaluations on training data.\n\n### Clusters\n\nA [cluster](https://bigml.com/developers/clusters) is a set of groups\nof instances of a dataset that have been automatically classified\ntogether according to a distance measure computed using the fields of\nthe dataset.  A clustering is an example of unsupervised learning.\n\n```clojure\n(def iris-cluster\n  (api/get-final (cluster/create iris-dataset)))\n```\n\n### Centroids\n\nA [centroid](https://bigml.com/developers/centroids) represents the\ncenter of a cluster and is computed using the mean for each numeric\nfield and the mode for each categorical field.  You can create a\ncentroid for a given cluster and a new data instance to identify the\ncluster's centroid that is closest to the given instance.\n\n```clojure\n(def iris-centroid\n    (api/get-final (centroid/create iris-cluster {\"000000\" 7.6\n                                                  \"000001\" 3.0\n                                                  \"000002\" 6.6\n                                                  \"000003\" 2.1})))\n```\n\n### Anomaly Detectors\n\nAn [anomaly detector](https://bigml.com/api/anomalies) helps find\nunusual instances in your data.\n\n```clojure\n(def iris-anomaly-detector\n  (api/get-final (anomaly-detector/create iris-dataset)))\n```\n\n### Anomaly Scores\n\nAn [anomaly score](https://bigml.com/api/anomalyscores) captures how\nstrange a data point appears to be given an anomaly detector within a\n0-1 range. Scores above 0.6 can generally be considered unusual.\n\n```clojure\n(def iris-anomaly-score\n    (api/get-final (anomaly-score/create iris-anomaly-detector\n                                         {\"000000\" 7.6\n                                          \"000001\" 3.0\n                                          \"000002\" 6.6\n                                          \"000003\" 2.1})))\n```\n\nAlternatively, we can use the detector to create a local Clojure fn\nfor generating scores.\n\n```clojure\n(def iris-local-detector\n  (anomaly-score/detector iris-anomaly-detector))\n\n(iris-local-detector {\"000000\" 5.2,\n                      \"000001\" 3.5,\n                      \"000002\" 1.5,\n                      \"000003\" 0.2,\n                      \"000004\" \"Iris-virginica\"}) ;; --\u003e 0.6125\n\n(iris-local-detector [5.2 3.5 1.5 0.2 \"Iris-virginica\"]) ;; --\u003e 0.6125\n```\n\n### Scripts\n\nA [script](https://bigml.com/api/scripts) is compiled\nsource code written in WhizzML, BigML's custom scripting\nlanguage for automating Machine Learning workflows.\n\nThe source code itself can be provided as a string:\n\n```clojure\n(def simple-script\n   (api/get-final (script/create \"(define n (+ k 1))\"\n                          :name \"Add one\"\n                          :inputs [{:name \"k\" :type \"number\" :default 0}]\n                          :outputs [{:name \"n\" :type \"number\"}])))\n```\n\nAlternatively, you can provide an input stream from which the source\ncode will be read:\n\n```clojure\n(def simple-script\n   (api/get-final (script/create (clojure.java.io/input-stream \"/path/to/source-code.whizzml\")\n                          :name \"Add one\"\n                          :inputs [{:name \"k\" :type \"number\" :default 0}]\n                          :outputs [{:name \"n\" :type \"number\"}])))\n```\n\n\n### Clean up resources\n\nIf you've been following along in your REPL, you can clean up the\nartifacts generated in these examples like so:\n\n```clojure\n(mapv api/delete [iris-source iris-dataset iris-model\n                  iris-remote-prediction iris-evaluation\n                  iris-cluster iris-centroid simple-script])\n```\n\n## More Examples\n\nSee `test/bigml/test/api/examples.clj` for more examples of the API in\naction.  We show how to break up a dataset into proper training and\ntesting sets through sampling options, and we show how to grow and\npredict with a random forest.\n\n## Support\n\nPlease report problems and bugs to our\n[BigML.io issue tracker](https://github.com/bigmlcom/io/issues).\n\nDiscussions about language bindings take place in the general\n[BigML mailing list](http://groups.google.com/group/bigml). Or join us\nin our [Campfire chatroom](https://bigmlinc.campfirenow.com/f20a0).\n\n## License\n\nCopyright (C) 2012-2016 BigML Inc.\n\nDistributed under the Apache License, Version 2.0.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigmlcom%2Fclj-bigml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbigmlcom%2Fclj-bigml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbigmlcom%2Fclj-bigml/lists"}