{"id":14966008,"url":"https://github.com/antononcube/raku-ml-clustering","last_synced_at":"2026-02-06T23:01:00.023Z","repository":{"id":51028087,"uuid":"520209437","full_name":"antononcube/Raku-ML-Clustering","owner":"antononcube","description":"Raku package for Machine Learning (ML) clustering algorithms","archived":false,"fork":false,"pushed_at":"2024-06-01T14:12:43.000Z","size":237,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-18T20:04:16.723Z","etag":null,"topics":["clustering","clustering-algorithm","machine-learning","machine-learning-algorithms","raku"],"latest_commit_sha":null,"homepage":"https://raku.land/zef:antononcube/ML::Clustering","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antononcube.png","metadata":{"files":{"readme":"README-work.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-01T17:49:42.000Z","updated_at":"2024-06-01T14:12:46.000Z","dependencies_parsed_at":"2024-09-14T01:22:14.568Z","dependency_job_id":null,"html_url":"https://github.com/antononcube/Raku-ML-Clustering","commit_stats":{"total_commits":56,"total_committers":2,"mean_commits":28.0,"dds":0.0357142857142857,"last_synced_commit":"4f66bae1c6893e017e5141f1acb31446ac339e03"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/antononcube/Raku-ML-Clustering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-Clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-Clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-Clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-Clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antononcube","download_url":"https://codeload.github.com/antononcube/Raku-ML-Clustering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antononcube%2FRaku-ML-Clustering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29179561,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T22:12:24.066Z","status":"ssl_error","status_checked_at":"2026-02-06T22:12:09.859Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","clustering-algorithm","machine-learning","machine-learning-algorithms","raku"],"created_at":"2024-09-24T13:35:41.282Z","updated_at":"2026-02-06T23:00:59.988Z","avatar_url":"https://github.com/antononcube.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Raku ML::Clustering\n\n[![MacOS](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/macos.yml/badge.svg)](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/macos.yml)\n[![Linux](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/linux.yml/badge.svg)](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/linux.yml)\n[![Win64](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/windows.yml/badge.svg)](https://github.com/antononcube/Raku-ML-Clustering/actions/workflows/windows.yml)\n[![https://raku.land/zef:antononcube/ML::Clustering](https://raku.land/zef:antononcube/ML::Clustering/badges/version)](https://raku.land/zef:antononcube/ML::Clustering)\n[![License: Artistic-2.0](https://img.shields.io/badge/License-Artistic%202.0-0298c3.svg)](https://opensource.org/licenses/Artistic-2.0)\n\nThis repository has the code of a Raku package for\nMachine Learning (ML)\n[Clustering (or Cluster analysis)](https://en.wikipedia.org/wiki/Cluster_analysis)\nfunctions, [Wk1].\n\nThe Clustering framework includes:\n\n- The algorithms \n  [K-means](https://en.wikipedia.org/wiki/K-means_clustering) \n  and \n  [K-medoids](https://en.wikipedia.org/wiki/K-medoids), \n  and others\n\n- The distance functions Euclidean, Cosine, Hamming, Manhattan, and others,\n  and their corresponding similarity functions\n\nThe data in the examples below is generated and manipulated with the packages\n[\"Data::Generators\"](https://raku.land/zef:antononcube/Data::Generators),\n[\"Data::Reshapers\"](https://raku.land/zef:antononcube/Data::Reshapers), and\n[\"Data::Summarizers\"](https://raku.land/zef:antononcube/Data::Summarizers), described in the article\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n[AA1].\n\nThe plots are made with the package\n[\"Text::Plot\"](https://raku.land/zef:antononcube/Text::Plot), [AAp6].\n\n-------\n\n## Installation\n\nVia zef-ecosystem:\n\n```\nzef install ML::Clustering\n```\n\nFrom GitHub:\n\n```\nzef install https://github.com/antononcube/Raku-ML-Clustering\n```\n\n-------\n\n## Usage example\n\nHere we derive a set of random points, and summarize it:\n\n```perl6\nuse Data::Generators;\nuse Data::Summarizers;\nuse Text::Plot;\n\nmy $n = 100;\nmy @data1 = (random-variate(NormalDistribution.new(5,1.5), $n) X random-variate(NormalDistribution.new(5,1), $n)).pick(30);\nmy @data2 = (random-variate(NormalDistribution.new(10,1), $n) X random-variate(NormalDistribution.new(10,1), $n)).pick(50);\nmy @data3 = [|@data1, |@data2].pick(*);\nrecords-summary(@data3)\n```\n\nHere we plot the points:\n\n```perl6\nuse Text::Plot;\ntext-list-plot(@data3)\n```\n\n**Problem:** Group the points in such a way that each group has close (or similar) points.\n\nHere is how we use the function `find-clusters` to give an answer:\n\n```perl6\nuse ML::Clustering;\nmy %res = find-clusters(@data3, 2, prop =\u003e 'All');\n%res\u003cClusters\u003e\u003e\u003e.elems\n```\n\n**Remark:** The first argument is data points that is a list-of-numeric-lists. \nThe second argument is a number of clusters to be found. \n(It is in the TODO list to have the number clusters automatically determined -- currently they are not.)  \n\n**Remark:** The function `find-clusters` can return results of different types controlled with the named argument \"prop\".\nUsing `prop =\u003e 'All'` returns a hash with all properties of the cluster finding result.\n\nHere are sample points from each found cluster:\n\n```perl6\n.say for %res\u003cClusters\u003e\u003e\u003e.pick(3);\n```\n\nHere are the centers of the clusters (the mean points):\n\n```perl6\n%res\u003cMeanPoints\u003e\n```\n\nWe can verify the result by looking at the plot of the found clusters:\n\n```perl6\ntext-list-plot((|%res\u003cClusters\u003e, %res\u003cMeanPoints\u003e), point-char =\u003e \u003c▽ ☐ ●\u003e, title =\u003e '▽ - 1st cluster; ☐ - 2nd cluster; ● - cluster centers')\n```\n\n**Remark:** By default `find-clusters` uses the K-means algorithm. The functions `k-means` and `k-medoids`\ncall `find-clusters` with the option settings `method=\u003e'K-means'` and `method=\u003e'K-medoids'` respectively.\n\n------\n\n## More interesting looking data\n\nHere is more interesting looking two-dimensional data, `data2D2`:\n\n```perl6\nuse Data::Reshapers;\nmy $pointsPerCluster = 200;\nmy @data2D5 = [[10,20,4],[20,60,6],[40,10,6],[-30,0,4],[100,100,8]].map({ \n    random-variate(NormalDistribution.new($_[0], $_[2]), $pointsPerCluster) Z random-variate(NormalDistribution.new($_[1], $_[2]), $pointsPerCluster)\n   }).Array;\n@data2D5 = flatten(@data2D5, max-level=\u003e1).pick(*);\n@data2D5.elems\n```\n\nHere is a plot of that data:\n\n```perl6\ntext-list-plot(@data2D5)\n```\n\nHere we find clusters and plot them together with their mean points:\n\n```perl6\nsrand(32);\nmy %clRes = find-clusters(@data2D5, 5, prop=\u003e'All');\ntext-list-plot([|%clRes\u003cClusters\u003e, %clRes\u003cMeanPoints\u003e], point-char=\u003e\u003c1 2 3 4 5 ●\u003e)\n```\n\n-------\n\n## Detailed function pages\n\nDetailed parameter explanations and usage examples for the functions provided by the package are given in:\n\n- [\"K-means function page\"](./doc/K-means-function-page.md)\n\n- [\"K-medoids function page\"]()\n\n- [\"Bi-sectional-K-means function page\"]()\n\n-------\n\n## Implementation considerations\n\n### UML diagram\n\nHere is a UML diagram that shows package's structure (in Mermaid-JS):\n\n```shell, output.prompt=NONE, output.lang=mermaid\nto-uml-spec ML::Clustering --format=mermaid\n```\n\n**Remark:** Maybe it is a good idea to have an abstract class named, say,\n`ML::Clustering::AbstractFinder` that is a parent of\n`ML::Clustering::KMeans`, `ML::Clustering::KMedoids`, `ML::Clustering::BiSectionalKMeans`, etc.,\nbut I have not found to be necessary. (At this point of development.)\n\n**Remark:** It seems it is better to have a separate package for the distance functions, named, say,\n\"ML::DistanceFunctions\". (Although distance functions are not just for ML...)\nAfter thinking over package and function names I will make such a package. \n\n-------\n\n## TODO\n\n- [X] DONE Factor-out the distance functions in a separate package.\n\n- [ ] TODO Implement Bi-sectional K-means algorithm, [AAp1].\n\n- [ ] TODO Implement K-medoids algorithm.\n\n- [ ] TODO Automatic determination of the number of clusters.\n\n- [ ] TODO Allow data points to be `Pair` objects the keys of which are point labels.\n\n   - Hence, the returned clusters consist of those labels, not points themselves.\n\n- [ ] TODO Implement Agglomerate algorithm.\n\n-------\n\n## References\n\n### Articles\n\n[Wk1] Wikipedia entry, [\"Cluster Analysis\"](https://en.wikipedia.org/wiki/Cluster_analysis).\n\n[AA1] Anton Antonov,\n[\"Introduction to data wrangling with Raku\"](https://rakuforprediction.wordpress.com/2021/12/31/introduction-to-data-wrangling-with-raku/),\n(2021),\n[RakuForPrediction at WordPress](https://rakuforprediction.wordpress.com).\n\n### Packages\n\n[AAp1] Anton Antonov,\n[Bi-sectional K-means algorithm in Mathematica](https://github.com/antononcube/MathematicaForPrediction/blob/master/BiSectionalKMeans.m),\n(2020),\n[MathematicaForPrediction at GitHub/antononcube](https://github.com/antononcube/MathematicaForPrediction/).\n\n[AAp2] Anton Antonov,\n[Data::Generators Raku package](https://github.com/antononcube/Raku-Data-Generators),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp3] Anton Antonov,\n[Data::Reshapers Raku package](https://github.com/antononcube/Raku-Data-Reshapers),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp4] Anton Antonov,\n[Data::Summarizers Raku package](https://github.com/antononcube/Raku-Data-Summarizers),\n(2021),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp5] Anton Antonov,\n[UML::Translators Raku package](https://github.com/antononcube/Raku-UML-Translators),\n(2022),\n[GitHub/antononcube](https://github.com/antononcube).\n\n[AAp6] Anton Antonov,\n[Text::Plot Raku package](https://raku.land/zef:antononcube/Text::Plot),\n(2022),\n[GitHub/antononcube](https://github.com/antononcube).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-ml-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantononcube%2Fraku-ml-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantononcube%2Fraku-ml-clustering/lists"}