{"id":18728817,"url":"https://github.com/rubyonworld/kmeans-clusterer","last_synced_at":"2025-11-12T05:30:20.500Z","repository":{"id":174007975,"uuid":"542157809","full_name":"RubyOnWorld/kmeans-clusterer","owner":"RubyOnWorld","description":"⭐ k-means clustering in Ruby. Uses NArray under the hood for fast calculations.","archived":false,"fork":false,"pushed_at":"2022-09-27T17:39:41.000Z","size":400,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-12-28T14:26:59.442Z","etag":null,"topics":["cluster","clusterer","k","kmeans","ruby"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RubyOnWorld.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"MIT-LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-27T15:28:38.000Z","updated_at":"2022-09-27T18:01:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"7fd5acbb-de2b-486d-b219-9599482b5a90","html_url":"https://github.com/RubyOnWorld/kmeans-clusterer","commit_stats":null,"previous_names":["rubyonworld/kmeans-clusterer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyOnWorld%2Fkmeans-clusterer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyOnWorld%2Fkmeans-clusterer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyOnWorld%2Fkmeans-clusterer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RubyOnWorld%2Fkmeans-clusterer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RubyOnWorld","download_url":"https://codeload.github.com/RubyOnWorld/kmeans-clusterer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239599040,"owners_count":19665911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","clusterer","k","kmeans","ruby"],"created_at":"2024-11-07T14:24:26.892Z","updated_at":"2025-11-12T05:30:20.452Z","avatar_url":"https://github.com/RubyOnWorld.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"KMeansClusterer\n===\n\n[k-means clustering](http://en.wikipedia.org/wiki/K-means_clustering) in Ruby. Uses [NArray](https://github.com/masa16/narray) under the hood for fast calculations.\n\nJump to the [examples](examples/) directory to see this in action.\n\n\nFeatures\n---\n\n- Runs multiple clustering attempts to find optimal solution (single runs are susceptible to falling into non-optimal local minima)\n- Initializes centroids via [k-means++](http://en.wikipedia.org/wiki/K-means%2B%2B) algorithm, for faster convergence\n- Calculates [silhouette](http://en.wikipedia.org/wiki/Silhouette_%28clustering%29) score for evaluation\n- Option to scale data before clustering, so that output isn't biased by different feature scales\n- Works with high-dimensional data\n\n\nInstall\n---\n```\ngem install kmeans-clusterer\n```\n\n\nUsage\n---\n\nSimple example:\n\n```ruby\nrequire 'kmeans-clusterer'\n\ndata = [[40.71,-74.01],[34.05,-118.24],[39.29,-76.61],\n        [45.52,-122.68],[38.9,-77.04],[36.11,-115.17]]\n\nlabels = ['New York', 'Los Angeles', 'Baltimore', \n          'Portland', 'Washington DC', 'Las Vegas']\n\nk = 2 # find 2 clusters in data\n\nkmeans = KMeansClusterer.run k, data, labels: labels, runs: 5\n\nkmeans.clusters.each do |cluster|\n  puts  cluster.id.to_s + '. ' + \n        cluster.points.map(\u0026:label).join(\", \") + \"\\t\" +\n        cluster.centroid.to_s\nend\n\n# Use existing clusters for prediction with new data:\npredicted = kmeans.predict [[41.85,-87.65]] # Chicago\nputs \"\\nClosest cluster to Chicago: #{predicted[0]}\"\n\n# Clustering quality score. Value between -1.0..1.0 (1.0 is best)\nputs \"\\nSilhouette score: #{kmeans.silhouette.round(2)}\"\n```\n\nOutput of simple example:\n\n```\n0. New York, Baltimore, Washington DC [39.63, -75.89]\n1. Los Angeles, Portland, Las Vegas [38.56, -118.7]\n\nClosest cluster to Chicago: 0\n\nSilhouette score: 0.91\n```\n\n### Options\n\nThe following options can be passed in to ```KMeansClusterer.run```:\n\noption | default | description\n------ | ------- | -----------\n:labels | nil | optional array of Ruby objects to collate with data array\n:runs   | 10 | number of times to run kmeans\n:log    | false | print stats after each run\n:init   | :kmpp | algorithm for picking initial cluster centroids. Accepts :kmpp, :random, or an array of k centroids\n:scale_data | false | scales features before clustering using formula (data - mean) / std\n:float_precision | :double | float precision to use. :double or :single\n:max_iter | 300 | max iterations per run\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubyonworld%2Fkmeans-clusterer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frubyonworld%2Fkmeans-clusterer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frubyonworld%2Fkmeans-clusterer/lists"}