{"id":22793650,"url":"https://github.com/cmtt/kmpp","last_synced_at":"2025-10-07T17:33:58.329Z","repository":{"id":4298700,"uuid":"5430056","full_name":"cmtt/kmpp","owner":"cmtt","description":"k-means clustering algorithm with k-means++ initialization.","archived":false,"fork":false,"pushed_at":"2022-12-03T12:11:37.000Z","size":1262,"stargazers_count":32,"open_issues_count":11,"forks_count":9,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-06-05T00:14:45.520Z","etag":null,"topics":["clustering-algorithm","javascript","kmeans-algorithm","kmeansplusplus"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cmtt.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2012-08-15T18:45:03.000Z","updated_at":"2025-03-08T20:03:40.000Z","dependencies_parsed_at":"2023-01-11T16:35:35.243Z","dependency_job_id":null,"html_url":"https://github.com/cmtt/kmpp","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/cmtt/kmpp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmtt%2Fkmpp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmtt%2Fkmpp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmtt%2Fkmpp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmtt%2Fkmpp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cmtt","download_url":"https://codeload.github.com/cmtt/kmpp/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cmtt%2Fkmpp/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259444954,"owners_count":22858548,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering-algorithm","javascript","kmeans-algorithm","kmeansplusplus"],"created_at":"2024-12-12T03:28:09.552Z","updated_at":"2025-10-07T17:33:53.304Z","avatar_url":"https://github.com/cmtt.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kmpp\n\n[![Travis CI](https://travis-ci.org/cmtt/kmpp.svg)](https://travis-ci.org/cmtt/kmpp)\n\nWhen dealing with lots of data points, clustering algorithms may be used to group them. The k-means algorithm partitions _n_ data points into _k_ clusters and finds the centroids of these clusters incrementally.\n\nThe algorithm assigns data points to the closest cluster, and the centroids of each cluster are re-calculated. These steps are repeated until the centroids do not changing anymore.\n\nThe basic k-means algorithm is initialized with _k_ centroids at random positions. This implementation addresses some disadvantages of the arbitrary initialization method with the k-means++ algorithm (see \"Further reading\" at the end).\n\n## Installation\n\n## Installing via npm\n\nInstall kmpp as Node.js module via NPM:\n````bash\n$ npm install kmpp\n````\n\n## Example\n\n```javascript\nvar kmpp = require('kmpp');\n\nkmpp([\n  [x1, y1, ...],\n  [x2, y2, ...],\n  [x3, y3, ...],\n  ...\n], {\n  k: 4\n});\n\n// =\u003e\n// { converged: true,\n//   centroids: [[xm1, ym1, ...], [xm2, ym2, ...], [xm3, ym3, ...]],\n//   counts: [ 7, 6, 7 ],\n//   assignments: [ 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1 ]\n// }\n```\n\n## API\n\n### `kmpp(points[, opts)`\n\nExectes the k-means++ algorithm on `points`.\n\nArguments:\n- `points` (`Array`): An array-of-arrays containing the points in format `[[x1, y1, ...], [x2, y2, ...], [x3, y3, ...], ...]`\n- `opts`: object containing configuration parameters. Parameters are\n  - `distance` (`function`): Optional function that takes two points and returns the distance between them.\n  - `initialize` (`Boolean`): Perform initialization. If false, uses the initial state provided in `centroids` and `assignments`. Otherwise discards any initial state and performs initialization.\n  - `k` (`Number`): number of centroids. If not provided, `sqrt(n / 2)` is used, where `n` is the number of points.\n  - `kmpp` (`Boolean`, default: `true`): If true, uses k-means++ initialization. Otherwise uses naive random assignment.\n  - `maxIterations` (`Number`, default: `100`): Maximum allowed number of iterations.\n  - `norm` (`Number`, default: `2`): L-norm used for distance computation. `1` is Manhattan norm, `2` is Euclidean norm. Ignored if `distance` function is provided.\n  - `centroids` (`Array`): An array of centroids. If `initialize` is false, used as initialization for the algorithm, otherwise overwritten in-place if of the correct size.\n  - `assignments` (`Array`): An array of assignments. Used for initialization, otherwise overwritten.\n  - `counts` (`Array`): An output array used to avoid extra allocation. Values are discarded and overwritten.\n\nReturns an object containing information about the centroids and point assignments. Values are:\n- `converged`: `true` if the algorithm converged successfully\n- `centroids`: a list of centroids\n- `counts`: the number of points assigned to each respective centroid\n- `assignments`: a list of integer assignments of each point to the respective centroid\n- `iterations`: number of iterations used\n\n# Credits\n\n* [Jared Harkins](https://github.com/hDeraj) improved the performance by\n  reducing the amount of function calls, reverting to Manhattan distance\n  for measurements and improved the random initialization by choosing from\n  points\n\n* [Ricky Reusser](https://github.com/rreusser) refactored API\n\n# Further reading\n\n* [Wikipedia: k-means clustering](https://en.wikipedia.org/wiki/K-means_clustering)\n* [Wikipedia: Determining the number of clusters in a data set](https://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set)\n* [k-means++: The advantages of careful seeding, Arthur Vassilvitskii](http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf)\n* [k-means++: The advantages of careful seeding, Presentation by Arthur Vassilvitskii (Presentation)](http://theory.stanford.edu/~sergei/slides/BATS-Means.pdf)\n\n# License\n\n\u0026copy; 2017-2019. MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmtt%2Fkmpp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcmtt%2Fkmpp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcmtt%2Fkmpp/lists"}