{"id":13413348,"url":"https://github.com/e-XpertSolutions/go-cluster","last_synced_at":"2025-03-14T19:31:59.998Z","repository":{"id":57496885,"uuid":"105765300","full_name":"e-XpertSolutions/go-cluster","owner":"e-XpertSolutions","description":"k-modes and k-prototypes clustering algorithms implementation in Go","archived":false,"fork":false,"pushed_at":"2022-11-29T13:37:05.000Z","size":3749,"stargazers_count":41,"open_issues_count":0,"forks_count":9,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-07-31T20:52:15.368Z","etag":null,"topics":["algorithm","cluster","clustering","clustering-algorithm","go","golang","k-modes","k-prototypes","machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/e-XpertSolutions.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-04T12:24:52.000Z","updated_at":"2024-06-05T02:20:57.000Z","dependencies_parsed_at":"2023-01-22T01:57:49.794Z","dependency_job_id":null,"html_url":"https://github.com/e-XpertSolutions/go-cluster","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e-XpertSolutions%2Fgo-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e-XpertSolutions%2Fgo-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e-XpertSolutions%2Fgo-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/e-XpertSolutions%2Fgo-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/e-XpertSolutions","download_url":"https://codeload.github.com/e-XpertSolutions/go-cluster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221498737,"owners_count":16833055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algorithm","cluster","clustering","clustering-algorithm","go","golang","k-modes","k-prototypes","machine-learning"],"created_at":"2024-07-30T20:01:38.337Z","updated_at":"2024-10-26T05:30:44.854Z","avatar_url":"https://github.com/e-XpertSolutions.png","language":"Go","funding_links":[],"categories":["Machine Learning","Go","机器学习","Repositories","\u003cspan id=\"机器学习-machine-learning\"\u003e机器学习 Machine Learning\u003c/span\u003e","Relational Databases"],"sub_categories":["Advanced Console UIs","检索及分析资料库","SQL 查询语句构建库","Search and Analytic Databases","\u003cspan id=\"高级控制台用户界面-advanced-console-uis\"\u003e高级控制台用户界面 Advanced Console UIs\u003c/span\u003e","交流"],"readme":"# go-cluster\n\n[![GoDoc](https://godoc.org/github.com/e-XpertSolutions/go-cluster/cluster?status.png)](http://godoc.org/github.com/e-XpertSolutions/go-cluster/cluster)\n[![License](https://img.shields.io/badge/license-BSD%203--Clause-yellow.svg?style=flat)](https://github.com/e-XpertSolutions/go-cluster/blob/master/LICENSE)\n[![GoReport](https://goreportcard.com/badge/github.com/e-XpertSolutions/go-cluster)](https://goreportcard.com/report/github.com/e-XpertSolutions/go-cluster)\n[![Travis](https://travis-ci.org/e-XpertSolutions/go-cluster.svg?branch=master)](https://travis-ci.org/e-XpertSolutions/go-cluster)\n[![cover.run go](https://cover.run/go/github.com/e-XpertSolutions/go-cluster/cluster.svg)](https://cover.run/go/github.com/e-XpertSolutions/go-cluster/cluster)\n\nGo implementation of clustering algorithms: k-modes and k-prototypes.\n\nK-modes algorithm is very similar to well-known clustering algorithm k-means. The difference is how the distance is computed. In k-means Euclidean distance between two vectors is most commonly used. While it works well for numerical, continuous data it is not suitable to use it with categorical data as it is impossible to compute the distance between values like ‘Europe’ and ‘Africa’. This is why in k-modes, the Hamming distance between vectors is used - it shows how many elements of two vectors is different. It is a good alternative for one-hot encoding while dealing with large number of categories for one feature. K-prototypes is used to cluster mixed data (both categorical and numerical).\n\nImplementation of algorithms is based on papers: [HUANG97](#references), [HUANG98](#references), [CAO09](#references) and partially inspired by python implementation of same algorithms: [KMODES](#references).\n\n## Installation\n\n```\ngo get github.com/e-XpertSolutions/go-cluster/v2\n```\n\n## Usage\n\nThis is basic configuration and usage of KModes and KPrototypes algorithms. For more information please refer to the documentation.\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"github.com/e-XpertSolutions/go-cluster/cluster\"\n)\n\nfunc main() {\n\n    //input categorical data first must be dictionary-encoded to numbers - for example for values\n    //\"blue\", \"red\", \"green\" it can be 1,2,3\n\n    data := cluster.NewDenseMatrix(lineNumber, columnNumber, rawData)\n    newData := cluster.NewDenseMatrix(newLineNumber, newColumnNumber, newRawData)\n\n\n    //input parameters for the algorithm\n\n    //distance and initialization functions may be chosen from the package or one may use \n    //custom functions with proper arguments\n    distanceFunction := cluster.WeightedHammingDistance\n    initializationFunction := cluster.InitCao\n\n    //number of clusters and maximum number of iterations \n    clustersNumber := 5\n    maxIteration := 20\n\n    //weight vector - used to set importance of the features, bigger number means greater \n    //contribution to the cost function\n    //vector must be of the same length as the number of features in dataset\n    //it is not compulsory, if 'nil' then all features are treated equally (weight = 1)  \n    weights := []float64{1,1,2}\n    wvec := [][]float64{weights}\n\n    //path to file where model will be saved or loaded from using LoadModel(), SaveModel()\n    //if no need to load or save the model, can be set to empty string\n    path = \"km.txt\"\n\n    //KModes algorithm\n    //initialization\n    km := cluster.NewKModes(distanceFunction, initializationFunction, clustersNumber, 1, \n    maxIteration, wvec, \"km.txt\")\n\n\n    //training\n    //after training it is possible to access clusters centers vectors and computed labels\n    //using km.ClusterCentroids and km.Labels\n    err := km.FitModel(data)\n    if err != nil {\n        fmt.Println(err)\n    }\n\n    //predicting labels for new data\n    newLabels, err := km.Predict(newData)\n    if err != nil {\n        fmt.Println(err)\n    }\n\n\n    //KPrototypes algorithm\n    //it needs two more parameters than k-modes:\n    //categorical - vector with numbers indicating columns with categorical features\n    //gamma - float number, importance of cost contribution for numerical values\n    categorical := []int{1} // means that only column number one contains categorical data\n    gamma := 0.2 //cost from distance function for numerical data will be multiplied by 0.2\n\n    //initialization\n    kp := cluster.NewKPrototypes(distanceFunction, initializationFunction, categorical, \n    clustersNumber, 1, maxIteration, wvec, gamma, \"km.txt\")\n\n    //training\n    err := kp.FitModel(data)\n    if err != nil {\n        fmt.Println(err)\n    }\n\n    //predicting labels for new data\n    newLabelsP, err := kp.Predict(newData)\n    if err != nil {\n        fmt.Println(err)\n    }\n}\n\n```\n\n\n## Contributing\n\nContributions are greatly appreciated. The project follows the typical\n[GitHub pull request model](https://help.github.com/articles/using-pull-requests/)\nfor contribution.\n\n\n## License\n\nThe sources are release under a BSD 3-Clause License. The full terms of that\nlicense can be found in `LICENSE` file of this repository.\n\n## References\n[HUANG97]: Huang, Z.: Clustering large data sets with mixed numeric and\n   categorical values, Proceedings of the First Pacific Asia Knowledge\n   Discovery and Data Mining Conference, Singapore, pp. 21-34, 1997.\n\n[HUANG98] Huang, Z.: Extensions to the k-modes algorithm for clustering\n   large data sets with categorical values, Data Mining and Knowledge\n   Discovery 2(3), pp. 283-304, 1998.\n\n[CAO09] Cao, F., Liang, J, Bai, L.: A new initialization method for\n   categorical data clustering, Expert Systems with Applications 36(7),\n   pp. 10223-10228., 2009.\n\n[KMODES] Python implementation of k-modes: https://github.com/nicodv/kmodes","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe-XpertSolutions%2Fgo-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fe-XpertSolutions%2Fgo-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fe-XpertSolutions%2Fgo-cluster/lists"}