{"id":15517060,"url":"https://github.com/deric/clustering-benchmark","last_synced_at":"2025-04-24T00:23:37.203Z","repository":{"id":33505956,"uuid":"37151921","full_name":"deric/clustering-benchmark","owner":"deric","description":null,"archived":false,"fork":false,"pushed_at":"2019-06-17T13:32:31.000Z","size":24439,"stargazers_count":167,"open_issues_count":0,"forks_count":119,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-30T05:11:21.767Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deric.png","metadata":{"files":{"readme":"README-old.asc","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-09T18:57:12.000Z","updated_at":"2025-02-12T19:41:43.000Z","dependencies_parsed_at":"2022-08-24T14:23:27.864Z","dependency_job_id":null,"html_url":"https://github.com/deric/clustering-benchmark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fclustering-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fclustering-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fclustering-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deric%2Fclustering-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deric","download_url":"https://codeload.github.com/deric/clustering-benchmark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250536446,"owners_count":21446734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-02T10:11:04.439Z","updated_at":"2025-04-24T00:23:37.158Z","avatar_url":"https://github.com/deric.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Clustering datasets\n\n## Datasets\n\nThis project contains collection of labeled clustering problems that can be found in the literature. Most of datasets were artificially created.\n\nAll datasets can be found link:https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial[data folder].\n\n### 2d-10c\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 2990  |  10 |  2\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/2d-10c.png[\"2d-10c\",400,float=\"left\"]\n\n[.float .right]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/2d-10c.arff[ARFF]\n* link:https://github.com/deric/handl-data-generators[generator]\n\n\u003e J. Handl and J. Knowles, “Multiobjective clustering with automatic\n\u003e determination of the number of clusters,” UMIST, Tech. Rep., 2004.\n\n### atom\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 800         |        2 |  3\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/atom.png[\"atom\",400,float=\"left\"]\n\n[.float .right]\n* source: link:https://www.uni-marburg.de/fb12/datenbionik/data?language_sync=1[FCPS]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/atom.arff[ARFF]\n\n### aggregation\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 788  |  7 |  2\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/aggregation.png[aggregation,400,float=\"left\"]\n\n[.float .right]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/aggregation.arff[ARFF]\n* link:http://cs.joensuu.fi/sipu/datasets/[original source]\n\n\u003e Gionis, A., H. Mannila, and P. Tsaparas, Clustering aggregation.\n\u003e ACM Transactions on Knowledge Discovery from Data (TKDD), 2007. 1(1): p. 1-30.\n\n### chainlink\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 1000        |        2 |  3\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/chainlink.png[\"chainlink\",400,float=\"left\"]\n\n[.float .right]\n* source: link:https://www.uni-marburg.de/fb12/datenbionik/data?language_sync=1[FCPS]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/chainlink.arff[ARFF]\n\n\u003e Alfred Ultsch, Clustering with SOM: U*C,\n\u003e in Proc. Workshop on Self Organizing Feature Maps ,pp 31-37 Paris 2005.\n\n### D31\n\n[align=\"right\",style=\"asciidoc\",options=\"noborders,wide\"]\n|===\n| data points |  3100\n| clusters    | 31\n| dimensions  | 2\n| image::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/D31.png[\"D31\",400,float=\"left\"] | * link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/D31.arff[ARFF]\n|===\n\n\u003e Veenman, C.J., M.J.T. Reinders, and E. Backer,\n\u003e A maximum variance cluster algorithm. IEEE Trans. Pattern Analysis and Machine Intelligence 2002. 24(9): p. 1273-1280.\n\n### 3MC\n\n[align=\"right\",options=\"header\",style=\"literal\"]\n|===\n| data points | clusters | dimension\n| 400         |        3 |  2\n|===\n\n[.float .right]\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/3MC.png[\"3MC\",400,float=\"left\"]\n\n\n### DS577\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 577        |        3 |  2\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/DS577.png[\"D31\",400,float=\"left\"]\n\n[.float .right]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/DS577.arff[ARFF]\n\n\u003e M. C. Su, C. H. Chou, and C. C. Hsieh, “Fuzzy C-Means Algorithm with a Point Symmetry Distance,”\n\u003e International Journal of Fuzzy Systems, vol. 7, no. 4, pp. 175-181, 2005.\n\n\n### cluto-t4_8k\n\n[align=\"right\",options=\"header\"]\n|===\n| data points | clusters | dimension\n| 8000        |        7 |  2\n|===\n\nimage::https://github.com/deric/clustering-benchmark/blob/images/fig/artificial/cluto-t4_8k.png[\"cluto-t4_8k\",400,float=\"left\"]\n\n[.float .right]\n* link:https://github.com/deric/clustering-benchmark/blob/master/src/main/resources/datasets/artificial/cluto-t4.8k.arff[ARFF]\n\n\u003e G. Karypis, “CLUTO A Clustering Toolkit,”\n\u003e Dept. of Computer Science, University of Minnesota, Tech. Rep. 02-017, 2002, available at\nhttp://www.cs.umn.edu/ ̃cluto.\n\n\n## Experiments\n\nThis project contains set of clustering methods benchmarks on various dataset. The project is dependent on [Clueminer project](https://github.com/deric/clueminer).\n\nin order to run benchmark compile dependencies into a single JAR file:\n\n    mvn assembly:assembly\n\n# Consensus experiment\n\nallows running repeated runs of the same algorithm:\n\n```\n./run consensus --dataset \"triangle1\" --repeat 10\n```\nby default k-means algorithm is used.\n\nFor available datasets see [resources folder](https://github.com/deric/clustering-benchmark/tree/master/src/main/resources/datasets/artificial).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderic%2Fclustering-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fderic%2Fclustering-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderic%2Fclustering-benchmark/lists"}