{"id":21432501,"url":"https://github.com/gagolews/clustering-data-v1","last_synced_at":"2025-04-19T16:45:32.412Z","repository":{"id":59521669,"uuid":"525659556","full_name":"gagolews/clustering-data-v1","owner":"gagolews","description":"A framework for benchmarking clustering algorithms – Benchmark suite, version 1","archived":false,"fork":false,"pushed_at":"2023-10-22T03:24:06.000Z","size":181894,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-13T01:46:23.373Z","etag":null,"topics":["benchmark","benchmark-datasets","clustering","data","dataset","datasets","machine-learning"],"latest_commit_sha":null,"homepage":"https://clustering-benchmarks.gagolewski.com/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gagolews.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-08-17T06:00:06.000Z","updated_at":"2025-03-10T12:46:44.000Z","dependencies_parsed_at":"2022-09-18T08:52:15.184Z","dependency_job_id":"3dd368b1-77ea-4b93-8979-06693caad5eb","html_url":"https://github.com/gagolews/clustering-data-v1","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fclustering-data-v1","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fclustering-data-v1/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fclustering-data-v1/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fclustering-data-v1/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gagolews","download_url":"https://codeload.github.com/gagolews/clustering-data-v1/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249741035,"owners_count":21318725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","benchmark-datasets","clustering","data","dataset","datasets","machine-learning"],"created_at":"2024-11-22T23:18:47.769Z","updated_at":"2025-04-19T16:45:32.394Z","avatar_url":"https://github.com/gagolews.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [A Framework for Benchmarking Clustering Algorithms](https://clustering-benchmarks.gagolewski.com)\n## Benchmark Suite Version 1 (with updates)\n\nThe aim of this project is to **aggregate, polish, and standardise the\nexisting clustering benchmark batteries** referred to across the machine\nlearning and data mining literature, and to introduce **new datasets**\nof different dimensionalities, sizes, and cluster types.\n\nThis repository is part of the\n[**Framework for Benchmarking Clustering Algorithms**](https://clustering-benchmarks.gagolewski.com).\nIt hosts the datasets from version 1 of the benchmark suite.\n\nRefer to \u003chttps://clustering-benchmarks.gagolewski.com\u003e\nfor a detailed description, file format specification,\nexample Python/R/MATLAB code, datasets explorer,\nand literature references.\n\n\n\n**Editor/Maintainer**:\n[Marek Gagolewski](https://www.gagolewski.com).\n\n\n**How to Cite**: Please cite the following paper which describes\nthe overall benchmarking methodology:\n\n\u003e Gagolewski M., A framework for benchmarking clustering algorithms,\n*SoftwareX* **20**, 2022, 101270, \u003chttps://clustering-benchmarks.gagolewski.com\u003e,\nDOI: [10.1016/j.softx.2022.101270](https://doi.org/10.1016/j.softx.2022.101270).\n\nAdditionally, mention the exact version of this benchmark suite\n(see *Changelog* below for version information):\n\n\u003e Gagolewski M. et al. (Eds.), *A benchmark suite for clustering algorithms:\nVersion 1.1.0*, 2022,\n\u003chttps://github.com/gagolews/clustering-data-v1/releases/tag/v1.1.0\u003e,\nDOI: [10.5281/zenodo.7088171](https://doi.org/10.5281/zenodo.7088171).\n\n\nThe datasets are provided **solely for research purposes**,\nunless stated otherwise. Please cite the literature references mentioned\nin the corresponding dataset description files in any publications\nthat make use of these.\n\n\n\n\n## Changelog\n\nThe datasets and the reference labels included in this suite\nare versioned. This ensures reproducibility.\n\nSee \u003chttps://github.com/gagolews/clustering-data-v1/releases/\u003e for\ndownloadable snapshots.\n\n\n###  1.1.0 (2022-09-17)\n\n-   Each battery is now equipped with a README.txt file.\n\n-   New label vectors:\n    wut/x2.labels1,\n    wut/x3.labels1.\n\n-   Prettified (slightly) label vectors:\n    graves/fuzzyx.labels[1-4],\n    graves/parabolic.labels1.\n\n-   Deleted now redundant label vectors:\n    graves/fuzzyx.labels5.\n\n-   The historical snapshot of this release is available at\n    DOI: [10.5281/zenodo.7088171](https://doi.org/10.5281/zenodo.7088171).\n\n\n###  1.0.1 (2022-09-10)\n\n-   Updated dataset description files, e.g., fixed broken links.\n\n-   The code and the data repositories were separated; see\n    \u003chttps://github.com/gagolews/clustering-benchmarks\u003e and\n    \u003chttps://github.com/gagolews/clustering-data-v1\u003e.\n\n-   The project's homepage has been created. It is available at\n    \u003chttps://clustering-benchmarks.gagolewski.com\u003e.\n\n-   The historical snapshot of this release is available at\n    DOI: [10.5281/zenodo.7066690](https://doi.org/10.5281/zenodo.7066690).\n\n\n###  1.0.0 (2020-05-08)\n\n-   Datasets in the 1st (v1.0.0) version of the benchmark\n    battery are now frozen.\n\n-   The historical snapshot of this release is available at\n    DOI: [10.5281/zenodo.3815066](https://doi.org/10.5281/zenodo.3815066).\n\n\n###  0.0.0 (2015-12-29)\n\n-   Version 0 of the benchmark suite consists of the datasets\n    studied in: Gagolewski M., Bartoszuk M., Cena A.,\n    Genie: A new, fast, and outlier-resistant hierarchical\n    clustering algorithm, *Information Sciences* **363**, 2016, pp. 8–23,\n    DOI: [10.1016/j.ins.2016.05.003](https://doi.org/10.1016/j.ins.2016.05.003).\n    The datasets have been archived at\n    \u003chttps://github.com/gagolews/clustering-data-v0\u003e.\n\n\n## See Also\n\n\u003chttps://clustering-benchmarks.gagolewski.com\u003e gives a detailed description\nof the whole framework for benchmarking clustering algorithms.\n\nIt also mentions where to find raw and aggregated results generated\nby many clustering methods when run on the datasets from this repository.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fclustering-data-v1","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgagolews%2Fclustering-data-v1","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fclustering-data-v1/lists"}