{"id":13423709,"url":"https://github.com/gagolews/genie","last_synced_at":"2025-07-14T12:33:37.248Z","repository":{"id":151189610,"uuid":"51749979","full_name":"gagolews/genie","owner":"gagolews","description":"Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)","archived":false,"fork":false,"pushed_at":"2022-08-16T01:38:12.000Z","size":419,"stargazers_count":22,"open_issues_count":0,"forks_count":3,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-26T23:12:22.258Z","etag":null,"topics":["cluster","cluster-analysis","clustering","data-analysis","data-mining","data-science","datascience","genie","hierarchical-clustering-algorithm","machine-learning","machine-learning-algorithms","outliers","r"],"latest_commit_sha":null,"homepage":"http://genieclust.gagolewski.com/","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gagolews.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-15T11:04:19.000Z","updated_at":"2024-10-22T11:37:45.000Z","dependencies_parsed_at":null,"dependency_job_id":"85ee7896-5114-45a0-8b60-a811504c52d0","html_url":"https://github.com/gagolews/genie","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fgenie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fgenie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fgenie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gagolews%2Fgenie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gagolews","download_url":"https://codeload.github.com/gagolews/genie/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225976753,"owners_count":17554271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster","cluster-analysis","clustering","data-analysis","data-mining","data-science","datascience","genie","hierarchical-clustering-algorithm","machine-learning","machine-learning-algorithms","outliers","r"],"created_at":"2024-07-31T00:00:41.020Z","updated_at":"2024-11-22T23:18:36.993Z","avatar_url":"https://github.com/gagolews.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"# Genie (R Package)\n\n\u003e This project has been superseded by\n[genieclust](https://genieclust.gagolewski.com)\n(see also: [GitHub](https://github.com/gagolews/genieclust/)),\nwhich features a faster and more feature-rich implementation\nof Genie (available for both R and Python).\n\n\n## A Fast and Robust Hierarchical Clustering Algorithm\n\nThe time needed to apply a hierarchical clustering algorithm\nis most often dominated by the number of computations of a pairwise\ndissimilarity measure. Such a constraint, for larger data sets,\nputs the use of all the classical linkage criteria at a disadvantage,\nwith the exception of the single linkage one. However, it is known that the single\nlinkage clustering algorithm is very sensitive to outliers, produces highly\nskewed dendrograms and therefore usually does not reflect the true\nunderlying structure of analysed data - unless the clusters are well-separated.\nTo overcome its limitations, we proposed a hierarchical clustering linkage\ncriterion called *Genie*. Namely, our algorithm links two clusters in such\na way that the Gini measure of inequity of the cluster sizes\ndoes not exceed a given threshold.\nThis method most often outperforms the Ward or average linkage in terms of\nthe clustering quality on benchmark data. At the same time,\nGenie retains the high speed of the single linkage approach,\ntherefore it is also suitable for analysing larger data sets.\nThe algorithm is easily parallelisable and thus may be run\non multiple threads to speed up its execution further on.\nIts memory overhead is small: there is no need to precompute the complete\ndistance matrix to perform the computations in order to obtain a desired\nclustering.\n\nA detailed description of the algorithm can be found in:\n\nGagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant\nhierarchical clustering algorithm, *Information Sciences* **363**, 2016, 8–23.\n[doi:10.1016/j.ins.2016.05.003](https://dx.doi.org/10.1016/j.ins.2016.05.003).\n\nSee also:\n\nGagolewski M., genieclust: Fast and robust hierarchical clustering,\n*SoftwareX* **15**, 2021, 100722.\n[doi:10.1016/j.softx.2021.100722](https://dx.doi.org/10.1016/j.softx.2021.100722).\n\n\n**Authors**: [Marek Gagolewski](https://www.gagolewski.com/),\n[Maciej Bartoszuk](http://bartoszuk.rexamine.com), and\n[Anna Cena](http://cena.rexamine.com)\n\n**CRAN entry**: \u003chttps://cran.r-project.org/web/packages/genie/\u003e\n\n**genieclust**: \u003chttps://genieclust.gagolewski.com/\u003e,\n\u003chttps://github.com/gagolews/genieclust/\u003e,\n\u003chttps://cran.r-project.org/web/packages/genieclust/\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fgenie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgagolews%2Fgenie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgagolews%2Fgenie/lists"}