{"id":25369960,"url":"https://github.com/tschechlovdev/effens","last_synced_at":"2025-04-09T07:18:25.065Z","repository":{"id":225118434,"uuid":"693569428","full_name":"tschechlovdev/EffEns","owner":"tschechlovdev","description":null,"archived":false,"fork":false,"pushed_at":"2024-05-13T11:46:05.000Z","size":1243,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-15T01:38:21.922Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tschechlovdev.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-19T09:34:39.000Z","updated_at":"2024-05-13T11:46:08.000Z","dependencies_parsed_at":"2024-05-13T13:03:15.135Z","dependency_job_id":null,"html_url":"https://github.com/tschechlovdev/EffEns","commit_stats":null,"previous_names":["tschechlovdev/effens"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tschechlovdev%2FEffEns","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tschechlovdev%2FEffEns/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tschechlovdev%2FEffEns/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tschechlovdev%2FEffEns/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tschechlovdev","download_url":"https://codeload.github.com/tschechlovdev/EffEns/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247994135,"owners_count":21030051,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-15T01:38:24.211Z","updated_at":"2025-04-09T07:18:25.030Z","avatar_url":"https://github.com/tschechlovdev.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EffEns: Efficient Ensemble Clustering based on Meta-Learning and Hyperparameter Optimization \n\nPrototypical Implementation in Python of the submitted Paper \"Ensemble  Clustering based on Meta-Learning and Hyperparameter Optimization\" at VLDB 2024.\nIn the following, we provide an overview of the code structure, an installation instruction, and an example on how to use EffEns.\n\n## Overview\n\nThe main code is in the \"src\" folder. It contains the following modules:\n\n- ``automlclustering``: Contains the adapted code from [AutoML4Clust](https://github.com/tschechlovdev/Automl4Clust) and [ML2DAC](https://github.com/tschechlovdev/ml2dac/tree/main), which provide \n    implementations of AutoML for Clustering Systems and for different meta-feature sets.\n- ``consensus_functions``: Contains implementations of the five consensus functions \"ABV\", \"ACV\", \"MCLA\", \"MM\", and \"QMI\", which we used in our paper.\n- ``ConsensusCS``: Provides the consensus functions and hyperparameters as configuration space for the optimizer.\n- ``EffEnsMKR``: Contains a script that stores the path to the MKR and the filenames of the \"evaluated ensembles\" and the meta-features.\n- ``EnsMetaLearning``: All functionality for our meta-learning procedure. \nIn particular, for the learning phase to evaluate different ensemble subsets and extract the meta-features.\n    It also contains ``EffEns`` that can be applied on new datasets.\n- ``EnsOptimizer``: Contains the optimizer that we use for hyperparameter optimization of the consensus functions. \n  We use SMAC as optimizer and provide a wrapper class as well as the black box function for optimization.\n- ``Experiments``: Contains the code for the experiments for the synthetic and real-world datasets of our paper (cf. Section 7).\n- ``Utils``: Contains some utility code such as functions to process the optimizer results or to clean up temporary directories.\n\nNote that the directory ``real_world_datasets`` contains the datasets that we used in our experiments.\nFurther, ``evaluation_results`` contains all results from our experimental evaluatioin. \n\n## Installation\n\nOur implementation is based on Python and we require Python 3.9.\nFurthermore, as SMAC only runs on Linux, we also require a Linux system.\nWe have tested on Ubuntu 20.04.\n\nBefore installing EffEns, you first have to install the following that are required for some of the libraries:\n- ``sudo apt-get install build-essential``\n- ``sudo apt-get install gcc``\n\nThe easiest way of installing EffEns is to use Anaconda. Follow https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html\nto install Anaconda.\nWe will then create a prepared Python 3.9 environment:\n- ``conda env create -f environment.yml``\n\nThis should create a conda environment with the name \"automated_ensemble_clustering\".\nThen you have to install ib_base as it is not available as package: \n\n```git clone https://collaborating.tuhh.de/cip3725/ib_base.git\ncd ib_base\npython setup.py install\ncd ..\n```\n\nAfter finishing this, you have to add the \"src\" folder of EffEns and the path to \"ib_base\" to your PYTHONPATH\nYou may also have to add them to your conda path\n``gedit  ~/anaconda3/envs/automated_ensemble_clustering/lib/python3.9/site-packages/conda.pth``\n\nNow everything should be setup and you can try to run ``python src/Experiments/SyntehticData/EffEns_Experiment_synthetic.py``.\nThis should run without any errors.\n\n## Examples\n\nIn the following, we provide a simple example on how to use EffEns on new unseen datasets with the provided Meta-Knowledge Repository:\n\n```Python\nfrom sklearn.datasets import make_blobs\nfrom EnsMetaLearning.EffEns import EffEns\nfrom automlclustering.ClusterValidityIndices import CVIHandler\nfrom Utils.Utils import process_result_to_dataframe\n\n# Generate simple synthetic data\nX, y = make_blobs()\n\n# Instantiate EffEnse. Use provided path to MKR.\neffens = EffEns(path_to_mkr=\"./EffEnsMKR/\")\n\n# Choose CVI to evaluate results\ncvi = CVIHandler.CVICollection.CALINSKI_HARABASZ\n\n# Apply EffEns on Data X\nresult, _ = effens.apply_ensemble_clustering(X, cvi=cvi, n_loops=5)\n\n# Parse Result\nresult = process_result_to_dataframe(result, {\"cvi\": cvi.get_abbrev()},\n                                     # compare against ground-truth clustering\n                                     ground_truth_clustering=y\n                                     )\n\nprint(result[[\"iteration\", \"config\", \"CVI score\", \"Best NMI\"]])\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftschechlovdev%2Feffens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftschechlovdev%2Feffens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftschechlovdev%2Feffens/lists"}