{"id":47325386,"url":"https://github.com/isg-siegen/assembled","last_synced_at":"2026-03-17T19:01:39.982Z","repository":{"id":37425049,"uuid":"471025423","full_name":"ISG-Siegen/assembled","owner":"ISG-Siegen","description":"A framework to find better ensembles for (Automated) Machine Learning ","archived":false,"fork":false,"pushed_at":"2023-07-04T07:23:21.000Z","size":3690,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-01-29T06:17:35.623Z","etag":null,"topics":["automl","benchmark","ensemble","evaluation","openml"],"latest_commit_sha":null,"homepage":"https://isg-siegen.github.io/assembled","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ISG-Siegen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-03-17T14:35:31.000Z","updated_at":"2024-01-27T09:03:08.000Z","dependencies_parsed_at":"2023-11-09T13:00:56.190Z","dependency_job_id":"dfa82a0c-345f-4f74-ad9e-7bd38ee2f2dc","html_url":"https://github.com/ISG-Siegen/assembled","commit_stats":{"total_commits":71,"total_committers":1,"mean_commits":71.0,"dds":0.0,"last_synced_commit":"42254621693aed38963c522c3006482cb7c20f00"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/ISG-Siegen/assembled","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ISG-Siegen%2Fassembled","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ISG-Siegen%2Fassembled/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ISG-Siegen%2Fassembled/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ISG-Siegen%2Fassembled/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ISG-Siegen","download_url":"https://codeload.github.com/ISG-Siegen/assembled/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ISG-Siegen%2Fassembled/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30628727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T17:32:55.572Z","status":"ssl_error","status_checked_at":"2026-03-17T17:32:38.732Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","benchmark","ensemble","evaluation","openml"],"created_at":"2026-03-17T19:01:36.152Z","updated_at":"2026-03-17T19:01:39.975Z","avatar_url":"https://github.com/ISG-Siegen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# assembled\n\nAssembled is planed to be a framework for ensemble evaluation. It shall run, benchmark, and evaluate ensemble techniques\nwithout the overhead of training base models.\n\nCurrently, its main features are:\n\n* **Metatasks**: A metatasks is a meta-dataset and a class interface. Metatasks contain the predictions and\n  confidences (e.g. sklearn's predict_proba) of specific base models and the data of an original (OpenML) task.\n  Moreover, its class interface contains several useful method to simplify the evaluation and benchmarking of ensemble\n  techniques. A collection of metatasks can be used to benchmark ensemble techniques without the computational overhead\n  of training and evaluating base models.\n* **Assembled-OpenML**: an extension of Assembled to build metatasks with data from OpenML. Only a OpenML Task ID must\n  be passed to the code to generate a metatask for a specific OpenML task. Technically, any ID can be passed. In\n  practice, only supervised classification tasks are supported so far. Moreover, this tool was build for and tested\n  against curated benchmarks (like tasks in OpenMLCC-18). Other classification tasks should be supported as well but\n  bugs might be more likely.\n* **FakedClassifiers**: Code to simulate the behavior of a base model by passing appropriate data to it during the\n  initialization. Allows us to evaluate most ensemble techniques without code changes to the original implementation's\n  code.\n* **Supporting Ensemble Techniques** We created code to make ensemble techniques usable with (pre-fitted) base models.\n  This is not part of Assembled itself but rather additional example on how to use ensemble techniques with Assembled.\n  Some implementation support base models by default other do not. See `/ensemble_techniques/` for more details.\n\nCurrently, is main use-cases are:\n\n* Ensembles After AutoML (Post-Processing)\n\nThis repository/branch also contains the Assembled-OpenML extension.\n\n## Publicly Available Data for Assembled\n\nThe followings projects collected data for assembled and share them publicly:\n\n* Metatasks containing the data for base models produced by executing [AutoGluon](https://auto.gluon.ai/) on the 71\n  classification datasets from the AutoML benchmark: [Code](https://doi.org/10.6084/m9.figshare.23609226)\n  and [Data](https://doi.org/10.6084/m9.figshare.23609361)\n* Metatasks containing the data for base models produced by\n  executing [Auto-Sklearn 1](https://automl.github.io/auto-sklearn) on the 71 classification datasets from the AutoML\n  benchmark: [Code](https://doi.org/10.6084/m9.figshare.23613624)\n  and [Data](https://doi.org/10.6084/m9.figshare.23613627)\n\n## Assembled-OpenML\n\n_For the original code of the workshop paper on Assembled-OpenML, see the `automl_workshop_paper` branch_\n\nAssembled-OpenML builds Metatasks from OpenML. In this first version of Assembled-OpenML, the predictions correspond to\nthe top-n best runs (configurations) of an OpenML task. It shall simulate the use case of post-processing an AutoML\ntool's top-n set of configurations.\n\nAssembled-OpenML enables the user to quickly generate a benchmark set by entering a list of OpenML Task IDs as input\n(see our code examples). In general, Assembled-OpenML is an affordable/efficient alternative to creating benchmarks by\nhand. It is affordable/efficient, because you do not need to train and evaluate the base models but can directly\nevaluate ensemble techniques.\n\n## Installation\n\nTo install Assembled and Assembled-OpenML, use:\n\n```bash\npip install assembled[openml]\n```\n\nIf you only want to use Assembled, leave away `[openml]`.\n\nTo install the newest version (from the main branch), use:\n\n```bash\npip install git+https://github.com/ISG-Siegen/assembled.git#egg=assembled[openml]\n```\n\n### Other Installations\n\nFor experiments, work-in-progress code, or other non-packaged code stored in this repository, we\nprovide `requirements.txt` files. These can be used to re-create the environments needed for the code.\n\nAn example workflow for the installation on Linux is:\n\n```bash\ngit clone https://github.com/ISG-Siegen/assembled.git\ncd assembled\npython3 -m venv venv_assembled\nsource venv_assembled/bin/activate\npip install -r requirements.txt\n```\n\nPlease be aware that any relevant-enough subdirectory keeps track of its own requirements through a `requirements.txt`.\nHence, if you want to use only parts of this project, it might be a better idea to only install the requirements of the\ncode that you want to use.\n\n## Usage\n\nTo see the example usage of Assembled-OpenML, see the `./examples/` directory for code examples and more details.\n\nA simple example of using Assembled-OpenML to get a Metatask and using Assembled to evaluate an ensemble technique on\nthe Metatask is:\n\n```python\nfrom assembledopenml.openml_assembler import OpenMLAssembler\nfrom assembled.ensemble_evaluation import evaluate_ensemble_on_metatask\n\n# Import an adapted version of auto-sklearn's Ensemble Selection\n# (requires the ensemble_techniques directory to be in your local directory)\nfrom ensemble_techniques.autosklearn.ensemble_selection import EnsembleSelection\nfrom ensemble_techniques.util.metrics import OpenMLAUROC\n\n# -- Use Assembled-OpenML to build a metatask for the OpenML task with ID 3\nomla = OpenMLAssembler(nr_base_models=50, openml_metric_name=\"area_under_roc_curve\")\nmt = omla.run(openml_task_id=3)\n\n# -- Benchmark the ensemble technique on the metatask\ntechnique_run_args = {\"ensemble_size\": 50, \"metric\": OpenMLAUROC}\nfold_scores = evaluate_ensemble_on_metatask(mt, EnsembleSelection, technique_run_args, \"autosklearn.EnsembleSelection\",\n                                            pre_fit_base_models=True,\n                                            meta_train_test_split_fraction=0.5,\n                                            meta_train_test_split_random_state=0,\n                                            return_scores=OpenMLAUROC)\nprint(fold_scores)\nprint(\"Average Performance:\", sum(fold_scores) / len(fold_scores))\n```\n\n## Limitations\n\n* **Regression is not supported** so far as OpenML has not enough data (runs) on Regression tasks. Would require some\n  additional implementations.\n* Assembled-OpenML ignores OpenML repetitions (most runs/datasets do not provide repetitions).\n* The file format for the predictions file is not fully standardized in OpenML and hence requires manually adjustment to\n  all used formats. Hopefully, we found most of the relevant formats with Assembled-OpenML.\n* Some files, which store predictions, seem to have malicious or corrupted predictions/confidence values. If we can not\n  fix such a case, we store these bad predictors in the Metatask object to be manually validated later on. Moreover,\n  these predictors can be filtered from the Metatask if one wants to (we do this for every example or experiment).\n\n## A Comment on Validation Data\n\nBy default, and by design, Metatask created only from OpenML data do not have inner fold validation data. To train an\nensemble techniques on metataks created only from OpenML data, we split a fold's predictions on the fold's test data of\ninto ensemble_train and ensemble_test data. With ensemble_train being used to build/train the ensemble and ensemble_test\nbeing used to evaluate the ensemble.\n\nAlternatively, if a metatask and the base models stored in the metatask were initialized / created with validation data,\nwe can also use the validation data to train the ensemble technique and then test it on all test data/predictions of a\nfold.\n\n## Relevant Publication\n\nIf you use Assembled or Assembled-OpenML in scientific publications, we would appreciate citations.\n\n**Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML**, _Lennart Purucker and Joeran\nBeel,_\n_First Conference on Automated Machine Learning (Late-Breaking Workshop), 2022_\n\nLink to\npublication: [AutoML Conference](https://2022.automl.cc/wp-content/uploads/2022/08/assembled_openml_creating_effi-Main-Paper-And-Supplementary-Material.pdf)\nand [arXiv](https://arxiv.org/abs/2307.00285)\n\nLink to teaser video: [YouTube](https://www.youtube.com/watch?v=8OI8pWfWzM8)\n\nLink to full video: [YouTube](https://www.youtube.com/watch?v=WC-ndeKr_Ms)\n\n```\n@inproceedings{purucker2022assembledopenml,\n    title={Assembled-Open{ML}: Creating Efficient Benchmarks for Ensembles in Auto{ML} with Open{ML}},\n    author={Lennart Purucker and Joeran Beel},\n    booktitle={First International Conference on Automated Machine Learning (Late-Breaking Workshop)},\n    year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisg-siegen%2Fassembled","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fisg-siegen%2Fassembled","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fisg-siegen%2Fassembled/lists"}