{"id":19631736,"url":"https://github.com/vanderschaarlab/autoprognosis","last_synced_at":"2025-05-16T12:08:59.558Z","repository":{"id":61791908,"uuid":"444142709","full_name":"vanderschaarlab/autoprognosis","owner":"vanderschaarlab","description":"A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.","archived":false,"fork":false,"pushed_at":"2025-03-26T14:09:12.000Z","size":983,"stargazers_count":147,"open_issues_count":7,"forks_count":28,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-05-12T15:59:33.071Z","etag":null,"topics":["automl","healthcare","interpretability","survival-analysis"],"latest_commit_sha":null,"homepage":"https://www.autoprognosis.vanderschaar-lab.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vanderschaarlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-01-03T17:19:38.000Z","updated_at":"2025-05-10T16:27:52.000Z","dependencies_parsed_at":"2024-09-13T07:49:19.635Z","dependency_job_id":"6eae2b46-c0a5-457f-b4bf-bbed77b35f83","html_url":"https://github.com/vanderschaarlab/autoprognosis","commit_stats":null,"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vanderschaarlab%2Fautoprognosis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vanderschaarlab%2Fautoprognosis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vanderschaarlab%2Fautoprognosis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vanderschaarlab%2Fautoprognosis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vanderschaarlab","download_url":"https://codeload.github.com/vanderschaarlab/autoprognosis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254527087,"owners_count":22085918,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automl","healthcare","interpretability","survival-analysis"],"created_at":"2024-11-11T12:11:15.691Z","updated_at":"2025-05-16T12:08:59.495Z","avatar_url":"https://github.com/vanderschaarlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AutoPrognosis - A system for automating the design of predictive modeling pipelines tailored for clinical prognosis.\n\n\n\u003cdiv align=\"center\"\u003e\n\n\n[![Test In Colab](https://img.shields.io/badge/Tutorial-Model%20Search-orange)](https://colab.research.google.com/drive/1sFVnnxjRMCNVIn-Ikc--Ja44U0Ll4joY?usp=sharing)\n[![Test In Colab](https://img.shields.io/badge/Tutorial-Build%20a%20Demonstrator-orange)](https://colab.research.google.com/drive/1ZwjD9RkosCtboyblH4C8sQV1DuGY1H2X?usp=sharing)\n[![arXiv](https://img.shields.io/badge/arXiv-2210.12090-b31b1b.svg)](https://arxiv.org/abs/2210.12090)\n\n\n[![Tests](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_pr.yml/badge.svg)](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_pr.yml)\n[![Tests](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_full.yml/badge.svg)](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_full.yml)\n\u003c!-- [![Tests R](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_R.yml/badge.svg)](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_R.yml) --\u003e\n[![Tutorials](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_tutorials.yml/badge.svg)](https://github.com/vanderschaarlab/autoprognosis/actions/workflows/test_tutorials.yml)\n[![Documentation Status](https://readthedocs.org/projects/autoprognosis/badge/?version=latest)](https://autoprognosis.readthedocs.io/en/latest/?badge=latest)\n\n[![](https://pepy.tech/badge/autoprognosis)](https://pypi.org/project/autoprognosis/)\n[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/vanderschaarlab/autoprognosis/blob/main/LICENSE)\n[![about](https://img.shields.io/badge/about-The%20van%20der%20Schaar%20Lab-blue)](https://www.vanderschaar-lab.com/)\n[![slack](https://img.shields.io/badge/chat-on%20slack-purple?logo=slack)](https://join.slack.com/t/vanderschaarlab/shared_invite/zt-1pzy8z7ti-zVsUPHAKTgCd1UoY8XtTEw)\n\n\u003c/div\u003e\n\n\n![image](https://github.com/vanderschaarlab/autoprognosis/raw/main/docs/arch.png \"AutoPrognosis\")\n\n## :key: Features\n\n- :rocket: Automatically learns ensembles of pipelines for classification, regression or survival analysis tasks.\n- :cyclone: Easy to extend pluginable architecture.\n- :fire: Interpretability and uncertainty quantification tools.\n- :adhesive_bandage: Data imputation using [HyperImpute](https://github.com/vanderschaarlab/hyperimpute).\n- :zap: Build demonstrators using [Streamlit](https://streamlit.io/).\n- :notebook: [Python](#high_brightness-tutorials) and [R tutorials](https://github.com/vanderschaarlab/autoprognosis/tree/main/tutorials/bindings/R) available.\n- :book: [Read the docs](https://autoprognosis.readthedocs.io/)\n\n__Note__ : The R bindings have been tested using R version 4.2 and Python 3.8.\n\n## :rocket: Installation\n\n#### Using pip\n\nThe library can be installed from PyPI using\n```bash\n$ pip install autoprognosis\n```\nor from source, using\n```bash\n$ pip install .\n```\n### Redis (Optional, but recommended)\nAutoPrognosis can use Redis as a backend to improve the performance and quality of the searches.\n\nFor that, install the redis-server package following the steps described on the [official site](https://redis.io/topics/quickstart).\n\n## Environment variables\nThe library can be configured from a set of environment variables.\n\n| Variable       | Description                                                     |\n|----------------|-----------------------------------------------------------------|\n| `N_OPT_JOBS`     | Number of cores to use for hyperparameter search. Default : 1 |\n| `N_LEARNER_JOBS` | Number of cores to use by inidividual learners. Default: all cpus      |\n| `REDIS_HOST`     | IP address for the Redis database. Default 127.0.0.1            |\n| `REDIS_PORT`     | Redis port. Default: 6379                                       |\n\n_Example_: `export N_OPT_JOBS = 2` to use 2 cores for hyperparam search.\n\n## :boom: Sample Usage\n\n__Advanced Python tutorials__ can be found in the [Python tutorials section](#high_brightness-tutorials).\n\n\n__R examples__ can be found in the [R tutorials section](https://github.com/vanderschaarlab/autoprognosis/tree/main/tutorials/bindings/R).\n\nList the available classifiers\n```python\nfrom autoprognosis.plugins.prediction.classifiers import Classifiers\nprint(Classifiers().list_available())\n```\n\nCreate a study for classifiers\n```python\nfrom sklearn.datasets import load_breast_cancer\n\nfrom autoprognosis.studies.classifiers import ClassifierStudy\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_estimator\n\n\nX, Y = load_breast_cancer(return_X_y=True, as_frame=True)\n\ndf = X.copy()\ndf[\"target\"] = Y\n\nstudy_name = \"example\"\n\nstudy = ClassifierStudy(\n    study_name=study_name,\n    dataset=df,  # pandas DataFrame\n    target=\"target\",  # the label column in the dataset\n)\nmodel = study.fit()\n\n# Predict the probabilities of each class using the model\nmodel.predict_proba(X)\n```\n\n\n__(Advanced)__ Customize the study for classifiers\n```python\nfrom pathlib import Path\n\nfrom sklearn.datasets import load_breast_cancer\n\nfrom autoprognosis.studies.classifiers import ClassifierStudy\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_estimator\n\n\nX, Y = load_breast_cancer(return_X_y=True, as_frame=True)\n\ndf = X.copy()\ndf[\"target\"] = Y\n\nworkspace = Path(\"workspace\")\nstudy_name = \"example\"\n\nstudy = ClassifierStudy(\n    study_name=study_name,\n    dataset=df,  # pandas DataFrame\n    target=\"target\",  # the label column in the dataset\n    num_iter=100,  # how many trials to do for each candidate\n    timeout=60,  # seconds\n    classifiers=[\"logistic_regression\", \"lda\", \"qda\"],\n    workspace=workspace,\n)\n\nstudy.run()\n\noutput = workspace / study_name / \"model.p\"\nmodel = load_model_from_file(output)\n\n# \u003cmodel\u003e contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.\n# This way, we can further benchmark the selected model on the training set.\nmetrics = evaluate_estimator(model, X, Y)\n\nprint(f\"model {model.name()} -\u003e {metrics['str']}\")\n\n# Train the model\nmodel.fit(X, Y)\n\n# Predict the probabilities of each class using the model\nmodel.predict_proba(X)\n```\n\nList the available regressors\n```python\nfrom autoprognosis.plugins.prediction.regression import Regression\nprint(Regression().list_available())\n```\n\nCreate a Regression study\n```python\n# third party\nimport pandas as pd\n\n# autoprognosis absolute\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_regression\nfrom autoprognosis.studies.regression import RegressionStudy\n\n# Load dataset\ndf = pd.read_csv(\n    \"https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat\",\n    header=None,\n    sep=\"\\\\t\",\n)\nlast_col = df.columns[-1]\ny = df[last_col]\nX = df.drop(columns=[last_col])\n\ndf = X.copy()\ndf[\"target\"] = y\n\n# Search the model\nstudy_name=\"regression_example\"\nstudy = RegressionStudy(\n    study_name=study_name,\n    dataset=df,  # pandas DataFrame\n    target=\"target\",  # the label column in the dataset\n)\nmodel = study.fit()\n\n# Predict using the model\nmodel.predict(X)\n```\n\n__Advanced__ Customize the Regression study\n```python\n# stdlib\nfrom pathlib import Path\n\n# third party\nimport pandas as pd\n\n# autoprognosis absolute\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_regression\nfrom autoprognosis.studies.regression import RegressionStudy\n\n# Load dataset\ndf = pd.read_csv(\n    \"https://archive.ics.uci.edu/ml/machine-learning-databases/00291/airfoil_self_noise.dat\",\n    header=None,\n    sep=\"\\\\t\",\n)\nlast_col = df.columns[-1]\ny = df[last_col]\nX = df.drop(columns=[last_col])\n\ndf = X.copy()\ndf[\"target\"] = y\n\n# Search the model\nworkspace = Path(\"workspace\")\nworkspace.mkdir(parents=True, exist_ok=True)\n\nstudy_name=\"regression_example\"\nstudy = RegressionStudy(\n    study_name=study_name,\n    dataset=df,  # pandas DataFrame\n    target=\"target\",  # the label column in the dataset\n    num_iter=10,  # how many trials to do for each candidate. Default: 50\n    num_study_iter=2,  # how many outer iterations to do. Default: 5\n    timeout=50,  # timeout for optimization for each classfier. Default: 600 seconds\n    regressors=[\"linear_regression\", \"xgboost_regressor\"],\n    workspace=workspace,\n)\n\nstudy.run()\n\n# Test the model\noutput = workspace / study_name / \"model.p\"\n\nmodel = load_model_from_file(output)\n# \u003cmodel\u003e contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.\n# This way, we can further benchmark the selected model on the training set.\n\nmetrics = evaluate_regression(model, X, y)\n\nprint(f\"Model {model.name()} score: {metrics['str']}\")\n\n# Train the model\nmodel.fit(X, y)\n\n\n# Predict using the model\nmodel.predict(X)\n```\n\nList available survival analysis estimators\n```python\nfrom autoprognosis.plugins.prediction.risk_estimation import RiskEstimation\nprint(RiskEstimation().list_available())\n```\nCreate a Survival analysis study\n```python\n# third party\nimport numpy as np\nfrom pycox import datasets\n\n# autoprognosis absolute\nfrom autoprognosis.studies.risk_estimation import RiskEstimationStudy\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_survival_estimator\n\ndf = datasets.gbsg.read_df()\ndf = df[df[\"duration\"] \u003e 0]\n\nX = df.drop(columns = [\"duration\"])\nT = df[\"duration\"]\nY = df[\"event\"]\n\neval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]\n\nstudy_name = \"example_risks\"\n\nstudy = RiskEstimationStudy(\n    study_name=study_name,\n    dataset=df,\n    target=\"event\",\n    time_to_event=\"duration\",\n    time_horizons=eval_time_horizons,\n)\n\nmodel = study.fit()\n\n# Predict using the model\nmodel.predict(X, eval_time_horizons)\n```\n\n__Advanced__ Customize the Survival analysis study\n```python\n# stdlib\nimport os\nfrom pathlib import Path\n\n# third party\nimport numpy as np\nfrom pycox import datasets\n\n# autoprognosis absolute\nfrom autoprognosis.studies.risk_estimation import RiskEstimationStudy\nfrom autoprognosis.utils.serialization import load_model_from_file\nfrom autoprognosis.utils.tester import evaluate_survival_estimator\n\ndf = datasets.gbsg.read_df()\ndf = df[df[\"duration\"] \u003e 0]\n\nX = df.drop(columns = [\"duration\"])\nT = df[\"duration\"]\nY = df[\"event\"]\n\neval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]\n\nworkspace = Path(\"workspace\")\nstudy_name = \"example_risks\"\n\nstudy = RiskEstimationStudy(\n    study_name=study_name,\n    dataset=df,\n    target=\"event\",\n    time_to_event=\"duration\",\n    time_horizons=eval_time_horizons,\n    num_iter=10,\n    num_study_iter=1,\n    timeout=10,\n    risk_estimators=[\"cox_ph\", \"survival_xgboost\"],\n    score_threshold=0.5,\n    workspace=workspace,\n)\n\nstudy.run()\n\noutput = workspace / study_name / \"model.p\"\n\nmodel = load_model_from_file(output)\n# \u003cmodel\u003e contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.\n# This way, we can further benchmark the selected model on the training set.\n\nmetrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)\n\nprint(f\"Model {model.name()} score: {metrics['str']}\")\n\n# Train the model\nmodel.fit(X, T, Y)\n\n# Predict using the model\nmodel.predict(X, eval_time_horizons)\n```\n\n## :high_brightness: Tutorials\n\n### Plugins\n\n\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QO7K3JqW8l4pgVSLxjVezTu5IfD9yHB-?usp=sharing) [ Imputation](tutorials/plugins/tutorial_00_imputation_plugins.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WQGZXQkQs0Wg5stB9fk-RvYey35ADIZu?usp=sharing)[ Preprocessing](tutorials/plugins/tutorial_01_preprocessing_plugins.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WTzO_2hqaEOvvATHPSIcW220xc1WaJlC?usp=sharing)[ Classification](tutorials/plugins/tutorial_02_classification_plugins.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/17bLtKUjN8ilHw4Cm7-53kiC0vCJO_pVb?usp=sharing)[ Pipelines](tutorials/plugins/tutorial_03_pipelines.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1K0yVwm4jQrXRbMKJ-em7tTYgHXWtoK5c?usp=sharing)[ Interpretability](tutorials/plugins/tutorial_04_interpretability.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1bY4CbiqMe2uoqeUu2d49aIdYRbtP156X?usp=sharing)[ Survival Analysis](tutorials/plugins/tutorial_05_survival_analysis_plugins.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1UK6WbsviT5nOQ_BAHSFYIjhpKtwppnUU?usp=sharing)[ Regression](tutorials/plugins/tutorial_06_regression_plugins.ipynb)\n\n### AutoML\n\n\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-lPuQAtjHESl32ahFQYsFl8ujAnDWxEJ?usp=sharing)[ Classification tasks](tutorials/automl/tutorial_00_classification_study.ipynb)\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/16UDaA3F5JGw_YVY8XlYqWjfxcUV1OHJo?usp=sharing)[ Classification tasks with imputation](tutorials/automl/tutorial_01_automl_classification_with_imputation.ipynb)\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1DtZCqebhaYdKB3ci5dr3hT0KvZPaTUOi?usp=sharing)[ Survival analysis tasks](tutorials/automl/tutorial_02_survival_analysis_study.ipynb)\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sFVnnxjRMCNVIn-Ikc--Ja44U0Ll4joY?usp=sharing)[ Survival analysis tasks with imputation](tutorials/automl/tutorial_03_automl_survival_analysis_with_imputation.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1HLhWI-tRZn4e9ijQ6iEIuppuDszgWkCC?usp=sharing)[ Regression tasks](tutorials/automl/tutorial_04_regression.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1eHw1l79_m3vq9y-0WpllCMBSD7DQajWO?usp=sharing)[ Classifiers with explainers](tutorials/automl/tutorial_05_classification_with_explainers.ipynb)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1dm3cRmo-jD6x7V5WePciDpcauqtUt6lS?usp=sharing)[ Multiple imputation example](tutorials/automl/tutorial_06_automl_multiple_imputation_example.ipynb)\n\n### Building a demonstrator\n\n\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lqbElEVJa2Q0JDsXPgb8K_QUTDcZvUQq?usp=sharing)[ Classification demonstrator](tutorials/demonstrators/tutorial_00_build_a_demonstrator_classification.ipynb)\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ZwjD9RkosCtboyblH4C8sQV1DuGY1H2X?usp=sharing)[ Survival analysis demonstrator](tutorials/demonstrators/tutorial_01_build_a_demonstrator_survival_analysis.ipynb)\n\n### AutoPrognosis 101 Tutorial Series\n - [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1axBQRnGCeh6vqhisfMNBLe9jHNKhVvG2)[ 00. Run a classification study](https://colab.research.google.com/drive/1axBQRnGCeh6vqhisfMNBLe9jHNKhVvG2)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RIxuYuFyTlaE1RZ0-y-W2kx54_vlWYTL)[ 01. Run a regression study](https://colab.research.google.com/drive/1RIxuYuFyTlaE1RZ0-y-W2kx54_vlWYTL)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/13shRlJgADizDgWJP0tN8HhIVfT8vnCeS)[ 02. Run a survival analysis study](https://colab.research.google.com/drive/13shRlJgADizDgWJP0tN8HhIVfT8vnCeS)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/137KvWXXYZXaajbrVBW_vEeCEJmyXA4OZ)[ 03. Run a study and interpret the model](https://colab.research.google.com/drive/137KvWXXYZXaajbrVBW_vEeCEJmyXA4OZ)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/13QMhuDfuXJNllvCa09T27YSyaJlmpBjP)[ 04. What’s a plugin? Survival analysis example](https://colab.research.google.com/drive/13QMhuDfuXJNllvCa09T27YSyaJlmpBjP)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1nl0ZUWbglmJQzTowLoSOz2pmHIzBZm-z)[ 05. Pipelines](https://colab.research.google.com/drive/1nl0ZUWbglmJQzTowLoSOz2pmHIzBZm-z)\n- [![Test In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Y47ortMzmqvc0JPvptIsiNiHFoynni-j)[ 06. [Advanced] Creating your own plugin: preprocessing example](https://colab.research.google.com/drive/1Y47ortMzmqvc0JPvptIsiNiHFoynni-j)\n\n\n## :zap: Plugins\n\n### Imputation methods\n\n\n```python\nfrom autoprognosis.plugins.imputers import  Imputers\n\nimputer = Imputers().get(\u003cNAME\u003e)\n```\n\n| Name | Description |\n|--- | --- |\n|**hyperimpute**|Iterative imputer using both regression and classification methods based on linear models, trees, XGBoost, CatBoost and neural nets|\n|**mean**|Replace the missing values using the mean along each column with [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)|\n|**median**|Replace the missing values using the median along each column with [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html) |\n|**most_frequent**|Replace the missing values using the most frequent value along each column with [`SimpleImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html)|\n|**missforest**|Iterative imputation method based on Random Forests using [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer) and [`ExtraTreesRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html)|\n|**ice**| Iterative imputation method based on regularized linear regression using [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer) and [`BayesianRidge`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html)|\n|**mice**| Multiple imputations based on ICE using [`IterativeImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer) and [`BayesianRidge`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html)|\n|**softimpute**|  [`Low-rank matrix approximation via nuclear-norm regularization`](https://jmlr.org/papers/volume16/hastie15a/hastie15a.pdf)| [`plugin_softimpute.py`](src/hyperimpute/plugins/imputers/plugin_softimpute.py)|\n|**EM**|Iterative procedure which uses other variables to impute a value (Expectation), then checks whether that is the value most likely (Maximization) - [`EM imputation algorithm`](https://joon3216.github.io/research_materials/2019/em_imputation.html)|\n|**gain**|[`GAIN: Missing Data Imputation using Generative Adversarial Nets`](https://arxiv.org/abs/1806.02920)|\n\n\n### Preprocessing methods\n```python\nfrom autoprognosis.plugins.preprocessors import Preprocessors\n\npreprocessor = Preprocessors().get(\u003cNAME\u003e)\n```\n| Name | Description |\n|--- | --- |\n| **maxabs_scaler**  | Scale each feature by its maximum absolute value. [`MaxAbsScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html)|\n| **scaler** |Standardize features by removing the mean and scaling to unit variance. - [`StandardScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)|\n|**feature_normalizer** | Normalize samples individually to unit norm. [`Normalizer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer)|\n|**normal_transform** |Transform features using quantiles information.[`QuantileTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)|\n|**uniform_transform** |Transform features using quantiles information.[`QuantileTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)|\n|**minmax_scaler** |Transform features by scaling each feature to a given range.[`MinMaxScaler`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)|\n\n\n### Classification\n```python\nfrom autoprognosis.plugins.prediction.classifiers import Classifiers\n\nclassifier = Classifiers().get(\u003cNAME\u003e)\n```\n\n| Name | Description |\n|--- | --- |\n| **neural_nets**  | PyTorch based neural net classifier.|\n| **logistic_regression**  | [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)|\n| **catboost**  |Gradient boosting on decision trees - [`CatBoost`](https://catboost.ai/)|\n| **random_forest**  | A random forest classifier. [`RandomForestClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)|\n| **tabnet**  |[`TabNet : Attentive Interpretable Tabular Learning`](https://github.com/dreamquark-ai/tabnet)|\n| **xgboost**  |[`XGBoostClassifier`](https://xgboost.readthedocs.io/en/stable/)|\n\n\n### Survival Analysis\n```python\nfrom autoprognosis.plugins.prediction.risk_estimation import RiskEstimation\n\npredictor = RiskEstimation().get(\u003cNAME\u003e)\n```\n\n| Name | Description |\n|--- | --- |\n| **survival_xgboost**  | [`XGBoost Survival Embeddings`](https://github.com/loft-br/xgboost-survival-embeddings)|\n| **loglogistic_aft**  | [` Log-Logistic AFT model`](https://lifelines.readthedocs.io/en/latest/fitters/regression/LogLogisticAFTFitter.html)|\n| **deephit**  | [`DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks`](https://github.com/chl8856/DeepHit)|\n| **cox_ph**  | [`Cox’s proportional hazard model`](https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html)|\n| **weibull_aft**  | [`Weibull AFT model.`](https://lifelines.readthedocs.io/en/latest/fitters/regression/WeibullAFTFitter.html)|\n| **lognormal_aft**  | [`Log-Normal AFT model`](https://lifelines.readthedocs.io/en/latest/fitters/regression/LogNormalAFTFitter.html)|\n| **coxnet**  | [`CoxNet is a Cox proportional hazards model also referred to as DeepSurv`](https://github.com/havakv/pycox)|\n\n### Regression\n```python\nfrom autoprognosis.plugins.prediction.regression import Regression\n\nregressor = Regression().get(\u003cNAME\u003e)\n```\n\n| Name | Description |\n|--- | --- |\n| **tabnet_regressor**  |[`TabNet : Attentive Interpretable Tabular Learning`](https://github.com/dreamquark-ai/tabnet)|\n| **catboost_regressor**  |Gradient boosting on decision trees - [`CatBoost`](https://catboost.ai/)|\n| **random_forest_regressor**  |[`RandomForestRegressor`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)|\n| **xgboost_regressor**  |[`XGBoostClassifier`](https://xgboost.readthedocs.io/en/stable/)|\n| **neural_nets_regression**  |PyTorch-based neural net regressor.|\n| **linear_regression**  |[`LinearRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)|\n\n\n### Explainers\n```python\nfrom autoprognosis.plugins.explainers import Explainers\n\nexplainer = Explainers().get(\u003cNAME\u003e)\n```\n| Name | Description |\n|--- | --- |\n| **risk_effect_size**  | Feature importance using Cohen's distance between probabilities|\n| **lime**  |[`Lime: Explaining the predictions of any machine learning classifier`](https://github.com/marcotcr/lime)|\n| **symbolic_pursuit**  |[`Symbolic Pursuit`](Learning outside the black-box: at the pursuit of interpretable models)|\n| **shap_permutation_sampler**  |[`SHAP Permutation Sampler`](https://shap.readthedocs.io/en/latest/generated/shap.explainers.Permutation.html)|\n| **kernel_shap**  |[`SHAP KernelExplainer`](https://shap-lrjball.readthedocs.io/en/latest/generated/shap.KernelExplainer.html)|\n| **invase**  |[`INVASE: Instance-wise Variable Selection`](https://github.com/vanderschaarlab/invase)|\n\n\n\n### Uncertainty\n```python\nfrom autoprognosis.plugins.uncertainty import UncertaintyQuantification\nmodel = UncertaintyQuantification().get(\u003cNAME\u003e)\n```\n| Name | Description |\n|--- | --- |\n| **cohort_explainer**  ||\n| **conformal_prediction**  ||\n| **jackknife**  ||\n\n\n## :hammer: Test\nAfter installing the library, the tests can be executed using `pytest`\n```bash\n$ pip install .[dev]\n$ pytest -vxs -m \"not slow\"\n```\n\n## Citing\nIf you use this code, please cite the associated paper:\n\n```\n@misc{https://doi.org/10.48550/arxiv.2210.12090,\n  doi = {10.48550/ARXIV.2210.12090},\n  url = {https://arxiv.org/abs/2210.12090},\n  author = {Imrie, Fergus and Cebere, Bogdan and McKinney, Eoin F. and van der Schaar, Mihaela},\n  keywords = {Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {AutoPrognosis 2.0: Democratizing Diagnostic and Prognostic Modeling in Healthcare with Automated Machine Learning},\n  publisher = {arXiv},\n  year = {2022},\n  copyright = {Creative Commons Attribution 4.0 International}\n}\n```\n\n## References\n1. [AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning](https://arxiv.org/abs/1802.07207)\n2. [Prognostication and Risk Factors for Cystic Fibrosis via Automated Machine Learning](https://www.nature.com/articles/s41598-018-29523-2)\n3. [Cardiovascular Disease Risk Prediction using Automated Machine Learning: A Prospective Study of 423,604 UK Biobank Participants](https://www.ncbi.nlm.nih.gov/pubmed/31091238)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvanderschaarlab%2Fautoprognosis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvanderschaarlab%2Fautoprognosis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvanderschaarlab%2Fautoprognosis/lists"}