{"id":14958232,"url":"https://github.com/modal-python/modal","last_synced_at":"2025-05-14T19:09:14.579Z","repository":{"id":40614225,"uuid":"110697473","full_name":"modAL-python/modAL","owner":"modAL-python","description":"A modular active learning framework for Python","archived":false,"fork":false,"pushed_at":"2024-02-26T15:38:11.000Z","size":9570,"stargazers_count":2284,"open_issues_count":106,"forks_count":325,"subscribers_count":44,"default_branch":"master","last_synced_at":"2025-05-14T19:09:11.301Z","etag":null,"topics":["active-learning","active-learning-module","bayesian-optimization","machine-learning","machine-learning-algorithms","machine-learning-api","machine-learning-library","python","scikit-learn"],"latest_commit_sha":null,"homepage":"https://modAL-python.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/modAL-python.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-14T14:01:15.000Z","updated_at":"2025-05-13T16:47:08.000Z","dependencies_parsed_at":"2023-02-16T08:10:26.840Z","dependency_job_id":"bc64bc01-30d8-4e30-b777-9c3cbf431c0b","html_url":"https://github.com/modAL-python/modAL","commit_stats":null,"previous_names":["cosmic-cortex/modal"],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modAL-python%2FmodAL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modAL-python%2FmodAL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modAL-python%2FmodAL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/modAL-python%2FmodAL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/modAL-python","download_url":"https://codeload.github.com/modAL-python/modAL/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254209859,"owners_count":22032897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","active-learning-module","bayesian-optimization","machine-learning","machine-learning-algorithms","machine-learning-api","machine-learning-library","python","scikit-learn"],"created_at":"2024-09-24T13:16:33.127Z","updated_at":"2025-05-14T19:09:12.842Z","avatar_url":"https://github.com/modAL-python.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg src=\"https://modal-python.readthedocs.io/en/latest/_static/modAL_b.png\" alt=\"modAL\" style=\"width: 400px;\"\u003e\n\nModular Active Learning framework for Python3\n\n[![travis-ci-master](https://travis-ci.org/modAL-python/modAL.svg?branch=master)](https://travis-ci.org/modAL-python/modAL) [![codecov-master](https://codecov.io/gh/modAL-python/modAL/branch/master/graph/badge.svg)](https://codecov.io/gh/modAL-python/modAL) [![readthedocs](https://readthedocs.org/projects/modal-python/badge/?version=latest)](http://modal-python.readthedocs.io/en/latest/?badge=latest)\n\n## Page contents\n- [Introduction](#introduction)  \n- [Active learning from bird's-eye view](#active-learning)  \n- [modAL in action](#modAL-in-action)\n  - [From zero to one in a few lines of code](#initialization)  \n  - [Replacing parts quickly](#replacing-parts)  \n  - [Replacing parts with your own solutions](#replacing-parts-with-your-own-solutions)  \n  - [An example with active regression](#active-regression)\n  - [Additional examples](#additional-examples)  \n- [Installation](#installation)  \n- [Documentation](#documentation)  \n- [Citing](#citing)  \n- [About the developer](#about-the-developer)\n\n# Introduction\u003ca name=\"introduction\"\u003e\u003c/a\u003e\nmodAL is an active learning framework for Python3, designed with *modularity, flexibility* and *extensibility* in mind. Built on top of scikit-learn, it allows you to rapidly create active learning workflows with nearly complete freedom. What is more, you can easily replace parts with your custom built solutions, allowing you to design novel algorithms with ease.\n\n# Active learning from bird's-eye view\u003ca name=\"active-learning\"\u003e\u003c/a\u003e\nWith the recent explosion of available data, you have can have millions of unlabelled examples with a high cost to obtain labels. For instance, when trying to predict the sentiment of tweets, obtaining a training set can require immense manual labour. But worry not, active learning comes to the rescue! In general, AL is a framework allowing you to increase classification performance by intelligently querying you to label the most informative instances. To give an example, suppose that you have the following data and classifier with shaded regions signifying the classification probability.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://modal-python.readthedocs.io/en/latest/_images/motivating-example.png\" height=\"600px\" width=\"600px\"/\u003e\n\u003c/p\u003e\n\nSuppose that you can query the label of an unlabelled instance, but it costs you a lot. Which one would you choose? By querying an instance in the uncertain region, surely you obtain more information than querying by random. Active learning gives you a set of tools to handle problems like this. In general, an active learning workflow looks like the following.\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"https://modal-python.readthedocs.io/en/latest/_images/active-learning.png\"/\u003e\n\u003c/p\u003e\n\nThe key components of any workflow are the **model** you choose, the **uncertainty** measure you use and the **query** strategy you apply to request labels. With modAL, instead of choosing from a small set of built-in components, you have the freedom to seamlessly integrate scikit-learn or Keras models into your algorithm and easily tailor your custom query strategies and uncertainty measures.\n\n# modAL in action\u003ca name=\"modAL-in-action\"\u003e\u003c/a\u003e\nLet's see what modAL can do for you!\n\n## From zero to one in a few lines of code\u003ca name=\"initialization\"\u003e\u003c/a\u003e\nActive learning with a scikit-learn classifier, for instance RandomForestClassifier, can be as simple as the following.\n```python\nfrom modAL.models import ActiveLearner\nfrom sklearn.ensemble import RandomForestClassifier\n\n# initializing the learner\nlearner = ActiveLearner(\n    estimator=RandomForestClassifier(),\n    X_training=X_training, y_training=y_training\n)\n\n# query for labels\nquery_idx, query_inst = learner.query(X_pool)\n\n# ...obtaining new labels from the Oracle...\n\n# supply label for queried instance\nlearner.teach(X_pool[query_idx], y_new)\n```\n\n## Replacing parts quickly\u003ca name=\"replacing-parts\"\u003e\u003c/a\u003e\nIf you would like to use different uncertainty measures and query strategies than the default uncertainty sampling, you can either replace them with several built-in strategies or you can design your own by following a few very simple design principles. For instance, replacing the default uncertainty measure to classification entropy looks the following.\n```python\nfrom modAL.models import ActiveLearner\nfrom modAL.uncertainty import entropy_sampling\nfrom sklearn.ensemble import RandomForestClassifier\n\nlearner = ActiveLearner(\n    estimator=RandomForestClassifier(),\n    query_strategy=entropy_sampling,\n    X_training=X_training, y_training=y_training\n)\n```\n\n## Replacing parts with your own solutions\u003ca name=\"replacing-parts-with-your-own-solutions\"\u003e\u003c/a\u003e\nmodAL was designed to make it easy for you to implement your own query strategy. For example, implementing and using a simple random sampling strategy is as easy as the following.\n```python\nimport numpy as np\n\ndef random_sampling(classifier, X_pool):\n    n_samples = len(X_pool)\n    query_idx = np.random.choice(range(n_samples))\n    return query_idx, X_pool[query_idx]\n\nlearner = ActiveLearner(\n    estimator=RandomForestClassifier(),\n    query_strategy=random_sampling,\n    X_training=X_training, y_training=y_training\n)\n```\nFor more details on how to implement your custom strategies, visit the page [Extending modAL](https://modal-python.readthedocs.io/en/latest/content/overview/Extending-modAL.html)!\n\n## An example with active regression\u003ca name=\"active-regression\"\u003e\u003c/a\u003e\nTo see modAL in *real* action, let's consider an active regression problem with Gaussian Processes! In this example, we shall try to learn the *noisy sine* function:\n```python\nimport numpy as np\n\nX = np.random.choice(np.linspace(0, 20, 10000), size=200, replace=False).reshape(-1, 1)\ny = np.sin(X) + np.random.normal(scale=0.3, size=X.shape)\n```\nFor active learning, we shall define a custom query strategy tailored to Gaussian processes. In a nutshell, a *query stategy* in modAL is a function taking (at least) two arguments (an estimator object and a pool of examples), outputting the index of the queried instance. In our case, the arguments are ```regressor``` and ```X```.\n```python\ndef GP_regression_std(regressor, X):\n    _, std = regressor.predict(X, return_std=True)\n    return np.argmax(std)\n```\nAfter setting up the query strategy and the data, the active learner can be initialized.\n```python\nfrom modAL.models import ActiveLearner\nfrom sklearn.gaussian_process import GaussianProcessRegressor\nfrom sklearn.gaussian_process.kernels import WhiteKernel, RBF\n\nn_initial = 5\ninitial_idx = np.random.choice(range(len(X)), size=n_initial, replace=False)\nX_training, y_training = X[initial_idx], y[initial_idx]\n\nkernel = RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e3)) \\\n         + WhiteKernel(noise_level=1, noise_level_bounds=(1e-10, 1e+1))\n\nregressor = ActiveLearner(\n    estimator=GaussianProcessRegressor(kernel=kernel),\n    query_strategy=GP_regression_std,\n    X_training=X_training.reshape(-1, 1), y_training=y_training.reshape(-1, 1)\n)\n```\nThe initial regressor is not very accurate.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://modal-python.readthedocs.io/en/latest/_images/gp-initial.png\"\u003e\n\u003c/p\u003e\n\nThe blue band enveloping the regressor represents the standard deviation of the Gaussian process at the given point. Now we are ready to do active learning!\n```python\n# active learning\nn_queries = 10\nfor idx in range(n_queries):\n    query_idx, query_instance = regressor.query(X)\n    regressor.teach(X[query_idx].reshape(1, -1), y[query_idx].reshape(1, -1))\n```\nAfter a few queries, we can see that the prediction is much improved.\n\n\u003cp align=\"center\"\u003e\n \u003cimg src=\"https://modal-python.readthedocs.io/en/latest/_images/gp-final.png\"\u003e\n\u003c/p\u003e\n\n## Additional examples\u003ca name=\"additional-examples\"\u003e\u003c/a\u003e\nIncluding this, many examples are available:\n- [Pool-based sampling](https://modal-python.readthedocs.io/en/latest/content/examples/pool-based_sampling.html)  \n- [Stream-based sampling](https://modal-python.readthedocs.io/en/latest/content/examples/stream-based_sampling.html)  \n- [Active regression](https://modal-python.readthedocs.io/en/latest/content/examples/active_regression.html)  \n- [Ensemble regression](https://modal-python.readthedocs.io/en/latest/content/examples/ensemble_regression.html)  \n- [Bayesian optimization](https://modal-python.readthedocs.io/en/latest/content/examples/bayesian_optimization.html)  \n- [Query by committee](https://modal-python.readthedocs.io/en/latest/content/examples/query_by_committee.html)  \n- [Bootstrapping and bagging](https://modal-python.readthedocs.io/en/latest/content/examples/bootstrapping_and_bagging.html)  \n- [Keras integration](https://modal-python.readthedocs.io/en/latest/content/examples/Keras_integration.html)\n\n# Installation\u003ca name=\"installation\"\u003e\u003c/a\u003e\nmodAL requires\n- Python \u003e= 3.5\n- NumPy \u003e= 1.13\n- SciPy \u003e= 0.18\n- scikit-learn \u003e= 0.18\n\nYou can install modAL directly with pip:  \n```\npip install modAL-python\n```\nAlternatively, you can install modAL directly from source:  \n```\npip install git+https://github.com/modAL-python/modAL.git\n```\n\n# Documentation\u003ca name=\"documentation\"\u003e\u003c/a\u003e\nYou can find the documentation of modAL at [https://modAL-python.github.io](https://modAL-python.github.io), where several tutorials and working examples are available, along with a complete API reference. For running the examples, Matplotlib \u003e= 2.0 is recommended.\n\n# Citing\u003ca name=\"citing\"\u003e\u003c/a\u003e\nIf you use modAL in your projects, you can cite it as\n```\n@article{modAL2018,\n    title={mod{AL}: {A} modular active learning framework for {P}ython},\n    author={Tivadar Danka and Peter Horvath},\n    url={https://github.com/modAL-python/modAL},\n    note={available on arXiv at \\url{https://arxiv.org/abs/1805.00979}}\n}\n```\n\n# About the developer\u003ca name=\"about-the-developer\"\u003e\nmodAL is developed by me, [Tivadar Danka](https://www.tivadardanka.com) (aka [cosmic-cortex](https://github.com/cosmic-cortex) in GitHub). I have a PhD in pure mathematics, but I fell in love with biology and machine learning right after I finished my PhD. I have changed fields and now I work in the [Bioimage Analysis and Machine Learning Group of Peter Horvath](http://group.szbk.u-szeged.hu/sysbiol/horvath-peter-lab-index.html), where I am working to develop active learning strategies for intelligent sample analysis in biology. During my work I realized that in Python, creating and prototyping active learning workflows can be made really easy and fast with scikit-learn, so I ended up developing a general framework for this. The result is modAL :) If you have any questions, requests or suggestions, you can contact me at \u003ca href=\"mailto:85a5187a@opayq.com\"\u003e85a5187a@opayq.com\u003c/a\u003e! I hope you'll find modAL useful!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodal-python%2Fmodal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmodal-python%2Fmodal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmodal-python%2Fmodal/lists"}