{"id":27833598,"url":"https://github.com/ElementAI/baal","last_synced_at":"2025-05-02T11:01:13.162Z","repository":{"id":37702984,"uuid":"211948492","full_name":"baal-org/baal","owner":"baal-org","description":"Bayesian active learning library for research and industrial usecases.","archived":false,"fork":false,"pushed_at":"2024-06-27T20:02:41.000Z","size":47017,"stargazers_count":895,"open_issues_count":20,"forks_count":87,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-04-21T04:43:33.847Z","etag":null,"topics":["active-learning","ai","bayesian-active-learning","deep-learning","machine-learning","python","pytorch"],"latest_commit_sha":null,"homepage":"https://baal.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/baal-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":"docs/support/faq.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-09-30T20:16:26.000Z","updated_at":"2025-04-07T19:46:22.000Z","dependencies_parsed_at":"2023-01-31T07:01:12.684Z","dependency_job_id":"5d1fde3a-bb67-4437-b98c-715b626234a0","html_url":"https://github.com/baal-org/baal","commit_stats":{"total_commits":192,"total_committers":18,"mean_commits":"10.666666666666666","dds":0.5104166666666667,"last_synced_commit":"7e4036384d5a8e2979bc3fc97e90859ae5deb478"},"previous_names":["elementai/baal"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baal-org%2Fbaal","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baal-org%2Fbaal/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baal-org%2Fbaal/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/baal-org%2Fbaal/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/baal-org","download_url":"https://codeload.github.com/baal-org/baal/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252028283,"owners_count":21682954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","ai","bayesian-active-learning","deep-learning","machine-learning","python","pytorch"],"created_at":"2025-05-02T11:00:35.963Z","updated_at":"2025-05-02T11:01:13.156Z","avatar_url":"https://github.com/baal-org.png","language":"Python","funding_links":[],"categories":["3.3 AL in AI Fields - 人工智能背景中的主动学习"],"sub_categories":["**Tutorials - 教程**"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg height=15% width=25% src=\"https://i.imgur.com/Zdzb2QZ.png\" style=\"max-width: 100%;border-radius: 25%;\"\u003e\n  \u003ch1 align=\"center\"\u003eBayesian Active Learning (Baal)\n   \u003cbr\u003e\n  \u003ca href=\"https://github.com/baal-org/baal/actions/workflows/pythonci.yml\"\u003e\n    \u003cimg alt=\"Python CI\" src=\"https://github.com/baal-org/baal/actions/workflows/pythonci.yml/badge.svg\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://baal.readthedocs.io/en/latest/?badge=latest\"\u003e\n    \u003cimg alt=\"Documentation Status\" src=\"https://readthedocs.org/projects/baal/badge/?version=latest\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://join.slack.com/t/baal-world/shared_invite/zt-z0izhn4y-Jt6Zu5dZaV2rsAS9sdISfg\"\u003e\n    \u003cimg alt=\"Slack\" src=\"https://img.shields.io/badge/slack-chat-green.svg?logo=slack\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/Elementai/baal/blob/master/LICENSE\"\u003e\n    \u003cimg alt=\"Licence\" src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://calendly.com/baal-org/30min\"\u003e\n    \u003cimg alt=\"Office hours\" src=\"https://img.shields.io/badge/Office hours-Calendly-blue.svg\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/project/baal\"\u003e\n    \u003cimg alt=\"Downloads\" src=\"https://static.pepy.tech/badge/baal\"/\u003e\n  \u003c/a\u003e\n\n  \u003c/h1\u003e\n\u003c/p\u003e\n\nBaal is an active learning library that supports both industrial applications and research usecases.\n\nRead the documentation at https://baal.readthedocs.io.\n\nOur paper can be read on [arXiv](https://arxiv.org/abs/2006.09916). It includes tips and tricks to make active learning\nusable in production.\n\nFor a quick introduction to Baal and Bayesian active learning, please see these links:\n\n- [Seminar with Label Studio](https://www.youtube.com/watch?v=HG7imRQN3-k)\n- [User guide](https://baal.readthedocs.io/en/latest/user_guide/index.html)\n- [Bayesian active learning presentation](https://drive.google.com/file/d/13UUDsS1rvqDnXza7L0j4bnqyhOT5TDSt/view?usp=sharing)\n\n*Baal was initially developed at [ElementAI](https://www.elementai.com/) (acquired by ServiceNow in 2021), but is now independant.*\n\n\n## Installation and requirements\n\nBaal requires `Python\u003e=3.8`.\n\nTo install Baal using pip: `pip install baal`\n\nWe use [Poetry](https://python-poetry.org/) as our package manager.\nTo install Baal from source: `poetry install`\n\n## Papers using Baal\n\n- [Bayesian active learning for production, a systematic study and a reusable library\n  ](https://arxiv.org/abs/2006.09916) (Atighehchian et al. 2020)\n- [Synbols: Probing Learning Algorithms with Synthetic Datasets\n  ](https://nips.cc/virtual/2020/public/poster_0169cf885f882efd795951253db5cdfb.html) (Lacoste et al. 2020)\n- [Can Active Learning Preemptively Mitigate Fairness Issues?\n  ](https://arxiv.org/pdf/2104.06879.pdf) (Branchaud-Charron et al. 2021)\n- [Active learning with MaskAL reduces annotation effort for training Mask R-CNN](https://arxiv.org/abs/2112.06586) (\n  Blok et al. 2021)\n- [Stochastic Batch Acquisition for Deep Active Learning](https://arxiv.org/abs/2106.12059) (Kirsch et al. 2022)\n\n# What is active learning?\n\nActive learning is a special case of machine learning in which a learning algorithm is able to interactively query the\nuser (or some other information source) to obtain the desired outputs at new data points\n(to understand the concept in more depth, refer to our [tutorial](https://baal.readthedocs.io/en/latest/)).\n\n## Baal Framework\n\nAt the moment Baal supports the following methods to perform active learning.\n\n- Monte-Carlo Dropout (Gal et al. 2015)\n- MCDropConnect (Mobiny et al. 2019)\n- Deep ensembles\n- Semi-supervised learning\n\nIf you want to propose new methods, please submit an issue.\n\nThe **Monte-Carlo Dropout** method is a known approximation for Bayesian neural networks. In this method, the Dropout\nlayer is used both in training and test time. By running the model multiple times whilst randomly dropping weights, we\ncalculate the uncertainty of the prediction using one of the uncertainty measurements\nin [heuristics.py](baal/active/heuristics/heuristics.py).\n\nThe framework consists of four main parts, as demonstrated in the flowchart below:\n\n- ActiveLearningDataset\n- Heuristics\n- ModelWrapper\n- ActiveLearningLoop\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/learn/literature/images/Baalscheme.svg\"\u003e\n\u003c/p\u003e\n\nTo get started, wrap your dataset in our _[**ActiveLearningDataset**](baal/active/dataset/pytorch_dataset.py)_ class. This will ensure\nthat the dataset is split into\n`training` and `pool` sets. The `pool` set represents the portion of the training set which is yet to be labelled.\n\nWe provide a lightweight object _[**ModelWrapper**](baal/modelwrapper.py)_ similar to `keras.Model` to make it easier to\ntrain and test the model. If your model is not ready for active learning, we provide Modules to prepare them.\n\nFor example, the _[**MCDropoutModule**](baal/bayesian/dropout.py)_ wrapper changes the existing dropout layer to be used\nin both training and inference time and the `ModelWrapper` makes the specifies the number of iterations to run at\ntraining and inference.\n\nFinally, _[**ActiveLearningLoop**](baal/active/active_loop.py)_ automatically computes the uncertainty and label the most\nuncertain items in the pool.\n\nIn conclusion, your script should be similar to this:\n\n```python\ndataset = ActiveLearningDataset(your_dataset)\ndataset.label_randomly(INITIAL_POOL)  # label some data\nmodel = MCDropoutModule(your_model)\nwrapper = ModelWrapper(model, args=TrainingArgs(...))\nexperiment = ActiveLearningExperiment(\n    trainer=wrapper, # Huggingface or ModelWrapper to train\n    al_dataset=dataset, # Active learning dataset\n    eval_dataset=test_dataset, # Evaluation Dataset\n    heuristic=BALD(), # Uncertainty heuristic to use\n    query_size=100, # How many items to label per round.\n    iterations=20, # How many MC sampling to perform per item.\n    pool_size=None, # Optionally limit the size of the unlabelled pool.\n    criterion=None # Stopping criterion for the experiment.\n)\n# The experiment will run until all items are labelled.\nmetrics = experiment.start()\n```\n\nFor a complete experiment, see _[experiments/vgg_mcdropout_cifar10.py](experiments/vgg_mcdropout_cifar10.py)_ .\n\n### Re-run our Experiments\n\n```bash\ndocker build [--target base_baal] -t baal .\ndocker run --rm baal --gpus all python3 experiments/vgg_mcdropout_cifar10.py\n```\n\n### Use Baal for YOUR Experiments\n\nSimply clone the repo, and create your own experiment script similar to the example\nat _[experiments/vgg_mcdropout_cifar10.py](experiments/vgg_mcdropout_cifar10.py)_. Make sure to use the four main parts of Baal\nframework. _Happy running experiments_\n\n### Contributing!\n\nTo contribute, see [CONTRIBUTING.md](./CONTRIBUTING.md).\n\n### Who We Are!\n\n\"There is passion, yet peace; serenity, yet emotion; chaos, yet order.\"\n\nThe Baal team tests and implements the most recent papers on uncertainty estimation and active learning.\n\nCurrent maintainers:\n\n- [Parmida Atighehchian](mailto:patighehchian@twitter.com)\n- [Frédéric Branchaud-Charron](mailto:frederic.branchaud-charron@gmail.com)\n- [George Pearse](georgehwp26@gmail.com)\n\n### How to cite\n\nIf you used Baal in one of your project, we would greatly appreciate if you cite this library using this Bibtex:\n\n```\n@misc{atighehchian2019baal,\n  title={Baal, a bayesian active learning library},\n  author={Atighehchian, Parmida and Branchaud-Charron, Frederic and Freyberg, Jan and Pardinas, Rafael and Schell, Lorne\n          and Pearse, George},\n  year={2022},\n  howpublished={\\url{https://github.com/baal-org/baal/}},\n}\n```\n\n### Licence\n\nTo get information on licence of this API please read [LICENCE](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FElementAI%2Fbaal","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FElementAI%2Fbaal","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FElementAI%2Fbaal/lists"}