{"id":13677734,"url":"https://github.com/webis-de/small-text","last_synced_at":"2025-05-14T18:02:30.411Z","repository":{"id":37363619,"uuid":"370275343","full_name":"webis-de/small-text","owner":"webis-de","description":"Active Learning for Text Classification in Python","archived":false,"fork":false,"pushed_at":"2025-04-06T17:40:46.000Z","size":3182,"stargazers_count":614,"open_issues_count":18,"forks_count":70,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-05-14T18:02:04.288Z","etag":null,"topics":["active-learning","deep-learning","language-models","looking-for-contributors","machine-learning","natural-language-processing","nlp","python","pytorch","small-language-models","text-classification","transformers"],"latest_commit_sha":null,"homepage":"https://small-text.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webis-de.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-05-24T08:06:41.000Z","updated_at":"2025-05-07T11:37:32.000Z","dependencies_parsed_at":"2023-02-18T19:45:26.188Z","dependency_job_id":"ff6f4776-84f9-41c2-8ae3-a69ca1a8e991","html_url":"https://github.com/webis-de/small-text","commit_stats":{"total_commits":414,"total_committers":4,"mean_commits":103.5,"dds":0.09178743961352653,"last_synced_commit":"3d99fb5637490b7a4e0e7b776c33a76c015e5320"},"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webis-de%2Fsmall-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webis-de%2Fsmall-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webis-de%2Fsmall-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webis-de%2Fsmall-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webis-de","download_url":"https://codeload.github.com/webis-de/small-text/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254198453,"owners_count":22030964,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","deep-learning","language-models","looking-for-contributors","machine-learning","natural-language-processing","nlp","python","pytorch","small-language-models","text-classification","transformers"],"created_at":"2024-08-02T13:00:46.361Z","updated_at":"2025-05-14T18:02:30.294Z","avatar_url":"https://github.com/webis-de.png","language":"Python","readme":"[![PyPI](https://img.shields.io/pypi/v/small-text/v2.0.0.dev2)](https://pypi.org/project/small-text/)\n[![Conda Forge](https://img.shields.io/conda/v/conda-forge/small-text?label=conda-forge)](https://anaconda.org/conda-forge/small-text)\n[![codecov](https://codecov.io/gh/webis-de/small-text/branch/master/graph/badge.svg?token=P86CPABQOL)](https://codecov.io/gh/webis-de/small-text)\n[![Documentation Status](https://readthedocs.org/projects/small-text/badge/?version=v2.0.0.dev2)](https://small-text.readthedocs.io/en/v2.0.0.dev2/) \n![Maintained Yes](https://img.shields.io/badge/maintained-yes-green)\n[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen)](CONTRIBUTING.md)\n[![MIT License](https://img.shields.io/github/license/webis-de/small-text)](LICENSE)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.15163677.svg)](https://zenodo.org/records/15163677)\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"450\" src=\"https://github.com/webis-de/small-text/blob/dev/docs/_static/small-text-logo.png?raw=true\" alt=\"small-text logo\" /\u003e\n\u003c/p\u003e\n\n\u003e Active Learning for Text Classification in Python.\n\u003chr\u003e\n\n[Installation](#installation) | [Quick Start](#quick-start) | [Contribution](CONTRIBUTING.md) | [Changelog][changelog] | [**Docs**][documentation_main]\n\nSmall-Text provides state-of-the-art **Active Learning** for Text Classification. \nSeveral pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, \nwhich can be easily mixed and matched to build active learning experiments or applications.\n\n## What is Active Learning?\n[Active Learning](https://small-text.readthedocs.io/en/latest/active_learning.html) allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.\n\n\u003cp align=\"center\"\u003e\n\n\u003cimg src=\"https://raw.githubusercontent.com/webis-de/small-text/dev/docs/_static/learning-curve-example.gif?raw=true\" alt=\"Learning curve example for the TREC-6 dataset.\" width=\"60%\"\u003e\n\n\u003c/p\u003e\n\n\n## Features\n\n- Provides unified interfaces for Active Learning so that you can \n  easily mix and match query strategies with classifiers provided by [sklearn](https://scikit-learn.org/), [Pytorch](https://pytorch.org/), or [transformers](https://github.com/huggingface/transformers).\n- Supports GPU-based [Pytorch](https://pytorch.org/) models and integrates [transformers](https://github.com/huggingface/transformers) \n  so that you can use state-of-the-art Text Classification models for Active Learning.\n- GPU is supported but not required. In case of a CPU-only use case, \n  a lightweight installation only requires a minimal set of dependencies.\n- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).\n\n---\n\n## News\n\n**Version 2.0.0 dev2** ([v2.0.0.dev2][changelog_2.0.0dev2]) - April 6th, 2025\n  - This is a development release with the most changes so far. You can consider it an alpha release, which does not guarantee you stable interfaces yet, \n    but is otherwise ready to use.\n  - Version 2.0.0 offers cleaned up interfaces, new query strategies, improved classifiers, and new functionality such as vector indices. See the [changelog][changelog_2.0.0dev2] for a full list of changes.\n\n**Version 1.4.1** ([v1.4.1][changelog_1.4.1]) - August 18th, 2024\n  - Bugfix release.\n\n**Version 1.4.0** ([v1.4.0][changelog_1.4.0]) - June 9th, 2024\n  - New query strategy: [AnchorSubsampling](https://small-text.readthedocs.io/en/v1.3.3/components/query_strategies.html#small_text.query_strategies.subsampling.AnchorSubsampling) (aka [AnchorAL](https://arxiv.org/abs/2404.05623)).  \n    Special thanks to [Pietro Lesci](https://github.com/pietrolesci) for the correspondence and code review. \n\n**Paper published at EACL 2023 🎉**\n  - The [paper][paper_published] introducing small-text has been accepted at [EACL 2023](https://2023.eacl.org/). Meet us at the conference in May!\n  - Update: the paper was awarded [EACL Best System Demonstration](https://aclanthology.org/2023.eacl-demo.11/). Thank you, for your support!\n\n[For a complete list of changes, see the change log.][changelog]\n\n---\n\n## Installation\n\nSmall-Text can be easily installed via pip:\n\n```bash\npip install small-text\n```\n\nThe command results in a [slim installation][documentation_install] with only the necessary dependencies. \nFor a full installation via pip, you just need to include the `transformers` extra requirement:\n\n```bash\npip install small-text[transformers]\n```\n\nThe library requires Python 3.9 or newer. For using the GPU, CUDA 10.1 or newer is required. \nMore information regarding the installation can be found in the \n[documentation][documentation_install].\n\n\n## Quick Start\n\nFor a quick start, see the provided examples for [binary classification](examples/examplecode/binary_classification.py),\n[pytorch multi-class classification](examples/examplecode/pytorch_multiclass_classification.py), and \n[transformer-based multi-class classification](examples/examplecode/transformers_multiclass_classification.py),\nor check out the notebooks.\n\n### Notebooks\n\n\u003cdiv align=\"center\"\u003e\n\n| # | Notebook                                                                                                                                                                                                       |                                                                                                                                                                                                                                                  |\n| --- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| \n| 1 | [Intro: Active Learning for Text Classification with Small-Text](https://github.com/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb) |\n| 2 | [Using Stopping Criteria for Active Learning](https://github.com/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb)                                           | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/02-active-learning-with-stopping-criteria.ipynb)                        |\n| 3 | [Active Learning using SetFit](https://github.com/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/03-active-learning-with-setfit.ipynb)                                                                     | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/03-active-learning-with-setfit.ipynb)                                   |\n| 4 | [Using SetFit's Zero Shot Capabilities for Cold Start Initialization](https://github.com/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/04-zero-shot-cold-start.ipynb)                                     | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/webis-de/small-text/blob/v2.0.0.dev2/examples/notebooks/04-zero-shot-cold-start.ipynb)                                          |\n\n\u003c/div\u003e\n\n### Showcase\n\n- [Tutorial: 👂 Active learning for text classification with small-text][argilla_al_tutorial] (Use small-text conveniently from the [argilla][argilla] UI.)\n\nA full list of showcases can be found [in the docs][documentation_showcase].\n\n🎀 **Would you like to share your use case?** Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the [showcase section][documentation_showcase] or even here.\n\n## Documentation\n\nRead the latest documentation [here][documentation_main]. Noteworthy pages include:\n\n- [Overview of Query Strategies][documentation_query_strategies]\n- [Reproducibility Notes][documentation_reproducibility_notes]\n\n---\n\n## Scope of Features\n\n\u003ctable align=\"center\"\u003e\n  \u003ccaption\u003eExtension of Table 1 in the \u003ca href=\"https://aclanthology.org/2023.eacl-demo.11v2.pdf\" target=\"_blank\"\u003eEACL 2023 paper\u003c/a\u003e.\u003c/caption\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eName\u003c/th\u003e\n      \u003cth colspan=\"2\"\u003eActive Learning\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003cth\u003e\u003c/th\u003e\n      \u003cth\u003eQuery Strategies\u003c/th\u003e\n      \u003cth\u003eStopping Criteria\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003esmall-text v1.3.0\u003c/td\u003e\n      \u003ctd\u003e14\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003esmall-text v2.0.0\u003c/td\u003e\n      \u003ctd\u003e19\u003c/td\u003e\n      \u003ctd\u003e5\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nWe use the numbers only to show to tremendous progress that small-text has made over time. \nThere many features and improvements that are not reflected in these numbers.\n\n## Alternatives\n\n[modAL](https://github.com/modAL-python/modAL), [ALiPy](https://github.com/NUAA-AL/ALiPy), [libact](https://github.com/ntucllab/libact), [ALToolbox](https://github.com/AIRI-Institute/al_toolbox)\n\n---\n\n## Contribution\n\nContributions are welcome. Details can be found in [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## Acknowledgments\n\nThis software was created by Christopher Schröder ([@chschroeder](https://github.com/chschroeder)) at Leipzig University's [NLP group](http://asv.informatik.uni-leipzig.de/) \nwhich is a part of the [Webis](https://webis.de/) research network. \nThe encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.\n\n## Citation\n\nSmall-Text has been introduced in detail in the EACL23 System Demonstration Paper [\"Small-Text: Active Learning for Text Classification in Python\"](https://aclanthology.org/2023.eacl-demo.11/) which can be cited as follows:\n```\n@inproceedings{schroeder2023small-text,\n    title = \"Small-Text: Active Learning for Text Classification in Python\",\n    author = {Schr{\\\"o}der, Christopher  and  M{\\\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},\n    booktitle = \"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\",\n    month = may,\n    year = \"2023\",\n    address = \"Dubrovnik, Croatia\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2023.eacl-demo.11\",\n    pages = \"84--95\"\n}\n```\n\n## License\n\n[MIT License](LICENSE)\n\n\n[documentation_main]: https://small-text.readthedocs.io/en/v2.0.0.dev2/\n[documentation_install]: https://small-text.readthedocs.io/en/v2.0.0.dev2/install.html\n[documentation_query_strategies]: https://small-text.readthedocs.io/en/v2.0.0.dev2/components/query_strategies.html\n[documentation_showcase]: https://small-text.readthedocs.io/en/v2.0.0.dev2/showcase.html\n[documentation_reproducibility_notes]: https://small-text.readthedocs.io/en/v2.0.0.dev2/reproducibility_notes.html\n[changelog]: https://small-text.readthedocs.io/en/latest/changelog.html\n[changelog_1.4.0]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-0-2024-06-09\n[changelog_1.4.1]: https://small-text.readthedocs.io/en/latest/changelog.html#version-1-4-1-2024-08-18\n[changelog_2.0.0dev2]: https://small-text.readthedocs.io/en/latest/changelog.html#version-2-0-0-dev2-2025-04-06\n[argilla]: https://github.com/argilla-io/argilla\n[argilla_al_tutorial]: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html\n[paper_published]: https://aclanthology.org/2023.eacl-demo.11v2.pdf\n","funding_links":[],"categories":["public repositories","Python","Categories","3.3 AL in AI Fields - 人工智能背景中的主动学习"],"sub_categories":["Sampling as a step of the publication","📖 Natural Language Processing (NLP)","**Tutorials - 教程**"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebis-de%2Fsmall-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebis-de%2Fsmall-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebis-de%2Fsmall-text/lists"}