{"id":27833600,"url":"https://github.com/MLSysOps/alaas","last_synced_at":"2025-05-02T11:01:12.602Z","repository":{"id":45265822,"uuid":"492430884","full_name":"HuaizhengZhang/Active-Learning-as-a-Service","owner":"HuaizhengZhang","description":"A scalable \u0026 efficient active learning/data selection system for everyone.","archived":false,"fork":false,"pushed_at":"2024-07-08T18:05:34.000Z","size":1564,"stargazers_count":214,"open_issues_count":10,"forks_count":17,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-26T03:17:07.476Z","etag":null,"topics":["active-learning","automl","deep-learning","machine-learning","mlops","mlsys","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HuaizhengZhang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-05-15T08:35:12.000Z","updated_at":"2024-11-25T08:03:05.000Z","dependencies_parsed_at":"2024-08-03T03:09:50.539Z","dependency_job_id":"872268d6-d475-4e7f-adc0-fc4af3889f93","html_url":"https://github.com/HuaizhengZhang/Active-Learning-as-a-Service","commit_stats":null,"previous_names":["mlsysops/alaas","huaizhengzhang/active-learning-as-a-service","mlsysops/active-learning-as-a-service"],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HuaizhengZhang%2FActive-Learning-as-a-Service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HuaizhengZhang%2FActive-Learning-as-a-Service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HuaizhengZhang%2FActive-Learning-as-a-Service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HuaizhengZhang%2FActive-Learning-as-a-Service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HuaizhengZhang","download_url":"https://codeload.github.com/HuaizhengZhang/Active-Learning-as-a-Service/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252028283,"owners_count":21682954,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["active-learning","automl","deep-learning","machine-learning","mlops","mlsys","pytorch"],"created_at":"2025-05-02T11:00:36.101Z","updated_at":"2025-05-02T11:01:12.585Z","avatar_url":"https://github.com/HuaizhengZhang.png","language":"Python","funding_links":[],"categories":["3.3 AL in AI Fields - 人工智能背景中的主动学习"],"sub_categories":["**Tutorials - 教程**"],"readme":"# ALaaS: Active Learning as a Service.\n\n![PyPI](https://img.shields.io/pypi/v/alaas?color=green) ![Downloads](https://static.pepy.tech/badge/alaas) ![GitHub Actions Workflow Status](https://img.shields.io/github/actions/workflow/status/MLSysOps/Active-Learning-as-a-Service/test.yml)\n ![GitHub](https://img.shields.io/github/license/MLSysOps/alaas) ![Docker Pulls](https://img.shields.io/docker/pulls/huangyz0918/alaas)\n\n![](./docs/images/logo.svg)\n\nActive Learning as a Service (ALaaS) is a fast and scalable framework for automatically selecting a subset to be labeled\nfrom a full dataset so to reduce labeling cost. It provides an out-of-the-box and standalone experience for users to quickly\nutilize active learning.\n\n\nALaaS is featured for\n\n- :hatching_chick: **Easy-to-use** With \u003c10 lines of code to start the system to employ active learning.\n- :rocket: **Fast** Use the stage-level parallellism to achieve over 10x speedup than under-optimized active learning process.\n- :collision: **Elastic** Scale up and down multiple active workers, depending on the number of GPU devices.\n\n*The project is still under the active development. Welcome to join us!*\n\n- [Installation](https://github.com/MLSysOps/Active-Learning-as-a-Service#installation-construction)\n- [Quick Start](https://github.com/MLSysOps/Active-Learning-as-a-Service#quick-start-truck)\n- [ALaaS Server Customization (for Advance users)](https://github.com/MLSysOps/Active-Learning-as-a-Service#alaas-server-customization-wrench)\n- [Strategy Zoo](https://github.com/MLSysOps/Active-Learning-as-a-Service#strategy-zoo-art)\n- [Citation](https://github.com/MLSysOps/Active-Learning-as-a-Service#citation)\n\n\n## Installation :construction:\n\nYou can easily install the ALaaS by [PyPI](https://pypi.org/project/alaas/),\n\n```bash\npip install alaas\n```\n\nThe package of ALaaS contains both client and server parts. You can build an active data selection service on your own\nservers or just apply the client to perform data selection.\n\n:warning: For deep learning frameworks like [TensorFlow](https://www.tensorflow.org/) and [Pytorch](https://pytorch.org/), you may need to install manually since the version to meet your deployment can be different (as well as [transformers](https://pypi.org/project/transformers/) if you are running models from it).\n\nYou can also use [Docker](https://www.docker.com/) to run ALaaS: \n\n```bash\ndocker pull huangyz0918/alaas\n```\n\nand start a service by the following command:\n\n```bash\ndocker run -it --rm -p 8081:8081 \\\n        --mount type=bind,source=\u003cconfig path\u003e,target=/server/config.yml,readonly huangyz0918/alaas:latest\n```\n\n## Quick Start :truck:\n\nAfter the installation of ALaaS, you can easily start a local server, here is the simplest example that can be executed with only 2 lines of code. \n\n```python\nfrom alaas.server import Server\n\nServer.start()\n```\n\nThe example code (by default) will start an image data selection (PyTorch ResNet-18 for image classification task) HTTP server in port `8081` for you. After this, you can try to get the selection results on your own image dataset, a client-side example is like\n\n\n```bash\ncurl \\\n-X POST http://0.0.0.0:8081/post \\\n-H 'Content-Type: application/json' \\\n-d '{\"data\":[{\"uri\": \"https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png\"},\n            {\"uri\": \"https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png\"},\n            {\"uri\": \"https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png\"},\n            {\"uri\": \"https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png\"},\n            {\"uri\": \"https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png\"}], \n    \"parameters\": {\"budget\": 3},\n    \"execEndpoint\":\"/query\"}'\n```\n\nYou can also use `alaas.Client` to build the query request (for both `http` and `grpc` protos) like this,\n\n\n```python\nfrom alaas.client import Client\n\nurl_list = [\n    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane1.png',\n    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane2.png',\n    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane3.png',\n    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane4.png',\n    'https://www.cs.toronto.edu/~kriz/cifar-10-sample/airplane5.png'\n]\nclient = Client('http://0.0.0.0:8081')\nprint(client.query_by_uri(url_list, budget=3))\n```\n\nThe output data is a subset uris/data in your input dataset, which indicates selected results for further data labeling.\n\n\n## ALaaS Server Customization :wrench:\n\nWe support two different methods to start your server, 1. by input parameters 2. by YAML configuration\n\n\n### Input Parameters\n\nYou can modify your server by setting different input parameters, \n\n```python\nfrom alaas.server import Server\n\nServer.start(proto='http',                      # the server proto, can be 'grpc', 'http' and 'https'.\n    port=8081,                                  # the access port of your server.\n    host='0.0.0.0',                             # the access IP address of your server.\n    job_name='default_app',                     # the server name.\n    model_hub='pytorch/vision:v0.10.0',         # the active learning model hub, the server will automatically download it for data selection.\n    model_name='resnet18',                      # the active learning model name (should be available in your model hub).\n    device='cpu',                               # the deploy location/device (can be something like 'cpu', 'cuda' or 'cuda:0'). \n    strategy='LeastConfidence',                 # the selection strategy (read the document to see what ALaaS supports).\n    batch_size=1,                               # the batch size of data processing.\n    replica=1,                                  # the number of workers to select/query data.\n    tokenizer=None,                             # the tokenizer name (should be available in your model hub), only for NLP tasks.\n    transformers_task=None                      # the NLP task name (for Hugging Face [Pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines)), only for NLP tasks.\n)\n```\n\n### YAML Configuration\n\nYou can also start the server by setting an input YAML configuration like this,\n\n```python\nfrom alaas import Server\n\n# start the server by an input configuration file.\nServer.start_by_config('path_to_your_configuration.yml')\n```\n\nDetails about building a configuration for your deployment scenarios can be found [here](./docs/configuration.md).\n\n\n\n## Strategy Zoo :art:\n\nCurrently we supported several active learning strategies shown in the following table,\n\n|Type|Setting|Abbr|Strategy|Year|Reference|\n|:--:|:--:|:--:|:--:|:--:|:--:|\n|Random|Pool-base|RS|Random Sampling|-|-|\n|Uncertainty|Pool|LC|Least Confidence Sampling|1994|[DD Lew et al.](https://arxiv.org/pdf/cmp-lg/9407020)|\n|Uncertainty|Pool|MC|Margin Confidence Sampling|2001|[T Scheffer et al.](https://link.springer.com/chapter/10.1007/3-540-44816-0_31)|\n|Uncertainty|Pool|RC|Ratio Confidence Sampling|2009|[B Settles et al.](https://research.cs.wisc.edu/techreports/2009/TR1648.pdf)|\n|Uncertainty|Pool|VRC|Variation Ratios Sampling|1965|[EH Johnson et al.](https://academic.oup.com/sf/article-abstract/44/3/455/2228590?redirectedFrom=fulltext)|\n|Uncertainty|Pool|ES|Entropy Sampling|2009|[B Settles et al.](https://research.cs.wisc.edu/techreports/2009/TR1648.pdf)|\n|Uncertainty|Pool|MSTD|Mean Standard Deviation|2016|[M Kampffmeyer et al.](https://ieeexplore.ieee.org/document/7789580)|\n|Uncertainty|Pool|BALD|Bayesian Active Learning Disagreement|2017|[Y Gal et al.](https://arxiv.org/abs/1703.02910)|\n|Clustering|Pool|KCG|K-Center Greedy Sampling|2017|[Ozan Sener et al.](https://www.semanticscholar.org/paper/A-Geometric-Approach-to-Active-Learning-for-Neural-Sener-Savarese/82fb7661d892a7412726de6ead14269139d0310c)|\n|Clustering|Pool|KM|K-Means Sampling|2011|[Z Bodó et al.](http://proceedings.mlr.press/v16/bodo11a/bodo11a.pdf)|\n|Clustering|Pool|CS|Core-Set Selection Approach|2018|[Ozan Sener et al.](https://arxiv.org/abs/1708.00489?context=cs)|\n|Diversity|Pool|DBAL|Diverse Mini-batch Sampling|2019|[Fedor Zhdanov](https://arxiv.org/abs/1901.05954)|\n|Adversarial|Pool|DFAL|DeepFool Active Learning|2018|[M Ducoffe et al.](https://arxiv.org/abs/1802.09841)|\n\n\n## Citation\n\nOur tech report of ALaaS is available on [arxiv](https://arxiv.org/abs/2207.09109) and [NeurIPS 2022](https://neurips.cc/media/PosterPDFs/NeurIPS%202022/64402.png?t=1668072815.3376932). Please cite as:\n\n```bash\n@article{huang2022active,\n  title={Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI},\n  author={Huang, Yizheng and Zhang, Huaizheng and Li, Yuanming and Lau, Chiew Tong and You, Yang},\n  journal={arXiv preprint arXiv:2207.09109},\n  year={2022}\n}\n```\n\n## Contributors ✨\n\nThanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):\n\n\u003c!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section --\u003e\n\u003c!-- prettier-ignore-start --\u003e\n\u003c!-- markdownlint-disable --\u003e\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"http://huangyz.name\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/15646062?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eYizheng Huang\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"#infra-huangyz0918\" title=\"Infrastructure (Hosting, Build-Tools, etc)\"\u003e🚇\u003c/a\u003e \u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=huangyz0918\" title=\"Tests\"\u003e⚠️\u003c/a\u003e \u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=huangyz0918\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://huaizhengzhang.github.io\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/5894780?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eHuaizheng\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"#content-HuaizhengZhang\" title=\"Content\"\u003e🖋\u003c/a\u003e \u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=HuaizhengZhang\" title=\"Tests\"\u003e⚠️\u003c/a\u003e \u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=HuaizhengZhang\" title=\"Documentation\"\u003e📖\u003c/a\u003e\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e\u003ca href=\"https://github.com/YuanmingLeee\"\u003e\u003cimg src=\"https://avatars.githubusercontent.com/u/36268431?v=4?s=100\" width=\"100px;\" alt=\"\"/\u003e\u003cbr /\u003e\u003csub\u003e\u003cb\u003eYuanming Li\u003c/b\u003e\u003c/sub\u003e\u003c/a\u003e\u003cbr /\u003e\u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=YuanmingLeee\" title=\"Tests\"\u003e⚠️\u003c/a\u003e \u003ca href=\"https://github.com/MLSysOps/ALaaS/commits?author=YuanmingLeee\" title=\"Code\"\u003e💻\u003c/a\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003c!-- markdownlint-restore --\u003e\n\u003c!-- prettier-ignore-end --\u003e\n\n\u003c!-- ALL-CONTRIBUTORS-LIST:END --\u003e\n\nThis project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!\n\n## Acknowledgement\n\n- [Jina](https://github.com/jina-ai/jina) - Build cross-modal and multimodal applications on the cloud.\n- [Transformers](https://github.com/huggingface/transformers) - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.\n\n\n## License\n\nThe theme is available as open source under the terms of the [Apache 2.0 License](./LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMLSysOps%2Falaas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMLSysOps%2Falaas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMLSysOps%2Falaas/lists"}