{"id":13712532,"url":"https://github.com/google/active-learning","last_synced_at":"2025-05-16T14:06:37.983Z","repository":{"id":44931065,"uuid":"106029625","full_name":"google/active-learning","owner":"google","description":null,"archived":false,"fork":false,"pushed_at":"2022-12-05T01:29:18.000Z","size":55,"stargazers_count":1147,"open_issues_count":2,"forks_count":207,"subscribers_count":33,"default_branch":"master","last_synced_at":"2025-04-03T10:11:23.429Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-06T17:02:13.000Z","updated_at":"2025-03-28T17:22:48.000Z","dependencies_parsed_at":"2023-01-23T23:04:59.239Z","dependency_job_id":null,"html_url":"https://github.com/google/active-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Factive-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Factive-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Factive-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google%2Factive-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google","download_url":"https://codeload.github.com/google/active-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248546496,"owners_count":21122330,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:01:19.512Z","updated_at":"2025-04-12T09:34:05.262Z","avatar_url":"https://github.com/google.png","language":"Python","funding_links":[],"categories":["Table of Contents","Python","3.3 AL in AI Fields - 人工智能背景中的主动学习"],"sub_categories":["**Tutorials - 教程**"],"readme":"# Active Learning Playground\n\n## Introduction\n\nThis is a python module for experimenting with different active learning\nalgorithms. There are a few key components to running active learning\nexperiments:\n\n*   Main experiment script is\n    [`run_experiment.py`](run_experiment.py)\n    with many flags for different run options.\n\n*   Supported datasets can be downloaded to a specified directory by running\n    [`utils/create_data.py`](utils/create_data.py).\n\n*   Supported active learning methods are in\n    [`sampling_methods`](sampling_methods/).\n\nBelow I will go into each component in more detail.\n\nDISCLAIMER: This is not an official Google product.\n\n## Setup\nThe dependencies are in [`requirements.txt`](requirements.txt).  Please make sure these packages are\ninstalled before running experiments.  If GPU capable `tensorflow` is desired, please follow\ninstructions [here](https://www.tensorflow.org/install/).\n\nIt is highly suggested that you install all dependencies into a separate `virtualenv` for\neasy package management.\n\n## Getting benchmark datasets\n\nBy default the datasets are saved to `/tmp/data`. You can specify another directory via the\n`--save_dir` flag.\n\nRedownloading all the datasets will be very time consuming so please be patient.\nYou can specify a subset of the data to download by passing in a comma separated\nstring of datasets via the `--datasets` flag.\n\n## Running experiments\n\nThere are a few key flags for\n[`run_experiment.py`](run_experiment.py):\n\n*   `dataset`: name of the dataset, must match the save name used in\n    `create_data.py`. Must also exist in the data_dir.\n\n*   `sampling_method`: active learning method to use. Must be specified in\n    [`sampling_methods/constants.py`](sampling_methods/constants.py).\n\n*   `warmstart_size`: initial batch of uniformly sampled examples to use as seed\n    data. Float indicates percentage of total training data and integer\n    indicates raw size.\n\n*   `batch_size`: number of datapoints to request in each batch. Float indicates\n    percentage of total training data and integer indicates raw size.\n\n*   `score_method`: model to use to evaluate the performance of the sampling\n    method. Must be in `get_model` method of\n    [`utils/utils.py`](utils/utils.py).\n\n*   `data_dir`: directory with saved datasets.\n\n*   `save_dir`: directory to save results.\n\nThis is just a subset of all the flags. There are also options for\npreprocessing, introducing labeling noise, dataset subsampling, and using a\ndifferent model to select than to score/evaluate.\n\n## Available active learning methods\n\nAll named active learning methods are in\n[`sampling_methods/constants.py`](sampling_methods/constants.py).\n\nYou can also specify a mixture of active learning methods by following the\npattern of `[sampling_method]-[mixture_weight]` separated by dashes; i.e.\n`mixture_of_samplers-margin-0.33-informative_diverse-0.33-uniform-0.34`.\n\nSome supported sampling methods include:\n\n*   Uniform: samples are selected via uniform sampling.\n\n*   Margin: uncertainty based sampling method.\n\n*   Informative and diverse: margin and cluster based sampling method.\n\n*   k-center greedy: representative strategy that greedily forms a batch of\n    points to minimize maximum distance from a labeled point.\n\n*   Graph density: representative strategy that selects points in dense regions\n    of pool.\n\n*   Exp3 bandit: meta-active learning method that tries to learns optimal\n    sampling method using a popular multi-armed bandit algorithm.\n\n### Adding new active learning methods\n\nImplement either a base sampler that inherits from\n[`SamplingMethod`](sampling_methods/sampling_def.py)\nor a meta-sampler that calls base samplers which inherits from\n[`WrapperSamplingMethod`](sampling_methods/wrapper_sampler_def.py).\n\nThe only method that must be implemented by any sampler is `select_batch_`,\nwhich can have arbitrary named arguments. The only restriction is that the name\nfor the same input must be consistent across all the samplers (i.e. the indices\nfor already selected examples all have the same name across samplers). Adding a\nnew named argument that hasn't been used in other sampling methods will require\nfeeding that into the `select_batch` call in\n[`run_experiment.py`](run_experiment.py).\n\nAfter implementing your sampler, be sure to add it to\n[`constants.py`](sampling_methods/constants.py)\nso that it can be called from\n[`run_experiment.py`](run_experiment.py).\n\n## Available models\n\nAll available models are in the `get_model` method of\n[`utils/utils.py`](utils/utils.py).\n\nSupported methods:\n\n*   Linear SVM: scikit method with grid search wrapper for regularization\n    parameter.\n\n*   Kernel SVM: scikit method with grid search wrapper for regularization\n    parameter.\n\n*   Logistc Regression: scikit method with grid search wrapper for\n    regularization parameter.\n\n*   Small CNN: 4 layer CNN optimized using rmsprop implemented in Keras with\n    tensorflow backend.\n\n*   Kernel Least Squares Classification: block gradient descient solver that can\n    use multiple cores so is often faster than scikit Kernel SVM.\n\n### Adding new models\n\nNew models must follow the scikit learn api and implement the following methods\n\n*   `fit(X, y[, sample_weight])`: fit the model to the input features and\n    target.\n\n*   `predict(X)`: predict the value of the input features.\n\n*   `score(X, y)`: returns target metric given test features and test targets.\n\n*   `decision_function(X)` (optional): return class probabilities, distance to\n    decision boundaries, or other metric that can be used by margin sampler as a\n    measure of uncertainty.\n\nSee\n[`small_cnn.py`](utils/small_cnn.py)\nfor an example.\n\nAfter implementing your new model, be sure to add it to `get_model` method of\n[`utils/utils.py`](utils/utils.py).\n\nCurrently models must be added on a one-off basis and not all scikit-learn\nclassifiers are supported due to the need for user input on whether and how to\ntune the hyperparameters of the model. However, it is very easy to add a\nscikit-learn model with hyperparameter search wrapped around as a supported\nmodel.\n\n## Collecting results and charting\n\nThe\n[`utils/chart_data.py`](utils/chart_data.py)\nscript handles processing of data and charting for a specified dataset and\nsource directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Factive-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle%2Factive-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle%2Factive-learning/lists"}