https://github.com/cleanlab/multiannotator-benchmarks
Benchmarking algorithms for assessing quality of data labeled by multiple annotators
https://github.com/cleanlab/multiannotator-benchmarks
Last synced: about 2 months ago
JSON representation
Benchmarking algorithms for assessing quality of data labeled by multiple annotators
- Host: GitHub
- URL: https://github.com/cleanlab/multiannotator-benchmarks
- Owner: cleanlab
- License: agpl-3.0
- Created: 2022-06-30T00:32:31.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-07T05:47:50.000Z (over 2 years ago)
- Last Synced: 2025-03-26T20:34:54.623Z (2 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 18 MB
- Stars: 32
- Watchers: 3
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Benchmarking methods for classification data labeled by multiple annotators
Code to reproduce results from the paper:
**[CROWDLAB: Supervised learning to infer consensus labels and quality scores for data with multiple annotators](https://arxiv.org/abs/2210.06812)**
*NeurIPS 2022 Human in the Loop Learning Workshop*This repository benchmarks algorithms that estimate:
1. A consensus label for each example that aggregates the individual annotations.
2. A confidence score for the correctness of each consensus label.
3. A rating for each annotator which estimates the overall correctness of their labels.This repository is only for intended for scientific purposes.
To apply the CROWDLAB algorithm to your own multi-annotator data, you should instead use [the implementation](https://docs.cleanlab.ai/stable/tutorials/multiannotator.html) from the official [cleanlab](https://github.com/cleanlab/cleanlab) library.Code to benchmark methods for active learning with multiple data annotators can be found in the [active_learning_benchmarks](active_learning_benchmarks) folder.
## Install Dependencies
To run the model training and benchmark, you need to install the following dependencies:
```
pip install ./cleanlab
pip install ./crowd-kit
pip install -r requirements.txt
```Note that our `cleanlab/` and `crowd-kit/` folders here contain forks of the [cleanlab](https://github.com/cleanlab/cleanlab) and [crowd-kit](https://github.com/Toloka/crowd-kit) libraries. These forks differ from the main libraries as follows:
- The `cleanlab` fork contains various multi-annotator algorithms studied in the benchmark (to obtain consensus labels and compute consensus and annotator quality scores) that are not present in the main library.
- The `crowd-kit` fork addresses some numeric underflow issues in the original library (needed for properly ranking examples by their quality). Instead of operating directly on probabilities, our fork does calculations on log-probabilities with the log-sum-exp trick.## Run Benchmarks
To benchmark various multi-annotator algorithms using given predictions from already trained classifier models, run the following notebooks:
1. [benchmark.ipynb](2_benchmark.ipynb) - runs the benchmarks and saves results to csv
2. benchmark_results_[...].ipynb - visualize benchmark results in plots## Generate Data and Train Classfier Model
To generate the multi-annotator datasets and train the image classifier considered in our benchmarks, run the following notebooks:
1. [preprocess_data.ipynb](0_preprocess_data.ipynb) - preprocesses the dataset
2. [create_labels_df.ipynb](0_create_labels_df.ipynb) - generates correct absolute label paths for images in preprocessed data
3. [xval_model_train.ipynb](1_xval_model_train.ipynb) / [xval_model_train_perfect_model.ipynb](1_xval_model_train_perfect_model.ipynb) - trains a model and obtains predicted class probabilities for each image