https://github.com/taharallouche/hakeem

Flexible crowdsourced data labeling solutions for scarce and incomplete annotations
https://github.com/taharallouche/hakeem

crowdsourcing data-science datalabeling python

Last synced: 8 days ago
JSON representation

Flexible crowdsourced data labeling solutions for scarce and incomplete annotations

Host: GitHub
URL: https://github.com/taharallouche/hakeem
Owner: taharallouche
Created: 2021-12-03T12:45:34.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2024-11-09T16:10:08.000Z (11 months ago)
Last Synced: 2024-11-09T16:38:47.045Z (11 months ago)
Topics: crowdsourcing, data-science, datalabeling, python
Language: Python
Homepage:
Size: 248 KB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # :mage_man: hakeem (حَكِيمْ) :mage_man:

Apply state-of-the-art data labelling methods to your own datasets.🛠️🗃️

## The vote-size-matters collective labelling method

If you possess an unlabeled dataset comprising 📷 images, 🔊 sounds, 🎥 videos, or ✉️ texts, and you have collected some crowdsourced annotations with the aim of aggregating them optimally to deduce the correct label for each instance, then `hakeem` is the solution you're seeking! 🚀 

The package implements the size-matters truth tracking principle, 💡 which has consistently shown superior performance compared to other voter-agnostic aggregation rules :chart_with_upwards_trend:. One notable advantage of this method is its reliance on a simple intuition, making the results it produces entirely explainable! :dart:🌟

In fact, the method's key principles include:

1. Granting hesitant voters the flexibility to select more than one possible label. 🤔🔄

2. Relying on mathematically proven [payment schemes](https://proceedings.mlr.press/v37/shaha15.html) to ensure sincerity of voters.📊✅

3. Assigning greater weight to voters who choose fewer labels. After all, a voter familiar with the correct label would likely choose that option, whereas a voter who selects too many labels probably doesn't know the correct answer.⚖️

Various weighting schemes are provided to the user, with each one being optimal under different assumptions. The choice of the right scheme is yours to make!

## Installation

You can install the `hakeem` package directly from `PyPi` using `pip`:

```bash

pip install hakeem

```

## Note: paper results reproduction

The code for reproducing the original [AAAI-2022 paper](https://ojs.aaai.org/index.php/AAAI/article/view/20403)'s experiments 📚🧪📊, benchmarking the **vote-size-matters** crowdsourcing data labelling method, has been moved to a [dedicated repo](https://github.com/taharallouche/truth-tracking-aaai-2022).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/taharallouche/hakeem

Awesome Lists containing this project

README