https://github.com/taharallouche/hakeem
Flexible crowdsourced data labeling solutions for scarce and incomplete annotations
https://github.com/taharallouche/hakeem
crowdsourcing data-science datalabeling python
Last synced: 8 days ago
JSON representation
Flexible crowdsourced data labeling solutions for scarce and incomplete annotations
- Host: GitHub
- URL: https://github.com/taharallouche/hakeem
- Owner: taharallouche
- Created: 2021-12-03T12:45:34.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-11-09T16:10:08.000Z (11 months ago)
- Last Synced: 2024-11-09T16:38:47.045Z (11 months ago)
- Topics: crowdsourcing, data-science, datalabeling, python
- Language: Python
- Homepage:
- Size: 248 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# :mage_man: hakeem (حَكِيمْ) :mage_man:
Apply state-of-the-art data labelling methods to your own datasets.🛠️🗃️
## The vote-size-matters collective labelling method
If you possess an unlabeled dataset comprising 📷 images, 🔊 sounds, 🎥 videos, or ✉️ texts, and you have collected some crowdsourced annotations with the aim of aggregating them optimally to deduce the correct label for each instance, then `hakeem` is the solution you're seeking! 🚀The package implements the size-matters truth tracking principle, 💡 which has consistently shown superior performance compared to other voter-agnostic aggregation rules :chart_with_upwards_trend:. One notable advantage of this method is its reliance on a simple intuition, making the results it produces entirely explainable! :dart:🌟
In fact, the method's key principles include:
1. Granting hesitant voters the flexibility to select more than one possible label. 🤔🔄
2. Relying on mathematically proven [payment schemes](https://proceedings.mlr.press/v37/shaha15.html) to ensure sincerity of voters.📊✅
3. Assigning greater weight to voters who choose fewer labels. After all, a voter familiar with the correct label would likely choose that option, whereas a voter who selects too many labels probably doesn't know the correct answer.⚖️Various weighting schemes are provided to the user, with each one being optimal under different assumptions. The choice of the right scheme is yours to make!
## Installation
You can install the `hakeem` package directly from `PyPi` using `pip`:
```bash
pip install hakeem
```## Note: paper results reproduction
The code for reproducing the original [AAAI-2022 paper](https://ojs.aaai.org/index.php/AAAI/article/view/20403)'s experiments 📚🧪📊, benchmarking the **vote-size-matters** crowdsourcing data labelling method, has been moved to a [dedicated repo](https://github.com/taharallouche/truth-tracking-aaai-2022).