{"id":24750896,"url":"https://github.com/taharallouche/hakeem","last_synced_at":"2025-10-10T21:32:00.456Z","repository":{"id":40455785,"uuid":"434593322","full_name":"taharallouche/hakeem","owner":"taharallouche","description":"Flexible crowdsourced data labeling solutions for scarce and incomplete annotations","archived":false,"fork":false,"pushed_at":"2024-11-09T16:10:08.000Z","size":254,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-09T16:38:47.045Z","etag":null,"topics":["crowdsourcing","data-science","datalabeling","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taharallouche.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-03T12:45:34.000Z","updated_at":"2024-11-09T16:10:10.000Z","dependencies_parsed_at":"2024-04-25T13:55:49.857Z","dependency_job_id":"21d797d1-aba5-424e-a8ac-a3c58b920017","html_url":"https://github.com/taharallouche/hakeem","commit_stats":null,"previous_names":["taharallouche/crowd-label-py"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taharallouche%2Fhakeem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taharallouche%2Fhakeem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taharallouche%2Fhakeem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taharallouche%2Fhakeem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taharallouche","download_url":"https://codeload.github.com/taharallouche/hakeem/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235996563,"owners_count":19078469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crowdsourcing","data-science","datalabeling","python"],"created_at":"2025-01-28T09:51:06.562Z","updated_at":"2025-10-10T21:31:55.172Z","avatar_url":"https://github.com/taharallouche.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# :mage_man: hakeem (حَكِيمْ) :mage_man:\n\nApply state-of-the-art data labelling methods to your own datasets.🛠️🗃️\n\n\n## The vote-size-matters collective labelling method\nIf you possess an unlabeled dataset comprising 📷 images, 🔊 sounds, 🎥 videos, or ✉️ texts, and you have collected some crowdsourced annotations with the aim of aggregating them optimally to deduce the correct label for each instance, then `hakeem` is the solution you're seeking! 🚀 \n\nThe package implements the size-matters truth tracking principle, 💡 which has consistently shown superior performance compared to other voter-agnostic aggregation rules :chart_with_upwards_trend:. One notable advantage of this method is its reliance on a simple intuition, making the results it produces entirely explainable! :dart:🌟\n\nIn fact, the method's key principles include:\n1. Granting hesitant voters the flexibility to select more than one possible label. 🤔🔄\n2. Relying on mathematically proven [payment schemes](https://proceedings.mlr.press/v37/shaha15.html) to ensure sincerity of voters.📊✅\n3. Assigning greater weight to voters who choose fewer labels. After all, a voter familiar with the correct label would likely choose that option, whereas a voter who selects too many labels probably doesn't know the correct answer.⚖️\n\nVarious weighting schemes are provided to the user, with each one being optimal under different assumptions. The choice of the right scheme is yours to make!\n\n## Installation\n\nYou can install the `hakeem` package directly from `PyPi` using `pip`:\n\n```bash\npip install hakeem\n```\n\n## Note: paper results reproduction\nThe code for reproducing the original [AAAI-2022 paper](https://ojs.aaai.org/index.php/AAAI/article/view/20403)'s experiments 📚🧪📊, benchmarking the **vote-size-matters** crowdsourcing data labelling method, has been moved to a [dedicated repo](https://github.com/taharallouche/truth-tracking-aaai-2022).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaharallouche%2Fhakeem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaharallouche%2Fhakeem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaharallouche%2Fhakeem/lists"}