{"id":13472801,"url":"https://github.com/Toloka/crowd-kit","last_synced_at":"2025-03-26T17:31:21.531Z","repository":{"id":37798195,"uuid":"343581364","full_name":"Toloka/crowd-kit","owner":"Toloka","description":"Control the quality of your labeled data with the Python tools you already know.","archived":false,"fork":false,"pushed_at":"2025-01-14T07:19:16.000Z","size":1494,"stargazers_count":220,"open_issues_count":2,"forks_count":16,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-17T02:40:28.563Z","etag":null,"topics":["aggregations","annotation","crowd","crowdsourcing","data-mining","data-science","labeling","python","quality-control","toloka","truth-inference"],"latest_commit_sha":null,"homepage":"https://crowd-kit.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Toloka.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-01T23:02:08.000Z","updated_at":"2025-03-13T21:45:04.000Z","dependencies_parsed_at":"2023-10-16T21:59:17.536Z","dependency_job_id":"22044de4-bb20-4ee6-885f-3413e2035fed","html_url":"https://github.com/Toloka/crowd-kit","commit_stats":{"total_commits":211,"total_committers":23,"mean_commits":9.173913043478262,"dds":0.7867298578199052,"last_synced_commit":"139463fbe7d79c443119de65a4f905216e4537c1"},"previous_names":[],"tags_count":24,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Toloka%2Fcrowd-kit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Toloka%2Fcrowd-kit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Toloka%2Fcrowd-kit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Toloka%2Fcrowd-kit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Toloka","download_url":"https://codeload.github.com/Toloka/crowd-kit/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245702250,"owners_count":20658574,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregations","annotation","crowd","crowdsourcing","data-mining","data-science","labeling","python","quality-control","toloka","truth-inference"],"created_at":"2024-07-31T16:00:58.119Z","updated_at":"2025-03-26T17:31:21.222Z","avatar_url":"https://github.com/Toloka.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Crowd-Kit: Computational Quality Control for Crowdsourcing\n\n[![Crowd-Kit](https://tlk.s3.yandex.net/crowd-kit/Crowd-Kit-GitHub.png)](https://github.com/Toloka/crowd-kit)\n\n[![PyPI Version][pypi_badge]][pypi_link]\n[![GitHub Tests][github_tests_badge]][github_tests_link]\n[![Codecov][codecov_badge]][codecov_link]\n[![Documentation][docs_badge]][docs_link]\n[![Paper][paper_badge]][paper_link]\n\n[pypi_badge]: https://badge.fury.io/py/crowd-kit.svg\n[pypi_link]: https://pypi.python.org/pypi/crowd-kit\n[github_tests_badge]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml/badge.svg?branch=main\n[github_tests_link]: https://github.com/Toloka/crowd-kit/actions/workflows/tests.yml\n[codecov_badge]: https://codecov.io/gh/Toloka/crowd-kit/branch/main/graph/badge.svg\n[codecov_link]: https://codecov.io/gh/Toloka/crowd-kit\n[docs_badge]: https://readthedocs.org/projects/crowd-kit/badge/\n[docs_link]: https://crowd-kit.readthedocs.io/\n[paper_badge]: https://joss.theoj.org/papers/10.21105/joss.06227/status.svg\n[paper_link]: https://doi.org/10.21105/joss.06227\n\n**Crowd-Kit** is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.\n\nCurrently, Crowd-Kit contains:\n\n* implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;\n* metrics of uncertainty, consistency, and agreement with aggregate;\n* loaders for popular crowdsourced datasets.\n\nAlso, the `learning` subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.\n\n## Installing\n\nTo install Crowd-Kit, run the following command: `pip install crowd-kit`. If you also want to use the `learning` subpackage, type `pip install crowd-kit[learning]`.\n\nIf you are interested in contributing to Crowd-Kit, use [uv](https://github.com/astral-sh/uv) to manage the dependencies:\n\n```shell\nuv venv\nuv pip install -e '.[dev,docs,learning]'\nuv tool run pre-commit install\n```\n\nWe use [pytest](https://pytest.org/) for testing and a variety of linters, including [pre-commit](https://pre-commit.com/), [Black](https://github.com/psf/black), [isort](https://github.com/pycqa/isort), [Flake8](https://github.com/pycqa/flake8), [pyupgrade](https://github.com/asottile/pyupgrade), and [nbQA](https://github.com/nbQA-dev/nbQA), to simplify code maintenance.\n\n## Getting Started\n\nThis example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.\n\nFirst, let us do all the necessary imports.\n\n````python\nfrom crowdkit.aggregation import DawidSkene\nfrom crowdkit.datasets import load_dataset\n\nimport pandas as pd\n````\n\nThen, you need to read your annotations into Pandas DataFrame with columns `task`, `worker`, `label`. Alternatively, you can download an example dataset:\n\n````python\ndf = pd.read_csv('results.csv')  # should contain columns: task, worker, label\n# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset\n````\n\nThen, you can aggregate the workers' responses using the `fit_predict` method from the **scikit-learn** library:\n\n````python\naggregated_labels = DawidSkene(n_iter=100).fit_predict(df)\n````\n\n[More usage examples](https://github.com/Toloka/crowd-kit/tree/main/examples)\n\n## Implemented Aggregation Methods\n\nBelow is the list of currently implemented methods, including the already available (✅) and in progress (🟡).\n\n### Categorical Responses\n\n| Method | Status |\n| ------------- | :-------------: |\n| [Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.majority_vote.MajorityVote) | ✅ |\n| [One-coin Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.OneCoinDawidSkene) | ✅ |\n| [Dawid-Skene](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.dawid_skene.DawidSkene) | ✅ |\n| [Gold Majority Vote](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.gold_majority_vote.GoldMajorityVote) | ✅ |\n| [M-MSR](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.m_msr.MMSR) | ✅ |\n| [Wawa](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.wawa.Wawa) | ✅ |\n| [Zero-Based Skill](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.zero_based_skill.ZeroBasedSkill) | ✅ |\n| [GLAD](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.glad.GLAD) | ✅ |\n| [KOS](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.kos.KOS) | ✅ |\n| [MACE](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.classification.mace.MACE) | ✅ |\n\n### Multi-Label Responses\n\n|Method|Status|\n|-|:-:|\n|[Binary Relevance](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.multilabel.binary_relevance.BinaryRelevance)|✅|\n\n### Textual Responses\n\n| Method | Status |\n| ------------- | :-------------: |\n| [RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.rasa.RASA) | ✅ |\n| [HRRASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.embeddings.hrrasa.HRRASA) | ✅ |\n| [ROVER](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.texts.rover.ROVER) | ✅ |\n\n### Image Segmentation\n\n| Method | Status |\n| ------------------ | :------------------: |\n| [Segmentation MV](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_majority_vote.SegmentationMajorityVote) | ✅ |\n| [Segmentation RASA](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_rasa.SegmentationRASA) | ✅ |\n| [Segmentation EM](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.image_segmentation.segmentation_em.SegmentationEM) | ✅ |\n\n### Pairwise Comparisons\n\n| Method | Status |\n| -------------- | :---------------------: |\n| [Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.bradley_terry.BradleyTerry) | ✅ |\n| [Noisy Bradley-Terry](https://toloka.ai/docs/crowd-kit/reference/crowdkit.aggregation.pairwise.noisy_bt.NoisyBradleyTerry) | ✅ |\n\n### Learning from Crowds\n\n|Method|Status|\n|-|:-:|\n|[CrowdLayer](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.crowd_layer.CrowdLayer)|✅|\n|[CoNAL](https://toloka.ai/docs/crowd-kit/reference/crowdkit.learning.conal.CoNAL)|✅|\n\n## Citation\n\n* Ustalov D., Pavlichenko N., Tseitlin B. (2024). [Learning from Crowds with Crowd-Kit](https://doi.org/10.21105/joss.06227). Journal of Open Source Software, 9(96), 6227\n\n```bibtex\n@article{CrowdKit,\n  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},\n  title     = {{Learning from Crowds with Crowd-Kit}},\n  year      = {2024},\n  journal   = {Journal of Open Source Software},\n  volume    = {9},\n  number    = {96},\n  pages     = {6227},\n  publisher = {The Open Journal},\n  doi       = {10.21105/joss.06227},\n  issn      = {2475-9066},\n  eprint    = {2109.08584},\n  eprinttype = {arxiv},\n  eprintclass = {cs.HC},\n  language  = {english},\n}\n```\n\n## Support and Contributions\n\nPlease use [GitHub Issues](https://github.com/Toloka/crowd-kit/issues) to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## License\n\n\u0026copy; Crowd-Kit team authors, 2020\u0026ndash;2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FToloka%2Fcrowd-kit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FToloka%2Fcrowd-kit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FToloka%2Fcrowd-kit/lists"}