{"id":15633437,"url":"https://github.com/JieyuZ2/wrench","last_synced_at":"2025-10-13T23:31:07.947Z","repository":{"id":39924329,"uuid":"398994329","full_name":"JieyuZ2/wrench","owner":"JieyuZ2","description":"[NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark","archived":false,"fork":false,"pushed_at":"2024-02-13T23:30:41.000Z","size":1901,"stargazers_count":223,"open_issues_count":11,"forks_count":34,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-25T11:07:25.808Z","etag":null,"topics":["benchmark-framework","data-centric-ai","data-programming","dataset","deep-learning","machine-learning","nlp","robust-learning","sequence-labeling","weak-supervision","weakly-supervised-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2109.11377","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JieyuZ2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-23T06:18:23.000Z","updated_at":"2025-04-27T01:47:24.000Z","dependencies_parsed_at":"2024-02-14T00:27:41.662Z","dependency_job_id":"754adeed-6b56-4f50-8b53-104861354db7","html_url":"https://github.com/JieyuZ2/wrench","commit_stats":{"total_commits":139,"total_committers":8,"mean_commits":17.375,"dds":"0.38129496402877694","last_synced_commit":"af9b77fb919abd57a9bb6f9e49a5febb61cf6a9a"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/JieyuZ2/wrench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JieyuZ2%2Fwrench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JieyuZ2%2Fwrench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JieyuZ2%2Fwrench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JieyuZ2%2Fwrench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JieyuZ2","download_url":"https://codeload.github.com/JieyuZ2/wrench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JieyuZ2%2Fwrench/sbom","scorecard":{"id":72715,"data":{"date":"2025-08-11","repo":{"name":"github.com/JieyuZ2/wrench","commit":"2eed5b45cd5f20d04c3372c2a7fc8409f65ba534"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.6,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":4,"reason":"Found 8/18 approved changesets -- score normalized to 4","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 20 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-15T04:11:14.555Z","repository_id":39924329,"created_at":"2025-08-15T04:11:14.555Z","updated_at":"2025-08-15T04:11:14.555Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279017239,"owners_count":26086015,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark-framework","data-centric-ai","data-programming","dataset","deep-learning","machine-learning","nlp","robust-learning","sequence-labeling","weak-supervision","weakly-supervised-learning"],"created_at":"2024-10-03T10:49:20.363Z","updated_at":"2025-10-13T23:31:07.594Z","avatar_url":"https://github.com/JieyuZ2.png","language":"Python","funding_links":[],"categories":["Dataset and Benchmark"],"sub_categories":[],"readme":"\u003ch1 style=\"text-align:center\"\u003e\n\u003cimg style=\"vertical-align:middle\" width=\"500\" height=\"180\" src=\"./images/wrench_logo.png\" /\u003e\n\u003c/h1\u003e\n\n[![made-with-python](https://img.shields.io/badge/Made%20with-Python3-1f425f.svg?color=purple)](https://www.python.org/)\n[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/JieyuZ2/wrench/commits/main)\n[![license](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![repo size](https://img.shields.io/github/repo-size/JieyuZ2/wrench.svg)](https://github.com/JieyuZ2/wrench/archive/master.zip)\n[![Total lines](https://img.shields.io/tokei/lines/github/JieyuZ2/wrench?color=red)](https://github.com/JieyuZ2/wrench)\n![visitors](https://visitor-badge.glitch.me/badge?page_id=JieyuZ2/wrench)\n![GitHub stars](https://img.shields.io/github/stars/JieyuZ2/wrench.svg?color=green)\n![GitHub forks](https://img.shields.io/github/forks/JieyuZ2/wrench?color=9cf)\n[![Arxiv](https://img.shields.io/badge/ArXiv-2109.11377-orange.svg)](https://arxiv.org/abs/2109.11377) \n\n\n## 🔧 New\n**1/25/23**\n1. Add [Hyper label model](https://github.com/JieyuZ2/wrench/tree/main/examples/run_hyper_label_model.py), please find more details in our [paper](https://openreview.net/forum?id=aCQt_BrkSjC).\n\n\n**4/20/22**\n1. Add [WS explainer](https://github.com/JieyuZ2/wrench/tree/main/examples/run_explainer.py), please find more details in our [paper](https://openreview.net/forum?id=7CONgGdxsV).\n\n**4/20/22**\n1. We have updated the `setup.py` to make installation more flexible.\n\nPlease use `pip install ws-benchmark==1.1.2rc0` to install the latest version. We strongly suggest create a new environment to install wrench. We will bring better compatibility in the next stable release.\n If you have any problems with installation, please let us know.\n\nKnown incompatibilities:\n\n`tensorflow==2.8.0`, `albumentations==0.1.12`\n\n**3/18/22**\n1. Wrench is available on [ws-benchmark](https://pypi.org/project/ws-benchmark/) now, using `pip install ws-benchmark` to qucik install.\n\n**2/13/22** \n\n1. Add [script](https://github.com/JieyuZ2/wrench/tree/main/datasets/tabular_data) to generate LFs for any tabular dataset as well as 5 new tabular datasets, namely, mushroom, spambase, PhishingWebsites, Bioresponse, and bank-marketing.\n\n**11/04/21** \n\n1. (beta) Add `parallel_fit` for torch model to support pytorch DistributedDataParallel-[example](https://github.com/JieyuZ2/wrench/blob/main/examples/run_torch_ddp.py)\n\n**10/15/21** \n\n1. A branch of new methods: WeaSEL, ImplyLoss, ASTRA, MeanTeacher, Meta-Weight-Net, Learning-to-Reweight\n2. Support image classification (dataset class / torchvision backbone) as well as DomainNet/Animals-with-Attributes2 datasets (check out the `datasets` folder)\n\n## 🔧 What is it?\n\n**Wrench** is a **benchmark platform** containing diverse weak supervision tasks. It also provides a **common and easy framework** for development and evaluation of your own weak supervision models within the benchmark.\n\nFor more information, checkout our publications: \n- [WRENCH: A Comprehensive Benchmark for Weak Supervision](https://arxiv.org/abs/2109.11377) (NeurIPS 2021)\n- [A Survey on Programmatic Weak Supervision](https://arxiv.org/pdf/2202.05433.pdf)\n\nIf you find this repository helpful, feel free to cite our publication:\n\n```\n@inproceedings{\nzhang2021wrench,\ntitle={{WRENCH}: A Comprehensive Benchmark for Weak Supervision},\nauthor={Jieyu Zhang and Yue Yu and Yinghao Li and Yujing Wang and Yaming Yang and Mao Yang and Alexander Ratner},\nbooktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},\nyear={2021},\nurl={https://openreview.net/forum?id=Q9SKS5k8io}\n}\n```\n\n## 🔧 What is weak supervision?\n\n**Weak Supervision** is a paradigm for automated training data creation without manual annotations. \n\nFor a brief overview, please check out this [blog](https://www.snorkel.org/blog/weak-supervision).\n\nFor more context, please check out this [survey](https://arxiv.org/abs/2202.05433).\n\nTo track recent advances in weak supervision, please follow this [repo](https://github.com/JieyuZ2/Awesome-Weak-Supervision).\n\n## 🔧 Installation\n[1] Install anaconda:\nInstructions here: https://www.anaconda.com/download/\n\n[2] Clone the repository:\n```\ngit clone https://github.com/JieyuZ2/wrench.git\ncd wrench\n```\n\n[3] Create virtual environment:\n```\nconda env create -f environment.yml\nsource activate wrench\n```\nIf this not working or you want to use only a subset of modules of Wrench, check out this [wiki page](https://github.com/JieyuZ2/wrench/wiki/Environment-Installation)\n\n[4] Download datasets:\n```python\nfrom huggingface_hub import snapshot_download\npath = \"path to local dir\"\nsnapshot_download(repo_id=\"jieyuz2/WRENCH\", repo_type=\"dataset\", local_dir=path)\n```\n\n## 🔧 Available Datasets\n\n**Note that some datasets may have more training examples than what is reported in README/paper because we include the dev set, whose indices can be found in labeled_id.json if exists.**\n\nA documentation of dataset format and usage can be found in this [wiki-page](https://github.com/JieyuZ2/wrench/wiki/Dataset:-Format-and-Usage)\n\n### classification:\n| Name | Task | # class | # LF | # train | # validation | # test | data source | LF source |\n|:--------|:---------|:------|:------|:------|:-------------|:-------|:------|:------|\n| Census | income classification | 2 | 83 | 10083 | 5561         | 16281  | [link](http://archive.ics.uci.edu/ml/datasets/Census+Income) |[link](https://openreview.net/forum?id=SkeuexBtDr) |\n| Youtube | spam classification | 2 | 10 | 1586 | 120          | 250    | [link](https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection) | [link](https://github.com/snorkel-team/snorkel-tutorials/tree/master/spam) |\n| SMS | spam classification | 2 | 73 | 4571 | 500          | 500    | [link](https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection) | [link](https://openreview.net/forum?id=SkeuexBtDr) |\n| IMDB | sentiment classification | 2 | 8 | 20000 | 2500         | 2500   | [link](https://dl.acm.org/doi/10.5555/2002472.2002491) | [link](https://arxiv.org/abs/2010.04582) |\n| Yelp | sentiment classification | 2 | 8 | 30400 | 3800         | 3800   | [link](https://arxiv.org/abs/1509.01626) | [link](https://arxiv.org/abs/2010.04582) |\n| AGNews | topic classification | 4 | 9 | 96000 | 12000        | 12000  | [link](https://arxiv.org/abs/1509.01626) | [link](https://arxiv.org/abs/2010.04582) |\n| TREC | question classification | 6 | 68 | 4965 | 500          | 500    | [link](https://aclanthology.org/C02-1150.pdf) | [link](https://openreview.net/forum?id=SkeuexBtDr) |\n| Spouse | relation classification | 2 | 9 | 22254 | 2801         | 2701   | [link](http://ceur-ws.org/Vol-1568/paper8.pdf) | [link](https://arxiv.org/abs/1711.10160) |\n| SemEval | relation classification | 9 | 164 | 1749 | 178            | 600      | [link](https://aclanthology.org/S10-1006/) | [link](https://arxiv.org/abs/1909.02177) |\n| CDR | bio relation classification | 2 | 33 | 8430 | 920          | 4673   | [link](https://pubmed.ncbi.nlm.nih.gov/27651457/) | [link](https://arxiv.org/abs/1711.10160) |\n| Chemprot | chemical relation classification | 10 | 26 | 12861 | 1607         | 1607   | [link](https://www.semanticscholar.org/paper/Overview-of-the-BioCreative-VI-chemical-protein-Krallinger-Rabal/eed781f498b563df5a9e8a241c67d63dd1d92ad5) | [link](https://arxiv.org/abs/2010.07835) |\n| Commercial | video frame classification | 2 | 4 | 64130 | 9479         | 7496   | [link](https://arxiv.org/pdf/2002.11955.pdf) | [link](https://arxiv.org/abs/2002.11955) |\n| Tennis Rally | video frame classification | 2 | 6 | 6959 | 746          | 1098   | [link](https://arxiv.org/pdf/2002.11955.pdf) | [link](https://arxiv.org/abs/2002.11955) |\n| Basketball | video frame classification | 2 | 4 | 17970 | 1064         | 1222   | [link](https://arxiv.org/pdf/2002.11955.pdf) | [link](https://arxiv.org/abs/2002.11955) |\n| [DomainNet](https://github.com/JieyuZ2/wrench/tree/main/datasets/domainnet) | image classification | - | - | - | -            | -      | [link](https://arxiv.org/pdf/1812.01754.pdf) | [link](http://cs.brown.edu/people/sbach/files/mazzetto-icml21.pdf) |\n\n### sequence tagging:\n| Name | # class | # LF | # train | # validation | # test | data source | LF source |\n|:--------|:---------|:------|:------|:------|:------|:------|:------|\n| CoNLL-03 | 4 | 16 | 14041 | 3250 | 3453 | [link](https://arxiv.org/abs/cs/0306050) | [link](https://arxiv.org/abs/2004.14723) |\n| WikiGold | 4 | 16 | 1355 | 169 | 170 | [link](https://dl.acm.org/doi/10.5555/1699765.1699767) | [link](https://arxiv.org/abs/2004.14723) |\n| OntoNotes 5.0 | 18 | 17 | 115812 | 5000 | 22897 | [link](https://catalog.ldc.upenn.edu/LDC2013T19) | [link]() |\n| BC5CDR | 2 | 9 | 500 | 500 | 500 | [link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/) | [link](https://arxiv.org/abs/2105.12848) |\n| NCBI-Disease  | 1 | 5 | 592 | 99 | 99 | [link](https://pubmed.ncbi.nlm.nih.gov/24393765/) | [link](https://arxiv.org/abs/2105.12848) |\n| Laptop-Review | 1 | 3 | 2436 | 609 | 800 | [link](https://aclanthology.org/S14-2004/) | [link](https://arxiv.org/abs/2105.12848) |\n| MIT-Restaurant | 8 | 16 | 7159 | 500 | 1521 | [link](https://groups.csail.mit.edu/sls/publications/2013/Liu_ASRU_2013.pdf) | [link]() |\n| MIT-Movies | 12 | 7 | 9241 | 500 | 2441 | [link](http://people.csail.mit.edu/jrg/2013/liu-icassp13.pdf) | [link]() |\n\n\nThe detailed documentation is coming soon.\n\n## 🔧 Available Models\n\n**If you find any of the implementations is wrong/problematic, don't hesitate to raise issue/pull request, we really appreciate it!**\n\nTODO-list: check [this](https://github.com/JieyuZ2/wrench/wiki/TODO-List) out! \n\n### classification:\n| Model                    | Model Type  | Reference                                            | Link to Wrench                                                                                |\n|:-------------------------|:------------|:-----------------------------------------------------|:----------------------------------------------------------------------------------------------|\n| Majority Voting          | Label Model | --                                                   | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/majority_voting.py#L44)  |\n| Weighted Majority Voting | Label Model | --                                                   | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/majority_voting.py#L14)  |\n| Dawid-Skene              | Label Model | [link](https://www.jstor.org/stable/2346806)         | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/dawid_skene.py#L15)      |\n| Data Progamming          | Label Model | [link](https://arxiv.org/abs/1605.07723)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/generative_model.py#L18) |\n| MeTaL                    | Label Model | [link](https://arxiv.org/abs/1810.02840)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/snorkel.py#L17)          |\n| FlyingSquid              | Label Model | [link](https://arxiv.org/pdf/2002.11955)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/flyingsquid.py#L16)      |\n | EBCC                     | Label Model | [link](https://proceedings.mlr.press/v97/li19i.html) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/ebcc.py#L12)             |\n| IBCC                     | Label Model | [link](https://proceedings.mlr.press/v97/li19i.html) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/ibcc.py#L11)             |\n| FABLE                    | Label Model | [link](https://arxiv.org/abs/2210.02724)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/fable.py#L331)           |\n|Hyper Label Model         | Label Model |  [link](https://openreview.net/forum?id=aCQt_BrkSjC) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/labelmodel/hyper_label_model.py)           |\n| Logistic Regression      | End Model   | --                                                   | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/endmodel/linear_model.py#L52)       |\n | MLP                      | End Model   | --                                                   | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/endmodel/neural_model.py#L21)       |\n | BERT                     | End Model   | [link](https://huggingface.co/models)                | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/endmodel/bert_model.py#L23)         |\n | COSINE                   | End Model   | [link](https://arxiv.org/abs/2010.07835)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/endmodel/cosine.py#L68)             |\n| ARS2                     | End Model   | [link](https://arxiv.org/abs/2210.03092)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/endmodel/ars2.py#L53)               |\n | Denoise                  | Joint Model | [link](https://arxiv.org/abs/2010.04582)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/classification/denoise.py#L72)      |\n | WeaSEL                   | Joint Model | [link](https://arxiv.org/abs/2107.02233)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/classification/weasel.py#L72)       |\n | SepLL                   | Joint Model | [link](https://arxiv.org/abs/2210.13898)             | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/classification/sepll.py#L120)        |\n \n\n### sequence tagging:\n| Model | Model Type | Reference | Link to Wrench |\n|:--------|:---------|:------|:------|\n| Hidden Markov Model | Label Model | [link](https://arxiv.org/abs/2004.14723) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seq_labelmodel/hmm.py#L81) |\n| Conditional Hidden Markov Model | Label Model | [link](https://arxiv.org/abs/2105.12848) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seq_labelmodel/chmm.py#L33) |\n| LSTM-CNNs-CRF | End Model | [link](https://arxiv.org/abs/1603.01354) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seq_endmodel/lstm_crf_model.py#L86) |\n| BERT-CRF | End Model | [link](https://huggingface.co/models) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seq_endmodel/bert_crf_model.py#L23) |\n| LSTM-ConNet | Joint Model | [link](https://arxiv.org/abs/1910.04289) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seqtagging/connet.py#L45) |\n| BERT-ConNet | Joint Model | [link](https://arxiv.org/abs/1910.04289) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/seqtagging/connet.py#L210) |\n\n### classification-to-sequence-tagging wrapper:\nWrench also provides a [`SeqLabelModelWrapper`](https://github.com/JieyuZ2/wrench/blob/main/wrench/seq_labelmodel/seq_wrapper.py#L43) that adaptes label model for classification task to sequence tagging task.\n\n### methods from related domains:\n\n#### Robust Learning methods as end model:\n\n| Model | Model Type | Reference | Link to Wrench |\n|:--------|:---------|:------|:------|\n| Meta-Weight-Net | End Model | [link](https://arxiv.org/abs/1902.07379) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/metalearning/meta_weight_net.py#L34) |\n| Learning2ReWeight | End Model | [link](https://arxiv.org/abs/1803.09050) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/metalearning/learn_to_reweight.py#L20) |\n\n#### Semi-Supervised Learning methods as end model:\n\n| Model | Model Type | Reference | Link to Wrench |\n|:--------|:---------|:------|:------|\n| MeanTeacher | End Model | [link](https://arxiv.org/abs/1703.01780) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/semisupervisedlearning/meanteacher.py#L61) |\n\n#### Weak Supervision with cleaned labels (Semi-Weak Supervision):\n\n| Model | Model Type | Reference | Link to Wrench |\n|:--------|:---------|:------|:------|\n| ImplyLoss | Joint Model | [link](https://arxiv.org/abs/2004.06025) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/classification/implyloss.py#L42) |\n| ASTRA | Joint Model | [link](https://www.microsoft.com/en-us/research/publication/self-training-weak-supervision-astra/) | [link](https://github.com/JieyuZ2/wrench/blob/main/wrench/classification/astra.py#L87) |\n\n# 🔧  Quick examples\n\n\n## 🔧  Label model with parallel grid search for hyper-parameters\n\n```python\nimport logging\nimport numpy as np\nimport pprint\n\nfrom wrench.dataset import load_dataset\nfrom wrench._logging import LoggingHandler\nfrom wrench.search import grid_search\nfrom wrench import labelmodel \nfrom wrench.evaluation import AverageMeter\n\n#### Just some code to print debug information to stdout\nlogging.basicConfig(format='%(asctime)s - %(message)s',\n                    datefmt='%Y-%m-%d %H:%M:%S',\n                    level=logging.INFO,\n                    handlers=[LoggingHandler()])\nlogger = logging.getLogger(__name__)\n\n#### Load dataset \ndataset_home = '../datasets'\ndata = 'youtube'\ntrain_data, valid_data, test_data = load_dataset(dataset_home, data, extract_feature=False)\n\n\n#### Specify the hyper-parameter search space for grid search\nsearch_space = {\n    'Snorkel': {\n        'lr': np.logspace(-5, -1, num=5, base=10),\n        'l2': np.logspace(-5, -1, num=5, base=10),\n        'n_epochs': [5, 10, 50, 100, 200],\n    }\n}\n\n#### Initialize label model\nlabel_model_name = 'Snorkel'\nlabel_model = getattr(labelmodel, label_model_name)\n\n#### Search best hyper-parameters using validation set in parallel\nn_trials = 100\nn_repeats = 5\ntarget = 'acc'\nsearched_paras = grid_search(label_model(), dataset_train=train_data, dataset_valid=valid_data,\n                             metric=target, direction='auto', search_space=search_space[label_model_name],\n                             n_repeats=n_repeats, n_trials=n_trials, parallel=True)\n\n#### Evaluate the label model with searched hyper-parameters and average meter\nmeter = AverageMeter(names=[target])\nfor i in range(n_repeats):\n    model = label_model(**searched_paras)\n    history = model.fit(dataset_train=train_data, dataset_valid=valid_data)\n    metric_value = model.test(test_data, target)\n    meter.update(target=metric_value)\n\nmetrics = meter.get_results()\npprint.pprint(metrics)\n```\n\nFor detailed guidance of `grid_search`, please check out [this wiki page](https://github.com/JieyuZ2/wrench/wiki/Hyperparameter-Search).\n\n\n## 🔧  Run a standard supervised learning pipeline\n\n```python\nimport logging\nimport torch\n\nfrom wrench.dataset import load_dataset\nfrom wrench._logging import LoggingHandler\nfrom wrench.endmodel import MLPModel\n\n#### Just some code to print debug information to stdout\nlogging.basicConfig(format='%(asctime)s - %(message)s',\n                    datefmt='%Y-%m-%d %H:%M:%S',\n                    level=logging.INFO,\n                    handlers=[LoggingHandler()])\nlogger = logging.getLogger(__name__)\n\n#### Load dataset \ndataset_home = '../datasets'\ndata = 'youtube'\n\n#### Extract data features using pre-trained BERT model and cache it\nextract_fn = 'bert'\nmodel_name = 'bert-base-cased'\ntrain_data, valid_data, test_data = load_dataset(dataset_home, data, extract_feature=True, extract_fn=extract_fn,\n                                                 cache_name=extract_fn, model_name=model_name)\n\n\n#### Train a MLP classifier\ndevice = torch.device('cuda:0')\nn_steps = 100000\nbatch_size = 128\ntest_batch_size = 1000 \npatience = 200\nevaluation_step = 50\ntarget='acc'\n\nmodel = MLPModel(n_steps=n_steps, batch_size=batch_size, test_batch_size=test_batch_size)\nhistory = model.fit(dataset_train=train_data, dataset_valid=valid_data, device=device, metric=target, \n                    patience=patience, evaluation_step=evaluation_step)\n\n#### Evaluate the trained model\nmetric_value = model.test(test_data, target)\n```\n\n## 🔧  Build a two-stage weak supervision pipeline\n\n```python\nimport logging\nimport torch\n\nfrom wrench.dataset import load_dataset\nfrom wrench._logging import LoggingHandler\nfrom wrench.endmodel import MLPModel\nfrom wrench.labelmodel import MajorityVoting\n\n#### Just some code to print debug information to stdout\nlogging.basicConfig(format='%(asctime)s - %(message)s',\n                    datefmt='%Y-%m-%d %H:%M:%S',\n                    level=logging.INFO,\n                    handlers=[LoggingHandler()])\nlogger = logging.getLogger(__name__)\n\n#### Load dataset \ndataset_home = '../datasets'\ndata = 'youtube'\n\n#### Extract data features using pre-trained BERT model and cache it\nextract_fn = 'bert'\nmodel_name = 'bert-base-cased'\ntrain_data, valid_data, test_data = load_dataset(dataset_home, data, extract_feature=True, extract_fn=extract_fn,\n                                                 cache_name=extract_fn, model_name=model_name)\n\n#### Generate soft training label via a label model\n#### The weak labels provided by supervision sources are alreadly encoded in dataset object\nlabel_model = MajorityVoting()\nlabel_model.fit(train_data, valid_data)\nsoft_label = label_model.predict_proba(train_data)\n\n\n#### Train a MLP classifier with soft label\ndevice = torch.device('cuda:0')\nn_steps = 100000\nbatch_size = 128\ntest_batch_size = 1000 \npatience = 200\nevaluation_step = 50\ntarget='acc'\n\nmodel = MLPModel(n_steps=n_steps, batch_size=batch_size, test_batch_size=test_batch_size)\nhistory = model.fit(dataset_train=train_data, dataset_valid=valid_data, y_train=soft_label, \n                    device=device, metric=target, patience=patience, evaluation_step=evaluation_step)\n\n#### Evaluate the trained model\nmetric_value = model.test(test_data, target)\n\n#### We can also train a MLP classifier with hard label\nfrom snorkel.utils import probs_to_preds\nhard_label = probs_to_preds(soft_label)\nmodel = MLPModel(n_steps=n_steps, batch_size=batch_size, test_batch_size=test_batch_size)\nmodel.fit(dataset_train=train_data, dataset_valid=valid_data, y_train=hard_label, \n          device=device, metric=target, patience=patience, evaluation_step=evaluation_step)\n```\n\n## 🔧  Procedural labeling function generator\n\n```python\nimport logging\nimport torch\n\nfrom wrench.dataset import load_dataset\nfrom wrench._logging import LoggingHandler\nfrom wrench.synthetic import ConditionalIndependentGenerator, NGramLFGenerator\nfrom wrench.labelmodel import FlyingSquid\n\n#### Just some code to print debug information to stdout\nlogging.basicConfig(format='%(asctime)s - %(message)s',\n                    datefmt='%Y-%m-%d %H:%M:%S',\n                    level=logging.INFO,\n                    handlers=[LoggingHandler()])\nlogger = logging.getLogger(__name__)\n\n\n#### Generate synthetic dataset\ngenerator = ConditionalIndependentGenerator(\n    n_class=2,\n    n_lfs=10,\n    alpha=0.75, # mean accuracy\n    beta=0.1, # mean propensity\n    alpha_radius=0.2, # radius of accuracy\n    beta_radius=0.1 # radius of propensity\n)\ntrain_data = generator.generate_split('train', 10000)\nvalid_data = generator.generate_split('valid', 1000)\ntest_data = generator.generate_split('test', 1000)\n\n#### Evaluate label model on synthetic dataset\nlabel_model = FlyingSquid()\nlabel_model.fit(dataset_train=train_data, dataset_valid=valid_data)\ntarget_value = label_model.test(test_data, metric_fn='auc')\n\n#### Load dataset \ndataset_home = '../datasets'\ndata = 'youtube'\n\n#### Load real-world dataset\ntrain_data, valid_data, test_data = load_dataset(dataset_home, data, extract_feature=False)\n\n#### Generate procedural labeling functions\ngenerator = NGramLFGenerator(dataset=train_data, min_acc_gain=0.1, min_support=0.01, ngram_range=(1, 2))\napplier = generator.generate(mode='correlated', n_lfs=10)\nL_test = applier.apply(test_data)\nL_train = applier.apply(train_data)\n\n\n#### Evaluate label model on real-world dataset with semi-synthetic labeling functions\nlabel_model = FlyingSquid()\nlabel_model.fit(dataset_train=L_train, dataset_valid=valid_data)\ntarget_value = label_model.test(L_test, metric_fn='auc')\n```\n\n## 🔧  Contact\n\nContact person: Jieyu Zhang, [jieyuzhang97@gmail.com](mailto:jieyuzhang97@gmail.com)\n\nDon't hesitate to send us an e-mail if you have any question.\n\nWe're also open to any collaboration!\n\n## 🔧  Contributing Dataset and Model\n\nWe sincerely welcome any contribution to the datasets or models!\n\n## 🔧  Citattion\n```\n@inproceedings{\nzhang2021wrench,\ntitle={{WRENCH}: A Comprehensive Benchmark for Weak Supervision},\nauthor={Jieyu Zhang and Yue Yu and Yinghao Li and Yujing Wang and Yaming Yang and Mao Yang and Alexander Ratner},\nbooktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},\nyear={2021},\nurl={https://openreview.net/forum?id=Q9SKS5k8io}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJieyuZ2%2Fwrench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJieyuZ2%2Fwrench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJieyuZ2%2Fwrench/lists"}