{"id":30539066,"url":"https://github.com/viktor-shcherb/bond","last_synced_at":"2026-05-14T13:34:59.960Z","repository":{"id":45621430,"uuid":"434498972","full_name":"viktor-shcherb/bond","owner":"viktor-shcherb","description":"Self-training framework for training on noisy datasets","archived":false,"fork":false,"pushed_at":"2022-11-04T15:13:59.000Z","size":5275,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-02T05:40:38.288Z","etag":null,"topics":["named-entity-recognition","ner","pytorch","self-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/viktor-shcherb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-03T07:06:01.000Z","updated_at":"2024-07-18T11:21:36.000Z","dependencies_parsed_at":"2023-01-21T14:00:57.600Z","dependency_job_id":null,"html_url":"https://github.com/viktor-shcherb/bond","commit_stats":null,"previous_names":["viktor-shcherb/bond"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/viktor-shcherb/bond","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viktor-shcherb%2Fbond","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viktor-shcherb%2Fbond/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viktor-shcherb%2Fbond/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viktor-shcherb%2Fbond/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/viktor-shcherb","download_url":"https://codeload.github.com/viktor-shcherb/bond/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/viktor-shcherb%2Fbond/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33026983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"online","status_checked_at":"2026-05-14T02:00:06.663Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["named-entity-recognition","ner","pytorch","self-learning"],"created_at":"2025-08-27T21:25:03.705Z","updated_at":"2026-05-14T13:34:59.932Z","avatar_url":"https://github.com/viktor-shcherb.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Datasets \nModels are trained on CoNLL 2003 relabelled dataset provided by [Liang et al](https://arxiv.org/abs/2006.15509).\nProvided dataset was created using KB matching, for more details consider visiting [BOND](https://github.com/cliang1453/BOND).\n\nKB-matching scores evaluated with original train part of CoNLL 2003:\n\n| F1     | Precision | Recall | \n|--------|-----------|--------|\n| 0.7097 | 0.8238    | 0.6233 |\n\nModels were evaluated on both original test part of CoNLL 2003 dataset and corrected test part provided by [Wang et al](https://arxiv.org/pdf/1909.01441v1.pdf)\n\n# Baseline\nAs baseline fine-tuned RoBERTa with document-level context was used. As expected, it effectively remembers all the incorrect annotations \nin dataset from KB matching. The results (**median** ± std) obtained from 5 runs with different RNG seeds (F1 / Precision / Recall) are as follows:\n\n| original test                                         | corrected test                                         |\n|-------------------------------------------------------|--------------------------------------------------------|\n| **71.38** ± 0.56 / **81.50** ± 0.6 / **63.49** ± 0.54 | **71.94** ± 0.54 / **82.59** ± 0.59 / **63.72** ± 0.52 |\n\nExperiment configs: `experiments/configs/baseline/*.json`\n\n# BOND\nExtremely volatile method that requires rigorous hyperparameter fine-tuning. The hyperparameters were tuned \nwith RNG seed 42 and evaluated with seeds 1-5. While the best model reached the F1 score of **82.97**\n(`experiments/relabel/bond.json`), after changing the seed results became not so prominent:\n\n| original test                                          | corrected test                                         |\n|--------------------------------------------------------|--------------------------------------------------------|\n| **77.97** ± 1.67 / **79.92** ± 1.25 / **77.50** ± 2.73 | **78.93** ± 1.71 / **81.21** ± 1.12 / **78.36** ± 2.81 |\n\nWhich shows that model performance is greatly dependent on parameter initialization.\n\nExperiment configs: `experiments/configs/bond/*.json`\n\n# Co-regularization\n\nRobust method for filtering out noisy annotations.\n\n\n| original test                                          | corrected test                                         |\n|--------------------------------------------------------|--------------------------------------------------------|\n| **77.17** ± 0.23 / **89.64** ± 0.25 / **67.72** ± 0.32 | **77.77** ± 0.21 / **90.78** ± 0.21 / **67.95** ± 0.31 |\n\n\nExperiment configs: `experiments/configs/coregularization/*.json`\n\n# Results reproduction\n\nUse TRIAGE to run any experiment config. Use `--help` option to get familiar \nwith possible command-line arguments.\n\n## Example: baseline\n\n```bash\n./init.sh\ntriage experiments/configs/baseline/*.json\n```\n\n## Acknowledgements\nMost of the code is based on the work of Liang et al. [BOND: Bert-Assisted Open-Domain Named Entity Recognition with Distant Supervision](https://arxiv.org/abs/2006.15509) ([GitHub](https://github.com/cliang1453/BOND)).\n\nCo-regularization technique was adapted from [wzhouad](https://github.com/wzhouad/NLL-IE).\n\n`MarginalCRF` implementation was taken from [kajyuuen](https://github.com/kajyuuen/pytorch-partial-crf).\n\nCrossWeigh dataset weighing was adapted from [ZihanwangKi](https://github.com/ZihanWangKi/CrossWeigh).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviktor-shcherb%2Fbond","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fviktor-shcherb%2Fbond","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fviktor-shcherb%2Fbond/lists"}