{"id":13810717,"url":"https://github.com/HKUST-KnowComp/ComHyper","last_synced_at":"2025-05-14T15:31:08.761Z","repository":{"id":76529636,"uuid":"302668991","full_name":"HKUST-KnowComp/ComHyper","owner":"HKUST-KnowComp","description":"Code for EMNLP'20 paper \"When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models\"","archived":false,"fork":false,"pushed_at":"2020-11-10T06:42:19.000Z","size":320,"stargazers_count":11,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-04-28T05:06:28.007Z","etag":null,"topics":["conceptualization","hypernymy-detection","lexical-semantics","semantic-relations"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HKUST-KnowComp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-10-09T14:41:55.000Z","updated_at":"2024-08-04T03:27:35.768Z","dependencies_parsed_at":"2023-05-24T16:30:24.184Z","dependency_job_id":null,"html_url":"https://github.com/HKUST-KnowComp/ComHyper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FComHyper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FComHyper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FComHyper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HKUST-KnowComp%2FComHyper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HKUST-KnowComp","download_url":"https://codeload.github.com/HKUST-KnowComp/ComHyper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254171707,"owners_count":22026497,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conceptualization","hypernymy-detection","lexical-semantics","semantic-relations"],"created_at":"2024-08-04T03:00:24.068Z","updated_at":"2025-05-14T15:31:03.750Z","avatar_url":"https://github.com/HKUST-KnowComp.png","language":"Python","funding_links":[],"categories":["Hypernymy Discovery \u0026 Lexical Entailment"],"sub_categories":[],"readme":"# ComHyper [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\n\nCode for EMNLP'20 paper \"When Hearst Is not Enough: Improving Hypernymy Detection from Corpus with Distributional Models\" ([arXiv](https://arxiv.org/abs/2010.04941v1))\n\n\n\nIn a nutshell, ComHyper is the complementary framework for solving hypernymy detection tasks from the perspective of blind points of Hearst  pattern-based methods.  As shown in the left Figure,  long-tailed nouns cannot well covered by Hearst patterns and thus form non-negligible sparsity types.  For such cases, we propose to use supervised distributional models for complmenting pattern-based models shown in the right Figure. \n\n\u003cp align=\"center\"\u003e\u003cimg width=\"45%\" src=\"img/dis.png\"/\u003e\u003cimg width=\"45%\" src=\"img/framework.png\"/\u003e\u003c/p\u003e\n\n\n\n## Use ComHyper\n\n### 1. Download Hearst pattern files and corpus.\n\nFirst prepare the extracted Hearst pattern pairs such as `hearst_counts.txt.gz` from the repo [hypernymysuite](https://github.com/facebookresearch/hypernymysuite)  or `data-concept.zip` from Microsoft Concept Graph (Also known as [Probase](https://concept.research.microsoft.com/Home/Download)).  Specify the parameter `pattern_filename` in the `config` as the file location. \n\n```\nwget https://github.com/facebookresearch/hypernymysuite/blob/master/hearst_counts.txt.gz\ncurl -L \"https://concept.research.microsoft.com/Home/StartDownload\" \u003e data-concept.zip\n```\n\nThen extract the contexts for words from large-scale corpus such as Wiki + Gigaword or ukWac.  All the contexts for one word should be organized into one `txt` file and one line for one context.  \n\nFor those words appearing in the Hearst patterns (**IP words**),  organize their context files into the directory `context` in the `config`.  For **OOP words**,  organize their context files into the `context_oov`  in the `config`. \n\n### 2. Train and evaluate the ComHyper. \n\nFor training the distributional models supervsied by the output of pattern-based models,  different context encoders are provided: \n\n```console\npython train_word2score.py config/word.cfg\npython train_context2score.py config/context.cfg\npython train_bert2score.py config/bert.cfg\n```\n\nThe same evaluation scripts work for all settings.  For reproducing the results, run: \n\n```console\npython evaluation/evaluation_all_context.py ../config/context.cfg \n```\n\nNote that we choose not to report the `BERT` encoder results in our orginial paper due to efficiency but release the relevant codes for incoroporating effective pre-trained contextualized encoders to further improve the performance. Welcome to PR or contact cyuaq # cse.ust.hk  ! \n\n\n## Citation\n\nPlease cite the following paper if you found our method helpful. Thanks !\n\n```\n@inproceedings{yu-etal-2020-hearst,\n    title = \"When Hearst Is Not Enough: Improving Hypernymy Detection from Corpus with Distributional Models\",\n    author = \"Yu, Changlong and Han, Jialong and Wang, Peifeng and Song, Yangqiu and Zhang, Hongming and Ng, Wilfred and Shi, Shuming\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    month = \"nov\",\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2020.emnlp-main.502\",\n    pages = \"6208--6217\",\n}\n```\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKUST-KnowComp%2FComHyper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHKUST-KnowComp%2FComHyper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHKUST-KnowComp%2FComHyper/lists"}