{"id":13754389,"url":"https://github.com/czhang99/SynonymNet","last_synced_at":"2025-05-09T22:32:17.981Z","repository":{"id":39739280,"uuid":"262902311","full_name":"czhang99/SynonymNet","owner":"czhang99","description":"Entity Synonym Discovery via Multipiece Bilateral Context Matching (IJCAI'20) https://arxiv.org/abs/1901.00056","archived":false,"fork":false,"pushed_at":"2023-03-24T22:22:27.000Z","size":15,"stargazers_count":30,"open_issues_count":6,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-16T07:33:31.825Z","etag":null,"topics":["deep-learning","ijcai2020","synonym-detection","synonym-discovery","synonym-matching"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/czhang99.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-11T00:35:22.000Z","updated_at":"2022-11-16T17:48:41.000Z","dependencies_parsed_at":"2024-08-03T09:17:23.284Z","dependency_job_id":null,"html_url":"https://github.com/czhang99/SynonymNet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czhang99%2FSynonymNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czhang99%2FSynonymNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czhang99%2FSynonymNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/czhang99%2FSynonymNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/czhang99","download_url":"https://codeload.github.com/czhang99/SynonymNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253336037,"owners_count":21892776,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","ijcai2020","synonym-detection","synonym-discovery","synonym-matching"],"created_at":"2024-08-03T09:01:57.982Z","updated_at":"2025-05-09T22:32:12.963Z","avatar_url":"https://github.com/czhang99.png","language":"Python","readme":"# Entity Synonym Discovery via Multipiece Bilateral Context Matching\n\nThis project provides source code and data for SynonymNet, a model that detects entity synonyms via multipiece bilateral context matching. \n\nDetails about SynonymNet can be accessed [here](https://arxiv.org/abs/1901.00056), and the implementation is based on the Tensorflow library. \n\n## Quick Links\n- [Installation](#installation)\n- [Usage](#usage)\n- [Data](#data)\n- [Results](#results)\n- [Acknowledgements](#acknowledgements)\n\n## Installation\n\nFor training, a GPU is recommended to accelerate the training speed. \n\n### Tensorflow\n\nThe code is based on Tensorflow 1.5 and can run on Tensorflow 1.15.0. You can find installation instructions [here](https://www.tensorflow.org/install).\n\n### Dependencies\n\nThe code is written in Python 3.7. Its dependencies are summarized in the file ```requirements.txt```. \n\ntensorflow_gpu==1.15.0\u003cbr\u003e\nnumpy==1.14.0\u003cbr\u003e\npandas==0.25.1\u003cbr\u003e\ngensim==3.8.1\u003cbr\u003e\nscikit_learn==0.21.2\n\nYou can install these dependencies like this:\n```\npip3 install -r requirements.txt\n```\n## Usage\n* Run the model on Wikipedia+Freebase dataset with the siamese architecture and the default hyperparameter settings\u003cbr\u003e\n```cd src```\u003cbr\u003e\n```python3 train_siamese.py --dataset=wiki```\u003cbr\u003e\n\n* For all available hyperparameter settings, use\u003cbr\u003e\n```python3 train_siamese.py -h```\n\n* Run the model on Wikipedia+Freebase dataset with the triplet architecture and the default hyperparameter settings\u003cbr\u003e\n```cd src```\u003cbr\u003e\n```python3 train_triplet.py --dataset=wiki```\u003cbr\u003e\n\n\n## Data\n### Format\nData \nEach dataset is a folder under the ```./input_data``` folder, where each sub-folder indicates a train/val/test split:\n```\n./data\n└── wiki\n    ├── train\n    |   ├── siamese_contexts.txt\n    |   └── triple_contexts.txt\n    ├── valid\n    |   ├── siamese_contexts.txt\n    |   └── triple_contexts.txt    \n    ├── test\n    |   ├── knn-siamese_contexts.txt\n    |   ├── knn_triple_contexts.txt\n    |   ├── siamese_contexts.txt\n    |   └── triple_contexts.txt\n    └── skipgram-vec200-mincount5-win5.bin\n    └── fasttext-vec200-mincount5-win5.bin\n    └── in_vovab (build during training)\n```\nIn each sub-folder,\u003cbr\u003e \n* ```siamese_contexts.txt``` file contains entities and contexts for the siamese architecture. Each line has five columns, seperated by \\t:\n```entity_a \\t entity_b \\t context_a1@@context_a2...context_an \\t context_b1@@context_b2@@...@@context_bn \\t  label```.\u003cbr\u003e\n    * ```entity_a``` and ```entity_b``` indicate two entities. e.g. ```u.s._government||m.01bqks||``` and ```united_states||m.01bqks||```.\n    * The next two columns indicate the contexts of two entities. e.g. ```context_a1@@context_a2...context_an``` indicates n pieces of contexts where ```entity_a``` is mentioned. ```@@``` is used to seperate contexts.\n    *  ```label``` is a binary value indicating synonymity.\n    \n* ```triple_contexts.txt``` file contains entities and contexts for the triplet architecture. Each line has six columns, seperated by \\t: \n```entity_a \\t entity_pos \\t entity_neg \\t context_a1@@context_a2...context_an \\t context_pos_1@@context_pos_2@@...@@context_pos_p \\t  context_neg_1@@context_neg_2@@...@@context_neg_q```.\u003cbr\u003e\n where ``entity_a`` denotes one entity and ```entity_pos``` denotes a synonym entity of ``entity_a`` and ```entity_neg``` as a negative sample of ``entity_a``.\n\n* ```*-vec200-mincount5-win5.bin``` is a binary file stores the pre-trained word embedding trained using the corpus in the dataset.\n\n* ```in_vocab``` is a vocabulary file generated automatically during training.\n\n### Download\nPre-trained word vectors and datasets can be downloaded here:\u003cbr\u003e \n\n| Dataset  | Link |\n| ------------- | ------------- |\n| Wikipedia + Freebase  | https://drive.google.com/open?id=1uX4KU6ws9xIIJjfpH2He-Yl5sPLYV0ws  |\n| PubMed + UMLS  | https://drive.google.com/open?id=1cWHhXVd_Pb4N3EFdpvn4Clk6HVeWKVfF |\n\n### Work on your own data\nPrepare and organize your dataset in a folder according to the [format](#format) and put it under ```./input_data/``` and use `--dataset=foldername` during training. \n\nFor example, your dataset is `./input_data/mydata`, then you need to use the flag `--dataset=mydata` for ```train_triplet.py```.\u003cbr\u003e\nYour dataset should be seperated to three folders - train, test, and valid, which is named 'train', 'test', and 'valid' by default setting of ```train_triplet.py``` or ```train_siamese.py```. \n   \n## Reference\n```\n@inproceedings{zhang2020entity,\n  title={Entity Synonym Discovery via Multipiece Bilateral Context Matching},\n  author={Zhang, Chenwei and Li, Yaliang and Du, Nan and Fan, Wei and Yu, Philip S},\n  booktitle={Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI)},\n  year={2020}\n}\n```\n","funding_links":[],"categories":["其他_NLP自然语言处理"],"sub_categories":["其他_文本生成、文本对话"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczhang99%2FSynonymNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fczhang99%2FSynonymNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fczhang99%2FSynonymNet/lists"}