{"id":13535492,"url":"https://github.com/malllabiisc/RESIDE","last_synced_at":"2025-04-02T01:30:59.586Z","repository":{"id":39738655,"uuid":"144647823","full_name":"malllabiisc/RESIDE","owner":"malllabiisc","description":"EMNLP 2018: RESIDE: Improving Distantly-Supervised Neural Relation Extraction using Side Information","archived":false,"fork":false,"pushed_at":"2023-03-24T22:44:45.000Z","size":5850,"stargazers_count":247,"open_issues_count":5,"forks_count":48,"subscribers_count":13,"default_branch":"master","last_synced_at":"2024-11-02T23:32:53.424Z","etag":null,"topics":["deep-learning","distant-supervision","graph-convolutional-networks","natural-language-processing","neural-relation-extraction","relation-extraction"],"latest_commit_sha":null,"homepage":"","language":"CSS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/malllabiisc.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-08-14T00:26:37.000Z","updated_at":"2024-01-04T16:25:24.000Z","dependencies_parsed_at":"2023-01-21T04:00:19.841Z","dependency_job_id":"25030dd0-cdae-4ab8-9b00-c884c2205c84","html_url":"https://github.com/malllabiisc/RESIDE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malllabiisc%2FRESIDE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malllabiisc%2FRESIDE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malllabiisc%2FRESIDE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/malllabiisc%2FRESIDE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/malllabiisc","download_url":"https://codeload.github.com/malllabiisc/RESIDE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246738490,"owners_count":20825788,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","distant-supervision","graph-convolutional-networks","natural-language-processing","neural-relation-extraction","relation-extraction"],"created_at":"2024-08-01T08:00:57.277Z","updated_at":"2025-04-02T01:30:58.691Z","avatar_url":"https://github.com/malllabiisc.png","language":"CSS","funding_links":[],"categories":["Relation Extraction:"],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  RESIDE\n\u003c/h1\u003e\n\n\u003ch4 align=\"center\"\u003eImproving Distantly-Supervised Neural Relation Extraction using Side Information\u003c/h4\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://2018.emnlp.org/\"\u003e\u003cimg src=\"http://img.shields.io/badge/EMNLP-2018-4b44ce.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://arxiv.org/abs/1812.04361\"\u003e\u003cimg src=\"http://img.shields.io/badge/Paper-PDF-red.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://vimeo.com/305199302\"\u003e\u003cimg src=\"http://img.shields.io/badge/Video-Vimeo-green.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://shikhar-vashishth.github.io/assets/pdf/reside_supp.pdf\"\u003e\u003cimg src=\"http://img.shields.io/badge/Supplementary-PDF-B31B1B.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://shikhar-vashishth.github.io/assets/pdf/reside_poster.pdf\"\u003e\u003cimg src=\"http://img.shields.io/badge/Poster-PDF-9cf.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://shikhar-vashishth.github.io/assets/pdf/slides_reside.pdf\"\u003e\u003cimg src=\"http://img.shields.io/badge/Slides-PDF-orange.svg\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/malllabiisc/RESIDE/blob/master/LICENSE\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/License-Apache%202.0-blue.svg\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003ch2 align=\"center\"\u003e\n  Overview of RESIDE\n  \u003cimg align=\"center\"  src=\"./images/overview.png\" alt=\"...\"\u003e\n\u003c/h2\u003e\n\nRESIDE first encodes each sentence in the bag by concatenating embeddings (denoted by ⊕) from Bi-GRU and Syntactic GCN for each token, followed by word attention. Then, sentence embedding is concatenated with relation alias information, which comes from the Side Information Acquisition Section, before computing attention over sentences. Finally, bag representation with entity type information is fed to a softmax classifier. Please refer to paper for more details.\n\nAlso includes implementation of [PCNN](http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP203.pdf), [PCNN+ATT](https://www.aclweb.org/anthology/P16-1200), [CNN](https://www.aclweb.org/anthology/C14-1220), CNN+ATT, and [BGWA](https://arxiv.org/pdf/1804.06987.pdf) models.\n\n### Dependencies\n\n- Compatible with TensorFlow 1.x and Python 3.x.\n- Dependencies can be installed using `requirements.txt`.\n\n### Dataset:\n\n- We use [Riedel NYT](http://iesl.cs.umass.edu/riedel/ecml/) and [Google IISc Distant Supervision (GIDS)](https://arxiv.org/pdf/1804.06987.pdf) dataset​ for evaluation.\n\n- Datasets in json list format with side information can be downloaded from here: [RiedelNYT](https://drive.google.com/open?id=1D7bZPvrSAbIPaFSG7ZswYQcPA3tmouCw) and [GIDS](https://drive.google.com/open?id=1gTNAbv8My2QDmP-OHLFtJFlzPDoCG4aI).  \n\n- The processed version of the datasets can be downloaded from [RiedelNYT](https://drive.google.com/file/d/1UD86c_6O_NSBn2DYirk6ygaHy_fTL-hN/view?usp=sharing) and [GIDS](https://drive.google.com/file/d/1UMS4EmWv5SWXfaSl_ZC4DcT3dk3JyHeq/view?usp=sharing). The structure of the processed input data is as follows.\n\n  ```java\n  {\n      \"voc2id\":   {\"w1\": 0, \"w2\": 1, ...},\n      \"type2id\":  {\"type1\": 0, \"type2\": 1 ...},\n      \"rel2id\":   {\"NA\": 0, \"/location/neighborhood/neighborhood_of\": 1, ...}\n      \"max_pos\": 123,\n      \"train\": [\n          {\n              \"X\":        [[s1_w1, s1_w2, ...], [s2_w1, s2_w2, ...], ...],\n              \"Y\":        [bag_label],\n              \"Pos1\":     [[s1_p1_1, sent1_p1_2, ...], [s2_p1_1, s2_p1_2, ...], ...],\n              \"Pos2\":     [[s1_p2_1, sent1_p2_2, ...], [s2_p2_1, s2_p2_2, ...], ...],\n              \"SubPos\":   [s1_sub, s2_sub, ...],\n              \"ObjPos\":   [s1_obj, s2_obj, ...],\n              \"SubType\":  [s1_subType, s2_subType, ...],\n              \"ObjType\":  [s1_objType, s2_objType, ...],\n              \"ProbY\":    [[s1_rel_alias1, s1_rel_alias2, ...], [s2_rel_alias1, ... ], ...]\n              \"DepEdges\": [[s1_dep_edges], [s2_dep_edges] ...]\n          },\n          {}, ...\n      ],\n      \"test\":  { same as \"train\"},\n      \"valid\": { same as \"train\"},\n  }\n  ```\n\n  * `voc2id` is the mapping of word to its id\n  * `type2id` is the maping of entity type to its id.\n  * `rel2id` is the mapping of relation to its id. \n  * `max_pos` is the maximum position to consider for positional embeddings.\n  * Each entry of `train`, `test` and `valid` is a bag of sentences, where\n    * `X` denotes the sentences in bag as the list of list of word indices.\n    * `Y` is the relation expressed by the sentences in the bag.\n    * `Pos1` and `Pos2` are position of each word in sentences wrt to target entity 1 and entity 2.\n    * `SubPos` and `ObjPos` contains the position of the target entity 1 and entity 2 in each sentence.\n    * `SubType` and `ObjType` contains the target entity 1 and entity 2 type information obtained from KG.\n    * `ProbY` is the relation alias side information (refer paper) for the bag.\n    * `DepEdges` is the edgelist of dependency parse for each sentence (required for GCN).\n\n### Evaluate pretrained model:\n\n- `reside.py` contains TensorFlow (1.x) based implementation of **RESIDE** (proposed method).\n- Download the pretrained model's parameters from [RiedelNYT](https://drive.google.com/file/d/1CUk10FTncaaZspAoh8YkHTML3RJHfW7e/view?usp=sharing) and [GIDS](https://drive.google.com/file/d/1X5pKkL6eOkGXw39baq0n9noBXa--5EhE/view?usp=sharing) (put downloaded folders in `checkpoint` directory). \n- Execute `evaluate.sh` for comparing pretrained **RESIDE** model against baselines (plots Precision-Recall curve). \n\n### Side Information:\n\n- **Entity Type** information for both the datasets is provided in `side_info/type_info.zip`. \n  * Entity type information can be used directly in the model.\n- **Relation Alias Information** for both the datasets is provided in `side_info/relation_alias.zip`.\n  * The preprocessing code for using relation alias information: `rel_alias_side_info.py`. \n  * Following figure summarizes the method:\n  ![](https://github.com/malllabiisc/RESIDE/blob/master/images/relation_alias.png)\n\n### Training from scratch:\n- Execute `setup.sh` for downloading GloVe embeddings.\n- For training **RESIDE** run:\n  ```shell\n  python reside.py -data data/riedel_processed.pkl -name new_run\n  ```\n\n* The above model needs to be further trained with SGD optimizer for few epochs to match the performance reported in the paper. For that execute\n\n  ```shell\n  python reside.py -name new_run -restore -opt sgd -lr 0.001 -l2 0.0 -epoch 4\n  ```\n\n* Finally, run `python plot_pr.py -name new_run` to get the plot.\n\n### Baselines:\n\n* The repository also includes code for [PCNN](http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP203.pdf), [PCNN+ATT](https://www.aclweb.org/anthology/P16-1200), [CNN](https://www.aclweb.org/anthology/C14-1220), CNN+ATT, [BGWA](https://arxiv.org/pdf/1804.06987.pdf) models.\n\n* For training **PCNN+ATT**:\n\n  ```shell\n  python pcnnatt.py -data data/riedel_processed.pkl -name new_run -attn # remove -attn for PCNN\n  ```\n\n  \n\n* Similarly for training **CNN+ATT**:\n\n  ```shell\n  python cnnatt.py -data data/riedel_processed.pkl -name new_run # remove -attn for CNN\n  ```\n\n* For training **BGWA**:\n\n  ```shell\n  python bgwa.py -data data/riedel_processed.pkl -name new_run\n  ```\n\n### Preprocessing a new dataset:\n\n* `preproc` directory contains code for getting a new dataset in the required format (`riedel_processed.pkl`) for `reside.py`.\n* Get the data in the same format as followed in [riedel_raw](https://drive.google.com/file/d/1D7bZPvrSAbIPaFSG7ZswYQcPA3tmouCw/view?usp=sharing) or [gids_raw](https://drive.google.com/open?id=1gTNAbv8My2QDmP-OHLFtJFlzPDoCG4aI) for `Riedel NYT` dataset.\n* Finally, run the script `preprocess.sh`.  `make_bags.py` is used for generating bags from sentence. `generate_pickle.py` is for converting the data in the required pickle format.\n\n### Running pretrained model on new samples:\n\n- The code for running pretrained model on a sample is included in `online` directory.\n\n- A [flask](http://flask.pocoo.org/) based server is also provided. Use `python online/server.py` to start the server.\n\n  - [riedel_test_bags.json](https://drive.google.com/open?id=1tIczJKU5NrZJvR-XHUEh7IrFrvbS_aHn) and other required [files](https://drive.google.com/open?id=17UNttRDo14O_Zgfr6y9tvY57fc0BGEjw) can be downloaded from the provided links.\n\n  ![](./images/demo.png)\n\n### Citation:\nPlease cite the following paper if you use this code in your work.\n\n```bibtex\n@inproceedings{reside2018,\n  author = \t\"Vashishth, Shikhar and \n  \t\tJoshi, Rishabh and\n\t\tPrayaga, Sai Suman and\n\t\tBhattacharyya, Chiranjib and\n\t\tTalukdar, Partha\",\n  title = \t\"{RESIDE}: Improving Distantly-Supervised Neural Relation Extraction using Side Information\",\n  booktitle = \t\"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing\",\n  month = \toct # \"-\" # nov,\n  address = \t\"Brussels, Belgium\",\n  year = \t\"2018\",\n  publisher = \t\"Association for Computational Linguistics\",\n  pages = \t\"1257--1266\",\n  url = \t\"http://aclweb.org/anthology/D18-1157\"\n}\n```\n\nFor any clarification, comments, or suggestions please create an issue or contact [Shikhar](http://shikhar-vashishth.github.io).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalllabiisc%2FRESIDE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmalllabiisc%2FRESIDE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmalllabiisc%2FRESIDE/lists"}