{"id":31944443,"url":"https://github.com/johngiorgi/seq2rel","last_synced_at":"2025-10-14T10:29:39.801Z","repository":{"id":36980574,"uuid":"327695506","full_name":"JohnGiorgi/seq2rel","owner":"JohnGiorgi","description":"The corresponding code for our paper: A sequence-to-sequence approach for document-level relation extraction.","archived":false,"fork":false,"pushed_at":"2024-05-27T17:53:02.000Z","size":1781,"stargazers_count":58,"open_issues_count":17,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-05-28T00:33:46.360Z","etag":null,"topics":["allen","coreference-resolution","entity-extraction","information-extraction","named-entity-recognition","pytorch","relation-extraction","seq2rel","seq2seq"],"latest_commit_sha":null,"homepage":"https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JohnGiorgi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-07T18:36:48.000Z","updated_at":"2024-05-27T17:53:02.000Z","dependencies_parsed_at":"2024-05-27T18:07:45.753Z","dependency_job_id":null,"html_url":"https://github.com/JohnGiorgi/seq2rel","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/JohnGiorgi/seq2rel","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnGiorgi%2Fseq2rel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnGiorgi%2Fseq2rel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnGiorgi%2Fseq2rel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnGiorgi%2Fseq2rel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JohnGiorgi","download_url":"https://codeload.github.com/JohnGiorgi/seq2rel/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JohnGiorgi%2Fseq2rel/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279018780,"owners_count":26086452,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allen","coreference-resolution","entity-extraction","information-extraction","named-entity-recognition","pytorch","relation-extraction","seq2rel","seq2seq"],"created_at":"2025-10-14T10:29:36.485Z","updated_at":"2025-10-14T10:29:39.794Z","avatar_url":"https://github.com/JohnGiorgi.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# seq2rel: A sequence-to-sequence approach for document-level relation extraction\n\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-sequence-to-sequence-approach-for-document/joint-entity-and-relation-extraction-on-cdr)](https://paperswithcode.com/sota/joint-entity-and-relation-extraction-on-cdr?p=a-sequence-to-sequence-approach-for-document)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-sequence-to-sequence-approach-for-document/joint-entity-and-relation-extraction-on-gda)](https://paperswithcode.com/sota/joint-entity-and-relation-extraction-on-gda?p=a-sequence-to-sequence-approach-for-document)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-sequence-to-sequence-approach-for-document/joint-entity-and-relation-extraction-on-3)](https://paperswithcode.com/sota/joint-entity-and-relation-extraction-on-3?p=a-sequence-to-sequence-approach-for-document)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-sequence-to-sequence-approach-for-document/relation-extraction-on-gda)](https://paperswithcode.com/sota/relation-extraction-on-gda?p=a-sequence-to-sequence-approach-for-document)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-sequence-to-sequence-approach-for-document/relation-extraction-on-cdr)](https://paperswithcode.com/sota/relation-extraction-on-cdr?p=a-sequence-to-sequence-approach-for-document)\n\n---\n\n[![ci](https://github.com/JohnGiorgi/seq2rel/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/JohnGiorgi/seq2rel/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/JohnGiorgi/seq2rel/branch/main/graph/badge.svg?token=RKJ7EV4WQK)](https://codecov.io/gh/JohnGiorgi/seq2rel)\n[![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/)\n![GitHub](https://img.shields.io/github/license/JohnGiorgi/seq2rel?color=blue)\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py)\n\nThe corresponding code for our paper: [A sequence-to-sequence approach for document-level relation extraction](https://aclanthology.org/2022.bionlp-1.2/). Check out our demo [here](https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py)!\n\n## Table of contents\n\n- [seq2rel: A sequence-to-sequence approach for document-level relation extraction](#seq2rel-a-sequence-to-sequence-approach-for-document-level-relation-extraction)\n  - [Table of contents](#table-of-contents)\n  - [Notebooks](#notebooks)\n  - [Installation](#installation)\n    - [Setting up a virtual environment](#setting-up-a-virtual-environment)\n    - [Installing the library and dependencies](#installing-the-library-and-dependencies)\n  - [Usage](#usage)\n    - [Preparing a dataset](#preparing-a-dataset)\n    - [Training](#training)\n    - [Inference](#inference)\n      - [Running the demo locally](#running-the-demo-locally)\n    - [Reproducing results](#reproducing-results)\n  - [Citing](#citing)\n\n## Notebooks\n\nThe easiest way to get started is to follow along with one of our [notebooks](notebooks):\n\n- Training your own model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnGiorgi/seq2rel/blob/main/notebooks/training.ipynb)\n- Reproducing results [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnGiorgi/seq2rel/blob/main/notebooks/reproducing_results.ipynb)\n\nOr to open the demo:\n\n[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/johngiorgi/seq2rel/main/demo.py)\n\n\u003e __Note__: Unfortunately, the demo is liable to crash as the free resources provided by Streamlit are insufficient to run the model. To run the demo locally, please follow the [instructions below](#running-the-demo-locally).\n\n## Installation\n\nThis repository requires Python 3.8 or later.\n\n### Setting up a virtual environment\n\nBefore installing, you should create and activate a Python virtual environment. If you need pointers on setting up a virtual environment, please see the [AllenNLP install instructions](https://github.com/allenai/allennlp#setting-up-a-virtual-environment).\n\n### Installing the library and dependencies\n\nIf you _do not_ plan on modifying the source code, install from `git` using `pip`\n\n```bash\npip install git+https://github.com/JohnGiorgi/seq2rel.git\n```\n\nOtherwise, clone the repository and install from source using [Poetry](https://python-poetry.org/):\n\n```bash\n# Install poetry for your system: https://python-poetry.org/docs/#installation\n# E.g. for Linux, macOS, Windows (WSL)\ncurl -sSL https://install.python-poetry.org | python3 -\n\n# Clone and move into the repo\ngit clone https://github.com/JohnGiorgi/seq2rel\ncd seq2rel\n\n# Install the package with poetry\npoetry install\n```\n\n## Usage\n\n### Preparing a dataset\n\nDatasets are tab-separated files where each example is contained on its own line. The first column contains the text, and the second column contains the relations. Relations themselves must be serialized to strings.\n\nTake the following example, which expresses a _gene-disease association_ (`\"@GDA@\"`) between _ESR1_ (`\"@GENE@\"`) and _schizophrenia_ (`\"@DISEASE@`\")\n\n```\nVariants in the estrogen receptor alpha (ESR1) gene and its mRNA contribute to risk for schizophrenia. estrogen receptor alpha ; ESR1 @GENE@ schizophrenia @DISEASE@ @GDA@\n```\n\nFor convenience, we provide a second package, [seq2rel-ds](https://github.com/JohnGiorgi/seq2rel-ds), which makes it easy to generate data in this format for various popular corpora. See [our paper](https://aclanthology.org/2022.bionlp-1.2/) for more details on serializing relations.\n\n### Training\n\nTo train the model, use the [`allennlp train`](https://docs.allennlp.org/main/api/commands/train/) command with [one of our configs](https://github.com/JohnGiorgi/seq2rel/tree/main/training_config) (or write your own!)\n\nFor example, to train a model on the [BioCreative V CDR task corpus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/), first, preprocess this data with [seq2rel-ds](https://github.com/JohnGiorgi/seq2rel-ds)\n\n```bash\nseq2rel-ds cdr main \"path/to/preprocessed/cdr\"\n```\n\nThen, call `allennlp train` with the [CDR config we have provided](https://github.com/JohnGiorgi/seq2rel/tree/main/training_config/cdr.jsonnet)\n\n```bash\ntrain_data_path=\"path/to/preprocessed/cdr/train.tsv\" \\\nvalid_data_path=\"path/to/preprocessed/cdr/valid.tsv\" \\\ndataset_size=500 \\\nallennlp train \"training_config/cdr.jsonnet\" \\\n    --serialization-dir \"output\" \\\n    --include-package \"seq2rel\" \n```\n\nThe best model checkpoint (measured by micro-F1 score on the validation set), vocabulary, configuration, and log files will be saved to `--serialization-dir`. This can be changed to any directory you like. Please see the [training](https://colab.research.google.com/github/JohnGiorgi/seq2rel/blob/main/notebooks/training.ipynb) notebook for more details.\n\n### Inference\n\nTo use the model to extract relations, import `Seq2Rel` and pass it some text\n\n```python\nfrom seq2rel import Seq2Rel\nfrom seq2rel.common import util\n\n# Pretrained models are stored on GitHub and will be downloaded and cached automatically.\n# See: https://github.com/JohnGiorgi/seq2rel/releases/tag/pretrained-models.\npretrained_model = \"gda\"\n\n# Models are loaded via a simple interface\nseq2rel = Seq2Rel(pretrained_model)\n\n# Flexible inputs. You can provide...\n# - a string\n# - a list of strings\n# - a text file (local path or URL)\ninput_text = \"Variations in the monoamine oxidase B (MAOB) gene are associated with Parkinson's disease (PD).\"\n\n# Pass any of these to the model to generate the raw output\noutput = seq2rel(input_text)\noutput == [\"monoamine oxidase b ; maob @GENE@ parkinson's disease ; pd @DISEASE@ @GDA@\"]\n\n# To get a more structured (and useful!) output, use the `extract_relations` function\nextract_relations = util.extract_relations(output)\nextract_relations == [\n  {\n    \"GDA\": [\n      (((\"monoamine oxidase b\", \"maob\"), \"GENE\"),\n      ((\"parkinson's disease\", \"pd\"), \"DISEASE\"))\n    ]\n  }\n]\n```\n\nSee the list of available `PRETRAINED_MODELS` in [seq2rel/seq2rel.py](seq2rel/seq2rel.py)\n\n```bash\npython -c \"from seq2rel import PRETRAINED_MODELS ; print(list(PRETRAINED_MODELS.keys()))\"\n```\n\n#### Running the demo locally\n\nTo run the demo locally, you will need to additionally install `streamlit` and `pyvis` (see [here](https://github.com/JohnGiorgi/seq2rel/blob/f757d6cc9da87ac527a9485d54843b6a5739657f/pyproject.toml#L58)), then\n\n```bash\nstreamlit run demo.py\n```\n\n### Reproducing results\n\nTo reproduce the main results of the paper, use the [`allennlp evaluate`](https://docs.allennlp.org/main/api/commands/evaluate/) command with [one of our pretrained models](https://github.com/JohnGiorgi/seq2rel/releases/tag/pretrained-models)\n\nFor example, to reproduce our results on the [BioCreative V CDR task corpus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4860626/), first, preprocess this data with [seq2rel-ds](https://github.com/JohnGiorgi/seq2rel-ds)\n\n```bash\nseq2rel-ds cdr main \"path/to/preprocessed/cdr\"\n```\n\nThen, call `allennlp evaluate` with the [pretrained CDR model](https://github.com/JohnGiorgi/seq2rel/releases/download/pretrained-models/cdr.tar.gz)\n\n```bash\nallennlp evaluate \"https://github.com/JohnGiorgi/seq2rel/releases/download/pretrained-models/cdr.tar.gz\" \\\n    \"path/to/preprocessed/cdr/test.tsv\" \\\n    --output-file \"output/test_metrics.jsonl\" \\\n    --cuda-device 0 \\\n    --predictions-output-file \"output/test_predictions.jsonl\" \\\n    --include-package \"seq2rel\"\n```\n\nThe results and predictions will be saved to `--output-file` and `--predictions-output-file`. Please see the [reproducing-results](https://colab.research.google.com/github/JohnGiorgi/seq2rel/blob/main/notebooks/reproducing_results.ipynb) notebook for more details.\n\n## Citing\n\nIf you use seq2rel in your work, please consider citing our paper:\n\n```\n@inproceedings{giorgi-etal-2022-sequence,\n\ttitle        = {A sequence-to-sequence approach for document-level relation extraction},\n\tauthor       = {Giorgi, John and Bader, Gary and Wang, Bo},\n\tyear         = 2022,\n\tmonth        = may,\n\tbooktitle    = {Proceedings of the 21st Workshop on Biomedical Language Processing},\n\tpublisher    = {Association for Computational Linguistics},\n\taddress      = {Dublin, Ireland},\n\tpages        = {10--25},\n\tdoi          = {10.18653/v1/2022.bionlp-1.2},\n\turl          = {https://aclanthology.org/2022.bionlp-1.2}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohngiorgi%2Fseq2rel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohngiorgi%2Fseq2rel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohngiorgi%2Fseq2rel/lists"}