https://github.com/IBM/transition-amr-parser

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.
https://github.com/IBM/transition-amr-parser

abstract-meaning-representation amr amr-graphs amr-parser amr-parsing machine-learning nlp semantic-parsing

Last synced: 2 months ago
JSON representation

SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.

Host: GitHub
URL: https://github.com/IBM/transition-amr-parser
Owner: IBM
License: apache-2.0
Created: 2019-10-08T01:03:49.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2025-01-01T16:44:30.000Z (6 months ago)
Last Synced: 2025-04-18T17:07:28.639Z (3 months ago)
Topics: abstract-meaning-representation, amr, amr-graphs, amr-parser, amr-parsing, machine-learning, nlp, semantic-parsing
Language: Python
Homepage:
Size: 5.48 MB
Stars: 251
Watchers: 11
Forks: 50
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        Transition-based Neural Parser

============================

State-of-the-Art Abstract Meaning Representation (AMR) parsing, see [papers

with code](https://paperswithcode.com/task/amr-parsing). Models both

distribution over graphs and aligments with a transition-based approach. Parser

supports generic text-to-graph as long as it is expressed in [Penman

notation](https://penman.readthedocs.io/en/latest/notation.html).

Some of the main features

- [Smatch](https://github.com/snowblink14/smatch) wrapper providing [significance testing](scripts/README.md#paired-boostrap-significance-test-for-Smatch) for Smatch and [MBSE](scripts/README.md#maximum-bayesian-smatch-ensemble-mbse) ensembling.

- `Structured-BART` [(Zhou et al 2021b)](https://aclanthology.org/2021.emnlp-main.507/) with [trained checkpoints](#available-pretrained-model-checkpoints) for document-level AMR [(Naseem et al 2022)](https://aclanthology.org/2022.naacl-main.256), MBSE [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) and latent alignments training [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464)

- `Structured-mBART` for multi-lingual support (EN, DE, Zh, IT) [(Lee et al 2022)](https://arxiv.org/abs/2112.07790)

- Action-Pointer Transformer (`APT`) [(Zhou et al 2021)](https://www.aclweb.org/anthology/2021.naacl-main.443), checkout `action-pointer` branch 

- `Stack-Transformer` [(Fernandez Astudillo et al 2020)](https://www.aclweb.org/anthology/2020.findings-emnlp.89), checkout `stack-Transformer` branch

## Install Instructions

create and activate a virtual environment with python 3.8, for example

```

conda create -y -p ./cenv_x86 python=3.8

conda activate ./cenv_x86

```

or alternatively use `virtualenv` and `pyenv` for python versions. Note that

all scripts source a `set_environment.sh` script that you can use to activate

your virtual environment as above and set environment variables. If not used,

just create an empty version

```bash

# or e.g. put inside conda activate ./cenv_x86

touch set_environment.sh

```

Then install the parser package using pip. You will need to manually install

`torch-scatter` since it is custom built for CUDA. Here we specify the

call for `torch 1.13.1` and `cuda 11.7`. See [torch-scatter

repository](https://pypi.org/project/torch-scatter/) to find the appropriate

installation instructions. 

**For MacOS users** 

(Please install the cpu version of torch-scatter; and model training is not fully supported here.)

```bash

pip install transition-neural-parser

# for linux users

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html

# for cpu installation for MacOS

# pip install torch-scatter

```

If you plan to edit the code, clone and install instead

```bash

# clone this repo (see link above), then

cd transition-neural-parser

pip install --editable .

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html

```

If you want to train a document-level AMR parser you will also need 

```bash

git clone https://github.com/IBM/docAMR.git

cd docAMR

pip install .

cd ..

```

## Parse with a pretrained model

Here is an example of how to download and use a pretrained AMR parser in Python

```python

from transition_amr_parser.parse import AMRParser

# Download and save a model named AMR3.0 to cache

parser = AMRParser.from_pretrained('AMR3-structbart-L')

tokens, positions = parser.tokenize('The girl travels and visits places')

# Use parse_sentence() for single sentences or parse_sentences() for a batch

annotations, machines = parser.parse_sentence(tokens)

# Print Penman notation

print(annotations)

# Print Penman notation without JAMR, with ISI

amr = machines.get_amr()

print(amr.to_penman(jamr=False, isi=True))

# Plot the graph (requires matplotlib)

amr.plot()

```

Note that Smatch does not support ISI-type alignments and gives worse results.

Set `isi=False` to remove them. 

You can also use the command line to run a pretrained model to parse a file:

```bash

amr-parse -c $in_checkpoint -i $input_file -o file.amr

```

Download models can invoked with `-m ` can be used as well.

Note that Smatch does not support ISI and gives worse results. Use `--no-isi`

to store alignments in `::alignments` meta data. Also use `--jamr` to add JAMR

annotations in meta-data. Use `--no-isi` to store alignments in `::alignments`

meta data. Also use `--jamr` to add JAMR annotations in meta-data.

## Document-level Parsing

This represents co-reference using `:same-as` edges. To change

the representation and merge the co-referent nodes as in the paper, please refer

to [the DocAMR repo](https://github.com/IBM/docAMR.git)

```python

from transition_amr_parser.parse import AMRParser

# Download and save the docamr model to cache

parser = AMRParser.from_pretrained('doc-sen-conll-amr-seed42')

# Sentences in the doc

doc = ["Hailey likes to travel." ,"She is going to London tomorrow.", "She will walk to Big Ben when she goes to London."]

# tokenize sentences if not already tokenized

tok_sentences = []

for sen in doc:

    tokens, positions = parser.tokenize(sen)

    tok_sentences.append(tokens)

# parse docs takes a list of docs as input

annotations, machines = parser.parse_docs([tok_sentences])

# Print Penman notation

print(annotations[0])

# Print Penman notation without JAMR, with ISI

amr = machines[0].get_amr()

print(amr.to_penman(jamr=False, isi=True))

# Plot the graph (requires matplotlib)

amr.plot()

```

To parse a document from the command line the input file `$doc_input_file` is a

text file where each line is a sentence in the document and there is a newline

('\n') separating every doc (even at the end) 

```bash

amr-parse -c $in_checkpoint --in-doc $doc_input_file -o file.docamr

```

## Available Pretrained Model Checkpoints

The models downloaded using `from_pretrained()` will be stored to the pytorch

cache folder under:

```python

cache_dir = torch.hub._get_torch_home()

```

This table shows you available pretrained model names to download;

| pretrained model name      | corresponding file name                               | paper                                                           | beam10-Smatch |

|:--------------------------:|:-----------------------------------------------------:|:---------------------------------------------------------------:|:-------------:|

| AMR3-structbart-L-smpl     | amr3.0-structured-bart-large-neur-al-sampling5-seed42 | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) PR     | 82.9 (beam1)  |

| AMR3-structbart-L          | amr3.0-structured-bart-large-neur-al-seed42           | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) MAP    | 82.6          |

| AMR2-structbart-L          | amr2.0-structured-bart-large-neur-al-seed42           | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) MAP    | 84.0          |

| AMR2-joint-ontowiki-seed42 | amr2joint_ontowiki2_g2g-structured-bart-large-seed42  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |

| AMR2-joint-ontowiki-seed43 | amr2joint_ontowiki2_g2g-structured-bart-large-seed43  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |

| AMR2-joint-ontowiki-seed44 | amr2joint_ontowiki2_g2g-structured-bart-large-seed44  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |

| AMR3-joint-ontowiki-seed42 | amr3joint_ontowiki2_g2g-structured-bart-large-seed42  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |

| AMR3-joint-ontowiki-seed43 | amr3joint_ontowiki2_g2g-structured-bart-large-seed43  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |

| AMR3-joint-ontowiki-seed44 | amr3joint_ontowiki2_g2g-structured-bart-large-seed44  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |

| doc-sen-conll-amr-seed42   | both_doc+sen_trainsliding_ws400x100-seed42            |                                                                 | 82.3¹/71.8 ²|              |

^{1 Smatch on AMR3.0 sentences}

^{2 Smatch on AMR3.0 Multi-Sentence dataset}

contact authors to obtain the trained `ibm-neural-aligner`. For the

ensemble we provide the three seeds. Following fairseq conventions, to run the

ensemble just give the three checkpoint paths joined by `:` to the normal

checkpoint argument `-c`. Note that the checkpoints were trained with the

`v0.5.1` tokenizer, this reduces performance by `0.1` on `v0.5.2` tokenized

data.

Note that we allways report average of three seeds in papers while these are

individual models. A fast way to test models standalone is

    bash tests/standalone.sh configs/.sh

## Training a model

You first need to pre-process and align the data. For AMR2.0 do

```bash

conda activate ./cenv_x86 # activate parser environment

python scripts/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/

```

You will also need to unzip the precomputed BLINK cache. See issues in this repository to get the cache file (or the link above for IBM-ers).

```

unzip /path/to/linkcache.zip

```

To launch train/test use (this will also run the aligner)

```

bash run/run_experiment.sh configs/amr2.0-structured-bart-large.sh

```

Training will store and evaluate all checkpoints by default (see config's

`EVAL_INIT_EPOCH`) and select the one with best dev Smatch. This needs a lot of

space but you can launch a parallel job that will perform evaluation and delete

Checkpoints not in the top `5` 

```

bash run/run_model_eval.sh configs/amr2.0-structured-bart-large.sh

```

you can check training status with

```

python run/status.py -c configs/amr2.0-structured-bart-large.sh

```

use `--results` to check for scores once models are finished.

We include code to launch parallel jobs in the LSF job schedules. This can be

adapted for other schedulers e.g. Slurm, see [here](run/lsf/README.md)

## Initialize with WatBART

To load WatBART instead of BART just uncomment and provide the path on

```

initialize_with_watbart=/path/to/checkpoint_best.pt

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/IBM/transition-amr-parser

Awesome Lists containing this project

README