https://github.com/plandes/amr

AMR annotation and feature generation
https://github.com/plandes/amr
abstract-meaning-representation amr machine-learning natural-language-processing nlp
Last synced: 6 months ago
JSON representation
AMR annotation and feature generation
Host: GitHub
URL: https://github.com/plandes/amr
Owner: plandes
License: mit
Created: 2023-11-29T18:00:25.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2025-01-26T02:25:43.000Z (9 months ago)
Last Synced: 2025-04-18T01:47:58.708Z (6 months ago)
Topics: abstract-meaning-representation, amr, machine-learning, natural-language-processing, nlp
Language: Python
Homepage:
Size: 694 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project

README

          # AMR annotation and feature generation

[![PyPI][pypi-badge]][pypi-link]

[![Python 3.11][python311-badge]][python311-link]

[![Build Status][build-badge]][build-link]

Provides support for AMR graph manipulation, annotations and feature

generation.

Features:

* Annotation in AMR metadata.  For example, sentence types found in the Proxy

  report AMR corpus.

* AMR token alignment as [spaCy] components.

* Integrates natural language parsing and features with Zensols

  [zensols.nlparse] library.

* A scoring API that includes [Smatch] and [WLK], which extends a more general

  [NLP scoring module].

* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).

* Command line and API utilities for AMR graph Penman graphs, debugging and

  files.

* Tools for [training and evaluating](training) new AMR parse (text to graph)

  and generation (graph to text) models.

* A method for re-indexing and updating AMR graph variables so that all in a

  document collection are unique.

## Documentation

* [Full documentation](https://plandes.github.io/amr/index.html).

* [API reference](https://plandes.github.io/amr/api.html)

## Installing

The library can be installed with pip from the [pypi] repository:

```bash

pip3 install zensols.amr

```

### Installing the Gsii Model

The Gsii model link expires and requires a manual download of the model.  To

install it, do the following:

1. Download the [Gsii model] (click "direct download").

1. Move the file to the local directory.

1. Install the file by forcing a test parse:

   ```bash

   amr parse 'Test sentence.' --override \

       amr_parse_gsii_resource.url=file:model_parse_gsii-v0_1_0.tar.gz

   ```

## Usage

```python

from penman.graph import Graph

from zensols.nlp import FeatureDocument, FeatureDocumentParser

from zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactory

sent: str = """

He was George Washington and first president of the United States.

He was born On February 22, 1732.

""".replace('\n', ' ').strip()

# get the AMR document parser

doc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()

# the parser creates a NLP centric feature document as provided in the

# zensols.nlp package

doc: FeatureDocument = doc_parser(sent)

# the AMR object graph data structure is provided in the feature document

amr_doc: AmrDocument = doc.amr

# dump a human readable output of the AMR document

amr_doc.write()

# get the first AMR sentence instance

amr_sent: AmrSentence = amr_doc.sents[0]

print('sentence:')

print(' ', amr_sent.text)

print('tuples:')

# show the Penman graph representation

pgraph: Graph = amr_sent.graph

print(f'variables: {", ".join(pgraph.variables())}')

for t in pgraph.triples:

    print(' ', t)

print('edges:')

for e in pgraph.edges():

    print(' ', e)

# visualize the graph as a PDF

dumper: Dumper = ApplicationFactory.get_dumper()

dumper(amr_doc)

```

Per the example, the [t5.conf](test-resources/t5.conf) and

[gsii.conf](test-resources/gsii.conf) configuration show how to include

configuration needed per AMR model.  These files can also be used directly with

the `amr` command using the `--config` option.

However, the other resources in the example must be imported unless you

redefine them yourself.

### Library

When adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is

set on the `Doc` instance.  You can either configure spaCy yourself, or you can

use the configuration files in [test-resources](test-resources) as an example

using the [zensols.util configuration framework].  The command line application

provides an example how to do this, along with the [test

case](test/python/test_amr.py).

### Command Line

This library is written mostly to be used by other program, but the command

line utility `amr` is also available to demonstrate its usage and to generate

ARM graphs on the command line.

To parse:

```lisp

$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'

# ::snt This is a test of the AMR command line utility.

(t / test-01

   :ARG1 (u / utility

            :mod (c / command-line)

            :name (n / name

                     :op1 "AMR"

                     :toki1 "6")

            :toki1 "9")

   :domain (t2 / this

               :toki1 "0")

   :toki1 "3")

```

To generate graphs in PDF format:

```bash

$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'

wrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf

```

## Training

This package uses the [amrlib] training, but adds a command line and

downloadable corpus aggregation / API.  To train:

1. Choose a model (i.e. SPRING, T5).

1. Optionally edit the [train configuration](train-config) directory of the

   model you choose.

1. Optionally edit the `resources/train.yml` to select/add more corpora (see

   [Adding Corpora](adding-corpora)).

1. Train the model: `./amr --config train-config/.conf`

### Pretrained Models

This library was used to train all of the [amrlib] models (using the same

checkpoints as [amrlib]), except the T5 Base v1 model, with additional

examples from publicly available human annotated corpora.  The differences of

these trained models include:

* None of the models were tested against a training set, only the development

  SMATCH scores are available.  This was intentional to provide more training

  examples.

* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.

* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora

  where used to train the models.  The first 85% of the AMR sentences were

  added to training set and the remaining 15% were added to the development

  set.

* The mini-batch size changed for `generate-t5wtense-base` due to memory

  constraints.

* The number of training epochs were increased to account for the additional

  number of training examples.

* Models have the same naming conventions but are prefixed with `zsl`.

* Generative models were trained on graphs metadata annotated by the Sci spaCy

  `en_core_sci_md` model.

The performance of these models:

| Model Name           | Model Type | Checkpoint             | Performance   |

|----------------------|------------|------------------------|---------------|

| `zsl_spring`         | parse      | [facebook/bart-large]  | SMATCH: 81.26 |

| `zsl_xfm_bart_base`  | parse      | [facebook/bart-base]   | SMATCH: 80.5  |

| `zsl_xfm_bart_large` | parse      | [facebook/bart-large]  | SMATCH: 82.7  |

| `zsl_t5wtense_base`  | generative | [t5-base]              | BLEU: 42.20   |

| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01   |

These models are available upon request.

### Adding Corpora

You can retrain your own model and add additional training corpora by modifying

the list of `${amr_prep_manager:preppers}` in `resources/train.yml`.  This file

defines downloaded corpora for the Little Prince and Bio AMR corpora.  To use

the AMR 3.0 release, add the LDC downloaded file to (a new) `download`

directory.

## Attribution

This project, or reference model code, uses:

* Python 3.11

* [amrlib] for AMR parsing.

* [amr_coref] for AMR co-reference

* [spaCy] for natural language parsing.

* [zensols.nlparse] for natural language features.

* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.

## Citation

If you use this project in your research please use the following BibTeX entry:

```bibtex

@inproceedings{landes-etal-2023-deepzensols,

	title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",

	author = "Landes, Paul  and

	  Di Eugenio, Barbara  and

	  Caragea, Cornelia",

	editor = "Tan, Liling  and

	  Milajevs, Dmitrijs  and

	  Chauhan, Geeticka  and

	  Gwinnup, Jeremy  and

	  Rippeth, Elijah",

	booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",

	month = dec,

	year = "2023",

	address = "Singapore, Singapore",

	publisher = "Association for Computational Linguistics",

	url = "https://aclanthology.org/2023.nlposs-1.16",

	pages = "141--146"

}

```

## Changelog

An extensive changelog is available [here](CHANGELOG.md).

## Community

Please star this repository and let me know how and where you use this API.

Contributions as pull requests, feedback and any input is welcome.

## License

[MIT License](LICENSE.md)

Copyright (c) 2021 - 2025 Paul Landes

[pypi]: https://pypi.org/project/zensols.amr/

[pypi-link]: https://pypi.python.org/pypi/zensols.amr

[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg

[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg

[python37-link]: https://www.python.org/downloads/release/python-370

[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg

[python38-link]: https://www.python.org/downloads/release/python-380

[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg

[python311-link]: https://www.python.org/downloads/release/python-3110

[build-badge]: https://github.com/plandes/amr/workflows/CI/badge.svg

[build-link]: https://github.com/plandes/amr/actions

[spaCy]: https://spacy.io

[amrlib]: https://github.com/bjascob/amrlib

[amr_coref]: https://github.com/bjascob/amr_coref

[Smatch]: https://github.com/snowblink14/smatch

[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics

[zensols.nlparse]: https://github.com/plandes/nlparse

[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html

[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score

[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02

[facebook/bart-large]: https://huggingface.co/facebook/bart-large

[facebook/bart-base]: https://huggingface.co/facebook/bart-base

[t5-base]: https://huggingface.co/google-t5/t5-base

[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large

[Gsii model]: https://u.pcloud.link/publink/show?code=XZD2z0XZOqRtS2mNMHhMG4UhXOCNO4yzeaLk
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/plandes/amr

Awesome Lists containing this project

README