https://github.com/plandes/amr
AMR annotation and feature generation
https://github.com/plandes/amr
abstract-meaning-representation amr machine-learning natural-language-processing nlp
Last synced: 6 months ago
JSON representation
AMR annotation and feature generation
- Host: GitHub
- URL: https://github.com/plandes/amr
- Owner: plandes
- License: mit
- Created: 2023-11-29T18:00:25.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2025-01-26T02:25:43.000Z (9 months ago)
- Last Synced: 2025-04-18T01:47:58.708Z (6 months ago)
- Topics: abstract-meaning-representation, amr, machine-learning, natural-language-processing, nlp
- Language: Python
- Homepage:
- Size: 694 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
Awesome Lists containing this project
README
# AMR annotation and feature generation
[![PyPI][pypi-badge]][pypi-link]
[![Python 3.11][python311-badge]][python311-link]
[![Build Status][build-badge]][build-link]Provides support for AMR graph manipulation, annotations and feature
generation.Features:
* Annotation in AMR metadata. For example, sentence types found in the Proxy
report AMR corpus.
* AMR token alignment as [spaCy] components.
* Integrates natural language parsing and features with Zensols
[zensols.nlparse] library.
* A scoring API that includes [Smatch] and [WLK], which extends a more general
[NLP scoring module].
* AMR parsing ([amrlib]) and AMR co-reference ([amr_coref]).
* Command line and API utilities for AMR graph Penman graphs, debugging and
files.
* Tools for [training and evaluating](training) new AMR parse (text to graph)
and generation (graph to text) models.
* A method for re-indexing and updating AMR graph variables so that all in a
document collection are unique.## Documentation
* [Full documentation](https://plandes.github.io/amr/index.html).
* [API reference](https://plandes.github.io/amr/api.html)## Installing
The library can be installed with pip from the [pypi] repository:
```bash
pip3 install zensols.amr
```### Installing the Gsii Model
The Gsii model link expires and requires a manual download of the model. To
install it, do the following:1. Download the [Gsii model] (click "direct download").
1. Move the file to the local directory.
1. Install the file by forcing a test parse:
```bash
amr parse 'Test sentence.' --override \
amr_parse_gsii_resource.url=file:model_parse_gsii-v0_1_0.tar.gz
```## Usage
```python
from penman.graph import Graph
from zensols.nlp import FeatureDocument, FeatureDocumentParser
from zensols.amr import AmrDocument, AmrSentence, Dumper, ApplicationFactorysent: str = """
He was George Washington and first president of the United States.
He was born On February 22, 1732.""".replace('\n', ' ').strip()
# get the AMR document parser
doc_parser: FeatureDocumentParser = ApplicationFactory.get_doc_parser()# the parser creates a NLP centric feature document as provided in the
# zensols.nlp package
doc: FeatureDocument = doc_parser(sent)# the AMR object graph data structure is provided in the feature document
amr_doc: AmrDocument = doc.amr# dump a human readable output of the AMR document
amr_doc.write()# get the first AMR sentence instance
amr_sent: AmrSentence = amr_doc.sents[0]
print('sentence:')
print(' ', amr_sent.text)
print('tuples:')# show the Penman graph representation
pgraph: Graph = amr_sent.graph
print(f'variables: {", ".join(pgraph.variables())}')
for t in pgraph.triples:
print(' ', t)
print('edges:')
for e in pgraph.edges():
print(' ', e)# visualize the graph as a PDF
dumper: Dumper = ApplicationFactory.get_dumper()
dumper(amr_doc)
```Per the example, the [t5.conf](test-resources/t5.conf) and
[gsii.conf](test-resources/gsii.conf) configuration show how to include
configuration needed per AMR model. These files can also be used directly with
the `amr` command using the `--config` option.However, the other resources in the example must be imported unless you
redefine them yourself.### Library
When adding the `amr` spaCy pipeline component, the `doc._.amr` attribute is
set on the `Doc` instance. You can either configure spaCy yourself, or you can
use the configuration files in [test-resources](test-resources) as an example
using the [zensols.util configuration framework]. The command line application
provides an example how to do this, along with the [test
case](test/python/test_amr.py).### Command Line
This library is written mostly to be used by other program, but the command
line utility `amr` is also available to demonstrate its usage and to generate
ARM graphs on the command line.To parse:
```lisp
$ amr parse -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
# ::snt This is a test of the AMR command line utility.
(t / test-01
:ARG1 (u / utility
:mod (c / command-line)
:name (n / name
:op1 "AMR"
:toki1 "6")
:toki1 "9")
:domain (t2 / this
:toki1 "0")
:toki1 "3")
```To generate graphs in PDF format:
```bash
$ amr plot -c test-resources/t5.conf 'This is a test of the AMR command line utility.'
wrote: amr-graph/this-is-a-test-of-the-amr-comm.pdf
```## Training
This package uses the [amrlib] training, but adds a command line and
downloadable corpus aggregation / API. To train:1. Choose a model (i.e. SPRING, T5).
1. Optionally edit the [train configuration](train-config) directory of the
model you choose.
1. Optionally edit the `resources/train.yml` to select/add more corpora (see
[Adding Corpora](adding-corpora)).
1. Train the model: `./amr --config train-config/.conf`### Pretrained Models
This library was used to train all of the [amrlib] models (using the same
checkpoints as [amrlib]), except the T5 Base v1 model, with additional
examples from publicly available human annotated corpora. The differences of
these trained models include:* None of the models were tested against a training set, only the development
SMATCH scores are available. This was intentional to provide more training
examples.
* The AMR Release 3.0 ([LDC2020T02]) test set was added to the training set.
* The [Little Prince and Bio AMR](https://amr.isi.edu/download.html) corpora
where used to train the models. The first 85% of the AMR sentences were
added to training set and the remaining 15% were added to the development
set.
* The mini-batch size changed for `generate-t5wtense-base` due to memory
constraints.
* The number of training epochs were increased to account for the additional
number of training examples.
* Models have the same naming conventions but are prefixed with `zsl`.
* Generative models were trained on graphs metadata annotated by the Sci spaCy
`en_core_sci_md` model.The performance of these models:
| Model Name | Model Type | Checkpoint | Performance |
|----------------------|------------|------------------------|---------------|
| `zsl_spring` | parse | [facebook/bart-large] | SMATCH: 81.26 |
| `zsl_xfm_bart_base` | parse | [facebook/bart-base] | SMATCH: 80.5 |
| `zsl_xfm_bart_large` | parse | [facebook/bart-large] | SMATCH: 82.7 |
| `zsl_t5wtense_base` | generative | [t5-base] | BLEU: 42.20 |
| `zsl_t5wtense_large` | generative | [google/flan-t5-large] | BLEU: 44.01 |These models are available upon request.
### Adding Corpora
You can retrain your own model and add additional training corpora by modifying
the list of `${amr_prep_manager:preppers}` in `resources/train.yml`. This file
defines downloaded corpora for the Little Prince and Bio AMR corpora. To use
the AMR 3.0 release, add the LDC downloaded file to (a new) `download`
directory.## Attribution
This project, or reference model code, uses:
* Python 3.11
* [amrlib] for AMR parsing.
* [amr_coref] for AMR co-reference
* [spaCy] for natural language parsing.
* [zensols.nlparse] for natural language features.
* [Smatch] (Cai and Knight. 2013) and [WLK] (Opitz et. al. 2021) for scoring.## Citation
If you use this project in your research please use the following BibTeX entry:
```bibtex
@inproceedings{landes-etal-2023-deepzensols,
title = "{D}eep{Z}ensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility",
author = "Landes, Paul and
Di Eugenio, Barbara and
Caragea, Cornelia",
editor = "Tan, Liling and
Milajevs, Dmitrijs and
Chauhan, Geeticka and
Gwinnup, Jeremy and
Rippeth, Elijah",
booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.nlposs-1.16",
pages = "141--146"
}
```## Changelog
An extensive changelog is available [here](CHANGELOG.md).
## Community
Please star this repository and let me know how and where you use this API.
Contributions as pull requests, feedback and any input is welcome.## License
[MIT License](LICENSE.md)
Copyright (c) 2021 - 2025 Paul Landes
[pypi]: https://pypi.org/project/zensols.amr/
[pypi-link]: https://pypi.python.org/pypi/zensols.amr
[pypi-badge]: https://img.shields.io/pypi/v/zensols.amr.svg
[python37-badge]: https://img.shields.io/badge/python-3.7-blue.svg
[python37-link]: https://www.python.org/downloads/release/python-370
[python38-badge]: https://img.shields.io/badge/python-3.8-blue.svg
[python38-link]: https://www.python.org/downloads/release/python-380
[python311-badge]: https://img.shields.io/badge/python-3.11-blue.svg
[python311-link]: https://www.python.org/downloads/release/python-3110
[build-badge]: https://github.com/plandes/amr/workflows/CI/badge.svg
[build-link]: https://github.com/plandes/amr/actions[spaCy]: https://spacy.io
[amrlib]: https://github.com/bjascob/amrlib
[amr_coref]: https://github.com/bjascob/amr_coref
[Smatch]: https://github.com/snowblink14/smatch
[WLK]: https://github.com/flipz357/weisfeiler-leman-amr-metrics
[zensols.nlparse]: https://github.com/plandes/nlparse
[zensols.util configuration framework]: https://plandes.github.io/util/doc/config.html
[NLP scoring module]: https://plandes.github.io/nlparse/api/zensols.nlp.html#zensols-nlp-score
[LDC2020T02]: https://catalog.ldc.upenn.edu/LDC2020T02[facebook/bart-large]: https://huggingface.co/facebook/bart-large
[facebook/bart-base]: https://huggingface.co/facebook/bart-base
[t5-base]: https://huggingface.co/google-t5/t5-base
[google/flan-t5-large]: https://huggingface.co/google/flan-t5-large
[Gsii model]: https://u.pcloud.link/publink/show?code=XZD2z0XZOqRtS2mNMHhMG4UhXOCNO4yzeaLk