{"id":13675620,"url":"https://github.com/IBM/transition-amr-parser","last_synced_at":"2025-04-28T23:30:49.259Z","repository":{"id":45059592,"uuid":"213517609","full_name":"IBM/transition-amr-parser","owner":"IBM","description":"SoTA Abstract Meaning Representation (AMR) parsing with word-node alignments in Pytorch. Includes checkpoints and other tools such as statistical significance Smatch.","archived":false,"fork":false,"pushed_at":"2025-01-01T16:44:30.000Z","size":5748,"stargazers_count":251,"open_issues_count":12,"forks_count":50,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-04-18T17:07:28.639Z","etag":null,"topics":["abstract-meaning-representation","amr","amr-graphs","amr-parser","amr-parsing","machine-learning","nlp","semantic-parsing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IBM.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-08T01:03:49.000Z","updated_at":"2025-04-04T02:09:25.000Z","dependencies_parsed_at":"2025-01-02T01:31:08.478Z","dependency_job_id":null,"html_url":"https://github.com/IBM/transition-amr-parser","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ftransition-amr-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ftransition-amr-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ftransition-amr-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBM%2Ftransition-amr-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IBM","download_url":"https://codeload.github.com/IBM/transition-amr-parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251404426,"owners_count":21584090,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstract-meaning-representation","amr","amr-graphs","amr-parser","amr-parsing","machine-learning","nlp","semantic-parsing"],"created_at":"2024-08-02T12:00:50.497Z","updated_at":"2025-04-28T23:30:44.250Z","avatar_url":"https://github.com/IBM.png","language":"Python","funding_links":[],"categories":["Papers and Models","Python"],"sub_categories":[],"readme":"Transition-based Neural Parser\n============================\n\nState-of-the-Art Abstract Meaning Representation (AMR) parsing, see [papers\nwith code](https://paperswithcode.com/task/amr-parsing). Models both\ndistribution over graphs and aligments with a transition-based approach. Parser\nsupports generic text-to-graph as long as it is expressed in [Penman\nnotation](https://penman.readthedocs.io/en/latest/notation.html).\n\nSome of the main features\n\n- [Smatch](https://github.com/snowblink14/smatch) wrapper providing [significance testing](scripts/README.md#paired-boostrap-significance-test-for-Smatch) for Smatch and [MBSE](scripts/README.md#maximum-bayesian-smatch-ensemble-mbse) ensembling.\n- `Structured-BART` [(Zhou et al 2021b)](https://aclanthology.org/2021.emnlp-main.507/) with [trained checkpoints](#available-pretrained-model-checkpoints) for document-level AMR [(Naseem et al 2022)](https://aclanthology.org/2022.naacl-main.256), MBSE [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) and latent alignments training [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464)\n- `Structured-mBART` for multi-lingual support (EN, DE, Zh, IT) [(Lee et al 2022)](https://arxiv.org/abs/2112.07790)\n- Action-Pointer Transformer (`APT`) [(Zhou et al 2021)](https://www.aclweb.org/anthology/2021.naacl-main.443), checkout `action-pointer` branch \n- `Stack-Transformer` [(Fernandez Astudillo et al 2020)](https://www.aclweb.org/anthology/2020.findings-emnlp.89), checkout `stack-Transformer` branch\n\n## Install Instructions\n\ncreate and activate a virtual environment with python 3.8, for example\n\n```\nconda create -y -p ./cenv_x86 python=3.8\nconda activate ./cenv_x86\n```\n\nor alternatively use `virtualenv` and `pyenv` for python versions. Note that\nall scripts source a `set_environment.sh` script that you can use to activate\nyour virtual environment as above and set environment variables. If not used,\njust create an empty version\n\n```bash\n# or e.g. put inside conda activate ./cenv_x86\ntouch set_environment.sh\n```\n\nThen install the parser package using pip. You will need to manually install\n`torch-scatter` since it is custom built for CUDA. Here we specify the\ncall for `torch 1.13.1` and `cuda 11.7`. See [torch-scatter\nrepository](https://pypi.org/project/torch-scatter/) to find the appropriate\ninstallation instructions. \n\n**For MacOS users** \n\n(Please install the cpu version of torch-scatter; and model training is not fully supported here.)\n\n```bash\npip install transition-neural-parser\n# for linux users\npip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html\n# for cpu installation for MacOS\n# pip install torch-scatter\n```\n\nIf you plan to edit the code, clone and install instead\n\n```bash\n# clone this repo (see link above), then\ncd transition-neural-parser\npip install --editable .\npip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html\n```\n\nIf you want to train a document-level AMR parser you will also need \n\n```bash\ngit clone https://github.com/IBM/docAMR.git\ncd docAMR\npip install .\ncd ..\n```\n\n## Parse with a pretrained model\n\nHere is an example of how to download and use a pretrained AMR parser in Python\n\n```python\nfrom transition_amr_parser.parse import AMRParser\n\n# Download and save a model named AMR3.0 to cache\nparser = AMRParser.from_pretrained('AMR3-structbart-L')\ntokens, positions = parser.tokenize('The girl travels and visits places')\n\n# Use parse_sentence() for single sentences or parse_sentences() for a batch\nannotations, machines = parser.parse_sentence(tokens)\n\n# Print Penman notation\nprint(annotations)\n\n# Print Penman notation without JAMR, with ISI\namr = machines.get_amr()\nprint(amr.to_penman(jamr=False, isi=True))\n\n# Plot the graph (requires matplotlib)\namr.plot()\n\n```\n\nNote that Smatch does not support ISI-type alignments and gives worse results.\nSet `isi=False` to remove them. \n\nYou can also use the command line to run a pretrained model to parse a file:\n\n```bash\namr-parse -c $in_checkpoint -i $input_file -o file.amr\n```\n\nDownload models can invoked with `-m \u003cconfig\u003e` can be used as well.\n\nNote that Smatch does not support ISI and gives worse results. Use `--no-isi`\nto store alignments in `::alignments` meta data. Also use `--jamr` to add JAMR\nannotations in meta-data. Use `--no-isi` to store alignments in `::alignments`\nmeta data. Also use `--jamr` to add JAMR annotations in meta-data.\n\n## Document-level Parsing\n\nThis represents co-reference using `:same-as` edges. To change\nthe representation and merge the co-referent nodes as in the paper, please refer\nto [the DocAMR repo](https://github.com/IBM/docAMR.git)\n\n```python\nfrom transition_amr_parser.parse import AMRParser\n\n# Download and save the docamr model to cache\nparser = AMRParser.from_pretrained('doc-sen-conll-amr-seed42')\n\n# Sentences in the doc\ndoc = [\"Hailey likes to travel.\" ,\"She is going to London tomorrow.\", \"She will walk to Big Ben when she goes to London.\"]\n\n# tokenize sentences if not already tokenized\ntok_sentences = []\nfor sen in doc:\n    tokens, positions = parser.tokenize(sen)\n    tok_sentences.append(tokens)\n\n# parse docs takes a list of docs as input\nannotations, machines = parser.parse_docs([tok_sentences])\n\n# Print Penman notation\nprint(annotations[0])\n\n# Print Penman notation without JAMR, with ISI\namr = machines[0].get_amr()\nprint(amr.to_penman(jamr=False, isi=True))\n\n# Plot the graph (requires matplotlib)\namr.plot()\n\n```\n\nTo parse a document from the command line the input file `$doc_input_file` is a\ntext file where each line is a sentence in the document and there is a newline\n('\\n') separating every doc (even at the end) \n\n\n```bash\namr-parse -c $in_checkpoint --in-doc $doc_input_file -o file.docamr\n```\n\n\n## Available Pretrained Model Checkpoints\n\nThe models downloaded using `from_pretrained()` will be stored to the pytorch\ncache folder under:\n```python\ncache_dir = torch.hub._get_torch_home()\n```\n\nThis table shows you available pretrained model names to download;\n\n| pretrained model name      | corresponding file name                               | paper                                                           | beam10-Smatch |\n|:--------------------------:|:-----------------------------------------------------:|:---------------------------------------------------------------:|:-------------:|\n| AMR3-structbart-L-smpl     | amr3.0-structured-bart-large-neur-al-sampling5-seed42 | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) PR     | 82.9 (beam1)  |\n| AMR3-structbart-L          | amr3.0-structured-bart-large-neur-al-seed42           | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) MAP    | 82.6          |\n| AMR2-structbart-L          | amr2.0-structured-bart-large-neur-al-seed42           | [(Drozdov et al 2022)](https://arxiv.org/abs/2205.01464) MAP    | 84.0          |\n| AMR2-joint-ontowiki-seed42 | amr2joint_ontowiki2_g2g-structured-bart-large-seed42  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |\n| AMR2-joint-ontowiki-seed43 | amr2joint_ontowiki2_g2g-structured-bart-large-seed43  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |\n| AMR2-joint-ontowiki-seed44 | amr2joint_ontowiki2_g2g-structured-bart-large-seed44  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 85.9          |\n| AMR3-joint-ontowiki-seed42 | amr3joint_ontowiki2_g2g-structured-bart-large-seed42  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |\n| AMR3-joint-ontowiki-seed43 | amr3joint_ontowiki2_g2g-structured-bart-large-seed43  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |\n| AMR3-joint-ontowiki-seed44 | amr3joint_ontowiki2_g2g-structured-bart-large-seed44  | [(Lee et al 2022)](https://arxiv.org/abs/2112.07790) (ensemble) | 84.4          |\n| doc-sen-conll-amr-seed42   | both_doc+sen_trainsliding_ws400x100-seed42            |                                                                 | 82.3\u003csup\u003e1\u003c/sup\u003e/71.8 \u003csup\u003e2\u003c/sup\u003e|              |\n\n\u003csup\u003e1 Smatch on AMR3.0 sentences\u003c/sup\u003e\n\n\u003csup\u003e2 Smatch on AMR3.0 Multi-Sentence dataset \u003c/sup\u003e\n\ncontact authors to obtain the trained `ibm-neural-aligner`. For the\nensemble we provide the three seeds. Following fairseq conventions, to run the\nensemble just give the three checkpoint paths joined by `:` to the normal\ncheckpoint argument `-c`. Note that the checkpoints were trained with the\n`v0.5.1` tokenizer, this reduces performance by `0.1` on `v0.5.2` tokenized\ndata.\n\nNote that we allways report average of three seeds in papers while these are\nindividual models. A fast way to test models standalone is\n\n    bash tests/standalone.sh configs/\u003cconfig\u003e.sh\n\n## Training a model\n\nYou first need to pre-process and align the data. For AMR2.0 do\n\n```bash\nconda activate ./cenv_x86 # activate parser environment\npython scripts/merge_files.py /path/to/LDC2017T10/data/amrs/split/ DATA/AMR2.0/corpora/\n```\n\nYou will also need to unzip the precomputed BLINK cache. See issues in this repository to get the cache file (or the link above for IBM-ers).\n\n```\nunzip /path/to/linkcache.zip\n```\n\nTo launch train/test use (this will also run the aligner)\n\n```\nbash run/run_experiment.sh configs/amr2.0-structured-bart-large.sh\n```\n\nTraining will store and evaluate all checkpoints by default (see config's\n`EVAL_INIT_EPOCH`) and select the one with best dev Smatch. This needs a lot of\nspace but you can launch a parallel job that will perform evaluation and delete\nCheckpoints not in the top `5` \n\n```\nbash run/run_model_eval.sh configs/amr2.0-structured-bart-large.sh\n```\n\nyou can check training status with\n\n```\npython run/status.py -c configs/amr2.0-structured-bart-large.sh\n```\n\nuse `--results` to check for scores once models are finished.\n\nWe include code to launch parallel jobs in the LSF job schedules. This can be\nadapted for other schedulers e.g. Slurm, see [here](run/lsf/README.md)\n\n## Initialize with WatBART\n\nTo load WatBART instead of BART just uncomment and provide the path on\n\n```\ninitialize_with_watbart=/path/to/checkpoint_best.pt\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIBM%2Ftransition-amr-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIBM%2Ftransition-amr-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIBM%2Ftransition-amr-parser/lists"}