https://github.com/bloomberg/emnlp20_depsrl
Research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing.
https://github.com/bloomberg/emnlp20_depsrl
dependency-parsing emnlp emnlp2020 nlp semantic-role-labeling
Last synced: about 1 month ago
JSON representation
Research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing.
- Host: GitHub
- URL: https://github.com/bloomberg/emnlp20_depsrl
- Owner: bloomberg
- Created: 2020-12-10T22:18:53.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-06-12T21:29:54.000Z (almost 2 years ago)
- Last Synced: 2025-04-19T23:31:57.837Z (about 2 months ago)
- Topics: dependency-parsing, emnlp, emnlp2020, nlp, semantic-role-labeling
- Language: Python
- Homepage:
- Size: 38.1 KB
- Stars: 14
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSES/Apache_2.0_License
Awesome Lists containing this project
README
# srl-dep
**Author:** Tianze Shi
## About this repo
This repo contains the research code and scripts used in the paper [Semantic Role Labeling as Syntactic Dependency Parsing](https://arxiv.org/abs/2010.11170). This README file aims at giving basic overviews of the code structure and its major components. For more questions, please directly contact the authors.
## Code structure
The entrance point to the package is [here](srl/parser.py). Calls to this package can be chained through fire CLI. The order of calling should usually be `build-vocab`, `create-parser`, `load-embeddings`, `train` and then finally `finish`.
An example inference script is [here](test.py) using `parser.evaluate(data)` after loading in models and embeddings.For official CoNLL evaluation script, access at [https://www.cs.upc.edu/~srlconll/soft.html](https://www.cs.upc.edu/~srlconll/soft.html). The F1 scores displayed during model training are NOT official F1 scores (though they are usually very close).
### SRL parsing module
The major parsing module is within the python class `SRLDepParser` inside [this file](srl/modules.py). Back-and-forth conversion algorithms tuned on OntoNotes 5.0 data are contained in [this file](srl/conversion.py).
### Pre-trained word embeddings
To speed up loading time, we can process the embedding files to trim down to only the vocabulary seen in our data. Script for trimming is [here](srl/filter_embeddings.py).
### Data preparation
Data preparation scripts lie under `data_prep` folder.
Prerequisite: [Stanford CoreNLP with English and Chinese models v3.9.2](https://stanfordnlp.github.io/CoreNLP/history.html)
1. Follow http://cemantix.org/data/ontonotes.html to prepare data, using v12 data release
2. For Chinese data (http://conll.cemantix.org/2012/data.html), copy folders under the correct splits
3. Use train dev and conll-2012-test splits for English, train dev, test for Chinese
4. Run `aggregate.sh`
5. Run `space_to_tab.sh`
6. Run `constituency_tree.sh`
7. Run `english_dep_tree.sh` and `english_fuse.py` for English data preparation
8. Run `chinese_dep_tree.sh` and `chinese_fuse.py` for Chinese data preparation