https://github.com/bloomberg/emnlp20_depsrl

Research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing.
https://github.com/bloomberg/emnlp20_depsrl

dependency-parsing emnlp emnlp2020 nlp semantic-role-labeling

Last synced: 5 months ago
JSON representation

Research code and scripts used in the paper Semantic Role Labeling as Syntactic Dependency Parsing.

Host: GitHub
URL: https://github.com/bloomberg/emnlp20_depsrl
Owner: bloomberg
Created: 2020-12-10T22:18:53.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2023-06-12T21:29:54.000Z (over 2 years ago)
Last Synced: 2025-04-19T23:31:57.837Z (6 months ago)
Topics: dependency-parsing, emnlp, emnlp2020, nlp, semantic-role-labeling
Language: Python
Homepage:
Size: 38.1 KB
Stars: 14
Watchers: 3
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSES/Apache_2.0_License

Awesome Lists containing this project

README

          # srl-dep

**Author:** Tianze Shi

## About this repo

This repo contains the research code and scripts used in the paper [Semantic Role Labeling as Syntactic Dependency Parsing](https://arxiv.org/abs/2010.11170). This README file aims at giving basic overviews of the code structure and its major components. For more questions, please directly contact the authors.

## Code structure

The entrance point to the package is [here](srl/parser.py). Calls to this package can be chained through fire CLI. The order of calling should usually be `build-vocab`, `create-parser`, `load-embeddings`, `train` and then finally `finish`.

An example inference script is [here](test.py) using `parser.evaluate(data)` after loading in models and embeddings.

For official CoNLL evaluation script, access at [https://www.cs.upc.edu/~srlconll/soft.html](https://www.cs.upc.edu/~srlconll/soft.html). The F1 scores displayed during model training are NOT official F1 scores (though they are usually very close).

### SRL parsing module

The major parsing module is within the python class `SRLDepParser` inside [this file](srl/modules.py). Back-and-forth conversion algorithms tuned on OntoNotes 5.0 data are contained in [this file](srl/conversion.py).

### Pre-trained word embeddings

To speed up loading time, we can process the embedding files to trim down to only the vocabulary seen in our data. Script for trimming is [here](srl/filter_embeddings.py).

### Data preparation

Data preparation scripts lie under `data_prep` folder.

Prerequisite: [Stanford CoreNLP with English and Chinese models v3.9.2](https://stanfordnlp.github.io/CoreNLP/history.html)

1. Follow http://cemantix.org/data/ontonotes.html to prepare data, using v12 data release

2. For Chinese data (http://conll.cemantix.org/2012/data.html), copy folders under the correct splits

3. Use train dev and conll-2012-test splits for English, train dev, test for Chinese

4. Run `aggregate.sh`

5. Run `space_to_tab.sh`

6. Run `constituency_tree.sh`

7. Run `english_dep_tree.sh` and `english_fuse.py` for English data preparation

8. Run `chinese_dep_tree.sh` and `chinese_fuse.py` for Chinese data preparation

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bloomberg/emnlp20_depsrl

Awesome Lists containing this project

README