https://github.com/grig-guz/tree-content-structuring

Content structuring for NLG with discourse dependency trees.
https://github.com/grig-guz/tree-content-structuring

content-structuring nlg nlg-dataset rst

Last synced: 7 months ago
JSON representation

Content structuring for NLG with discourse dependency trees.

Host: GitHub
URL: https://github.com/grig-guz/tree-content-structuring
Owner: grig-guz
Created: 2020-11-30T00:14:35.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-11-30T00:18:04.000Z (almost 5 years ago)
Last Synced: 2025-01-20T10:34:22.818Z (9 months ago)
Topics: content-structuring, nlg, nlg-dataset, rst
Language: Python
Homepage:
Size: 37.1 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Domain-Independent Neural Text Structuring
This is the code for our [paper](https://www.aclweb.org/anthology/2020.findings-emnlp.281/) on text structuring with silver-standard discourse trees from [MEGA-DT treebank](https://www.cs.ubc.ca/cs-research/lci/research-groups/natural-language-processing/mega_dt.html).
## Requirements

* Python (3.6+)
* [Pytorch](https://pytorch.org/) (1.3.0+)
* [dgl](https://www.dgl.ai/) (0.4.2 strictly)
* [Transformers](https://huggingface.co/transformers/) (3.0.2)

## Running experiments
1. Create the folder named "data".
2. Download the pickled versions of MEGA-DT [here](https://www.todo) (100k train, 250k train, 5k val, 15k test), and place it in the "data" folder.
3. Run the train/testing script as described below. Each scripts accepts a single numeric (1 or 2) indicating whether the model should be trained on 100k or 250k version of MEGA-DT.

#### Dependency Model
To train/evaluate the dependency model,
```bash
bash scripts/train_dep.sh dataset_id
bash scripts/eval_dep.sh dataset_id
```
#### Pointer Model
```bash
bash scripts/train_pointer.sh dataset_id
bash scripts/eval_pointer.sh dataset_id
```
#### Dependency no-pointer Baseline
```bash
bash scripts/train_dep_treetrain_baseline.sh dataset_id
bash scripts/eval_dep_treetrain_baseline.sh dataset_id
```
#### Language Model Decoding Baseline
```bash
bash scripts/eval_lm_baseline.sh
```

## Configuration
You can set hyperparameters and device type in the training/testing scripts for each model individually. The parameter values used in our experiments are already specified there.

## Citation
```
@inproceedings{guz-carenini-2020-towards,
title = "Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks",
author = "Guz, Grigorii and
Carenini, Giuseppe",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.281",
pages = "3141--3152",
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/grig-guz/tree-content-structuring

Awesome Lists containing this project

README