https://github.com/borealisai/cross_domain_coherence

A Cross-Domain Transferable Neural Coherence Model https://arxiv.org/abs/1905.11912
https://github.com/borealisai/cross_domain_coherence

Last synced: 10 months ago
JSON representation

A Cross-Domain Transferable Neural Coherence Model https://arxiv.org/abs/1905.11912

Host: GitHub
URL: https://github.com/borealisai/cross_domain_coherence
Owner: BorealisAI
License: other
Created: 2019-06-14T21:55:37.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2020-07-08T15:19:08.000Z (almost 6 years ago)
Last Synced: 2025-07-09T03:07:28.997Z (11 months ago)
Language: Python
Size: 32.2 KB
Stars: 24
Watchers: 4
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Cross-Domain Coherence Modeling

A Cross-Domain Transferable Neural Coherence Model

Paper published in ACL 2019: [arxiv.org/abs/1905.11912](https://arxiv.org/abs/1905.11912)

This implementation is based on PyTorch 0.4.1.

### Dataset

To download the dataset:

```
python prepare_data.py
```

which includes WikiCoherence dataset we construct, 300-dim GloVe embeddings and pre-trained Infersent model.

For WikiCoherence, it contains:

- 7 categories under **Person**
- Artist
- Athlete
- Politician
- Writer
- MilitaryPerson
- OfficeHolder
- Scientist
- 3 categories from different irrelevant domains:
- Plant
- EducationalInstitution
- CelestialBody
- parsed\_wsj: original split for Wall Street Journal (WSJ)
- parsed\_random: randomly split all paragraphs of the seven categories under **Person** into training part and testing part

Check `config.py` for the data\_name for each setting.

### Preprocessing

Premute the original documents or paragraphs to obtain the negative samples for evaluation:

```
python preprocess.py --data_name
```

### LM Pre-training

Train the LM with the following command:

```
python train_lm.py --data_name
python train_lm.py --data_name --reverse
```

The pre-trained models will be saved in `./checkpoint`.

### Training and Evaluation

To evaluate our proposed models:

```
python run_bigram_coherence.py --data_name --sent_encoder [--bidirectional]
```

where `sent_encoder` can be average\_glove, infersent or lm\_hidden.

```
python eval.py --data_name --sent_encoder [--bidirectional]
```

Run the above script will run the experiment multiple times and report the mean and std statistics.
The log will be saved in `./log`.

### Cite

If you found this codebase or our work useful, please cite:

```
@InProceedings{xu2019cross,
author = {Xu, Peng and Saghir, Hamidreza and Kang, Jin Sung and Long, Teng and Bose, Avishek Joey and Cao, Yanshuai and Cheung, Jackie Chi Kit},
title = {A Cross-Domain Transferable Neural Coherence Model}
booktitle = {The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
month = {July},
year = {2019},
publisher = {ACL}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/borealisai/cross_domain_coherence

Awesome Lists containing this project

README