https://github.com/borealisai/cross_domain_coherence
A Cross-Domain Transferable Neural Coherence Model https://arxiv.org/abs/1905.11912
https://github.com/borealisai/cross_domain_coherence
Last synced: 10 months ago
JSON representation
A Cross-Domain Transferable Neural Coherence Model https://arxiv.org/abs/1905.11912
- Host: GitHub
- URL: https://github.com/borealisai/cross_domain_coherence
- Owner: BorealisAI
- License: other
- Created: 2019-06-14T21:55:37.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-07-08T15:19:08.000Z (almost 6 years ago)
- Last Synced: 2025-07-09T03:07:28.997Z (11 months ago)
- Language: Python
- Size: 32.2 KB
- Stars: 24
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Cross-Domain Coherence Modeling
A Cross-Domain Transferable Neural Coherence Model
Paper published in ACL 2019: [arxiv.org/abs/1905.11912](https://arxiv.org/abs/1905.11912)
This implementation is based on PyTorch 0.4.1.
### Dataset
To download the dataset:
```
python prepare_data.py
```
which includes WikiCoherence dataset we construct, 300-dim GloVe embeddings and pre-trained Infersent model.
For WikiCoherence, it contains:
- 7 categories under **Person**
- Artist
- Athlete
- Politician
- Writer
- MilitaryPerson
- OfficeHolder
- Scientist
- 3 categories from different irrelevant domains:
- Plant
- EducationalInstitution
- CelestialBody
- parsed\_wsj: original split for Wall Street Journal (WSJ)
- parsed\_random: randomly split all paragraphs of the seven categories under **Person** into training part and testing part
Check `config.py` for the data\_name for each setting.
### Preprocessing
Premute the original documents or paragraphs to obtain the negative samples for evaluation:
```
python preprocess.py --data_name
```
### LM Pre-training
Train the LM with the following command:
```
python train_lm.py --data_name
python train_lm.py --data_name --reverse
```
The pre-trained models will be saved in `./checkpoint`.
### Training and Evaluation
To evaluate our proposed models:
```
python run_bigram_coherence.py --data_name --sent_encoder [--bidirectional]
```
where `sent_encoder` can be average\_glove, infersent or lm\_hidden.
```
python eval.py --data_name --sent_encoder [--bidirectional]
```
Run the above script will run the experiment multiple times and report the mean and std statistics.
The log will be saved in `./log`.
### Cite
If you found this codebase or our work useful, please cite:
```
@InProceedings{xu2019cross,
author = {Xu, Peng and Saghir, Hamidreza and Kang, Jin Sung and Long, Teng and Bose, Avishek Joey and Cao, Yanshuai and Cheung, Jackie Chi Kit},
title = {A Cross-Domain Transferable Neural Coherence Model}
booktitle = {The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
month = {July},
year = {2019},
publisher = {ACL}
}
```