Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hangyav/biadapt
https://github.com/hangyav/biadapt
Last synced: about 3 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/hangyav/biadapt
- Owner: hangyav
- License: apache-2.0
- Created: 2018-05-01T14:52:24.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-07-31T12:31:46.000Z (over 6 years ago)
- Last Synced: 2024-11-12T17:50:22.041Z (2 months ago)
- Language: Jupyter Notebook
- Size: 431 KB
- Stars: 1
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Methods for Domain Adaptation of Bilingual Tasks
This repository contains implementation of the work *[Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable](http://aclweb.org/anthology/P18-1075)*.
We use off-the-shelf systems for downstream tasks. The modified code can also be found in the repository.## Cite
```
@InProceedings{P18-1075,
author = "Hangya, Viktor
and Braune, Fabienne
and Fraser, Alexander
and Sch{\"u}tze, Hinrich",
title = "Two Methods for Domain Adaptation of Bilingual Tasks: Delightfully Simple and Broadly Applicable",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "810--820",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1075"
}```
## Requirements
* Python 3.5
* dependencies in requirements.txt```sh
pip install -r requirements.txt
```## Cross Lingual Sentiment Classification
### Target-ignorant system
* Follow the procedure at section _Semi-supervised_ below
* Set visit and walker weights to 0.0### Target-aware system
* As target-aware system we used the method of [(Zhang et al., 2016)](https://github.com/SUTDNLP/NNTargetedSentiment)
* To convert data to iob format use the script: *scripts/to_iob.py*### Semi-supervised
* For the semi-supervised system for sentiment we modified the original implementation of [(Haeusser et al. 2017)](https://github.com/haeusser/learning_by_association)
* We added the implementation of [(Kim (2014)’s CNN-non-static)](https://github.com/yoonkim/CNN_sentence)
* An example script demonstrating the use of the system: *scripts/run_semisup_sentiment.sh*### Data
* Domain specific data: We provide the tweet IDs for the 22M_tweets dataset, run:
```
wget http://www.cis.uni-muenchen.de/~hangyav/data/22M_tweet_ids.tar.bz2 -O - | tar -xj
```* General domain data: [OpenSubtitles](http://opus.nlpl.eu/OpenSubtitles2016.php) parallel corpus
* Bilingual lexicon: BNC included in this repository
* Sentiment data: [RepLab](http://nlp.uned.es/replab2013/)## Bilingual Lexicon Induction
### Cosine similarity
* *scripts/bll_with_threshold.py*: also use for fine tuning of the threshold on the developement set (use *-h* to get input parameters)
### Classification
As the classifier to perform BLI we used the method introduced by [(Heyman et al., 2017)](http://liir.cs.kuleuven.be/software_pages/bilingual_classifier_eacl.php).
#### Requirements
* A different environment is needed due to the use of Python 2.7 in the original code of the classifier
* An easy way to deal with different environments is [Conda](https://conda.io)
* dependencies in BLI_classifier/requirements.txt```sh
pip install -r BLI_classifier/requirements.txt
```#### Classifier
* To download data, embeddings and lexicon released by (Heyman et al., 2017) run: *scripts/get_eacl_data.sh*
* An example script demonstrating the use of the system: *scripts/run_BLI_classifier.sh*#### Semi-supervised
* An example script demonstrating the use of the system: *scripts/run_BLI_classifier_semisup.sh*
### Data
* Domain specific data and train/dev/test lexicons: [Link](http://liir.cs.kuleuven.be/software_pages/bilingual_classifier_eacl.php) or by running: *scripts/get_eacl_data.sh*
* General domain data: [Europarl (v7)](http://www.statmt.org/europarl)
* Bilingual lexicon: BNC included in this repository