Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Linear95/BinarySentEmb
Code for ACL 2019 oral paper - Learning Compressed Sentence Representations for On-Device Text Processing.
https://github.com/Linear95/BinarySentEmb
Last synced: 2 days ago
JSON representation
Code for ACL 2019 oral paper - Learning Compressed Sentence Representations for On-Device Text Processing.
- Host: GitHub
- URL: https://github.com/Linear95/BinarySentEmb
- Owner: Linear95
- Created: 2019-06-18T17:31:29.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-01T21:11:07.000Z (about 4 years ago)
- Last Synced: 2024-08-02T13:24:22.952Z (3 months ago)
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 44
- Watchers: 6
- Forks: 7
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BinarySentEmb
Code for the ACL 2019 paper: Learning Compressed Sentence Representations for On-Device Text Processing.This repository contains source code necessary to reproduce the results presented in the following paper:
* [*Learning Compressed Sentence Representations for On-Device Text Processing*](https://arxiv.org/pdf/1906.08340.pdf) (ACL 2019)This project is maintained by [Pengyu Cheng](https://linear95.github.io/). Feel free to contact [email protected] for any relevant issues.
## Dependencies
This code is written in python. The dependencies are:
* Python 3.6
* Pytorch>=0.4 (0.4.1 is recommended)
* NLTK>=3## Download pretrained models
First, download [GloVe](https://nlp.stanford.edu/projects/glove/) pretrained word embeddings:
```bash
mkdir dataset/GloVe
curl -Lo dataset/GloVe/glove.840B.300d.zip http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip dataset/GloVe/glove.840B.300d.zip -d dataset/GloVe/
```
Then, follow the instruction of [InferSent](https://github.com/facebookresearch/InferSent) to download pretrain universal sentence encoder:```bash
mkdir encoder
curl -Lo encoder/infersent1.pkl https://dl.fbaipublicfiles.com/infersent/infersent1.pkl
```Futhermore, download our pretrained binary sentence encoder from [here](https://drive.google.com/open?id=12lzqtxQwktywXRc1HsQ36ptHGfGOTcIJ). Make sure the binary encoder is also in the `./encoder/` folder.
## Train a binary encoder
To train a binary sentence encoder, first download `data.py`, `mutils.py`, and `models.py` from [InferSent](https://github.com/facebookresearch/InferSent).Then, run the command:
```bash
python train.py
```## Evaluate the binary encoder on transfer tasks
Following the instruction of [SentEval](https://github.com/facebookresearch/SentEval) to download the sentence embeddings evaluation toolkit and datasets.Download the original InferSent encoder model from [here](https://github.com/facebookresearch/InferSent).
To reproduce results of our pretrained binary sentence encoder, run the command:
```bash
python evaluate.py
```## Citation
Please cite our ACL 2019 paper if you found the code useful.```latex
@article{shen2019learning,
title={Learning Compressed Sentence Representations for On-Device Text Processing},
author={Shen, Dinghan and Cheng, Pengyu and Sundararaman, Dhanasekar and Zhang, Xinyuan and Yang, Qian and Tang, Meng and Celikyilmaz, Asli and Carin, Lawrence},
journal={arXiv preprint arXiv:1906.08340},
year={2019}
}
```