https://github.com/kostyaev/sentence2vec
Deep sentence embedding using Sequence to Sequence learning
https://github.com/kostyaev/sentence2vec
cuda sentence2vec seq2seq torch
Last synced: about 1 year ago
JSON representation
Deep sentence embedding using Sequence to Sequence learning
- Host: GitHub
- URL: https://github.com/kostyaev/sentence2vec
- Owner: kostyaev
- License: mit
- Created: 2016-05-26T14:42:23.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2017-01-04T10:11:07.000Z (over 9 years ago)
- Last Synced: 2024-10-28T09:53:43.560Z (over 1 year ago)
- Topics: cuda, sentence2vec, seq2seq, torch
- Language: Jupyter Notebook
- Homepage:
- Size: 103 KB
- Stars: 22
- Watchers: 3
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-torch - sentence2vec
README
# Deep sentence embedding using Sequence to Sequence learning

## Installing
1. [Install Torch](http://torch.ch/docs/getting-started.html).
2. Install the following additional Lua libs:
```sh
luarocks install nn
luarocks install rnn
luarocks install penlight
```
To train with CUDA install the latest CUDA drivers, toolkit and run:
```sh
luarocks install cutorch
luarocks install cunn
```
To train with opencl install the lastest Opencl torch lib:
```sh
luarocks install cltorch
luarocks install clnn
```
3. Download the [Cornell Movie-Dialogs Corpus](http://www.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html) and extract all the files into data/cornell_movie_dialogs.
## Training
```sh
th train.lua [-h / options]
```
Use the `--dataset NUMBER` option to control the size of the dataset. Training on the full dataset takes about 5h for a single epoch.
The model will be saved to `data/model.t7` after each epoch if it has improved (error decreased).
## Getting a pretrained model
Download:
1. The pretraned [model.t7](https://drive.google.com/file/d/0BwsDa5L6bdMpTC1GUEtPbWE2Zms/view?usp=sharing)
2. Vocabulary [vocab.t7](https://drive.google.com/file/d/0BwsDa5L6bdMpQV9zOTRhZlNPWG8/view?usp=sharing)
Put them into the `data` directory.
## Extracting embeddings from sentences
Run the following command
```sh
th -i extract_embeddings.lua --model_file data/model.t7 --input_file data/test_sentences.txt --output_file data/embeddings.t7 --cuda
```
To visualize 2D projections of the embeddings refer to: [example.ipynb](https://github.com/kostyaev/sentence2vec/blob/master/example.ipynb)
## Acknowledgments
This implementation utilizes code from [Marc-André Cournoyer's repo](https://github.com/macournoyer/neuralconvo)
## License
MIT License