Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kostyaev/sentence2vec
Deep sentence embedding using Sequence to Sequence learning
https://github.com/kostyaev/sentence2vec
cuda sentence2vec seq2seq torch
Last synced: 3 months ago
JSON representation
Deep sentence embedding using Sequence to Sequence learning
- Host: GitHub
- URL: https://github.com/kostyaev/sentence2vec
- Owner: kostyaev
- License: mit
- Created: 2016-05-26T14:42:23.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-01-04T10:11:07.000Z (about 8 years ago)
- Last Synced: 2024-07-31T05:10:38.379Z (6 months ago)
- Topics: cuda, sentence2vec, seq2seq, torch
- Language: Jupyter Notebook
- Homepage:
- Size: 103 KB
- Stars: 22
- Watchers: 3
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep sentence embedding using Sequence to Sequence learning
![screenshot](images/2d_pca_projection.png)
## Installing
1. [Install Torch](http://torch.ch/docs/getting-started.html).
2. Install the following additional Lua libs:```sh
luarocks install nn
luarocks install rnn
luarocks install penlight
```
To train with CUDA install the latest CUDA drivers, toolkit and run:```sh
luarocks install cutorch
luarocks install cunn
```
To train with opencl install the lastest Opencl torch lib:```sh
luarocks install cltorch
luarocks install clnn
```3. Download the [Cornell Movie-Dialogs Corpus](http://www.mpi-sws.org/~cristian/Cornell_Movie-Dialogs_Corpus.html) and extract all the files into data/cornell_movie_dialogs.
## Training
```sh
th train.lua [-h / options]
```Use the `--dataset NUMBER` option to control the size of the dataset. Training on the full dataset takes about 5h for a single epoch.
The model will be saved to `data/model.t7` after each epoch if it has improved (error decreased).
## Getting a pretrained model
Download:1. The pretraned [model.t7](https://drive.google.com/file/d/0BwsDa5L6bdMpTC1GUEtPbWE2Zms/view?usp=sharing)
2. Vocabulary [vocab.t7](https://drive.google.com/file/d/0BwsDa5L6bdMpQV9zOTRhZlNPWG8/view?usp=sharing)Put them into the `data` directory.
## Extracting embeddings from sentences
Run the following command
```sh
th -i extract_embeddings.lua --model_file data/model.t7 --input_file data/test_sentences.txt --output_file data/embeddings.t7 --cuda
```To visualize 2D projections of the embeddings refer to: [example.ipynb](https://github.com/kostyaev/sentence2vec/blob/master/example.ipynb)
## Acknowledgments
This implementation utilizes code from [Marc-André Cournoyer's repo](https://github.com/macournoyer/neuralconvo)## License
MIT License