https://github.com/prashantranjan09/wordembeddings-elmo-fasttext-word2vec

Using pre trained word embeddings (Fasttext, Word2Vec)
https://github.com/prashantranjan09/wordembeddings-elmo-fasttext-word2vec

ai2 allennlp classification elmo-8 fair fasttext fasttext-python gensim gensim-word2vec glove glove-embeddings nlp word2vec wordembedding wordembeddings

Last synced: 2 months ago
JSON representation

Using pre trained word embeddings (Fasttext, Word2Vec)

Host: GitHub
URL: https://github.com/prashantranjan09/wordembeddings-elmo-fasttext-word2vec
Owner: PrashantRanjan09
Created: 2018-06-14T18:20:54.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-06-19T12:37:12.000Z (over 7 years ago)
Last Synced: 2024-12-09T16:44:11.937Z (10 months ago)
Topics: ai2, allennlp, classification, elmo-8, fair, fasttext, fasttext-python, gensim, gensim-word2vec, glove, glove-embeddings, nlp, word2vec, wordembedding, wordembeddings
Language: Python
Homepage:
Size: 27.3 KB
Stars: 158
Watchers: 5
Forks: 31
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# WordEmbeddings-ELMo, Fasttext, FastText (Gensim) and Word2Vec

This implementation gives the flexibility of choosing word embeddings on your corpus. One has the option of choosing word Embeddings from ELMo (https://arxiv.org/pdf/1802.05365.pdf) - recently introduced by Allennlp and these word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. Also fastext embeddings (https://arxiv.org/pdf/1712.09405.pdf) published in LREC from Thomas Mikolov and team is available.
ELMo embeddings outperformed the Fastext, Glove and Word2Vec on an average by 2~2.5% on a simple Imdb sentiment classification task (Keras Dataset).

### USAGE:

To run it on the Imdb dataset,

run: python main.py

To run it on your data: comment out line 32-40 and uncomment 41-53

### FILES:
* word_embeddings.py – contains all the functions for embedding and choosing which word embedding model you want to choose.
* config.json – you can mention all your parameters here (embedding dimension, maxlen for padding, etc)
* model_params.json - you can mention all your model parameters here (epochs, batch size etc.)
* main.py – This is the main file. Just use this file to run in terminal.

You have the option of choosing the word vector model

In **config.json** specify “option” as 0 – Word2vec, 1 – Gensim FastText, 2- Fasttext (FAIR), 3- ELMo

The model is very generic. You can change your model as per your requirements.

Feel free to reach out in case you need any help.

Special thanks to Jacob Zweig for the write up: https://towardsdatascience.com/elmo-embeddings-in-keras-with-tensorflow-hub-7eb6f0145440. Its a good 2 min read.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prashantranjan09/wordembeddings-elmo-fasttext-word2vec

Awesome Lists containing this project

README