
An open API service indexing awesome lists of open source software.

A curated list of awesome embedding models tutorials, projects and communities.

List: awesome-embedding-models

awesome embedding-models embeddings machine-learning natural-language-processing papers word2vec

Last synced: about 1 month ago
JSON representation

A curated list of awesome embedding models tutorials, projects and communities.




# awesome-embedding-models[![Awesome](](
A curated list of awesome embedding models tutorials, projects and communities.
Please feel free to pull requests to add links.

## Table of Contents

* **[Papers](#papers)**
* **[Researchers](#researchers)**
* **[Courses and Lectures](#courses-and-lectures)**
* **[Datasets](#datasets)**
* **[Implementations and Tools](#implementations-and-tools)**

## Papers
### Word Embeddings

**Word2vec, GloVe, FastText**

* Efficient Estimation of Word Representations in Vector Space (2013), T. Mikolov et al. [[pdf]](
* Distributed Representations of Words and Phrases and their Compositionality (2013), T. Mikolov et al. [[pdf]](
* word2vec Parameter Learning Explained (2014), Xin Rong [[pdf]](
* word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method (2014), Yoav Goldberg, Omer Levy [[pdf]](
* GloVe: Global Vectors for Word Representation (2014), J. Pennington et al. [[pdf]](
* Improving Word Representations via Global Context and Multiple Word Prototypes (2012), EH Huang et al. [[pdf]](
* Enriching Word Vectors with Subword Information (2016), P. Bojanowski et al. [[pdf]](
* Bag of Tricks for Efficient Text Classification (2016), A. Joulin et al. [[pdf]](

**Language Model**

* Semi-supervised sequence tagging with bidirectional language models (2017), Peters, Matthew E., et al. [[pdf]](
* Deep contextualized word representations (2018), Peters, Matthew E., et al. [[pdf]](
* Contextual String Embeddings for Sequence Labeling (2018), Akbik, Alan, Duncan Blythe, and Roland Vollgraf. [[pdf]](
* BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018), [[pdf]](

**Embedding Enhancement**

* Sentence Embedding:Learning Semantic Sentence Embeddings using Pair-wise Discriminator(2018),Patro et al.[[Project Page]]( [[Paper]](
* Retrofitting Word Vectors to Semantic Lexicons (2014), M. Faruqui et al. [[pdf]](
* Better Word Representations with Recursive Neural Networks for Morphology (2013), T.Luong et al. [[pdf]](
* Dependency-Based Word Embeddings (2014), Omer Levy, Yoav Goldberg [[pdf]](
* Not All Neural Embeddings are Born Equal (2014), F. Hill et al. [[pdf]](
* Two/Too Simple Adaptations of Word2Vec for Syntax Problems (2015), W. Ling[[pdf]](

**Comparing count-based vs predict-based method**

* Linguistic Regularities in Sparse and Explicit Word Representations (2014), Omer Levy, Yoav Goldberg[[pdf]](
* Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors (2014), M. Baroni [[pdf]](
* Improving Distributional Similarity with Lessons Learned from Word Embeddings (2015), Omer Levy [[pdf]](

**Evaluation, Analysis**

* Evaluation methods for unsupervised word embeddings (2015), T. Schnabel [[pdf]](
* Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance (2016), B. Chiu [[pdf]](
* Problems With Evaluation of Word Embeddings Using Word Similarity Tasks (2016), M. Faruqui [[pdf]](
* Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure (2016), Oded Avraham, Yoav Goldberg [[pdf]](
* Evaluating Word Embeddings Using a Representative Suite of Practical Tasks (2016), N. Nayak [[pdf]](

### Phrase, Sentence and Document Embeddings


* [Skip-Thought Vectors](
* [A Simple but Tough-to-Beat Baseline for Sentence Embeddings](
* [An efficient framework for learning sentence representations](
* [Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning](
* [Universal Sentence Encoder](


* [Distributed Representations of Sentences and Documents](

### Sense Embeddings

* [SENSEMBED: Learning Sense Embeddings for Word and Relational Similarity](
* [Multi-Prototype Vector-Space Models of Word Meaning](

### Neural Language Models

* [Recurrent neural network based language model](
* [A Neural Probabilistic Language Model](
* [Linguistic Regularities in Continuous Space Word Representations](

## Researchers

* [Tomas Mikolov](
* [Yoshua Bengio](
* [Yoav Goldberg](
* [Omer Levy](
* [Kai Chen](

## Courses and Lectures

* [CS224d: Deep Learning for Natural Language Processing](
* [Udacity Deep Learning](

## Datasets
### Training

* [Wikipedia](
* [WestburyLab.wikicorp.201004](

### Evaluation

* [SemEval-2012 Task 2](
* [WordSimilarity-353](
* [Stanford's Contextual Word Similarities (SCWS)](
* [Stanford Rare Word (RW) Similarity Dataset](

### Pre-Trained Language Models

Below is pre-trained [ELMo]( models. Adding ELMo to existing NLP systems significantly improves the state-of-the-art for every considered task.

* [ELMo by AllenNLP](
* [ELMo by TensorFlow Hub](

Below is pre-trained [sent2vec]( models.
* [BioSentVec: sent2vec pretrained vector for biomedical text](

### Pre-Trained Word Vectors
Convenient downloader for pre-trained word vectors:
* [chakin](

Links for pre-trained word vectors:
* [Word2vec pretrained vector(English Only)](
* [Word2vec pretrained vectors for 30+ languages](
* [FastText pretrained vectors for 157 languages](
* [FastText pretrained vector for Japanese with NEologd](
* [word vectors trained by GloVe](
* [Dependency-Based Word Embeddings](
* [Meta-Embeddings](
* [Lex-Vec](
* [Huang et al. (2012)'s embeddings (HSMN+csmRNN)](
* [Collobert et al. (2011)'s embeddings (CW+csmRNN)](
* [BPEmb: subword embeddings for 275 languages](
* [Wikipedia2Vec: pretrained word and entity embeddings for 12 languages](
* [word2vec-slim](
* [BioWordVec: fastText pretrained vector for biomedical text](

## Implementations and Tools
### Word2vec

* [Original](
* [gensim](
* [TensorFlow](

### GloVe

* [Original](
* [GloVe as an optimized TensorFlow GPU Layer using chakin](