awesome-sentence-embedding
  
  
    A curated list of pretrained sentence and word embedding models 
    https://github.com/chaosgen/awesome-sentence-embedding
  
        Last synced: 6 days ago 
        JSON representation
    
- 
            
Contextualized Word Embeddings
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - research/text-to-text-transfer-transformer ) |[T5](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints )|
 - CamemBERT: a Tasty French Language Model - |[CamemBERT](https://camembert-model.fr/#download )|
 - ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations - |
 - Unsupervised Cross-lingual Representation Learning at Scale - R (XLM-RoBERTa)([xlmr.large](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.tar.gz), [xlmr.base](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz))|
 - ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - large-16GB](https://drive.google.com/file/d/1PctDAca8517_weYUUBW96OjIPdolbQkd/view?usp=sharing), [ProphetNet-large-160GB](https://drive.google.com/file/d/1_nZcF-bBCQvBBcoPzA1nPZsz-Wo7hzEL/view?usp=sharing))|
 - CodeBERT: A Pre-Trained Model for Programming and Natural Languages
 - UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training - |
 - ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators - research/electra ) |ELECTRA([ELECTRA-Small](https://storage.googleapis.com/electra-data/electra_small.zip), [ELECTRA-Base](https://storage.googleapis.com/electra-data/electra_base.zip), [ELECTRA-Large](https://storage.googleapis.com/electra-data/electra_large.zip))|
 - MPNet: Masked and Permuted Pre-training for Language Understanding - training/MPNet/mpnet.base.tar.gz )|
 - ParsBERT: Transformer-based Model for Persian Language Understanding - base-parsbert-uncased )|
 - Language Models are Few-Shot Learners - |-|
 - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training - |
 - Language Models are Unsupervised Multitask Learners - 2 ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) <br>[Keras](https://github.com/CyberZHG/keras-gpt-2 ) |GPT-2([117M](https://github.com/openai/gpt-2), [124M](https://github.com/openai/gpt-2), [345M](https://github.com/openai/gpt-2), [355M](https://github.com/openai/gpt-2), [774M](https://github.com/openai/gpt-2), [1558M](https://github.com/openai/gpt-2))|
 - Learned in Translation: Contextualized Word Vectors
 - Universal Language Model Fine-tuning for Text Classification - tuning-a-language-model), [Zoo](https://forums.fast.ai/t/language-model-zoo-gorilla/14623/1))|
 - Deep contextualized word representations - tf ) |ELMO([AllenNLP](https://allennlp.org/elmo), [TF-Hub](https://tfhub.dev/google/elmo/2))|
 - Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling - Net ) |[LD-Net](https://github.com/LiyuanLucasLiu/LD-Net#language-models )|
 - Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation - SCIR/ELMoForManyLangs ) |[ELMo](https://github.com/HIT-SCIR/ELMoForManyLangs#downloads )|
 - Direct Output Connection for a High-Rank Language Model - nlp/doc_lm ) |[DOC](https://drive.google.com/open?id=1ug-6ISrXHEGcWTk5KIw8Ojdjuww-i-Ci )|
 - Improving Language Understanding by Generative Pre-Training - transformer-lm ) <br>[Keras](https://github.com/Separius/BERT-keras ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) |[GPT](https://github.com/openai/finetune-transformer-lm )|
 - Multi-Task Deep Neural Networks for Natural Language Understanding - dnn ) |[MT-DNN](https://github.com/namisan/mt-dnn/blob/master/download.sh )|
 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - research/bert ) <br>[Keras](https://github.com/Separius/BERT-keras ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) <br>[MXNet](https://github.com/imgarylai/bert-embedding ) <br>[PaddlePaddle](https://github.com/PaddlePaddle/ERNIE ) <br>[TF](https://github.com/hanxiao/bert-as-service/ ) <br>[Keras](https://github.com/CyberZHG/keras-bert ) |BERT([BERT](https://github.com/google-research/bert#pre-trained-models), [ERNIE](https://github.com/PaddlePaddle/ERNIE), [KoBERT](https://github.com/SKTBrain/KoBERT))|
 - BioBERT: pre-trained biomedical language representation model for biomedical text mining - lab/biobert ) |[BioBERT](https://github.com/naver/biobert-pretrained )|
 - Cross-lingual Language Model Pretraining - models )|
 - SciBERT: Pretrained Contextualized Embeddings for Scientific Text - trained-models )|
 - Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context - xl/tree/master/tf ) <br>[Pytorch](https://github.com/kimiyoung/transformer-xl/tree/master/pytorch ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) |[Transformer-XL](https://github.com/kimiyoung/transformer-xl/tree/master/tf )|
 - Efficient Contextual Representation Learning Without Softmax Layer - C ) |-|
 - Publicly Available Clinical BERT Embeddings
 - ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission - r88Q5-sfC993x2Tjt1pu--A900/view )|
 - ERNIE: Enhanced Language Representation with Informative Entities - YB-4j1ISNDlk5oZjpPF2El7vn6f )|
 - Unified Language Model Pre-training for Natural Language Understanding and Generation - v1 ) |UniLMv1([unilm1-large-cased](https://unilm.blob.core.windows.net/ckpt/unilm1-large-cased.bin), [unilm1-base-cased](https://unilm.blob.core.windows.net/ckpt/unilm1-base-cased.bin))|
 - HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization - |
 - Pre-Training with Whole Word Masking for Chinese BERT - BERT-wwm ) |[BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm#pytorch%E7%89%88%E6%9C%AC%E8%AF%B7%E4%BD%BF%E7%94%A8-%E7%9A%84pytorch-bert--06%E5%85%B6%E4%BB%96%E7%89%88%E6%9C%AC%E8%AF%B7%E8%87%AA%E8%A1%8C%E8%BD%AC%E6%8D%A2 )|
 - XLNet: Generalized Autoregressive Pretraining for Language Understanding - models )|
 - ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
 - TinyBERT: Distilling BERT for Natural Language Understanding - |
 - SpanBERT: Improving Pre-training by Representing and Predicting Spans - trained-models )|
 - RoBERTa: A Robustly Optimized BERT Pretraining Approach - trained-models )|
 - Subword ELMo - Li/Subword-ELMo/ ) |-|
 - Knowledge Enhanced Contextual Word Representations - |
 - Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - LM ) |Megatron-LM([BERT-345M](https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m), [GPT-2-345M](https://ngc.nvidia.com/catalog/models/nvidia:megatron_lm_345m))|
 - MultiFiT: Efficient Multi-lingual Language Model Fine-tuning - waves/ulmfit-multilingual ) |-|
 - Extreme Language Model Compression with Optimal Subwords and Shared Projections - |
 - MULE: Multimodal Universal Language Embedding - |
 - Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks - |
 - K-BERT: Enabling Language Representation with Knowledge Graph - |
 - UNITER: Learning UNiversal Image-TExt Representations - |
 - ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - |
 - BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
 - DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
 
 - 
            
Pooling Methods
- SIF - to-Beat Baseline for Sentence Embeddings](https://openreview.net/pdf?id=SyK00v5xx)
 - TF-IDF - -IDF](https://arxiv.org/abs/1710.06524)
 - P-norm - Lingual Sentence Representations](https://arxiv.org/abs/1803.01400)
 - DisC - of-n-Grams, and LSTMs](https://openreview.net/pdf?id=B1e5ef-C-)
 - GEM - Training Sentence Embedding via Orthogonal Basis](https://arxiv.org/abs/1810.00438)
 - SWEM - Embedding-Based Modelsand Associated Pooling Mechanisms](https://arxiv.org/abs/1805.09843)
 - Efficient Sentence Embedding using Discrete Cosine Transform
 - Efficient Sentence Embedding via Semantic Subspace Analysis
 
 - 
            
Encoders
- Distributed Representations of Sentences and Documents - vectors ) <br>[Python](https://github.com/jhlau/doc2vec ) |Doc2Vec|
 - Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models - semantic-embedding ) <br>[Pytorch](https://github.com/linxd5/VSE_Pytorch ) |VSE|
 - Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books - thoughts ) <br>[TF](https://github.com/tensorflow/models/tree/master/research/skip_thoughts ) <br>[Pytorch, Torch](https://github.com/Cadene/skip-thoughts.torch ) |SkipThought|
 - Order-Embeddings of Images and Language - embedding ) |order-embedding|
 - Towards Universal Paraphrastic Sentence Embeddings
 - From Word Embeddings to Document Distances
 - Learning Distributed Representations of Sentences from Unlabelled Data
 - Charagram: Embedding Words and Sentences via Character n-grams
 - Learning Generic Sentence Representations Using Convolutional Neural Networks
 - Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
 - Learning to Generate Reviews and Discovering Sentiment - reviews-discovering-sentiment ) <br>[Pytorch](https://github.com/guillitte/pytorch-sentiment-neuron ) <br>[Pytorch](https://github.com/NVIDIA/sentiment-discovery ) |Sentiment Neuron|
 - Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
 - Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
 - VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
 - Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
 - StarSpace: Embed All The Things!
 - DisSent: Learning Sentence Representations from Explicit Discourse Relations
 - Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations - nmt-50m ) |para-nmt|
 - Dual-Path Convolutional Image-Text Embedding with Instance Loss - Text-Embedding ) |Image-Text-Embedding|
 - An efficient framework for learning sentence representations - Thought|
 - Universal Sentence Encoder - Hub](https://tfhub.dev/google/universal-sentence-encoder-large/2 )|USE|
 - End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
 - Learning general purpose distributed sentence representations via large scale multi-task learning
 - Embedding Text in Hyperbolic Spaces - research/hyperbolictext ) |HyperText|
 - Representation Learning with Contrastive Predictive Coding - predictive-coding ) |CPC|
 - Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations - mover/context-mover-distance-and-barycenters ) |CMD|
 - Learning Universal Sentence Representations with Mean-Max Attention Autoencoder - MaxAAE|
 - Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model - Hub](https://tfhub.dev/s?q=universal-sentence-encoder-xling )|USE-xling|
 - Improving Sentence Representations with Consensus Maximisation - |Multi-view|
 - BioSentVec: creating sentence embeddings for biomedical texts - nlp/BioSentVec ) |BioSentVec|
 - Word Mover's Embedding: From Word2Vec to Document Embedding
 - A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
 - Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
 - No Training Required: Exploring Random Encoders for Sentence Classification
 - CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model
 - GLOSS: Generative Latent Optimization of Sentence Representations - |GLOSS|
 - Multilingual Universal Sentence Encoder - Hub](https://tfhub.dev/google/universal-sentence-encoder-multilingual/1 )|MultilingualUSE|
 - Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - transformers ) |Sentence-BERT|
 - SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models - WK-Sentence-Embedding ) |SBERT-WK|
 - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
 - Language-agnostic BERT Sentence Embedding - Hub](https://tfhub.dev/google/LaBSE/1 )|LaBSE|
 - On the Sentence Embeddings from Pre-trained Language Models - flow ) |BERT-flow|
 - Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings - Interactive-Machine-Learning/AraSIF ) |AraSIF|
 
 - 
            
Evaluation
- decaNLP
 - SentEval
 - GLUE - Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/abs/1804.07461)
 - Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
 - Word Embeddings Benchmarks
 - MLDoc - conf.org/proceedings/lrec2018/pdf/658.pdf)
 - LexNET
 - wordvectors.net - vecdemo.pdf)
 - Evaluation of sentence embeddings in downstream and linguistic probing tasks
 - QVEC - 1243)
 - Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments
 - EQUATE : A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
 - Evaluating Word Embedding Models: Methods andExperimental Results
 - How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
 - LINSPECTOR
 - Pitfalls in the Evaluation of Sentence Embeddings
 - Probing Multilingual Sentence Representations With X-Probe
 - Exploring Semantic Properties of Sentence Embeddings
 
 - 
            
Misc
- Word Embedding Dimensionality Selection
 - Half-Size
 - magnitude
 - To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
 - Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
 - The Pupil Has Become the Master: Teacher-Student Model-BasedWord Embedding Distillation with Ensemble Learning
 - Misspelling Oblivious Word Embeddings
 - Compressing Word Embeddings via Deep Compositional Code Learning
 - - py](https://github.com/dbiir/UER-py)
 - Improving Distributional Similarity with Lessons Learned from Word Embeddings
 - German BERT
 
 - 
            
Vector Mapping
- Cross-lingual Word Vectors Projection Using CCA - 1049)
 - vecmap - learning method for fully unsupervised cross-lingual mappings of word embeddings](https://arxiv.org/abs/1805.06297)
 - MUSE
 
 - 
            
Articles
- Comparing Sentence Similarity Methods
 - The Current Best of Universal Word Embeddings and Sentence Embeddings
 - On sentence representations, pt. 1: what can you fit into a single #$!%@*&% blog post?
 - Deep-learning-free Text and Sentence Embedding, Part 1
 - Deep-learning-free Text and Sentence Embedding, Part 2
 - An Overview of Sentence Embedding Methods
 - Word embeddings in 2017: Trends and future directions
 - A survey of cross-lingual word embedding models
 
 - 
            
Word Embeddings
- GloVe: Global Vectors for Word Representation - pre-trained-word-vectors )|
 - Sparse Overcomplete Word Vector Representations - coding ) |-|
 - From Paraphrase Database to Compositional Paraphrase Model and Back - word ) |[PARAGRAM](http://ttic.uchicago.edu/~wieting/paragram-word-demo.zip )|
 - Non-distributional Word Vector Representations - distributional ) |[WordFeat](https://github.com/mfaruqui/non-distributional/blob/master/binary-vectors.txt.gz )|
 - Joint Learning of Character and Word Embeddings - Xu/CWE ) |-|
 - Topical Word Embeddings
 - Swivel: Improving Embeddings by Noticing What's Missing - |
 - Counter-fitting Word Vectors to Linguistic Constraints - fitting ) |[counter-fitting](http://mi.eng.cam.ac.uk/~nm480/counter-fitted-vectors.txt.zip )(broken)|
 - Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec - |
 - Siamese CBOW: Optimizing Word Embeddings for Sentence Representations - cbow/src/master/ )|[Siamese CBOW](https://bitbucket.org/TomKenter/siamese-cbow/src/master/ )|
 - Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations - trained-vectors )|
 - Enriching Word Vectors with Subword Information - vectors.html )|
 - Morphological Priors for Probabilistic Neural Word Embeddings - |
 - A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks - tokyo.ac.jp/~hassy/publications/arxiv2016jmt/jmt_pre-trained_embeddings.tar.gz )|
 - ConceptNet 5.5: An Open Multilingual Graph of General Knowledge - numberbatch ) |[Numberbatch](https://github.com/commonsense/conceptnet-numberbatch#downloads )|
 - Offline bilingual word vectors, orthogonal transformations and the inverted softmax - |
 - Multimodal Word Distributions - model )|
 - Poincaré Embeddings for Learning Hierarchical Representations - embeddings ) |-|
 - Context encoders as a simple but powerful extension of word2vec - |
 - Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints - repel ) |[Attract-Repel](https://github.com/nmrksic/attract-repel#available-word-vector-spaces )|
 - Learning Chinese Word Representations From Glyphs Of Characters - |
 - Making Sense of Word Embeddings - lt/sensegram ) |[sensegram](http://ltdata1.informatik.uni-hamburg.de/sensegram/ )|
 - Hash Embeddings for Efficient Word Representations - |
 - BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages - for-each-language )|
 - SPINE: SParse Interpretable Neural Embeddings
 - AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP - grams-models-1 )|
 - Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components - knowcomp/jwe ) |-|
 - Representation Tradeoffs for Hyperbolic Embeddings - MDS](https://github.com/HazyResearch/hyperbolics )|
 - Dynamic Meta-Embeddings for Improved Sentence Representations - trained-models )|
 - Analogical Reasoning on Chinese Morphological and Semantic Relations - |[ChineseWordVectors](https://github.com/Embedding/Chinese-Word-Vectors )|
 - Probabilistic FastText for Multi-Sense Word Embeddings - prob-fasttext ) |[Probabilistic FastText](https://github.com/benathi/multisense-prob-fasttext#3-loading-and-analyzing-pre-trained-models )|
 - Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
 - FRAGE: Frequency-Agnostic Word Representation - Agnostic ) |-|
 - Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia
 - Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings - |[ChineseEmbedding](https://ai.tencent.com/ailab/nlp/en/embedding.html )|
 - cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information - |
 - VCWE: Visual Character-Enhanced Word Embeddings
 - Learning Cross-lingual Embeddings from Twitter via Distant Supervision - twitter ) |-|
 - An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning - word-embeddings ) |-|
 - ViCo: Word Embeddings from Visual Co-occurrences - give-me-pretrained-vico )|
 - Spherical Text Embedding - Text-Embedding ) |-|
 - Unsupervised word embeddings capture latent knowledge from materials science literature - |
 - Efficient Estimation of Word Representations in Vector Space
 - Word Representations via Gaussian Embedding - |
 - Unsupervised word embeddings capture latent knowledge from materials science literature - |
 - A Probabilistic Model for Learning Multi-Prototype Word Embeddings - |
 - Dependency-Based Word Embeddings - based-word-embeddings/ )|
 - SensEmbed: Learning Sense Embeddings for Word and Relational Similarity - |[SensEmbed](http://lcl.uniroma1.it/sensembed/sensembed_vectors.gz )|
 - Learning Word Meta-Embeddings - |[Meta-Emb](http://cistern.cis.lmu.de/meta-emb/ )(broken)|
 - Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics - |
 - Dict2vec : Learning Word Embeddings using Lexical Dictionaries - pre-trained-vectors )|
 - Unsupervised word embeddings capture latent knowledge from materials science literature - |
 - WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models - |[RusVectōrēs](http://rusvectores.org/en/models/ )|
 
 - 
            
OOV Handling
- ALaCarte - 1002)
 - Mimick - 1010)
 - CompactReconstruction - based Compact Reconstruction of Word Embeddings](https://www.aclweb.org/anthology/N19-1353)
 
 
            Programming Languages
          
          
        
            Categories
          
          
        
            Sub Categories
          
          
            Keywords
          
          
              
                natural-language-processing
                4
              
              
                word-embeddings
                3
              
              
                embeddings
                2
              
              
                nlp
                2
              
              
                deep-learning
                2
              
              
                fasttext
                2
              
              
                glove
                2
              
              
                semantic-similarity
                1
              
              
                sentence-embeddings
                1
              
              
                sentence-representations
                1
              
              
                representation-learning
                1
              
              
                tensorflow
                1
              
              
                multitask-learning
                1
              
              
                nips-2017
                1
              
              
                pca
                1
              
              
                wordembedding
                1
              
              
                fast
                1
              
              
                gensim
                1
              
              
                machine-learning
                1
              
              
                machine-learning-library
                1
              
              
                memory-efficient
                1
              
              
                python
                1
              
              
                vectors
                1
              
              
                word2vec
                1
              
              
                convolutional-neural-networks
                1
              
              
                lstm
                1
              
              
                multilingual
                1
              
              
                neural-network
                1
              
              
                part-of-speech-tagger
                1