awesome-sentence-embedding
A curated list of pretrained sentence and word embedding models
https://github.com/chaosgen/awesome-sentence-embedding
Last synced: 3 days ago
JSON representation
-
Contextualized Word Embeddings
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer - research/text-to-text-transfer-transformer ) |[T5](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints )|
- CamemBERT: a Tasty French Language Model - |[CamemBERT](https://camembert-model.fr/#download )|
- ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations - |
- Unsupervised Cross-lingual Representation Learning at Scale - R (XLM-RoBERTa)([xlmr.large](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.large.tar.gz), [xlmr.base](https://dl.fbaipublicfiles.com/fairseq/models/xlmr.base.tar.gz))|
- ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - large-16GB](https://drive.google.com/file/d/1PctDAca8517_weYUUBW96OjIPdolbQkd/view?usp=sharing), [ProphetNet-large-160GB](https://drive.google.com/file/d/1_nZcF-bBCQvBBcoPzA1nPZsz-Wo7hzEL/view?usp=sharing))|
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages
- UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training - |
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators - research/electra ) |ELECTRA([ELECTRA-Small](https://storage.googleapis.com/electra-data/electra_small.zip), [ELECTRA-Base](https://storage.googleapis.com/electra-data/electra_base.zip), [ELECTRA-Large](https://storage.googleapis.com/electra-data/electra_large.zip))|
- MPNet: Masked and Permuted Pre-training for Language Understanding - training/MPNet/mpnet.base.tar.gz )|
- ParsBERT: Transformer-based Model for Persian Language Understanding - base-parsbert-uncased )|
- Language Models are Few-Shot Learners - |-|
- InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training - |
- Language Models are Unsupervised Multitask Learners - 2 ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) <br>[Keras](https://github.com/CyberZHG/keras-gpt-2 ) |GPT-2([117M](https://github.com/openai/gpt-2), [124M](https://github.com/openai/gpt-2), [345M](https://github.com/openai/gpt-2), [355M](https://github.com/openai/gpt-2), [774M](https://github.com/openai/gpt-2), [1558M](https://github.com/openai/gpt-2))|
- Learned in Translation: Contextualized Word Vectors
- Universal Language Model Fine-tuning for Text Classification - tuning-a-language-model), [Zoo](https://forums.fast.ai/t/language-model-zoo-gorilla/14623/1))|
- Deep contextualized word representations - tf ) |ELMO([AllenNLP](https://allennlp.org/elmo), [TF-Hub](https://tfhub.dev/google/elmo/2))|
- Efficient Contextualized Representation:Language Model Pruning for Sequence Labeling - Net ) |[LD-Net](https://github.com/LiyuanLucasLiu/LD-Net#language-models )|
- Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation - SCIR/ELMoForManyLangs ) |[ELMo](https://github.com/HIT-SCIR/ELMoForManyLangs#downloads )|
- Direct Output Connection for a High-Rank Language Model - nlp/doc_lm ) |[DOC](https://drive.google.com/open?id=1ug-6ISrXHEGcWTk5KIw8Ojdjuww-i-Ci )|
- Multi-Task Deep Neural Networks for Natural Language Understanding - dnn ) |[MT-DNN](https://github.com/namisan/mt-dnn/blob/master/download.sh )|
- BioBERT: pre-trained biomedical language representation model for biomedical text mining - lab/biobert ) |[BioBERT](https://github.com/naver/biobert-pretrained )|
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding - research/bert ) <br>[Keras](https://github.com/Separius/BERT-keras ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) <br>[MXNet](https://github.com/imgarylai/bert-embedding ) <br>[PaddlePaddle](https://github.com/PaddlePaddle/ERNIE ) <br>[TF](https://github.com/hanxiao/bert-as-service/ ) <br>[Keras](https://github.com/CyberZHG/keras-bert ) |BERT([BERT](https://github.com/google-research/bert#pre-trained-models), [ERNIE](https://github.com/PaddlePaddle/ERNIE), [KoBERT](https://github.com/SKTBrain/KoBERT))|
- Improving Language Understanding by Generative Pre-Training - transformer-lm ) <br>[Keras](https://github.com/Separius/BERT-keras ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) |[GPT](https://github.com/openai/finetune-transformer-lm )|
- Cross-lingual Language Model Pretraining - models )|
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context - xl/tree/master/tf ) <br>[Pytorch](https://github.com/kimiyoung/transformer-xl/tree/master/pytorch ) <br>[Pytorch, TF2.0](https://github.com/huggingface/transformers ) |[Transformer-XL](https://github.com/kimiyoung/transformer-xl/tree/master/tf )|
- Efficient Contextual Representation Learning Without Softmax Layer - C ) |-|
- SciBERT: Pretrained Contextualized Embeddings for Scientific Text - trained-models )|
- Publicly Available Clinical BERT Embeddings
- ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission - r88Q5-sfC993x2Tjt1pu--A900/view )|
- ERNIE: Enhanced Language Representation with Informative Entities - YB-4j1ISNDlk5oZjpPF2El7vn6f )|
- Unified Language Model Pre-training for Natural Language Understanding and Generation - v1 ) |UniLMv1([unilm1-large-cased](https://unilm.blob.core.windows.net/ckpt/unilm1-large-cased.bin), [unilm1-base-cased](https://unilm.blob.core.windows.net/ckpt/unilm1-base-cased.bin))|
- HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization - |
- Pre-Training with Whole Word Masking for Chinese BERT - BERT-wwm ) |[BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm#pytorch%E7%89%88%E6%9C%AC%E8%AF%B7%E4%BD%BF%E7%94%A8-%E7%9A%84pytorch-bert--06%E5%85%B6%E4%BB%96%E7%89%88%E6%9C%AC%E8%AF%B7%E8%87%AA%E8%A1%8C%E8%BD%AC%E6%8D%A2 )|
- XLNet: Generalized Autoregressive Pretraining for Language Understanding - models )|
- ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
- SpanBERT: Improving Pre-training by Representing and Predicting Spans - trained-models )|
- RoBERTa: A Robustly Optimized BERT Pretraining Approach - trained-models )|
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism - LM ) |Megatron-LM([BERT-345M](https://ngc.nvidia.com/catalog/models/nvidia:megatron_bert_345m), [GPT-2-345M](https://ngc.nvidia.com/catalog/models/nvidia:megatron_lm_345m))|
- Subword ELMo - Li/Subword-ELMo/ ) |-|
- Knowledge Enhanced Contextual Word Representations - |
- TinyBERT: Distilling BERT for Natural Language Understanding - |
- MultiFiT: Efficient Multi-lingual Language Model Fine-tuning - waves/ulmfit-multilingual ) |-|
- Extreme Language Model Compression with Optimal Subwords and Shared Projections - |
- MULE: Multimodal Universal Language Embedding - |
- Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks - |
- K-BERT: Enabling Language Representation with Knowledge Graph - |
- UNITER: Learning UNiversal Image-TExt Representations - |
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations - |
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
-
Pooling Methods
- SIF - to-Beat Baseline for Sentence Embeddings](https://openreview.net/pdf?id=SyK00v5xx)
- TF-IDF - -IDF](https://arxiv.org/abs/1710.06524)
- P-norm - Lingual Sentence Representations](https://arxiv.org/abs/1803.01400)
- DisC - of-n-Grams, and LSTMs](https://openreview.net/pdf?id=B1e5ef-C-)
- GEM - Training Sentence Embedding via Orthogonal Basis](https://arxiv.org/abs/1810.00438)
- SWEM - Embedding-Based Modelsand Associated Pooling Mechanisms](https://arxiv.org/abs/1805.09843)
- Efficient Sentence Embedding using Discrete Cosine Transform
- Efficient Sentence Embedding via Semantic Subspace Analysis
-
Encoders
- Distributed Representations of Sentences and Documents - vectors ) <br>[Python](https://github.com/jhlau/doc2vec ) |Doc2Vec|
- Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models - semantic-embedding ) <br>[Pytorch](https://github.com/linxd5/VSE_Pytorch ) |VSE|
- Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books - thoughts ) <br>[TF](https://github.com/tensorflow/models/tree/master/research/skip_thoughts ) <br>[Pytorch, Torch](https://github.com/Cadene/skip-thoughts.torch ) |SkipThought|
- Order-Embeddings of Images and Language - embedding ) |order-embedding|
- Towards Universal Paraphrastic Sentence Embeddings
- From Word Embeddings to Document Distances
- Learning Distributed Representations of Sentences from Unlabelled Data
- Charagram: Embedding Words and Sentences via Character n-grams
- Learning Generic Sentence Representations Using Convolutional Neural Networks
- Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features
- Learning to Generate Reviews and Discovering Sentiment - reviews-discovering-sentiment ) <br>[Pytorch](https://github.com/guillitte/pytorch-sentiment-neuron ) <br>[Pytorch](https://github.com/NVIDIA/sentiment-discovery ) |Sentiment Neuron|
- Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings
- Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
- VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
- Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm
- StarSpace: Embed All The Things!
- DisSent: Learning Sentence Representations from Explicit Discourse Relations
- Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations - nmt-50m ) |para-nmt|
- Dual-Path Convolutional Image-Text Embedding with Instance Loss - Text-Embedding ) |Image-Text-Embedding|
- An efficient framework for learning sentence representations - Thought|
- Universal Sentence Encoder - Hub](https://tfhub.dev/google/universal-sentence-encoder-large/2 )|USE|
- End-Task Oriented Textual Entailment via Deep Explorations of Inter-Sentence Interactions
- Learning general purpose distributed sentence representations via large scale multi-task learning
- Embedding Text in Hyperbolic Spaces - research/hyperbolictext ) |HyperText|
- Representation Learning with Contrastive Predictive Coding - predictive-coding ) |CPC|
- Context Mover’s Distance & Barycenters: Optimal transport of contexts for building representations - mover/context-mover-distance-and-barycenters ) |CMD|
- Learning Universal Sentence Representations with Mean-Max Attention Autoencoder - MaxAAE|
- Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model - Hub](https://tfhub.dev/s?q=universal-sentence-encoder-xling )|USE-xling|
- Improving Sentence Representations with Consensus Maximisation - |Multi-view|
- BioSentVec: creating sentence embeddings for biomedical texts - nlp/BioSentVec ) |BioSentVec|
- Word Mover's Embedding: From Word2Vec to Document Embedding
- A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
- Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond
- No Training Required: Exploring Random Encoders for Sentence Classification
- CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model
- GLOSS: Generative Latent Optimization of Sentence Representations - |GLOSS|
- Multilingual Universal Sentence Encoder - Hub](https://tfhub.dev/google/universal-sentence-encoder-multilingual/1 )|MultilingualUSE|
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - transformers ) |Sentence-BERT|
- SBERT-WK: A Sentence Embedding Method By Dissecting BERT-based Word Models - WK-Sentence-Embedding ) |SBERT-WK|
- DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
- Language-agnostic BERT Sentence Embedding - Hub](https://tfhub.dev/google/LaBSE/1 )|LaBSE|
- On the Sentence Embeddings from Pre-trained Language Models - flow ) |BERT-flow|
- Incremental Domain Adaptation for Neural Machine Translation in Low-Resource Settings - Interactive-Machine-Learning/AraSIF ) |AraSIF|
-
Evaluation
- decaNLP
- SentEval
- GLUE - Task Benchmark and Analysis Platform for Natural Language Understanding](https://arxiv.org/abs/1804.07461)
- Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks
- Word Embeddings Benchmarks
- MLDoc - conf.org/proceedings/lrec2018/pdf/658.pdf)
- LexNET
- wordvectors.net - vecdemo.pdf)
- Evaluation of sentence embeddings in downstream and linguistic probing tasks
- QVEC - 1243)
- Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments
- EQUATE : A Benchmark Evaluation Framework for Quantitative Reasoning in Natural Language Inference
- Evaluating Word Embedding Models: Methods andExperimental Results
- How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
- LINSPECTOR
- Pitfalls in the Evaluation of Sentence Embeddings
- Probing Multilingual Sentence Representations With X-Probe
- jiant
- Exploring Semantic Properties of Sentence Embeddings
-
Misc
- Word Embedding Dimensionality Selection
- Half-Size
- magnitude
- To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
- Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors
- The Pupil Has Become the Master: Teacher-Student Model-BasedWord Embedding Distillation with Ensemble Learning
- Misspelling Oblivious Word Embeddings
- Compressing Word Embeddings via Deep Compositional Code Learning
- - py](https://github.com/dbiir/UER-py)
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
- German BERT
-
Vector Mapping
- Cross-lingual Word Vectors Projection Using CCA - 1049)
- vecmap - learning method for fully unsupervised cross-lingual mappings of word embeddings](https://arxiv.org/abs/1805.06297)
- MUSE
-
Articles
- Comparing Sentence Similarity Methods
- The Current Best of Universal Word Embeddings and Sentence Embeddings
- On sentence representations, pt. 1: what can you fit into a single #$!%@*&% blog post?
- Deep-learning-free Text and Sentence Embedding, Part 1
- Deep-learning-free Text and Sentence Embedding, Part 2
- An Overview of Sentence Embedding Methods
- Word embeddings in 2017: Trends and future directions
- A survey of cross-lingual word embedding models
-
Word Embeddings
- GloVe: Global Vectors for Word Representation - pre-trained-word-vectors )|
- Sparse Overcomplete Word Vector Representations - coding ) |-|
- From Paraphrase Database to Compositional Paraphrase Model and Back - word ) |[PARAGRAM](http://ttic.uchicago.edu/~wieting/paragram-word-demo.zip )|
- Non-distributional Word Vector Representations - distributional ) |[WordFeat](https://github.com/mfaruqui/non-distributional/blob/master/binary-vectors.txt.gz )|
- Joint Learning of Character and Word Embeddings - Xu/CWE ) |-|
- Topical Word Embeddings
- Swivel: Improving Embeddings by Noticing What's Missing - |
- Counter-fitting Word Vectors to Linguistic Constraints - fitting ) |[counter-fitting](http://mi.eng.cam.ac.uk/~nm480/counter-fitted-vectors.txt.zip )(broken)|
- Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec - |
- Siamese CBOW: Optimizing Word Embeddings for Sentence Representations - cbow/src/master/ )|[Siamese CBOW](https://bitbucket.org/TomKenter/siamese-cbow/src/master/ )|
- Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations - trained-vectors )|
- Enriching Word Vectors with Subword Information - vectors.html )|
- Morphological Priors for Probabilistic Neural Word Embeddings - |
- A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks - tokyo.ac.jp/~hassy/publications/arxiv2016jmt/jmt_pre-trained_embeddings.tar.gz )|
- ConceptNet 5.5: An Open Multilingual Graph of General Knowledge - numberbatch ) |[Numberbatch](https://github.com/commonsense/conceptnet-numberbatch#downloads )|
- Offline bilingual word vectors, orthogonal transformations and the inverted softmax - |
- Multimodal Word Distributions - model )|
- Poincaré Embeddings for Learning Hierarchical Representations - embeddings ) |-|
- Context encoders as a simple but powerful extension of word2vec - |
- Semantic Specialisation of Distributional Word Vector Spaces using Monolingual and Cross-Lingual Constraints - repel ) |[Attract-Repel](https://github.com/nmrksic/attract-repel#available-word-vector-spaces )|
- Learning Chinese Word Representations From Glyphs Of Characters - |
- Making Sense of Word Embeddings - lt/sensegram ) |[sensegram](http://ltdata1.informatik.uni-hamburg.de/sensegram/ )|
- Hash Embeddings for Efficient Word Representations - |
- BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages - for-each-language )|
- SPINE: SParse Interpretable Neural Embeddings
- AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP - grams-models-1 )|
- Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components - knowcomp/jwe ) |-|
- Representation Tradeoffs for Hyperbolic Embeddings - MDS](https://github.com/HazyResearch/hyperbolics )|
- Dynamic Meta-Embeddings for Improved Sentence Representations - trained-models )|
- Analogical Reasoning on Chinese Morphological and Semantic Relations - |[ChineseWordVectors](https://github.com/Embedding/Chinese-Word-Vectors )|
- Probabilistic FastText for Multi-Sense Word Embeddings - prob-fasttext ) |[Probabilistic FastText](https://github.com/benathi/multisense-prob-fasttext#3-loading-and-analyzing-pre-trained-models )|
- Incorporating Syntactic and Semantic Information in Word Embeddings using Graph Convolutional Networks
- FRAGE: Frequency-Agnostic Word Representation - Agnostic ) |-|
- Wikipedia2Vec: An Optimized Tool for LearningEmbeddings of Words and Entities from Wikipedia
- Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings - |[ChineseEmbedding](https://ai.tencent.com/ailab/nlp/en/embedding.html )|
- cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information - |
- VCWE: Visual Character-Enhanced Word Embeddings
- Learning Cross-lingual Embeddings from Twitter via Distant Supervision - twitter ) |-|
- An Unsupervised Character-Aware Neural Approach to Word and Context Representation Learning - word-embeddings ) |-|
- ViCo: Word Embeddings from Visual Co-occurrences - give-me-pretrained-vico )|
- Spherical Text Embedding - Text-Embedding ) |-|
- Unsupervised word embeddings capture latent knowledge from materials science literature - |
- WebVectors: A Toolkit for Building Web Interfaces for Vector Semantic Models - |[RusVectōrēs](http://rusvectores.org/en/models/ )|
- Efficient Estimation of Word Representations in Vector Space
- Word Representations via Gaussian Embedding - |
- Unsupervised word embeddings capture latent knowledge from materials science literature - |
- A Probabilistic Model for Learning Multi-Prototype Word Embeddings - |
- Dependency-Based Word Embeddings - based-word-embeddings/ )|
- SensEmbed: Learning Sense Embeddings for Word and Relational Similarity - |[SensEmbed](http://lcl.uniroma1.it/sensembed/sensembed_vectors.gz )|
- Learning Word Meta-Embeddings - |[Meta-Emb](http://cistern.cis.lmu.de/meta-emb/ )(broken)|
- Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics - |
- Dict2vec : Learning Word Embeddings using Lexical Dictionaries - pre-trained-vectors )|
- Unsupervised word embeddings capture latent knowledge from materials science literature - |
-
OOV Handling
- ALaCarte - 1002)
- Mimick - 1010)
- CompactReconstruction - based Compact Reconstruction of Word Embeddings](https://www.aclweb.org/anthology/N19-1353)
Programming Languages
Categories
Sub Categories
Keywords
natural-language-processing
4
word-embeddings
3
embeddings
2
nlp
2
deep-learning
2
fasttext
2
glove
2
semantic-similarity
1
sentence-embeddings
1
sentence-representations
1
representation-learning
1
tensorflow
1
multitask-learning
1
nips-2017
1
pca
1
wordembedding
1
fast
1
gensim
1
machine-learning
1
machine-learning-library
1
memory-efficient
1
python
1
vectors
1
word2vec
1
convolutional-neural-networks
1
lstm
1
multilingual
1
neural-network
1
part-of-speech-tagger
1