Projects in Awesome Lists tagged with doc2vec
A curated list of projects in awesome lists tagged with doc2vec .
https://github.com/sudharsan13296/hands-on-deep-learning-algorithms-with-python
Master Deep Learning Algorithms with Extensive Math by Implementing them using TensorFlow
adagrad autoencoder capsule-network cnn-math contractive-autonencoders cyclegan deep-learning-math deep-learning-mathematics deep-learning-scratch doc2vec few-shot-learning gans gru lstm-math nadam quick-thought rnn-derivation skip-thoughts tensorflow word-embeddings
Last synced: 13 Sep 2025
https://github.com/ibrahimsharaf/doc2vec
:notebook: Long(er) text representation and classification using Doc2Vec embeddings
doc2vec gensim nlp-machine-learning scikit-learn sentiment-analysis text-classification
Last synced: 09 Mar 2026
https://github.com/thiswillbeyourgithub/anna_anki_neuronal_appendix
Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity
ai anki bert clustering doc2vec embedding flashcards kmeans latent machinelearning neighbourhood nlp pca sbert scheduler sementics sentence-embeddings umap
Last synced: 10 Apr 2025
https://github.com/bnosac/doc2vec
Distributed Representations of Sentences and Documents
doc2vec embeddings natural-language-processing paragraph2vec r-package word2vec
Last synced: 29 Apr 2025
https://github.com/nphdang/GE-FSG
Graph Embedding via Frequent Subgraphs
deep-learning doc2vec frequent-subgraphs graph graph-classification graph-embedding graph-representation-learning machine-learning word2vec
Last synced: 27 Mar 2025
https://github.com/lokicui/doc2vec-golang
doc2vec , word2vec, implemented by golang. word embedding representation
doc2vec doc2vec-golang golang word2vec
Last synced: 16 Jan 2026
https://github.com/dsxiangli/embedding
Embedding模型代码和学习笔记总结
bookcorpus cnn-lstm doc2vec embedding encoder-decoder fasttext genism hierarchical-softmax negative-sampling quick-thought seq2seq skip-thoughts tf-estimator transformer word2vec
Last synced: 24 Apr 2025
https://github.com/miteshputhran/document_classification
Python code for classification of documents into different classes using machine learning
doc2vec docfileanalysis docs document-classification jupyter-notebook machine-learning naive-bayes-classifier pdf python random-forest supervised-learning text-classification textfile word2vec xgboost
Last synced: 14 Apr 2025
https://github.com/andrewtavis/wikirec
Recommendation engine framework based on Wikipedia data
bert bert-embeddings books doc2vec lda machine-learning multilingual natural-language-processing neural-network nlp open-source python python3 recommendation-engine recommender-system text-mining tfidf unsupervised-learning wikipedia wikipedia-data
Last synced: 05 Jul 2025
https://github.com/ihabbendidi/sentiment_embeddings
A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets
3d-visualization benchmark bert colab doc2vec embedding-evaluation keras logistic-regression lstm nlp notebook python pytorch sentiment-analysis sentiment-embeddings textblob twitter-data visualization word2vec
Last synced: 05 Oct 2025
https://github.com/papachristoumarios/sade
Code for paper: Software clusterings with vector semantics and the call graph
c cflow cscout doc2vec layering layering-violations natural-language-processing refactoring word-embeddings
Last synced: 12 Mar 2026
https://github.com/ryogrid/anime-illust-image-searcher
Anime Style Illustration Specific Image Search App with ViT Tagger x BM25/Doc2Vec
anime anime-style bm25 deep-learning doc2vec gensim illustration image-search machine-learning onnxruntime python pytorch search-engine streamlit transformer vector-search vision-transformer
Last synced: 22 Apr 2025
https://github.com/moindalvs/resume_screening_and_parser
Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents
data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data
Last synced: 23 Apr 2025
https://github.com/shreyansh26/revopid
Review Opinion Diversification - Shared task in IJCNLP 2017, Taipei, Taiwan
clustering doc2vec infersent liu mining-opinion-features review-opinion-diversification
Last synced: 03 Oct 2025
https://github.com/atsukoba/LabelEstimator
Simple Unsupervised Document Labeling with MeCab and Pretrained Doc2Vec Model and some experiments about `Doc2Vec.infer_vector()`
Last synced: 29 Apr 2025
https://github.com/howardyclo/learn-to-highlight-movies
Highlight Stephen Chow's famous movies using bullet-screen comments, document vector and neural network.
doc2vec doc2vec-word2vec highlight-movies keras solr supervised-learning text-classification word2vec
Last synced: 28 Apr 2025
https://github.com/natserract/natserract-ai
Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/
chatbot doc2vec gpt4 langchain natserract openai pgvector qatool
Last synced: 17 Aug 2025
https://github.com/kozistr/movie-rate-prediction
Movie Rate Prediction with Tensorflow
char2vec doc2vec gensim korean-nlp mecab-ko movie-review-classifier naver soynlp tensorflow text-classification textcnn textrnn word2vec
Last synced: 20 Jul 2025
https://github.com/markedg/wfmu-universe
Graph to show the relationships of WFMU DJs and playlsts
Last synced: 18 Oct 2025
https://github.com/singhmnprt01/nlp-and-pytorch
NLP use cases using popular solutions: Frequency Embeddings, Word embedding (word2vec, doc2vec, Glove), RNN,LSTM, Transformers-BERT, Sentence_Transformers etc. PyTorch
doc2vec nlp-machine-learning sentence-transformers tfidf word2vec
Last synced: 22 Feb 2026
https://github.com/joaquimgomez/bachelorsthesis-textsimilaritymeasures
Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.
bert cosine-similarity doc2vec elmo embeddings fasttext glove normalized-relative-compression soft-cosine-similarity text-similarity thesis-project word-mover-distance word2vec
Last synced: 07 Aug 2025
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Jun 2025
https://github.com/vsoch/arxiv-equations
looking for patterns in equation use in arxiv papers
arxiv doc2vec equations word2vec
Last synced: 22 Mar 2025
https://github.com/bluella/text-clusterization-overview
This project is created to test different text vectorization techniques in order to perform further clusterization..
doc2vec gensim matplotlib mds nltk pca python3 sklearn text-clustering tfidf
Last synced: 01 Mar 2026
https://github.com/elifftosunn/bert-bank-model
It is a Turkish BERT-based model that will analyze people's bank complaints and classify them according to one of eight categories.
countvectorizer doc2vec f1-score huggingface huggingface-transformer huggingface-transformers nlp nltk python3 scikit-learn stopwords tagged tfidf-transformer train-test-split word-tokenizer wordnetlemmatizer
Last synced: 15 Mar 2025
https://github.com/aarryasutar/hate_speech_detection
This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.
confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud
Last synced: 30 Dec 2025
https://github.com/aditeyabaral/doc2sim
A simple command line utility to find similarity in content between documents using Doc2Vec.
doc2vec gensim machine machine-learning nlp python python3 word2vec
Last synced: 09 Mar 2025
https://github.com/isa1asn/plagiarism-detector
Plagiarism detection for Amharic language text
amharic cosine-similarity doc2vec fastapi gensim nlp numpy plagiarism-detection preprocessing stopwords-removal vector-embeddings
Last synced: 23 Mar 2025
https://github.com/faezeh-gholamrezaie/vectorization-techniques-tutorial
Vectorization Techniques in Natural Language Processing Tutorial for Deep Learning Researchers
bert-embeddings bow cbow doc2vec doc2vec-word2vec fasttext-embeddings glove-embeddings infersent nlp sentence-bert sentence-embeddings skipgram text-analysis text-classification tf-idf universal-sentence-encoder vectorization word2vec
Last synced: 16 Oct 2025
https://github.com/ototot/doc2vecc
GPU accelerated implementation for Doc2VecC
doc2vec doc2vecc word-embedding word-embeddings
Last synced: 26 Feb 2026
https://github.com/papachristoumarios/software-clusterings-with-vector-semantics-and-call-graph
FSE 2019 Paper - Software Clusterings with Vector Semantics and the Call Graph
community-detection doc2vec software-architecture software-engineering vector-semantics
Last synced: 07 Mar 2026
https://github.com/daandouwe/svd-doc2vec
Turn documents into vectors by decomposing a PPMI cooccurence matrix.
Last synced: 20 Mar 2025
https://github.com/aakashjhawar/twitter-sentiment-analysis
Sentiment analysis of tweets to detect negative tweets.
bagofwords data-analysis data-science doc2vec logistic-regression machine-learning nlp-machine-learning nltk regex sentiment-analyser sentiment-analysis support-vector-machine textblob tf-idf-features twitter twitter-sentiment-analysis word2vec xgboost
Last synced: 29 Mar 2025
https://github.com/hailiang-wang/doc2vec
Doc2vec implementation using Gensim & Tensorflow
doc2vec natural-language-processing word2vec
Last synced: 13 Oct 2025
https://github.com/jayfunc/doc2vec_server
A Python server to be invoked doc2vec method, which uses TensorFlow and BERT model.
bert doc2vec python tensorflow
Last synced: 12 Mar 2025
https://github.com/dahsie/spam_classification
Ce fut mon prémier projet NLP où j'ai réalisé la détection de spam en utilisant les algorithmes d'embedding pour encorder mes textes. J'ai utilisé Random Forest et Milti-Layres Perceptrons pour la phase de classification. Ce qui a pemit l'obtension des précisions respective de 97% et 98%. J'ai aussi appris à documenter mes codes via sphinx
doc2vec fasttext-embeddings gensim glove-embeddings python scikit-learn sphinx-doc word2vec-algorithm
Last synced: 24 Jul 2025
https://github.com/erickorsi/mtg-learner
Text classifier with 2 part process for grammar check and vocabulary check.
classification-algorithm doc2vec ensemble lstm machine-learning natual-language-processing neural-network parts-of-speech word2vec
Last synced: 16 Jan 2026
https://github.com/jayfunc/Doc2Vec-Server
A Python server to be invoked doc2vec method, which uses TensorFlow and BERT model.
bert doc2vec python tensorflow
Last synced: 03 Oct 2025