An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with doc2vec

A curated list of projects in awesome lists tagged with doc2vec .

https://github.com/danielfrg/word2vec

Python interface to Google word2vec

doc2vec python word2vec

Last synced: 20 Oct 2025

https://github.com/ibrahimsharaf/doc2vec

:notebook: Long(er) text representation and classification using Doc2Vec embeddings

doc2vec gensim nlp-machine-learning scikit-learn sentiment-analysis text-classification

Last synced: 09 Mar 2026

https://github.com/thiswillbeyourgithub/anna_anki_neuronal_appendix

Using machine learning on your anki collection to enhance the scheduling via semantic clustering and semantic similarity

ai anki bert clustering doc2vec embedding flashcards kmeans latent machinelearning neighbourhood nlp pca sbert scheduler sementics sentence-embeddings umap

Last synced: 10 Apr 2025

https://github.com/bnosac/doc2vec

Distributed Representations of Sentences and Documents

doc2vec embeddings natural-language-processing paragraph2vec r-package word2vec

Last synced: 29 Apr 2025

https://github.com/lokicui/doc2vec-golang

doc2vec , word2vec, implemented by golang. word embedding representation

doc2vec doc2vec-golang golang word2vec

Last synced: 16 Jan 2026

https://github.com/ihabbendidi/sentiment_embeddings

A scientific benchmark and comparison of the performance of sentiment analysis models in NLP on small to medium datasets

3d-visualization benchmark bert colab doc2vec embedding-evaluation keras logistic-regression lstm nlp notebook python pytorch sentiment-analysis sentiment-embeddings textblob twitter-data visualization word2vec

Last synced: 05 Oct 2025

https://github.com/papachristoumarios/sade

Code for paper: Software clusterings with vector semantics and the call graph

c cflow cscout doc2vec layering layering-violations natural-language-processing refactoring word-embeddings

Last synced: 12 Mar 2026

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 23 Apr 2025

https://github.com/shreyansh26/revopid

Review Opinion Diversification - Shared task in IJCNLP 2017, Taipei, Taiwan

clustering doc2vec infersent liu mining-opinion-features review-opinion-diversification

Last synced: 03 Oct 2025

https://github.com/atsukoba/LabelEstimator

Simple Unsupervised Document Labeling with MeCab and Pretrained Doc2Vec Model and some experiments about `Doc2Vec.infer_vector()`

doc2vec jupyter-notebook

Last synced: 29 Apr 2025

https://github.com/howardyclo/learn-to-highlight-movies

Highlight Stephen Chow's famous movies using bullet-screen comments, document vector and neural network.

doc2vec doc2vec-word2vec highlight-movies keras solr supervised-learning text-classification word2vec

Last synced: 28 Apr 2025

https://github.com/natserract/natserract-ai

Using Doc2Vec, Langchain and OpenAI to chat with Natserract blog https://engineering-natserract.vercel.app/

chatbot doc2vec gpt4 langchain natserract openai pgvector qatool

Last synced: 17 Aug 2025

https://github.com/markedg/wfmu-universe

Graph to show the relationships of WFMU DJs and playlsts

doc2vec wfmu

Last synced: 18 Oct 2025

https://github.com/singhmnprt01/nlp-and-pytorch

NLP use cases using popular solutions: Frequency Embeddings, Word embedding (word2vec, doc2vec, Glove), RNN,LSTM, Transformers-BERT, Sentence_Transformers etc. PyTorch

doc2vec nlp-machine-learning sentence-transformers tfidf word2vec

Last synced: 22 Feb 2026

https://github.com/joaquimgomez/bachelorsthesis-textsimilaritymeasures

Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.

bert cosine-similarity doc2vec elmo embeddings fasttext glove normalized-relative-compression soft-cosine-similarity text-similarity thesis-project word-mover-distance word2vec

Last synced: 07 Aug 2025

https://github.com/duyet/doc2vec-compare-doc-demo

Demo for using Doc2vec compare between two or more documents (for Vietnamsese data)

bootstrap demo doc2vec web word2vec

Last synced: 21 Mar 2025

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 13 Jun 2025

https://github.com/vsoch/arxiv-equations

looking for patterns in equation use in arxiv papers

arxiv doc2vec equations word2vec

Last synced: 22 Mar 2025

https://github.com/bluella/text-clusterization-overview

This project is created to test different text vectorization techniques in order to perform further clusterization..

doc2vec gensim matplotlib mds nltk pca python3 sklearn text-clustering tfidf

Last synced: 01 Mar 2026

https://github.com/elifftosunn/bert-bank-model

It is a Turkish BERT-based model that will analyze people's bank complaints and classify them according to one of eight categories.

countvectorizer doc2vec f1-score huggingface huggingface-transformer huggingface-transformers nlp nltk python3 scikit-learn stopwords tagged tfidf-transformer train-test-split word-tokenizer wordnetlemmatizer

Last synced: 15 Mar 2025

https://github.com/aarryasutar/hate_speech_detection

This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.

confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud

Last synced: 30 Dec 2025

https://github.com/aditeyabaral/doc2sim

A simple command line utility to find similarity in content between documents using Doc2Vec.

doc2vec gensim machine machine-learning nlp python python3 word2vec

Last synced: 09 Mar 2025

https://github.com/ototot/doc2vecc

GPU accelerated implementation for Doc2VecC

doc2vec doc2vecc word-embedding word-embeddings

Last synced: 26 Feb 2026

https://github.com/daandouwe/svd-doc2vec

Turn documents into vectors by decomposing a PPMI cooccurence matrix.

doc2vec ppmi svd wikitext

Last synced: 20 Mar 2025

https://github.com/hailiang-wang/doc2vec

Doc2vec implementation using Gensim & Tensorflow

doc2vec natural-language-processing word2vec

Last synced: 13 Oct 2025

https://github.com/jayfunc/doc2vec_server

A Python server to be invoked doc2vec method, which uses TensorFlow and BERT model.

bert doc2vec python tensorflow

Last synced: 12 Mar 2025

https://github.com/dahsie/spam_classification

Ce fut mon prémier projet NLP où j'ai réalisé la détection de spam en utilisant les algorithmes d'embedding pour encorder mes textes. J'ai utilisé Random Forest et Milti-Layres Perceptrons pour la phase de classification. Ce qui a pemit l'obtension des précisions respective de 97% et 98%. J'ai aussi appris à documenter mes codes via sphinx

doc2vec fasttext-embeddings gensim glove-embeddings python scikit-learn sphinx-doc word2vec-algorithm

Last synced: 24 Jul 2025

https://github.com/erickorsi/mtg-learner

Text classifier with 2 part process for grammar check and vocabulary check.

classification-algorithm doc2vec ensemble lstm machine-learning natual-language-processing neural-network parts-of-speech word2vec

Last synced: 16 Jan 2026

https://github.com/jayfunc/Doc2Vec-Server

A Python server to be invoked doc2vec method, which uses TensorFlow and BERT model.

bert doc2vec python tensorflow

Last synced: 03 Oct 2025