Projects in Awesome Lists tagged with tfidf
A curated list of projects in awesome lists tagged with tfidf .
https://github.com/paulmcinnis/jobfunnel
Scrape job websites into a single spreadsheet with no duplicates.
automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml
Last synced: 14 May 2025
https://github.com/PaulMcInnis/JobFunnel
Scrape job websites into a single spreadsheet with no duplicates.
automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml
Last synced: 17 Mar 2025
https://github.com/andrewtavis/kwx
BERT, LDA, and TFIDF based keyword extraction in Python
bert data-analysis data-science data-visualization keyword-extraction latent-dirichlet-allocation lda machine-learning multilingual natural-language-processing nlp open-source python python3 text-analysis text-classification text-mining tfidf topic-modeling unsupervised-learning
Last synced: 14 Apr 2025
https://github.com/winkjs/wink-bm25-text-search
Fast Full Text Search based on BM25
bm25 bm25f full-text-search in-memory-search natural-language-processing nlp semantic-search tf-idf tfidf
Last synced: 20 Aug 2025
https://github.com/NISH1001/tag-generator
A simple tool to generate tags for the given text (document) using TF-IDF.
Last synced: 05 Apr 2025
https://github.com/faizann24/phishytics-machine-learning-for-phishing
Machine Learning for Phishing Website Detection
artificial-intelligence bpe cybersecurity data-science machine-learning phishing phishing-detection random-forest security security-tools tfidf
Last synced: 14 Jul 2025
https://github.com/zayedrais/documentsearchengine
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
data-science deep-learning document-search document-similarity juypter machine-learning python python-text-analysis semantic-search semantic-search-engine tensorflow tensorflow-models tensorflow-tutorials text-analysis text-search text-semantic-similarity tfidf tfidf-text-analysis tfidf-vectorizer universal-sentence-encoder
Last synced: 02 May 2025
https://github.com/ahmedbesbes/overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
bag-of-words blog character-ngrams convolutional-neural-networks deeplearning glove-embeddings keras nlp recurrent-neural-networks sentiment-analysis sklearn text-classification tfidf tutorial tweets word-embeddings word-ngrams
Last synced: 14 Jul 2025
https://github.com/97k/spam-ham-web-app
A web app that classifies text as a spam or ham. I am using my own ML algorithm in the backend, Code to that can be found under machine_learning_section. For Live Demo: Checkout this link
bag-of-words data-visualization django heroku-deployment jupyter-notebook machine-learning machine-learning-projects multinomial-naive-bayes nlp nltk spam-classification text-classification tfidf
Last synced: 27 Apr 2025
https://github.com/jldbc/gutenberg
A content-based recommender system for books using the Project Gutenberg text corpus
gutenberg knn pyspark recommender tfidf
Last synced: 07 May 2025
https://github.com/cereja-project/cereja
Cereja is a bundle of useful functions we don't want to rewrite and .. just pure fun!
array-manipulations colab console data-tools datapreprocessing file-converter freq freqitems hacktoberfest hacktoberfest2024 progress-bar progress-view python python-library python3 tfidf tokenizer utilities
Last synced: 06 Apr 2025
https://github.com/tiagomantunes/karen
KAREN: Unifying Hatespeech Detection and Benchmarking
abuse benchmark bert deep-learning detection framework hate hatespeech huggingface natural-language-processing nlp offensive offensive-language pytorch sentence-classification speech tfidf twitter
Last synced: 22 Aug 2025
https://github.com/goldbattle/mangadexrecomendations
Finding recommendations between them all. Work in progress.
manga manga-recommendations mangadex neko recommendation-algorithm tfidf
Last synced: 10 Apr 2025
https://github.com/andrewtavis/wikirec
Recommendation engine framework based on Wikipedia data
bert bert-embeddings books doc2vec lda machine-learning multilingual natural-language-processing neural-network nlp open-source python python3 recommendation-engine recommender-system text-mining tfidf unsupervised-learning wikipedia wikipedia-data
Last synced: 05 Jul 2025
https://github.com/brunoarine/findlike
Command-line tool that finds lexically similar documents in relation to a reference text file or ad-hoc query
bm25 nlp similarity-search tfidf
Last synced: 18 Jul 2025
https://github.com/aquatiko/sentiment-analysis-tfidf-vectorizer-method
Sentiment Analysis of movie reviews by sklearn's naive bayes and TfIdf word vectorizer.
confusion-matrix movie-reviews naive-bayes-classifier sentiment-analysis sklearn-vectorizer tfidf wordnet
Last synced: 10 Apr 2025
https://github.com/fukurosan/jhaystack
A JavaScript search engine with zero dependencies.
bitap bm25 clustering fulltext fulltext-search fuzzy javascript nlp query search search-engine spellcheck tfidf typescript
Last synced: 12 Apr 2025
https://github.com/wittline/tf-idf
Term Frequency-Inverse Document Frequency from Scratch
feature-engineering python text-analytics tfidf
Last synced: 13 Apr 2025
https://github.com/nikhiljsk/preprocess_nlp
A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.
cleaning-data feature-extraction glove natural-language-processing nlp parallel-processing preprocess python3 reduction spacy stages tfidf vectorization word2vec
Last synced: 12 Apr 2025
https://github.com/byukan/chatbots-nlp
Chatbots and other NLP applications: Topic Modeling on text from Codechef and OkCupid
lda machine-learning nlp nmf tfidf topic-modeling
Last synced: 15 Jul 2025
https://github.com/raphaelsty/cherche-api
Deploy Cherche using FastAPI and Docker
bm25 docker fastapi neural-search question-answering summarization tfidf
Last synced: 25 Oct 2025
https://github.com/dbozhinovski/relatinator
A humble library for finding related posts and content. Uses tf-idf and BM25 under the hood. Primarily aimed at static site generators.
astro bm25 related-posts static-site tfidf
Last synced: 10 Apr 2025
https://github.com/bent10/boox
Search anything, instantly
boox document-search full-text-search fulltext-search fuzzy-matching fuzzy-search instantsearch inverted-index nlp search search-engine search-index tf-idf tfidf vector-search vector-space-model
Last synced: 10 Apr 2025
https://github.com/laertispappas/mapreduce_python
TFIDF ALgorithm on Hadoop - Python
Last synced: 10 Aug 2025
https://github.com/i-umairkhan/news-ranking-tool
NLP project to extract relevent data from DAWN news dataset.
Last synced: 04 Jan 2026
https://github.com/avannaldas/emailsclassification
Classification of emails received on a mass distribution group
countvectorizer email-classifier scikit-learn sklearn text-classification tfidf
Last synced: 01 Jul 2025
https://github.com/stefantaubert/quora-competition
Code for Quora Competition on Kaggle
data-science dataset evaluation jaccard kaggle lemmatization levenshtein nlp python quora quora-competition quora-question-pairs random-search tfidf tokenization xgboost
Last synced: 14 Oct 2025
https://github.com/germabyte/obsidian-deduper
The Obsidian Duplicate Finder is a user-friendly tool designed to help users manage duplicate files within their Obsidian vaults. It identifies similar Markdown files based on their content, making it easier to organize and declutter note-taking environments.
cosine-similarity deduplication duplicate-files markdown notes obsidian python tfidf tkinter vault
Last synced: 02 Mar 2025
https://github.com/mwoss/mors
Application of topic models for information retrieval and search engine optimization.
common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf
Last synced: 13 Jun 2025
https://github.com/aymenkhs/information-retrieval-on-cacm-collection
information retrieval on cacm collection using python
boolean-model cacm information-retrieval pyqt5 python tfidf vectorial-model
Last synced: 31 Oct 2025
https://github.com/amir78729/information-retrieval
information-retrieval inverted-index ir search-engine tfidf
Last synced: 20 Jun 2025
https://github.com/amirhosseinhonardoust/fake-review-detector
An AI-powered Fake Review Detector built with Python, Streamlit, and Scikit-learn. Uses TF-IDF vectorization, Logistic Regression, and behavioral text analytics (sentiment, exclamations, clichés) to identify synthetic or spammy product reviews. Includes training scripts and a full interactive dashboard.
ai-project dashboard data-science data-visualization fake-review-detection logistic-regression machine-learning natural-language-processing nlp python sentiment-analysis sklearn streamlit text-classification tfidf
Last synced: 06 Nov 2025
https://github.com/hanifhefaz/elm-tf-idf
Elm implementation of Term Frequency-Inverse Document Frequency (TF-IDF) for text analysis
tf-idf tfidf tfidf-text-analysis
Last synced: 04 Jul 2025
https://github.com/singhmnprt01/nlp-and-pytorch
NLP use cases using popular solutions: Frequency Embeddings, Word embedding (word2vec, doc2vec, Glove), RNN,LSTM, Transformers-BERT, Sentence_Transformers etc. PyTorch
doc2vec nlp-machine-learning sentence-transformers tfidf word2vec
Last synced: 29 Oct 2025
https://github.com/notshrirang/lyrics-analysis-and-music-recommendation-with-pair-similarities
Song recommendation system for film makers and music makers. Developed using Term Frequency - Inverse Document Frequency to vectorize lyrics of song and then cosine similarity to calculate similarity between the songs. This system recommends songs with similar lyrics.
cosine-similarity nlp spotify tfidf
Last synced: 05 Apr 2025
https://github.com/jhaayush2004/hybrid-retrieval-systems
Hybrid Retrieval System combining keyword matching (BM25) with semantic similarity (Vectorstore) for improved retrieval.
bm25-okapi chromadb huggingface-pipeline langchain python rag tfidf
Last synced: 13 Jul 2025
https://github.com/recker-dev/exploring-nlp
This Repo, explores various processes for sentiment analysis using Amazon Customer Review dataset.
bagofwords distilbert huggingface machine-learning nlp-machine-learning tfidf word2vec
Last synced: 16 Jun 2025
https://github.com/rochimfn/question-answering-konstitusi
Indonesia Constitution Question Answering System (Telegram Bot, Streamlit Page, and HTTP API)
gensim-doc2vec gensim-word2vec indonesia tfidf
Last synced: 26 Jun 2025
https://github.com/moindalvs/text_mining_nlp
Natural Language Processing
bag-of-words classifier data-science fake-news lemmatization nlp pipeline sentiment-analysis sentiment-classification spacy spacy-pipeline stemming text-classification text-mining tfidf tokenization vectorizer
Last synced: 24 Dec 2025
https://github.com/asthavashisth/resume-screening-system
A simple AI-powered web app to automatically categorize resumes, recommend suitable job roles, and extract key details like skills, email, and contact number using NLP techniques.
css flask html jinja2 nlp-machine-learning pycharm-ide python random-forest tfidf
Last synced: 30 Dec 2025
https://github.com/codeasarjun/docbuddy
DocBuddy is a Flask web app that lets users upload and interact with PDF files by summarizing content, suggesting keywords, and providing a basic Q&A feature, all through an intuitive interface.
abstrative-text-summarization documentation-tool end-to-end-machine-learning end-to-end-project extractive-question-answering extractive-summarization natural-language-processing natural-language-understanding naturallanguageprocessing nlp nlp-machine-learning question-answering summarization textsummarization tfidf
Last synced: 03 Jan 2026
https://github.com/alessandromonolo/descriptive-texts-classification-by-usage-purposes-of-estate-properties
The project aims to identify the best model for the classification of texts derived from descriptions of assets subject to Italian judicial auctions. The employed models include both conventional models, such as Logistic Regression, Naive Bayes, SVM, and XGBoost, and neural network models, such as Fasttext and XLM-Roberta.
fasttext logistic-regression naive-bayes nlp python pytorch scikit-learn seaborn spacy svm text-classification tfidf tokenizer xgboost xlm-roberta
Last synced: 30 Dec 2025
https://github.com/akhand-pratap-tiwari/automatic-extractive-text-summarization-using-tf-idf
Text Summarization using TF-IDF technique in Python.
natural-language-processing nltk python python-3 python3 sklearn tfidf tfidf-text-analysis vectorization
Last synced: 23 Mar 2025
https://github.com/geekquad/text-learning
Basic usage of NLTK. Implementation of concepts like Stemmer, TfIdf, and text.CountVectors
corpus countvectorizer nltk sklearn stopwords tfidf
Last synced: 24 Feb 2025
https://github.com/moindalvs/sentiment_analysis_on_-elon_musk_tweets
Perform sentimental analysis on the Elon-musk tweets (Elon-musk.csv)
bag-of-words cleaning-data elon-musk feature-engineering nlp nltk polarity sentiment-analysis sentiment-intensity sentiment-polarity spacy subjectivity text-mining text-processing textblob-sentiment-analysis tfidf tfidf-vectorizer tokenizer tweet-analysis twitter-sentiment-analysis
Last synced: 24 Dec 2025
https://github.com/anishlearnstocode/bow-representation
Different Bag of Words representation like One Hot Vector, TF (Term frequency) & TF-IDF in NLP.
natural-language-processing nlp one-hot-vector term-frequency tf tfidf
Last synced: 18 Mar 2025
https://github.com/ternion-1121/yt-comments-clustering
An NLP project to cluster YouTube comments on the basis of their similarity of words.
clustering google-youtube-api grouping kmeans kmeans-clustering matplotlib-pyplot natural-language-processing nlp pandas python python3 sentiment-analysis tfidf wordcloud youtube youtube-api
Last synced: 28 Dec 2025
https://github.com/nandahkrishna/sarcasmdetection
Detecting sarcasm in Reddit comments
bert-embeddings classification explainable-ml jupyter-notebook machine-learning natural-language-processing python reddit sarcasm sarcasm-detection tfidf
Last synced: 13 Oct 2025
https://github.com/jash271/youglance
Package for analyzing Youtube Videos from searching by relevant entities to analyzing sentiments and clustering different parts of the video according to your liking
cosine-similarity named-entity-recognition ner nlp nltk python sentiment-analysis spacy tfidf topic-modeling
Last synced: 04 Jan 2026
https://github.com/antonio-f/multilabel-classification
Predict tags on StackOverflow with linear models - Week 1 assignment of Coursera's Natural Language Processing course from the Advanced Machine Learning Specialization.
bag-of-words logistic-regression multilabel-classification nltk-library one-vs-rest sklearn-library tfidf tfidf-vectorizer
Last synced: 30 Mar 2025
https://github.com/alvinmurimi/lexiful
Specialized Intelligent Text Matching and Correction Engine
fuzzy-matching machine-learning natural-language-processing nlp phonetic-matching spelling-correction tfidf
Last synced: 04 Jul 2025
https://github.com/ffreemt/similarity-matrix
Similarity matrix based on doc-term-scores from textacy
Last synced: 15 Mar 2025
https://github.com/ychaaby/real-time-coursecompass
A real-time course recommendation system powered by Apache Spark and Kafka for scalable big data processing. It uses content-based filtering and AI-generated keywords to deliver personalized learning suggestions, all orchestrated with Docker for seamless deployment.
data-engineering docker gemini kafka mllib oracle pyspark recommender-system selenium spark-streaming streaming streamlit tfidf
Last synced: 09 Oct 2025
https://github.com/williamcorsel/berteval
Compare BERT-based models for document-level sentiment analysis using the SemEval 2017 Twitter dataset.
bert-model semeval sentiment-analysis tfidf twitter
Last synced: 08 Apr 2025
https://github.com/louislefevre/information-retrieval-models
Ranks passages against queries using various models and techniques.
bm25 dirichlet-smoothing information-retrieval laplace-smoothing lidstone-smoothing query-likelihood tfidf vectorspace
Last synced: 30 Mar 2025
https://github.com/rubenhari/data-science-projects
A collection of my data science projects
Last synced: 09 Jul 2025
https://github.com/antoinewg/ocr-tfidf
TF-IDF with Hadoop Streaming
hadoop-streaming mapreduce ocr tfidf
Last synced: 09 Apr 2025
https://github.com/sambhu431/medicine-recommendation-system
The project aims to recommend medicines based on product uses similarity, side effects, and product review weightages. Powered by NLP techniques like TF-IDF and Cosine Similarity, the system provides intelligent and user-centric recommendations.
cosine-similarity flask machine-learning medicine medicine-recommendation medicine-search pickle recommendation-system tfidf tfidf-vectorizer
Last synced: 09 Apr 2025
https://github.com/rajputpritesh1/cyberguard
This project detects cyberbullying in Bhojpuri user-generated content using Logistic Regression and TF-IDF features.
bhojpuri cyberbullying-detection flask logistic-regression machine-learning tfidf
Last synced: 24 Apr 2025
https://github.com/oyebamiji-micheal/quora-insincere-questions-classification-using-tf-idf
A web app which classifies whether a given quora question is sincere or insincere using TF IDF - A beginner's approach to NLP
classification nlp quora-questions tfidf xgboost
Last synced: 11 Aug 2025
https://github.com/zaaim-halim/java_multilanguage_searchengine_tfidf_based
java implementation of Multilanguage search engine based on TFIDF approach
arabic-nlp search-engine tfidf
Last synced: 16 Mar 2025
https://github.com/asaficontact/stack_classifier_project
We classified Stack Overflow Python questions from 2008-2016 with Natural Language Processing and Deep Learning. Using Regular Expressions, we removed HTML tags and punctuation. We also utilized spaCy to tokenize, lemmatize and remove stop words. Using Keras, we built a 4 layered artificial neural network with a 20% dropout rate using relu and softmax activation functions. We also utilized the adam optimizer and categorical cross-entropy loss function which classified 11 tags 88% successfully.
cross-entropy-loss deep-learning deep-neural-networks keras lemmatization neural-networks object-oriented-programming pandas python3 regular-expressions relu sklearn spacy spacy-nlp stackoverflow tfidf tokenization
Last synced: 08 Apr 2025
https://github.com/rid17pawar/sentiment-analysis-model-experiments
Experiments in the field of Sentiment Analysis using ML Algorithms namely Logistic Regression, Naive Bayes along with tfidf, one hot encoding, bag of words vectorization. Different MLP and RNN models viz. LSTM, GRU, Bidirectional LSTM. Lastly, state of the art BERT model
bag-of-words bert bidirectional-lstm gru logistic-regression lstm ml-algorithms naive-bayes neural-networks one-hot-encoding rnn sentiment-analysis sentiment-classification text-vectorization tfidf tfidf-vectorizer transformer-architecture twitter-sentiment-analysis
Last synced: 12 Dec 2025
https://github.com/yugalsoni18/counterfeit_review_detection
Fake review detection using TF-IDF & SVM (AUC 0.98), plus Counterfeit Risk Score with clustering & anomaly detection.
business-analytics fraud-detection isolation-forest kmeans nlp python risk-scoring scikit-learn svm tfidf
Last synced: 05 Oct 2025
https://github.com/amabna/quran-verse-similarity
A simple NLP tool to find conceptually similar Quranic verses. Uses Selenium to scrape English verse texts from clearquran.com, applies TF-IDF and CountVectorizer for similarity analysis, and displays top 5 similar verses via a Tkinter GUI.
ai deep lemmatization machine-learning natural-language-processing nlp quran sickit-learn sklearn tfidf
Last synced: 07 Oct 2025
https://github.com/das-debjit/emotion-detection
A simple ML-powered web app for real-time emotion detection from text using Streamlit and TF-IDF-based classification.
machine-learning nlp python scikit-learn sentiment-analysis streamlit text-classification tfidf web-app
Last synced: 25 Oct 2025
https://github.com/mamiglia/adm_hw_3
Algorithms for Data Mining 2022 - Homework 3 - Group 7
Last synced: 11 Mar 2025
https://github.com/fusi3/natural_language_coursework
Assessing the impact of different pre-processing techniques for classifying the sentiment of movie reviews
bag-of-words latent-semantic-analysis lemmatization multilayer-perceptron nlp sentiment-analysis stemming support-vector-machines tfidf
Last synced: 18 Mar 2025
https://github.com/susheel-1999/nlp-spam_ham_classification_using_different_techniques
Spam and Ham classification
deep-learning doc2v ham keras nlp nlp-machine-learning rnn spam spam-filtering tfidf word2vec
Last synced: 06 Apr 2025
https://github.com/satyavardhan2k4/legal-outcome-classifier
A simple legal case outcome prediction project using Logistic Regression, NLP and TF-IDF vectorization on case facts text data. Includes model training, evaluation, and feature importance visualizations to interpret influential words impacting the prediction.
classification legal-analytics logistic-regression machine-learning nlp pandas python tfidf
Last synced: 26 Jun 2025
https://github.com/btc/flavor
Inspiring Culinary Creativity 🍃 Flavor Search on iOS
cooking information-retrieval ios parser-combinators rust search swift tfidf
Last synced: 27 Jun 2025
https://github.com/vasgat/company-mapping
Mapping incoming companies based on a given Corpus (of Companies).
Last synced: 29 Mar 2025
https://github.com/tahirzia-1/nlp-textclassify
A hands-on NLP project comparing classic ML models (Naïve Bayes, SVM, Logistic Regression) and ANNs for text classification using SMS Spam and 20 Newsgroups datasets.
adam-optimizer ann cbow deep-learning lemmatization logistic-regression naive-bayes-classifier nlp nlp-machine-learning skipgram-algorithm svm tensorflow tfidf tfidf-vectorizer tokenization vectorization word2vec
Last synced: 14 Sep 2025
https://github.com/huangjunxin/lite-sklearn-tfidfvectorizer
A Lite Implementation of sklearn TfidfVectorizer
implementation sklearn tfidf tfidfvectorizer
Last synced: 06 Apr 2025
https://github.com/pedrofracassi/insper-nlp-relevance-search
Busca por posts no Bluesky usando TFIDF para classificar relevância dos resultados
Last synced: 27 Mar 2025
https://github.com/brej-29/disaster-tweets-nlp-model-benchmarks
Benchmark NLP models on Kaggle “Disaster Tweets”: TF-IDF + Naive Bayes baseline, Keras deep nets (Dense/LSTM/GRU/BiRNN/Conv1D), and TensorFlow Hub Universal Sentence Encoder transfer learning—compared using accuracy, precision, recall, and F1.
bidirectional-rnn cnn conv1d deep-learning disaster-tweets gru kaggle keras lstm machine-learning naive-bayes nlp rnn scikit-learn tensorflow tensorflow-hub text-classification tfidf
Last synced: 30 Dec 2025
https://github.com/prajwalsde/fake-news-detection
A Machine Learning-based Fake News Detection system using TF-IDF and Logistic Regression, with a Streamlit app for real-time predictions.
data-science fake-news-detection machine-learning news-classifier nlp python sklearn streamlit text-classification tfidf
Last synced: 28 Jun 2025
https://github.com/akshat48002/youtube-sentiment-analysis
A complete end-to-end Machine Learning + MLOps project that analyzes YouTube video comments and classifies them into Positive, Negative, or Neutral sentiments.
aws chrome-extension cicd data-preprocessing docker flask github-actions lightgbm machine-learning mlflow natural-language-processing optuna sentiment-analysis smote-oversampler tfidf youtube
Last synced: 04 Oct 2025
https://github.com/daedalus/distiller
Model distiller automator
ai bloom-filter huggingface large-language-models model-distillation ngrams openai scikit-learn sqlite tfidf torch transformers unsloth
Last synced: 17 Jul 2025
https://github.com/shimaa83/twitter_disaster
twitter classification using classic ML models
cat-boast light-gm naive-bayes-classifier nlp random-forest tfidf word-cloud
Last synced: 28 Jul 2025
https://github.com/inscapist/gensim-similarity-task
Use Gensim's TFIDF model to compute document similarity
Last synced: 07 Nov 2025
https://github.com/ummtushar/domain-analysis-nlp-thesis
Source Code Analysis of Jupyter Notebooks using Natural Language Processing
bert bow cosine-similarity gensim glove huggingface-transformers jupyter-notebook lda nlp scipy softmax-regression tfidf vector-embeddings word2vec
Last synced: 30 Jul 2025
https://github.com/abhibisht89/tfidf_example
Simple way to get the top features from text using TFIDF
analytics feature-extraction python text tfidf
Last synced: 31 Jul 2025
https://github.com/hamedzarei/simple-tfidf
information retrieval - search in 70 documents in differents topics
information-retrieval nlp python tfidf
Last synced: 31 Jul 2025
https://github.com/0xkibh/simple-nlp
A simple NLP clustering program to cluster the text using TF-IDF and Word2Vec as feature extraction and K-Means Clustering as an algorithm
gensim kmeans-clustering nlp pandas python tfidf word2vec
Last synced: 31 Jul 2025
https://github.com/sayamalt/text-similarity-quantifier
Successfully developed a machine learning model for computing the similarity score between two text paragraphs taken as input from a webpage.
bag-of-words cosine-similarity cosine-similarity-scores countvectorizer flask machine-learning nlp pandas python text-preprocessing tfidf
Last synced: 09 Nov 2025
https://github.com/armanjscript/rag-driven-generative-ai
Generative AI has made remarkable strides in creating human-like text, images, and even code. However, traditional models like GPT rely solely on pre-trained knowledge, which can lead to outdated, inaccurate, or hallucinated responses. Retrieval-Augmented Generation (RAG) addresses these limitations. We offer various types of RAG here
cosine-similarity langchain langchain-ollama qwen2-5 spacy spacy-nlp tfidf tfidf-vectorizer wordnet
Last synced: 17 Aug 2025
https://github.com/kathrin-92/unsupervised-ml-trends-in-science-dlbdsmlusl01
Analyzing trends in scientific publications through NLP, including clustering research articles and identifying overarching subjects within the data.
kmeans-clustering nlp nlp-keywords-extraction pca text-analysis tfidf topic-modeling unsupervised-machine-learning
Last synced: 15 Jul 2025
https://github.com/blockfeed/ai-playlist-heuristic
A tongue-in-cheek 'AI' playlist generator: TF-IDF + tempo/heuristics. Offline, reproducible.
arch-linux audio m3u music playlist python rockbox tfidf xspf
Last synced: 31 Aug 2025
https://github.com/meinhere/news-clasification
Klasifikasi Berita Online pada KOMPAS untuk mata kuliah Pencarian dan Penambangan Web menggunakan metode Logistic Regression
logistic-regression python streamlit tfidf vsm
Last synced: 14 Apr 2025
https://github.com/singhxtushar/bow-tfidf-spambuster
This project is a SMS spam classifier which detect whether the SMS is spam or ham using the multinomial Naive Bayes algorithm along the side of BOW/TF-IDF in NLP
bow naive-bayes nlp sms-classification tfidf
Last synced: 12 Nov 2025
https://github.com/vickshan001/friends-character-classifier-vector-semantics-nlp
NLP coursework using vector space semantics to classify Friends character dialogue. Includes TF-IDF, POS, sentiment, and context-aware features.
distributional-semantics document-classification friends-tv-show nlp pos-tagging python sentiment-analysis tfidf vector-space-model
Last synced: 31 Aug 2025
https://github.com/seekai-786/resume-analyzer
Resume Analyzer is a prototype web application that allows users to upload multiple resumes and compare them against a job description using vectorization and cosine similarity. The project is built using Python, Flask, and scikit-learn.
backend-development css document-vectorization flask flask-app html javascript job-matching machine-learning ml nlp nlp-project osine-similarity python pythonanywhere resume-analyzer resume-matching resume-screening-app sckiit-learn tfidf
Last synced: 08 Aug 2025
https://github.com/hackerslash/dsarch
A search engine for Data Structure and Algorithm problems
data-structures dsa leetcode search-engine tfidf
Last synced: 16 May 2025
https://github.com/joaooliveirapro/indexergo
IndexerGo 🔎 is a Go-based application designed to analyse and index HTML documents for efficient content search and ranking (using TF-IDF algorithm). It provides detailed insights into document structure and text content.
go golang indexing text-analysis tfidf
Last synced: 03 Mar 2025
https://github.com/minhosong88/corpusreader_tdidf
CorpusReader_TFIDF is a custom Python class designed to calculate TF-IDF (Term Frequency-Inverse Document Frequency) for documents in a corpus. It was developed as part of an assignment for the Introduction to Natural La
Last synced: 27 Dec 2025