An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with tfidf

A curated list of projects in awesome lists tagged with tfidf .

https://github.com/paulmcinnis/jobfunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 14 May 2025

https://github.com/PaulMcInnis/JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

automated beautifulsoup beautifulsoup4 csv glassdoor indeed international job jobs monster python scraper search tfidf waterloo yaml

Last synced: 17 Mar 2025

https://github.com/NISH1001/tag-generator

A simple tool to generate tags for the given text (document) using TF-IDF.

nlp tagging tf-idf tfidf

Last synced: 05 Apr 2025

https://github.com/97k/spam-ham-web-app

A web app that classifies text as a spam or ham. I am using my own ML algorithm in the backend, Code to that can be found under machine_learning_section. For Live Demo: Checkout this link

bag-of-words data-visualization django heroku-deployment jupyter-notebook machine-learning machine-learning-projects multinomial-naive-bayes nlp nltk spam-classification text-classification tfidf

Last synced: 27 Apr 2025

https://github.com/jldbc/gutenberg

A content-based recommender system for books using the Project Gutenberg text corpus

gutenberg knn pyspark recommender tfidf

Last synced: 07 May 2025

https://github.com/goldbattle/mangadexrecomendations

Finding recommendations between them all. Work in progress.

manga manga-recommendations mangadex neko recommendation-algorithm tfidf

Last synced: 10 Apr 2025

https://github.com/brunoarine/findlike

Command-line tool that finds lexically similar documents in relation to a reference text file or ad-hoc query

bm25 nlp similarity-search tfidf

Last synced: 18 Jul 2025

https://github.com/aquatiko/sentiment-analysis-tfidf-vectorizer-method

Sentiment Analysis of movie reviews by sklearn's naive bayes and TfIdf word vectorizer.

confusion-matrix movie-reviews naive-bayes-classifier sentiment-analysis sklearn-vectorizer tfidf wordnet

Last synced: 10 Apr 2025

https://github.com/wittline/tf-idf

Term Frequency-Inverse Document Frequency from Scratch

feature-engineering python text-analytics tfidf

Last synced: 13 Apr 2025

https://github.com/zengfr/svm-neuro-matching

SVM Neuro Matching C#机器学习 LibSVM支持向量机 神经网络 匹配 中文文本分词分类聚类

accrod aforge csharp hotel java learning libsvm matching neuro nlp room svm svm-neuro-matching tfidf zengfr

Last synced: 11 Apr 2025

https://github.com/nikhiljsk/preprocess_nlp

A fast framework for pre-processing (Cleaning text, Reduction of vocabulary, Feature extraction and Vectorization). Implemented with parallel processing using custom number of processes.

cleaning-data feature-extraction glove natural-language-processing nlp parallel-processing preprocess python3 reduction spacy stages tfidf vectorization word2vec

Last synced: 12 Apr 2025

https://github.com/byukan/chatbots-nlp

Chatbots and other NLP applications: Topic Modeling on text from Codechef and OkCupid

lda machine-learning nlp nmf tfidf topic-modeling

Last synced: 15 Jul 2025

https://github.com/raphaelsty/cherche-api

Deploy Cherche using FastAPI and Docker

bm25 docker fastapi neural-search question-answering summarization tfidf

Last synced: 25 Oct 2025

https://github.com/dbozhinovski/relatinator

A humble library for finding related posts and content. Uses tf-idf and BM25 under the hood. Primarily aimed at static site generators.

astro bm25 related-posts static-site tfidf

Last synced: 10 Apr 2025

https://github.com/laertispappas/mapreduce_python

TFIDF ALgorithm on Hadoop - Python

hadoop python tfidf

Last synced: 10 Aug 2025

https://github.com/i-umairkhan/news-ranking-tool

NLP project to extract relevent data from DAWN news dataset.

nlp tfidf

Last synced: 04 Jan 2026

https://github.com/avannaldas/emailsclassification

Classification of emails received on a mass distribution group

countvectorizer email-classifier scikit-learn sklearn text-classification tfidf

Last synced: 01 Jul 2025

https://github.com/germabyte/obsidian-deduper

The Obsidian Duplicate Finder is a user-friendly tool designed to help users manage duplicate files within their Obsidian vaults. It identifies similar Markdown files based on their content, making it easier to organize and declutter note-taking environments.

cosine-similarity deduplication duplicate-files markdown notes obsidian python tfidf tkinter vault

Last synced: 02 Mar 2025

https://github.com/mwoss/mors

Application of topic models for information retrieval and search engine optimization.

common-crawl crawler django doc2vec gensim hacktoberfest lda python scrapy search search-engine tfidf

Last synced: 13 Jun 2025

https://github.com/amirhosseinhonardoust/fake-review-detector

An AI-powered Fake Review Detector built with Python, Streamlit, and Scikit-learn. Uses TF-IDF vectorization, Logistic Regression, and behavioral text analytics (sentiment, exclamations, clichés) to identify synthetic or spammy product reviews. Includes training scripts and a full interactive dashboard.

ai-project dashboard data-science data-visualization fake-review-detection logistic-regression machine-learning natural-language-processing nlp python sentiment-analysis sklearn streamlit text-classification tfidf

Last synced: 06 Nov 2025

https://github.com/hanifhefaz/elm-tf-idf

Elm implementation of Term Frequency-Inverse Document Frequency (TF-IDF) for text analysis

tf-idf tfidf tfidf-text-analysis

Last synced: 04 Jul 2025

https://github.com/singhmnprt01/nlp-and-pytorch

NLP use cases using popular solutions: Frequency Embeddings, Word embedding (word2vec, doc2vec, Glove), RNN,LSTM, Transformers-BERT, Sentence_Transformers etc. PyTorch

doc2vec nlp-machine-learning sentence-transformers tfidf word2vec

Last synced: 29 Oct 2025

https://github.com/notshrirang/lyrics-analysis-and-music-recommendation-with-pair-similarities

Song recommendation system for film makers and music makers. Developed using Term Frequency - Inverse Document Frequency to vectorize lyrics of song and then cosine similarity to calculate similarity between the songs. This system recommends songs with similar lyrics.

cosine-similarity nlp spotify tfidf

Last synced: 05 Apr 2025

https://github.com/jhaayush2004/hybrid-retrieval-systems

Hybrid Retrieval System combining keyword matching (BM25) with semantic similarity (Vectorstore) for improved retrieval.

bm25-okapi chromadb huggingface-pipeline langchain python rag tfidf

Last synced: 13 Jul 2025

https://github.com/recker-dev/exploring-nlp

This Repo, explores various processes for sentiment analysis using Amazon Customer Review dataset.

bagofwords distilbert huggingface machine-learning nlp-machine-learning tfidf word2vec

Last synced: 16 Jun 2025

https://github.com/rochimfn/question-answering-konstitusi

Indonesia Constitution Question Answering System (Telegram Bot, Streamlit Page, and HTTP API)

gensim-doc2vec gensim-word2vec indonesia tfidf

Last synced: 26 Jun 2025

https://github.com/agnjason/tf-idf

实现tfidf,并应用于数据集

python tfidf

Last synced: 08 Jul 2025

https://github.com/asthavashisth/resume-screening-system

A simple AI-powered web app to automatically categorize resumes, recommend suitable job roles, and extract key details like skills, email, and contact number using NLP techniques.

css flask html jinja2 nlp-machine-learning pycharm-ide python random-forest tfidf

Last synced: 30 Dec 2025

https://github.com/codeasarjun/docbuddy

DocBuddy is a Flask web app that lets users upload and interact with PDF files by summarizing content, suggesting keywords, and providing a basic Q&A feature, all through an intuitive interface.

abstrative-text-summarization documentation-tool end-to-end-machine-learning end-to-end-project extractive-question-answering extractive-summarization natural-language-processing natural-language-understanding naturallanguageprocessing nlp nlp-machine-learning question-answering summarization textsummarization tfidf

Last synced: 03 Jan 2026

https://github.com/alessandromonolo/descriptive-texts-classification-by-usage-purposes-of-estate-properties

The project aims to identify the best model for the classification of texts derived from descriptions of assets subject to Italian judicial auctions. The employed models include both conventional models, such as Logistic Regression, Naive Bayes, SVM, and XGBoost, and neural network models, such as Fasttext and XLM-Roberta.

fasttext logistic-regression naive-bayes nlp python pytorch scikit-learn seaborn spacy svm text-classification tfidf tokenizer xgboost xlm-roberta

Last synced: 30 Dec 2025

https://github.com/geekquad/text-learning

Basic usage of NLTK. Implementation of concepts like Stemmer, TfIdf, and text.CountVectors

corpus countvectorizer nltk sklearn stopwords tfidf

Last synced: 24 Feb 2025

https://github.com/anishlearnstocode/bow-representation

Different Bag of Words representation like One Hot Vector, TF (Term frequency) & TF-IDF in NLP.

natural-language-processing nlp one-hot-vector term-frequency tf tfidf

Last synced: 18 Mar 2025

https://github.com/jash271/youglance

Package for analyzing Youtube Videos from searching by relevant entities to analyzing sentiments and clustering different parts of the video according to your liking

cosine-similarity named-entity-recognition ner nlp nltk python sentiment-analysis spacy tfidf topic-modeling

Last synced: 04 Jan 2026

https://github.com/antonio-f/multilabel-classification

Predict tags on StackOverflow with linear models - Week 1 assignment of Coursera's Natural Language Processing course from the Advanced Machine Learning Specialization.

bag-of-words logistic-regression multilabel-classification nltk-library one-vs-rest sklearn-library tfidf tfidf-vectorizer

Last synced: 30 Mar 2025

https://github.com/ffreemt/similarity-matrix

Similarity matrix based on doc-term-scores from textacy

bm25 nlp textacy tfidf

Last synced: 15 Mar 2025

https://github.com/anuraganalog/bullshit-detector

Trying a Different Approach on Fake News Detection

count dataset detection ensemble fake learning liar news nlp python3 sklearn tfidf vectorizer

Last synced: 12 Mar 2025

https://github.com/ychaaby/real-time-coursecompass

A real-time course recommendation system powered by Apache Spark and Kafka for scalable big data processing. It uses content-based filtering and AI-generated keywords to deliver personalized learning suggestions, all orchestrated with Docker for seamless deployment.

data-engineering docker gemini kafka mllib oracle pyspark recommender-system selenium spark-streaming streaming streamlit tfidf

Last synced: 09 Oct 2025

https://github.com/williamcorsel/berteval

Compare BERT-based models for document-level sentiment analysis using the SemEval 2017 Twitter dataset.

bert-model semeval sentiment-analysis tfidf twitter

Last synced: 08 Apr 2025

https://github.com/rubenhari/data-science-projects

A collection of my data science projects

data-science python tfidf

Last synced: 09 Jul 2025

https://github.com/antoinewg/ocr-tfidf

TF-IDF with Hadoop Streaming

hadoop-streaming mapreduce ocr tfidf

Last synced: 09 Apr 2025

https://github.com/sambhu431/medicine-recommendation-system

The project aims to recommend medicines based on product uses similarity, side effects, and product review weightages. Powered by NLP techniques like TF-IDF and Cosine Similarity, the system provides intelligent and user-centric recommendations.

cosine-similarity flask machine-learning medicine medicine-recommendation medicine-search pickle recommendation-system tfidf tfidf-vectorizer

Last synced: 09 Apr 2025

https://github.com/rajputpritesh1/cyberguard

This project detects cyberbullying in Bhojpuri user-generated content using Logistic Regression and TF-IDF features.

bhojpuri cyberbullying-detection flask logistic-regression machine-learning tfidf

Last synced: 24 Apr 2025

https://github.com/oyebamiji-micheal/quora-insincere-questions-classification-using-tf-idf

A web app which classifies whether a given quora question is sincere or insincere using TF IDF - A beginner's approach to NLP

classification nlp quora-questions tfidf xgboost

Last synced: 11 Aug 2025

https://github.com/zaaim-halim/java_multilanguage_searchengine_tfidf_based

java implementation of Multilanguage search engine based on TFIDF approach

arabic-nlp search-engine tfidf

Last synced: 16 Mar 2025

https://github.com/asaficontact/stack_classifier_project

We classified Stack Overflow Python questions from 2008-2016 with Natural Language Processing and Deep Learning. Using Regular Expressions, we removed HTML tags and punctuation. We also utilized spaCy to tokenize, lemmatize and remove stop words. Using Keras, we built a 4 layered artificial neural network with a 20% dropout rate using relu and softmax activation functions. We also utilized the adam optimizer and categorical cross-entropy loss function which classified 11 tags 88% successfully.

cross-entropy-loss deep-learning deep-neural-networks keras lemmatization neural-networks object-oriented-programming pandas python3 regular-expressions relu sklearn spacy spacy-nlp stackoverflow tfidf tokenization

Last synced: 08 Apr 2025

https://github.com/rid17pawar/sentiment-analysis-model-experiments

Experiments in the field of Sentiment Analysis using ML Algorithms namely Logistic Regression, Naive Bayes along with tfidf, one hot encoding, bag of words vectorization. Different MLP and RNN models viz. LSTM, GRU, Bidirectional LSTM. Lastly, state of the art BERT model

bag-of-words bert bidirectional-lstm gru logistic-regression lstm ml-algorithms naive-bayes neural-networks one-hot-encoding rnn sentiment-analysis sentiment-classification text-vectorization tfidf tfidf-vectorizer transformer-architecture twitter-sentiment-analysis

Last synced: 12 Dec 2025

https://github.com/yugalsoni18/counterfeit_review_detection

Fake review detection using TF-IDF & SVM (AUC 0.98), plus Counterfeit Risk Score with clustering & anomaly detection.

business-analytics fraud-detection isolation-forest kmeans nlp python risk-scoring scikit-learn svm tfidf

Last synced: 05 Oct 2025

https://github.com/amabna/quran-verse-similarity

A simple NLP tool to find conceptually similar Quranic verses. Uses Selenium to scrape English verse texts from clearquran.com, applies TF-IDF and CountVectorizer for similarity analysis, and displays top 5 similar verses via a Tkinter GUI.

ai deep lemmatization machine-learning natural-language-processing nlp quran sickit-learn sklearn tfidf

Last synced: 07 Oct 2025

https://github.com/das-debjit/emotion-detection

A simple ML-powered web app for real-time emotion detection from text using Streamlit and TF-IDF-based classification.

machine-learning nlp python scikit-learn sentiment-analysis streamlit text-classification tfidf web-app

Last synced: 25 Oct 2025

https://github.com/mamiglia/adm_hw_3

Algorithms for Data Mining 2022 - Homework 3 - Group 7

homework search-engine tfidf

Last synced: 11 Mar 2025

https://github.com/fusi3/natural_language_coursework

Assessing the impact of different pre-processing techniques for classifying the sentiment of movie reviews

bag-of-words latent-semantic-analysis lemmatization multilayer-perceptron nlp sentiment-analysis stemming support-vector-machines tfidf

Last synced: 18 Mar 2025

https://github.com/davidulloa6310/search-engine

Search engine implemented in Golang using Postgres for storing documents and computing TF-IDF values

go golang nlp postgres tfidf

Last synced: 13 Apr 2025

https://github.com/satyavardhan2k4/legal-outcome-classifier

A simple legal case outcome prediction project using Logistic Regression, NLP and TF-IDF vectorization on case facts text data. Includes model training, evaluation, and feature importance visualizations to interpret influential words impacting the prediction.

classification legal-analytics logistic-regression machine-learning nlp pandas python tfidf

Last synced: 26 Jun 2025

https://github.com/btc/flavor

Inspiring Culinary Creativity 🍃 Flavor Search on iOS

cooking information-retrieval ios parser-combinators rust search swift tfidf

Last synced: 27 Jun 2025

https://github.com/vasgat/company-mapping

Mapping incoming companies based on a given Corpus (of Companies).

data-integration tfidf

Last synced: 29 Mar 2025

https://github.com/tahirzia-1/nlp-textclassify

A hands-on NLP project comparing classic ML models (Naïve Bayes, SVM, Logistic Regression) and ANNs for text classification using SMS Spam and 20 Newsgroups datasets.

adam-optimizer ann cbow deep-learning lemmatization logistic-regression naive-bayes-classifier nlp nlp-machine-learning skipgram-algorithm svm tensorflow tfidf tfidf-vectorizer tokenization vectorization word2vec

Last synced: 14 Sep 2025

https://github.com/huangjunxin/lite-sklearn-tfidfvectorizer

A Lite Implementation of sklearn TfidfVectorizer

implementation sklearn tfidf tfidfvectorizer

Last synced: 06 Apr 2025

https://github.com/pedrofracassi/insper-nlp-relevance-search

Busca por posts no Bluesky usando TFIDF para classificar relevância dos resultados

tfidf tfidf-vectorizer

Last synced: 27 Mar 2025

https://github.com/brej-29/disaster-tweets-nlp-model-benchmarks

Benchmark NLP models on Kaggle “Disaster Tweets”: TF-IDF + Naive Bayes baseline, Keras deep nets (Dense/LSTM/GRU/BiRNN/Conv1D), and TensorFlow Hub Universal Sentence Encoder transfer learning—compared using accuracy, precision, recall, and F1.

bidirectional-rnn cnn conv1d deep-learning disaster-tweets gru kaggle keras lstm machine-learning naive-bayes nlp rnn scikit-learn tensorflow tensorflow-hub text-classification tfidf

Last synced: 30 Dec 2025

https://github.com/prajwalsde/fake-news-detection

A Machine Learning-based Fake News Detection system using TF-IDF and Logistic Regression, with a Streamlit app for real-time predictions.

data-science fake-news-detection machine-learning news-classifier nlp python sklearn streamlit text-classification tfidf

Last synced: 28 Jun 2025

https://github.com/akshat48002/youtube-sentiment-analysis

A complete end-to-end Machine Learning + MLOps project that analyzes YouTube video comments and classifies them into Positive, Negative, or Neutral sentiments.

aws chrome-extension cicd data-preprocessing docker flask github-actions lightgbm machine-learning mlflow natural-language-processing optuna sentiment-analysis smote-oversampler tfidf youtube

Last synced: 04 Oct 2025

https://github.com/shimaa83/twitter_disaster

twitter classification using classic ML models

cat-boast light-gm naive-bayes-classifier nlp random-forest tfidf word-cloud

Last synced: 28 Jul 2025

https://github.com/inscapist/gensim-similarity-task

Use Gensim's TFIDF model to compute document similarity

gensim similarity-score tfidf

Last synced: 07 Nov 2025

https://github.com/abhibisht89/tfidf_example

Simple way to get the top features from text using TFIDF

analytics feature-extraction python text tfidf

Last synced: 31 Jul 2025

https://github.com/hamedzarei/simple-tfidf

information retrieval - search in 70 documents in differents topics

information-retrieval nlp python tfidf

Last synced: 31 Jul 2025

https://github.com/0xkibh/simple-nlp

A simple NLP clustering program to cluster the text using TF-IDF and Word2Vec as feature extraction and K-Means Clustering as an algorithm

gensim kmeans-clustering nlp pandas python tfidf word2vec

Last synced: 31 Jul 2025

https://github.com/sayamalt/text-similarity-quantifier

Successfully developed a machine learning model for computing the similarity score between two text paragraphs taken as input from a webpage.

bag-of-words cosine-similarity cosine-similarity-scores countvectorizer flask machine-learning nlp pandas python text-preprocessing tfidf

Last synced: 09 Nov 2025

https://github.com/armanjscript/rag-driven-generative-ai

Generative AI has made remarkable strides in creating human-like text, images, and even code. However, traditional models like GPT rely solely on pre-trained knowledge, which can lead to outdated, inaccurate, or hallucinated responses. Retrieval-Augmented Generation (RAG) addresses these limitations. We offer various types of RAG here

cosine-similarity langchain langchain-ollama qwen2-5 spacy spacy-nlp tfidf tfidf-vectorizer wordnet

Last synced: 17 Aug 2025

https://github.com/kathrin-92/unsupervised-ml-trends-in-science-dlbdsmlusl01

Analyzing trends in scientific publications through NLP, including clustering research articles and identifying overarching subjects within the data.

kmeans-clustering nlp nlp-keywords-extraction pca text-analysis tfidf topic-modeling unsupervised-machine-learning

Last synced: 15 Jul 2025

https://github.com/blockfeed/ai-playlist-heuristic

A tongue-in-cheek 'AI' playlist generator: TF-IDF + tempo/heuristics. Offline, reproducible.

arch-linux audio m3u music playlist python rockbox tfidf xspf

Last synced: 31 Aug 2025

https://github.com/meinhere/news-clasification

Klasifikasi Berita Online pada KOMPAS untuk mata kuliah Pencarian dan Penambangan Web menggunakan metode Logistic Regression

logistic-regression python streamlit tfidf vsm

Last synced: 14 Apr 2025

https://github.com/singhxtushar/bow-tfidf-spambuster

This project is a SMS spam classifier which detect whether the SMS is spam or ham using the multinomial Naive Bayes algorithm along the side of BOW/TF-IDF in NLP

bow naive-bayes nlp sms-classification tfidf

Last synced: 12 Nov 2025

https://github.com/vickshan001/friends-character-classifier-vector-semantics-nlp

NLP coursework using vector space semantics to classify Friends character dialogue. Includes TF-IDF, POS, sentiment, and context-aware features.

distributional-semantics document-classification friends-tv-show nlp pos-tagging python sentiment-analysis tfidf vector-space-model

Last synced: 31 Aug 2025

https://github.com/seekai-786/resume-analyzer

Resume Analyzer is a prototype web application that allows users to upload multiple resumes and compare them against a job description using vectorization and cosine similarity. The project is built using Python, Flask, and scikit-learn.

backend-development css document-vectorization flask flask-app html javascript job-matching machine-learning ml nlp nlp-project osine-similarity python pythonanywhere resume-analyzer resume-matching resume-screening-app sckiit-learn tfidf

Last synced: 08 Aug 2025

https://github.com/hackerslash/dsarch

A search engine for Data Structure and Algorithm problems

data-structures dsa leetcode search-engine tfidf

Last synced: 16 May 2025

https://github.com/joaooliveirapro/indexergo

IndexerGo 🔎 is a Go-based application designed to analyse and index HTML documents for efficient content search and ranking (using TF-IDF algorithm). It provides detailed insights into document structure and text content.

go golang indexing text-analysis tfidf

Last synced: 03 Mar 2025

https://github.com/minhosong88/corpusreader_tdidf

CorpusReader_TFIDF is a custom Python class designed to calculate TF-IDF (Term Frequency-Inverse Document Frequency) for documents in a corpus. It was developed as part of an assignment for the Introduction to Natural La

corpus nlp tfidf

Last synced: 27 Dec 2025