Projects in Awesome Lists tagged with tf-idf-vectorizer
A curated list of projects in awesome lists tagged with tf-idf-vectorizer .
https://github.com/jalajthanaki/basic_ecommerce_recomendation_system
This repository contains the code for basic kind of E-commerce recommendation engine. By using the concept of TF-IDF and cosine similarity, we have built this recommendation engine.
cosine-similarity recommendation-system tf-idf-vectorizer
Last synced: 09 Apr 2025
https://github.com/kodiks/turkish-news-classification
Turkish News Category Classification Tutorial
datasets huggingface machine-learning news-classification nlp svm-classifier text-classification tf-idf-vectorizer turkish-nlp
Last synced: 05 Feb 2026
https://github.com/iamkankan/natural-language-processing-nlp-tutorial
NLP tutorials and guidelines to learn efficiently
bigrams bow cbow glove lemmatization one-hot-encoding stemming stopwords tf-idf-vectorizer tokenization unigram word-embeddings word2vec
Last synced: 08 Jan 2026
https://github.com/jalajthanaki/medical_notes_extractive_summarization
Extractive summarizationof medical transcriptions
extractive-text-summarization medical-data ranking-algorithm summarization tf-idf-vectorizer
Last synced: 09 Apr 2025
https://github.com/nivesayee/recipe-genie
Recipe Genie is a recipe recommendation system that recommends recipes to users based on the ingredients they have at home.
cosine-similarity cosine-similarity-scores recipe recipe-recommendation recipe-recommender recipe-search recipes recommendation-engine recommendation-system recommender-system tf-idf tf-idf-vectorizer
Last synced: 31 Oct 2025
https://github.com/kvarun07/ecom-product-classifier
Text Classification: Predicting product categories from their text descriptions.
count-vectorizer logistic-regression naive-bayes-classifier natural-language-processing neural-network support-vector-machine text-classification tf-idf-vectorizer
Last synced: 10 Apr 2025
https://github.com/kanchanraiii/context-aware-bot
Pseudo RAG Based Bot
chatbot gemini-api streamlit tf-idf-vectorizer
Last synced: 15 May 2026
https://github.com/rasti37/most-similar-string-to-given-query
In this project I am using the tf - idf algorithm and cosine similarity to find the similarity of two strings.
cosine-similarity cosine-similarity-scores document-frequency idf inverse-document-frequency query string-similarity term-frequency tf tf-idf tf-idf-vectorizer
Last synced: 18 Mar 2025
https://github.com/coderjolly/news-recommender
This is a news recommender system that uses beautiful-soup to scrape news articles, their categories and descriptions to create a data dump. It then uses word embedding techniques such tf-idf, word2vec for content based news recommendation and LightRF, LightFM to explore hybrid and collaborative filtering based recommender models.
collaborative-filtering lightrf nlp-machine-learning nltk-python recommendation-system recommender-system tf-idf-vectorizer
Last synced: 27 Mar 2025
https://github.com/armanjscript/fusion-rag
A powerful web-based application designed to answer questions based on the content of uploaded PDF documents. This project leverages the **Fusion-in-Decoder (FiD)** approach for **Retrieval-Augmented Generation (RAG)**, combining semantic similarity, technical term relevance, and recency to deliver accurate and contextually relevant responses
chroma chromadb fusion-rag langchain langchain-ollama ollama pypdf qwen2-5 rag rag-chatbot scikit-learn streamlit tf-idf-score tf-idf-vectorizer vector-database
Last synced: 10 Apr 2026
https://github.com/tressos-aristomenis/most-similar-string-to-given-query
In this project I am using the tf - idf algorithm and cosine similarity to find the similarity of two strings.
cosine-similarity cosine-similarity-scores document-frequency idf inverse-document-frequency query string-similarity term-frequency tf tf-idf tf-idf-vectorizer
Last synced: 24 Feb 2025
https://github.com/aarryasutar/hate_speech_detection
This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.
confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud
Last synced: 09 Apr 2026
https://github.com/lasithaamarasinghe/movie-recommender-system
This ML model recommends movies that may align with the user's preferences based on TF-IDF matrix.
jupyter-notebook machine-learning movie-recommendation movielens-dataset numpy pandas python regex scikit-learn tf-idf-vectorizer
Last synced: 12 Apr 2026
https://github.com/varunkhurana07/ecom-product-classifier
Text Classification: Predicting product categories from their text descriptions.
count-vectorizer logistic-regression naive-bayes-classifier natural-language-processing neural-network support-vector-machine text-classification tf-idf-vectorizer
Last synced: 06 Jul 2025
https://github.com/steveee27/multiclass-text-classification-of-presidential-campaign-tweets
Explore the Indonesian presidential campaign of 2024 through advanced text classification. This project transforms tweets into insights on national resilience using cutting-edge machine learning models and text preprocessing techniques. Dive into the intersection of politics and data science!
data-science machine-learning nlp text-classification tf-idf-vectorizer twitter-analysis
Last synced: 09 Oct 2025
https://github.com/rohithgowdam/cyberbullying-classification
The project deals with the identification of high accuracy model among the given models to detect the cyberbullying in text by training them with the given dataset which is preprocessed and vectorized with tf-idf
classification cyberbullying-detection decision-trees logistic-regression machine-learning mlproject naive-bayes-classifier preprocessing random-forest tf-idf tf-idf-vectorizer tweets vectorization
Last synced: 10 Jun 2026
https://github.com/codeasarjun/moviemingle
Movie Recommendation System is a web application designed to provide personalized movie recommendations to users based on their input movie titles.
collaborative-filtering content-based-filtering content-based-recommendation machine-learning machine-learning-algorithms movie-recomendation-system movie-recommendation-app movie-web-app recommender-system similarity-search tf-idf-vectorizer tmdb-api
Last synced: 16 Mar 2025
https://github.com/ashithapallath/abusive_comment_detection_malayalam
This project detects abusive and non-abusive comments in Malayalm Language using the MuRIL Bert model and compares its performance with TF-IDF + SVM and XGBoost. MuRIL outperforms classical models.
classical-machine-learning muril nlp tf-idf-vectorizer
Last synced: 07 May 2025
https://github.com/codeasarjun/rewordify
You will get other sentences for provided inputs
api end-to-end-machine-learning end-to-end-project nltk nltk-python paraphrase-generation python rest-api tf-idf-vectorizer
Last synced: 16 Mar 2025
https://github.com/armahdavi/nlp_document_tracking_construction_management
Summary of NLP work to automate construction management for non-compliance, punch list, and database creation.
bert bidirectional-lstm construction-management distill-bert glove-vectors nlp nlp-machine-learning recall-precision sklearn tf-idf-vectorizer word2vec
Last synced: 12 Jan 2026
https://github.com/prneidhardt/natural-language-processing
Twitter US Airline Sentiment
count-vectorizer sentiment-analysis text-processing tf-idf-vectorizer vectorization
Last synced: 03 Feb 2026
https://github.com/vlada-pv/prediction-sociolinguistic-data-based-on-the-diaries-texts-of-the-prozhito-project
The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.
author-profiling bag-of-words bilstm convol convolutional-neural-networks deep-learning diary-entries logistic-regression naive-bayes-classifier neural-networks recurrent-neural-networks sociolinguistics text-preprocessing text-vectorization tf-idf-vectorizer word-embeddings
Last synced: 13 Jul 2025
https://github.com/roaajadaa/content-based-recommender-system
Build a content-based recommender system that suggests items to users based on their preferences (favorite products)
cosine-similarity fastapi pymongo tf-idf-vectorizer
Last synced: 30 Apr 2026
https://github.com/chaitanyac22/cross_platform_product_mapping_algorithm_for_products
This repository contains a product ID mapping solution using TF-IDF vectorizer for weighted text vectors, Facebook AI Similarity Search (FAISS) for coarse filtering with cosine similarity, and Levenshtein distance for refined matching against the Blinkit catalog. Achieved 11.45% match for Zepto and 11.48% for Instamart.
exploratory-data-analysis faiss levenshtein-distance nlp numpy pandas similarity-search tf-idf-vectorizer
Last synced: 20 Mar 2025
https://github.com/sayande01/fake_news_detection_logisticregression
This project detects fake news using Logistic Regression with NLP techniques, including NLTK stopword removal, Porter Stemmer for text normalization, and TF-IDF vectorization for feature extraction. It achieves high accuracy and precision, offering a reliable solution to combat misinformation.
logistic-regression nltk porter-stemmer stopwords tf-idf-vectorizer
Last synced: 06 Apr 2025
https://github.com/yash1th-yerra/simple-search-engine-tfidf
A Flask-based Search Engine that allows users to search for songs using lyrics snippets! This project demonstrates how to implement a basic text search functionality with TF-IDF Vectorization and Cosine Similarity for ranking results.
search-engine tf-idf-vectorizer vector-search-engine
Last synced: 06 Nov 2025
https://github.com/2003harsh/sms-spam-classifier
ML model for spam detection using Naive Bayes & TF-IDF. Achieved 0.98 accuracy. Utilized Scikit-learn, Numpy, nltk. Implements NLP concepts. Explore precise spam classification effortlessly. #MachineLearning #SpamDetection 🚀✉️📱
naive-bayes-classifier natural-language-processing tf-idf-vectorizer
Last synced: 09 Jun 2026
https://github.com/razamehar/sentiment-analysis-using-deep-learning---machine-learning
Sentiment analysis on the IMDB dataset using Bag of Words models (Unigram, Bigram, Trigram, Bigram with TF-IDF) and Sequence to Sequence models (one-hot vectors, word embeddings, pretrained embeddings like GloVe, and transformers with positional embeddings).
bag-of-words glove-embeddings imdb-dataset multinomial-naive-bayes one-hot-encoded-vectors python sentiment-analysis sequence-to-sequence-models tensorflow term-frequency-inverse-document-frequency tf-idf-vectorizer transformer-architecture word-embeddings
Last synced: 18 Apr 2026
https://github.com/sabin74/movie_recommendation_system
A Python-based movie recommendation engine built using the MovieLens Dataset that supports:
collaborative-filtering content-based-filtering cosine-similarity movie-lens movie-recomendation-system pyhton3 scikit-learn tf-idf-vectorizer
Last synced: 24 Apr 2026
https://github.com/nirmaldeepponnada/codeclauseinternshipproject1
This project involves Customer Segmentation using K-Means clustering to group customers based on Recency, Frequency, and Monetary (RFM) analysis from the Online Retail dataset. It also performs Sentiment Analysis on Amazon Product Reviews using Natural Language Processing techniques & Logistic Regression to classify reviews as positive or negative.
kmeans logistic-regression numpy pandas python3 regular-expressions scikit-learn tf-idf-vectorizer
Last synced: 11 Apr 2026
https://github.com/alaazameldev/text-based-search-engine
Implementation of a search engine using TF-IDF and Word Embedding-based vectorization techniques for efficient document retrieval
chromadb fastapi gensim-word2vec nltk numpy precision-recall python scikit-learn tf-idf-vectorizer
Last synced: 20 Jan 2026