An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with tf-idf-vectorizer

A curated list of projects in awesome lists tagged with tf-idf-vectorizer .

https://github.com/jalajthanaki/basic_ecommerce_recomendation_system

This repository contains the code for basic kind of E-commerce recommendation engine. By using the concept of TF-IDF and cosine similarity, we have built this recommendation engine.

cosine-similarity recommendation-system tf-idf-vectorizer

Last synced: 09 Apr 2025

https://github.com/nivesayee/recipe-genie

Recipe Genie is a recipe recommendation system that recommends recipes to users based on the ingredients they have at home.

cosine-similarity cosine-similarity-scores recipe recipe-recommendation recipe-recommender recipe-search recipes recommendation-engine recommendation-system recommender-system tf-idf tf-idf-vectorizer

Last synced: 31 Oct 2025

https://github.com/rasti37/most-similar-string-to-given-query

In this project I am using the tf - idf algorithm and cosine similarity to find the similarity of two strings.

cosine-similarity cosine-similarity-scores document-frequency idf inverse-document-frequency query string-similarity term-frequency tf tf-idf tf-idf-vectorizer

Last synced: 18 Mar 2025

https://github.com/coderjolly/news-recommender

This is a news recommender system that uses beautiful-soup to scrape news articles, their categories and descriptions to create a data dump. It then uses word embedding techniques such tf-idf, word2vec for content based news recommendation and LightRF, LightFM to explore hybrid and collaborative filtering based recommender models.

collaborative-filtering lightrf nlp-machine-learning nltk-python recommendation-system recommender-system tf-idf-vectorizer

Last synced: 27 Mar 2025

https://github.com/armanjscript/fusion-rag

A powerful web-based application designed to answer questions based on the content of uploaded PDF documents. This project leverages the **Fusion-in-Decoder (FiD)** approach for **Retrieval-Augmented Generation (RAG)**, combining semantic similarity, technical term relevance, and recency to deliver accurate and contextually relevant responses

chroma chromadb fusion-rag langchain langchain-ollama ollama pypdf qwen2-5 rag rag-chatbot scikit-learn streamlit tf-idf-score tf-idf-vectorizer vector-database

Last synced: 10 Apr 2026

https://github.com/aarryasutar/hate_speech_detection

This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.

confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud

Last synced: 09 Apr 2026

https://github.com/lasithaamarasinghe/movie-recommender-system

This ML model recommends movies that may align with the user's preferences based on TF-IDF matrix.

jupyter-notebook machine-learning movie-recommendation movielens-dataset numpy pandas python regex scikit-learn tf-idf-vectorizer

Last synced: 12 Apr 2026

https://github.com/steveee27/multiclass-text-classification-of-presidential-campaign-tweets

Explore the Indonesian presidential campaign of 2024 through advanced text classification. This project transforms tweets into insights on national resilience using cutting-edge machine learning models and text preprocessing techniques. Dive into the intersection of politics and data science!

data-science machine-learning nlp text-classification tf-idf-vectorizer twitter-analysis

Last synced: 09 Oct 2025

https://github.com/rohithgowdam/cyberbullying-classification

The project deals with the identification of high accuracy model among the given models to detect the cyberbullying in text by training them with the given dataset which is preprocessed and vectorized with tf-idf

classification cyberbullying-detection decision-trees logistic-regression machine-learning mlproject naive-bayes-classifier preprocessing random-forest tf-idf tf-idf-vectorizer tweets vectorization

Last synced: 10 Jun 2026

https://github.com/ashithapallath/abusive_comment_detection_malayalam

This project detects abusive and non-abusive comments in Malayalm Language using the MuRIL Bert model and compares its performance with TF-IDF + SVM and XGBoost. MuRIL outperforms classical models.

classical-machine-learning muril nlp tf-idf-vectorizer

Last synced: 07 May 2025

https://github.com/armahdavi/nlp_document_tracking_construction_management

Summary of NLP work to automate construction management for non-compliance, punch list, and database creation.

bert bidirectional-lstm construction-management distill-bert glove-vectors nlp nlp-machine-learning recall-precision sklearn tf-idf-vectorizer word2vec

Last synced: 12 Jan 2026

https://github.com/vlada-pv/prediction-sociolinguistic-data-based-on-the-diaries-texts-of-the-prozhito-project

The repository contains notebooks created for collecting and preprocessing the corpus of diary entries and for experiments on creating models for predicting gender, age groups of authors and the time period of text creation.

author-profiling bag-of-words bilstm convol convolutional-neural-networks deep-learning diary-entries logistic-regression naive-bayes-classifier neural-networks recurrent-neural-networks sociolinguistics text-preprocessing text-vectorization tf-idf-vectorizer word-embeddings

Last synced: 13 Jul 2025

https://github.com/roaajadaa/content-based-recommender-system

Build a content-based recommender system that suggests items to users based on their preferences (favorite products)

cosine-similarity fastapi pymongo tf-idf-vectorizer

Last synced: 30 Apr 2026

https://github.com/chaitanyac22/cross_platform_product_mapping_algorithm_for_products

This repository contains a product ID mapping solution using TF-IDF vectorizer for weighted text vectors, Facebook AI Similarity Search (FAISS) for coarse filtering with cosine similarity, and Levenshtein distance for refined matching against the Blinkit catalog. Achieved 11.45% match for Zepto and 11.48% for Instamart.

exploratory-data-analysis faiss levenshtein-distance nlp numpy pandas similarity-search tf-idf-vectorizer

Last synced: 20 Mar 2025

https://github.com/sayande01/fake_news_detection_logisticregression

This project detects fake news using Logistic Regression with NLP techniques, including NLTK stopword removal, Porter Stemmer for text normalization, and TF-IDF vectorization for feature extraction. It achieves high accuracy and precision, offering a reliable solution to combat misinformation.

logistic-regression nltk porter-stemmer stopwords tf-idf-vectorizer

Last synced: 06 Apr 2025

https://github.com/yash1th-yerra/simple-search-engine-tfidf

A Flask-based Search Engine that allows users to search for songs using lyrics snippets! This project demonstrates how to implement a basic text search functionality with TF-IDF Vectorization and Cosine Similarity for ranking results.

search-engine tf-idf-vectorizer vector-search-engine

Last synced: 06 Nov 2025

https://github.com/2003harsh/sms-spam-classifier

ML model for spam detection using Naive Bayes & TF-IDF. Achieved 0.98 accuracy. Utilized Scikit-learn, Numpy, nltk. Implements NLP concepts. Explore precise spam classification effortlessly. #MachineLearning #SpamDetection 🚀✉️📱

naive-bayes-classifier natural-language-processing tf-idf-vectorizer

Last synced: 09 Jun 2026

https://github.com/razamehar/sentiment-analysis-using-deep-learning---machine-learning

Sentiment analysis on the IMDB dataset using Bag of Words models (Unigram, Bigram, Trigram, Bigram with TF-IDF) and Sequence to Sequence models (one-hot vectors, word embeddings, pretrained embeddings like GloVe, and transformers with positional embeddings).

bag-of-words glove-embeddings imdb-dataset multinomial-naive-bayes one-hot-encoded-vectors python sentiment-analysis sequence-to-sequence-models tensorflow term-frequency-inverse-document-frequency tf-idf-vectorizer transformer-architecture word-embeddings

Last synced: 18 Apr 2026

https://github.com/nirmaldeepponnada/codeclauseinternshipproject1

This project involves Customer Segmentation using K-Means clustering to group customers based on Recency, Frequency, and Monetary (RFM) analysis from the Online Retail dataset. It also performs Sentiment Analysis on Amazon Product Reviews using Natural Language Processing techniques & Logistic Regression to classify reviews as positive or negative.

kmeans logistic-regression numpy pandas python3 regular-expressions scikit-learn tf-idf-vectorizer

Last synced: 11 Apr 2026

https://github.com/alaazameldev/text-based-search-engine

Implementation of a search engine using TF-IDF and Word Embedding-based vectorization techniques for efficient document retrieval

chromadb fastapi gensim-word2vec nltk numpy precision-recall python scikit-learn tf-idf-vectorizer

Last synced: 20 Jan 2026