An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with tfidf-vectorizer

A curated list of projects in awesome lists tagged with tfidf-vectorizer .

https://github.com/soumyajit4419/ai_for_social_good

Using natural language processing to analyze the sentiments of people and detect suicidal ideation on online social content.

lstm natural-language-processing random-forest tfidf-vectorizer web-scraping

Last synced: 30 Apr 2025

https://github.com/ksdkamesh99/spam-classifier

A Natural Language Processing with SMS Data to predict whether the SMS is Spam/Ham with various ML Algorithms like multinomial-naive-bayes,logistic regression,svm,decision trees to compare accuracy and using various data cleaning and processing techniques like PorterStemmer,CountVectorizer,TFIDF Vetorizer,WordnetLemmatizer. It is implemented using LSTM and Word Embeddings to gain accuracy of 97.84%.

bag-of-words count-vectorizer decision-tree-classifier embeddings logistic-regression lstm-neural-networks multinomial-naive-bayes naive-bayes-classifier porter-stemmer sms-spam-detection support-vector-machines tfidf-vectorizer wordnetlemmatizer

Last synced: 12 May 2025

https://github.com/rjarman/bus-mama

The Bus-Mama is a bus tracking mobile application for the transportation of the students of BSMRSTU. It helps the students of our university by showing the available route, bus, and their exact location. This app includes real-time bus tracking which is going to solve a problem that university students have been facing for many years. Students are often seen missing their buses. Often they can't maintain the bus time. Since there are many buses in our university, students can easily catch a bus if they know where and when it will pass by. My goal is to track the buses and make hardware, mobile application, and machine learning solution to solve the issue. This way the students can get relief from missing the bus and use the buses efficiently. The main idea is to track the buses. GPS trackers will be attached to every bus that will give the current position of them and automatically sync on the server. The Bus-Mama mobile application will show every real-time position of those buses. This application will be installed on students' mobile phones and in this way the students can easily maintain their transportation. In this application, the current location of the bus can be seen through Google map. Every bus will have a specific marker on Google map and all the details about a specific bus will be shown by clicking on the marker. There will be seen about how far the bus is, from which direction it will come, how much time to reach the bus, how much time it will take if there is any traffic on road, etc. There is also a search option to know about any specific bus details. There is also a list of all buses with sufficient details that will help students to know about all the details. Every student will have an account through which they can access bus data. Another main objective is the Bus-Mama Chatbot in the Bengali language so that the students can communicate to know about the bus easily. For now, they can make conversation only about bus-related information. The Chatbot is not yet able to make conversation except bus-related questions. If anyone asks anything except bus-related questions, it cannot reply to the question rather it will give a tag to the question as a reply. As the Chatbot is created in the Bengali language, it has used the "trie" data structure in lemmatization. A library has been designed to lemmatize the Bengali words. Almost 63,205 Bengali words have been lemmatized by using the library to train the SVM machine learning model.

angular bangla chatbot distancematrixservice googlemap gps iot javascript lemmatization machine-learning mongodb nodejs nosql python scss socket svm tfidf-vectorizer trie typescript

Last synced: 09 Mar 2026

https://github.com/venkat-0706/twalyze

Twitter sentiment analysis project using machine learning to classify tweets and understand audience mood, opinions, and behavior trends in real-time.

logistic-regression machine-learning model-evaluation naive-bayes-classifier pandas python scikitlearn-machine-learning tfidf-vectorizer tokenization

Last synced: 07 May 2026

https://github.com/abhishtagatya/text2meme

🖼️ Text2Meme is a Meme Classification Experiment based on Caption Text (Implemented as a Discord Bot)

discord-bot kaggle linear-svc meme-generator tfidf-vectorizer

Last synced: 06 May 2025

https://github.com/chiraag-kakar/fund

An NLP model to detect fake news and accurately classify a piece of news as REAL or FAKE trained on dataset provided by Kaggle.

confusion-matrix fake-news machine-learning-algorithms news-article passive-aggressive-classifier project sklearn tf-idf tfidf-text-analysis tfidf-vectorizer tfidfvectorizer

Last synced: 07 May 2025

https://github.com/shaadclt/password-strength-checker-randomforestclassifier

This project is a password strength checker that utilizes a Random Forest Classifier to determine the strength of a given password. The Random Forest Classifier is trained on a dataset of passwords labeled with their corresponding strength levels.

random-forest-classifier tfidf-vectorizer

Last synced: 10 Oct 2025

https://github.com/faseeh41/ai_for_social_good

Using natural language processing to analyze the sentiments of people and detect suicidal ideation on online social content.

lstm natural-language-processing random-forest tfidf-vectorizer web-scraping

Last synced: 25 Feb 2026

https://github.com/psychomita/intellicv

IntelliCV is an AI-driven platform for efficient and intelligent resume screening.

jupyter-notebook numpy pandas python scikit-learn seaborn streamlit svc-model tfidf-vectorizer

Last synced: 19 Apr 2025

https://github.com/pmadruga/ds-jobindex

Machine learning techniques (NLP) applied to the jobindex.dk dataset

bert deep-learning machine-learning natural-language-processing nlp python pytorch tfidf-vectorizer transformers

Last synced: 19 Feb 2026

https://github.com/saheedniyi02/krecommend

A python package for creating content-based text recommender systems on pandas dataframes and SQLAlchemy tables

cosine-similarity flask-sqlalchemy nlp numpy pandas python recommendation-algorithms recommendation-engine recommendation-system recommender-system scikit-learn sql sqlalchemy sqlite3 tfidf-vectorizer

Last synced: 10 Mar 2026

https://github.com/ayusharma03/codsoft_internship

CodSoft Internship Projects containing, SMS Spam prediction Model, Customer Churn Prediction and Movie Classification System Based On the Movie's Summary

bag-of-words codsoft codsoft-internship codsoft-machine-learning codsoft-virtual-internship codsoftinternship machine-learning nltk tfidf-vectorizer

Last synced: 29 Jan 2026

https://github.com/tushard48/sms-spam-detection

This repository contains code and models for identifying spam SMS messages. It utilizes machine learning techniques to classify messages as spam or ham (non-spam).

machine-learning spam-detection streamlit tfidf-vectorizer

Last synced: 19 May 2026

https://github.com/antonio-f/multilabel-classification

Predict tags on StackOverflow with linear models - Week 1 assignment of Coursera's Natural Language Processing course from the Advanced Machine Learning Specialization.

bag-of-words logistic-regression multilabel-classification nltk-library one-vs-rest sklearn-library tfidf tfidf-vectorizer

Last synced: 30 Mar 2025

https://github.com/chengetanaim/sentimentanalysisforfinancialnewsnotebook

Building the model of a financial news sentiment classifier. Financial news headlines will be classified as positive, negative or neutral (from an investor point of view)

logistic-regression machine-learning natural-language-processing scikit-learn tfidf-vectorizer

Last synced: 04 May 2026

https://github.com/shubhamgoyal575/spam_detective

This project uses machine learning to classify messages as spam or ham based on text analysis. It includes data preprocessing, feature extraction (TF-IDF), and classification models like Logistic Regression and Naive Bayes for accurate spam detection. Built with Python and Scikit-Learn. 🚀

count-vectorizer data-analysis data-analytics data-cleaning data-preprocessing data-science data-visualization data-wrangling exploratory-data-analysis logistic-regression machine-learning machine-learning-algorithms naive-bayes natural-language-processing spam-detection tfidf-vectorizer

Last synced: 02 Jul 2025

https://github.com/singhkunwardeep/twitter_sentiment_analysis

A machine learning project to classify Twitter sentiment into positive, negative, categories using Logistic Regression and TF-IDF Vectorization. This project involves data preprocessing, feature extraction, model training, and evaluation of the sentiment of tweets. Built with Python, NLTK, and Scikit-learn.

logistic-regression nltk-python pandas-dataframe python3 scikit-learn tfidf-vectorizer

Last synced: 05 May 2026

https://github.com/priyam-hub/inside-medium

Inside-Medium is an AI-powered content recommendation engine designed to help readers find the most relevant and high-quality Medium articles based on their interests or selected articles.

natural-language-processing non-negative-matrix-factorization tfidf-vectorizer

Last synced: 25 Jul 2025

https://github.com/akarshankapoor7/automated-complaint-triage-system-using-nlp-and-machine-learning

Automated Severity Classification of Forum Complaints for Resolution Teams - Emphasizes automation and the end goal for resolution teams.

data-science datamining kmeans-clustering naive-bayes-classifier nlp tfidf-vectorizer

Last synced: 27 Mar 2025

https://github.com/parag000/content-based-movie-recommender

This project builds a content-based movie recommendation system using the TMDB dataset. By combining metadata features like cast, genres, and directors into a "metadata soup," it calculates movie similarity with vectorizers (Count) and cosine similarity. Ideal for learning content-based filtering and text vectorization techniques.

cosine-similarity countvectorizer recommendation-system scikit-learn tfidf-vectorizer vectorization

Last synced: 18 Apr 2026

https://github.com/armanjscript/rag-driven-generative-ai

Generative AI has made remarkable strides in creating human-like text, images, and even code. However, traditional models like GPT rely solely on pre-trained knowledge, which can lead to outdated, inaccurate, or hallucinated responses. Retrieval-Augmented Generation (RAG) addresses these limitations. We offer various types of RAG here

cosine-similarity langchain langchain-ollama qwen2-5 spacy spacy-nlp tfidf tfidf-vectorizer wordnet

Last synced: 09 Apr 2026

https://github.com/dynamicanupam/classification_of_customer_complaints_using_nlp

Create a solution that will help in identifying the type of complaint ticket raised by the customers of a multinational bank using NLP and Topic Modelling (NMF)

nlp nmf tfidf-vectorizer topicmodelling

Last synced: 22 Aug 2025

https://github.com/somjit101/nlp-stackeroverflow-tag-prediction

A multi-class classification problem where the objective is to read a question posted on the popular reference website, StackOverflow and predict the primary topics it deals with, i.e. tags which the question will be associated with.

bag-of-words countvectorizer logistic-regression multi-class-classification multiclass-logistic-regression natural-language-processing nlp one-vs-rest onevsrestclassifier stackoverflow-tags stemming text-mining tf-idf tfidf-vectorizer word-cloud

Last synced: 06 Mar 2025

https://github.com/lightxlk/smbdunlp

Making a project for detecting bots and fraud in social media using Deep Learning & NLP.

bot botdetection histgram-gradient-boosting kde nlp-machine-learning random-forest shap social-media tfidf-vectorizer

Last synced: 16 May 2025

https://github.com/srijaadhya12/project-to-interview

Your ultimate interview preparation for personal project related questions

flask gemini-api random-forest-classifier react sklearn tailwind tfidf-vectorizer

Last synced: 11 Apr 2026

https://github.com/tahirzia-1/nlp-textclassify

A hands-on NLP project comparing classic ML models (Naïve Bayes, SVM, Logistic Regression) and ANNs for text classification using SMS Spam and 20 Newsgroups datasets.

adam-optimizer ann cbow deep-learning lemmatization logistic-regression naive-bayes-classifier nlp nlp-machine-learning skipgram-algorithm svm tensorflow tfidf tfidf-vectorizer tokenization vectorization word2vec

Last synced: 12 Apr 2026

https://github.com/sambhu431/medicine-recommendation-system

The project aims to recommend medicines based on product uses similarity, side effects, and product review weightages. Powered by NLP techniques like TF-IDF and Cosine Similarity, the system provides intelligent and user-centric recommendations.

cosine-similarity flask machine-learning medicine medicine-recommendation medicine-search pickle recommendation-system tfidf tfidf-vectorizer

Last synced: 09 Apr 2025

https://github.com/supriya811106/twitter-sentiment-analysis

Analyzing the mood of tweets! We sort tweets on popular topics into positive, negative, or neutral categories to gauge public opinion. See what Twitter really thinks!

bernoulli-naive-bayes jupyter-notebook matplotlib nlp-machine-learning nltk numpy pandas python scikit-learn seaborn sentiment-analysis text-classification tfidf-vectorizer wordcloud

Last synced: 05 Apr 2026

https://github.com/ankulmaurya88/zomato-content-based-restaurant

Content-based restaurant recommendation system using Zomato data with TF-IDF and cosine similarity.

content-based-recommendation data-science machine-learning python3 recommender-system tfidf-vectorizer zomato

Last synced: 21 May 2026

https://github.com/arufonsekun/covid-topic-modeling

Covid news topic modeling using TFIDF feature extractor and non-negative matrix factorization (NMF)

covid-19 nlp spacy-nlp tfidf-vectorizer

Last synced: 17 Mar 2025

https://github.com/otuemre/emailphishingdetection

A real-time phishing email detection system using Machine Learning (SVM, Logistic Regression, Naive Bayes) with FastAPI backend and custom domain deployment.

cybersecurity fastapi huggingface machine-learning nlp real-time scikit-learn spam-detection svm-classifier tfidf-vectorizer

Last synced: 13 Apr 2026

https://github.com/rid17pawar/sentiment-analysis-model-experiments

Experiments in the field of Sentiment Analysis using ML Algorithms namely Logistic Regression, Naive Bayes along with tfidf, one hot encoding, bag of words vectorization. Different MLP and RNN models viz. LSTM, GRU, Bidirectional LSTM. Lastly, state of the art BERT model

bag-of-words bert bidirectional-lstm gru logistic-regression lstm ml-algorithms naive-bayes neural-networks one-hot-encoding rnn sentiment-analysis sentiment-classification text-vectorization tfidf tfidf-vectorizer transformer-architecture twitter-sentiment-analysis

Last synced: 30 May 2026

https://github.com/soumyapro/movie-recommendation-system

A machine learning model to recommend movies.This model is completely build in python using cosine similarity.This type of recommendation system, takes in a movie that a user currently likes as input. Then it analyzes the contents,popularity etc of the movie to find out other movies which have similar content.

cosine-similarity tfidf-vectorizer

Last synced: 01 Mar 2025

https://github.com/04bhavyaa/sms-spam-classification-system

A Machine Learning project that identifies whether a given message is spam or not. It uses Natural Language Processing (NLP) techniques (Stemming and TF-IDF Vectorization) for text transformation and a trained Multinomial Naive Bayes Classifier for predictions.

bernoulli-naive-bayes nlp-machine-learning nltk-library spam-classification stemming streamlit tfidf-vectorizer

Last synced: 24 Apr 2026

https://github.com/chandkund/sms-spam-detection

The goal is to develop a classification model that can accurately differentiate between spam and non-spam messages. This is crucial for applications like email filtering, SMS spam detection, and improving overall user experience by reducing the influx of unwanted or malicious content.

matplotlib nlp-machine-learning numpy pandas seaborn stemming tfidf-vectorizer tokenization

Last synced: 19 Jan 2026

https://github.com/soumyapro/sms-spam-classifier

A machine learning project that detects spam SMS messages using natural language processing techniques. The model analyzes text messages and accurately classifies them as spam or legitimate (ham).

multinomial-naive-bayes nltk sklearn tfidf-vectorizer tokenizer

Last synced: 15 Apr 2026

https://github.com/floressek/languageprocessinglab

Collection of Natural Language Processing laboratory exercises exploring text processing, linguistic analysis, and statistical methods.

pca-analysis tfidf-vectorizer word-frequency-analysis

Last synced: 31 Jan 2026

https://github.com/inddrsingh/email-sms-spam-classifier

Given a text, the ML model can predict whether it is "SPAM" or "NOT SPAM"

machine-learning-algorithms naive-bayes-classifier python3 tfidf-vectorizer vectorization

Last synced: 15 Feb 2026

https://github.com/rohansardar/speechflowguard

A machine learning web API that detects toxic language in user comments using classical ML

docker logistic-regression machine-learning python3 scikit-learn tf-idf tfidf-text-analysis tfidf-vectorizer

Last synced: 17 Apr 2026

https://github.com/somjit101/nlp-casestudy-amazon-fine-foods-review

Efficient Sentencing Encoding and Vectorization techniques with customer reviews on a product page of the popular E-Commerce website, Amazon using proven NLP techniques for the purpose of sentiment analysis.

amazon-fine-food-reviews amazon-fine-food-reviews-dataset featurization natural-language-processing nlp text-classification text-preprocessing tfidf-vectorizer vectorization word2vec

Last synced: 20 Apr 2026

https://github.com/abdelrahman-amen/active_learning_in_nlp_using_small_text_technique

This project demonstrates active learning for text classification using the Small-Text library on the IMDB dataset. A logistic regression model is trained iteratively, selecting the most uncertain samples for labeling with a smart query strategy. The approach highlights efficient learning with minimal labeled data, improving model performance.

activelearning imdb logistic-regression nlp python sklearn smalltext tfidf-vectorizer uncertainty

Last synced: 20 Apr 2026

https://github.com/jash271/news_classifier

Classifies news text to True or Fake

fake-news nlp pipelines pkl python sklearn tf-idf tfidf-vectorizer

Last synced: 20 Apr 2026

https://github.com/chandadiya2004/movie-recommendation-system

A Movie Recommendation System built using TfidfVectorizer and cosine similarity. The model processes a large dataset of movies and recommends similar movies based on a given input movie by analyzing textual features and calculating similarity scores.

cosine-similarity numpy pandas python sklearn tfidf-vectorizer

Last synced: 29 Apr 2026

https://github.com/kaustavmodak/business-aided-customer-feedback-assessment-system

A Streamlit-based sentiment analysis app that classifies customer reviews into Positive, Neutral, or Negative using a pre-trained ML mode

framework machine-learning matplotlib nlp nltk numpy pandas pickle regex scikit-learn seaborn sentiment-analysis streamlt tfidf-vectorizer

Last synced: 03 May 2026

https://github.com/chengetanaim/sentimentanalysisforfinancialnews

This is a Django application for predicting whether the sentiment of a financial news headline is positive, negative or neutral (from an investor point of view)

beautifulsoup4 chartjs django html-css-javascript logistic-regression machine-learning natural-language-processing scikit-learn tfidf-vectorizer webscraping

Last synced: 10 May 2026

https://github.com/kkeshav1101/nlp

Based on Natural Language Programming Lab coursework as a part of my degree

bag-of-words keras-tensorflow lstm nlp nltk python rnn-tensorflow tensorflow tfidf-vectorizer word2vec

Last synced: 11 May 2026

https://github.com/jash271/topic-modeling

Segregating Quora Questions to 8 Categories

nlp nmf-decomposition sklearn tfidf-vectorizer topic-modeling wordcloud

Last synced: 15 May 2026

https://github.com/ramneek2003/movie-recommendation-system

Developed as a warm-up project, this machine learning-based movie recommendation system utilizes cosine similarity to find and suggest similar films. By combining content-based filtering with popularity metrics, it provides personalized movie recommendations based on user preferences and trends, enhancing the overall user experience.

cosine-similarity machine-learning tfidf-vectorizer

Last synced: 15 May 2026

https://github.com/himank-khatri/spamham

NLP models trained using Bag of Words (BoW) and Term Frequency - Inverse Document Frequency (TF-IDF) to classify SMS as Spam or Ham.

bag-of-words naive-bayes-algorithm nlp nlp-machine-learning spam-detection tfidf-vectorizer

Last synced: 02 Mar 2025

https://github.com/souravxbera/movie-recommendation

Movie Recommender - A Smart Movie Recommendation System, built using NLP, TF-IDF & FastAPI

ml nlp-machine-learning tfidf-vectorizer

Last synced: 15 May 2026

https://github.com/abinashsahoo007/project-resume-classification

The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention.

corpus count-vectorizer label-encoding lemmitization machine-learning nltk part-of-speech-tagging resume-classification spacy stemming text-mining text-preprocessing textract tfidf-vectorizer tokenization wordcloud

Last synced: 02 Feb 2026

https://github.com/veer-parikh/amazon-review-helpfulness

A machine learning project that predicts the helpfulness of Amazon customer reviews using NLP techniques, TF-IDF, and a Random Forest classifier.

amazon-reviews machine-learning natural-language-processing random-forest sentiment-analysis tfidf-vectorizer

Last synced: 21 Jun 2025

https://github.com/beenish-ishtiaq/dep-task-2-spam-email-classifier

This project focuses on building a classifier to distinguish between spam and ham emails using Logistic Regression. Key steps include data preprocessing, feature extraction with TF-IDF vectorization, and model evaluation with accuracy metrics and a confusion matrix.

data-science email-filtering logistic-regression machine-learning natural-language-processing python spam-detection text-classification tfidf-vectorizer

Last synced: 17 May 2026

https://github.com/jeffreywijaya100/youtube-comment-textmining

scrapping data komentar youtube yang berkaitan dengan machine learning dalam bahasa Indonesia sebanyak minimal 100 komentar

api-key count-vectorizer machine-learning scraping text-mining tfidf-vectorizer word-cloud youtube-api-v3 youtube-comment-scraper

Last synced: 28 Mar 2025

https://github.com/pedrofracassi/insper-nlp-relevance-search

Busca por posts no Bluesky usando TFIDF para classificar relevância dos resultados

tfidf tfidf-vectorizer

Last synced: 27 Mar 2025

https://github.com/aasthaj28/ai-for-social-good

Using natural language processing to analyze the sentiments of people and detect suicidal ideation on online social content.

lstm natural-language-processing random-forest tfidf-vectorizer web-scraping

Last synced: 05 Apr 2025

https://github.com/sanjanahombal/study-on-sentiment-analysis

This project explores the optimal combination of Bag-of-Words and TF-IDF vectorization with Naive Bayes and SVM for sentiment analysis. It evaluates performance using accuracy, precision, recall, and F1-score, addressing ethical concerns like data privacy and bias to improve sentiment classification in real-world applications.

bag-of-words confusionmatrix googlecollab gridsearch-crossvalidation matplotlib-pyplot naive-bayes-classifier numpy pandas seaborn sklearn svm-classifier tfidf-vectorizer

Last synced: 07 Jan 2026

https://github.com/sanjurajveer/moview_review_analysis_nlp

Analysing movie reviews using NLP and categorising int good and bad

nlp-machine-learning nltk-python perplexity tfidf-vectorizer tsne-algorithm

Last synced: 25 Jun 2025

https://github.com/sridharyadav07/ai--powered-task-management-system

An intelligent Task Management System that integrates Sentiment Analysis, Task Optimization, and Forecasting to streamline project and task handling. This AI-powered tool is designed to assist teams and project managers in making data-driven decisions by understanding emotional context, forecasting productivity, and optimizing workload distribution

arima flask joblib jupyter-notebook naive-bayes-classifier nltk numpy pandas pickle-file python randomforestregressor scikit-learn stopwords-removal streamlit tfidf-vectorizer

Last synced: 08 Apr 2026

https://github.com/bhaskrr/restaurant-reviews-5-class-rating-prediction-

This repo contains the dataset and notebook for the kaggle restaurant reviews five class rating prediction

kaggle-dataset machine-learning natural-language-processing randomoversampler rating-prediction tfidf-vectorizer

Last synced: 27 Jun 2025

https://github.com/pthmhatre/stylescribe-using-generative-adversarial-network

A fashion AI-based model capable of generating images from textual descriptions. The model should take natural language text as input and generate images that visually represent the given text. This text-to-image generation system bridges the gap between textual descriptions and visual content.

deep-neural-networks flask-application generative-adversarial-network generative-ai googlecloudplatform hyperparameter-tuning keras-tensorflow neural-networks nlp os pillow rdp-connection scipy sklearn-metrics spacy-nlp texttoimage tfidf-vectorizer

Last synced: 30 Jan 2026

https://github.com/snehawk20/log_anomaly_detection

Detecting anomalous log entries

logistic-regression tfidf-vectorizer

Last synced: 10 Sep 2025

https://github.com/ahmad-ali-rafique/mail-spam-detection-ml

This repository contains a machine learning project for email spam detection. It includes data preprocessing, model training, evaluation, and deployment using Python and scikit-learn.

artificial-intelligence data-science dataanalysis datavisualization linear-regression machine-learning modeling scikitlearn-machine-learning spam-detection tfidf-vectorizer

Last synced: 05 Mar 2025

https://github.com/kush1912/text-classification

This is one of the Projects which was done in interest to learn the difference between the different classification algorithm and derive a solid conclusion from that. It scrap sthe data from youtube and related to six different classes and then by using different classification algorithm it classifies them.

beautifulsoup4 naive-bayes-algorithm neural-network randomforest selenium-webdriver svm-classifier text-classification tfidf-vectorizer webscrapping xgboost-algorithm youtube

Last synced: 09 Apr 2026