An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/gurpreet0022/nlp_exploration

This repository explores various Natural Language Processing (NLP) techniques using the NLTK library in Python. It demonstrates these techniques on a sample dataset and performs sentiment analysis on movie reviews.

beginner-friendly nlp nlp-machine-learning nltk scikit-learn

Last synced: 30 Apr 2026

https://github.com/coder5omkar/logistic-regression-customer-churn-prediction

This project uses Logistic Regression to predict customer churn in the telecom industry. To run, clone the repository, install dependencies, and run the Jupyter notebook for full analysis and predictions.

logistic-regression ml pandas scikit-learn seaborn statistics

Last synced: 20 Apr 2026

https://github.com/abdiasarsene/customer_segmentation_for_a_marketing_campaign

Use unsupervised learning techniques to segment a company’s customers into distinct groups in order to personalize marketing campaigns. To ultimately propose specific marketing strategies for each customer segment based on the insights obtained.

acp kmeans-clustering matplotlib pandas plotly python scikit-learn seaborn

Last synced: 08 Mar 2025

https://github.com/paulinhok14/property-insight-sample

Property Insight is an app that helps you identify amazing real estate opportunities, leveraging AI models to estimate a property Fair Value and compare to current prices.

ai docker fastapi python pytorch real-estate scikit-learn streamlit

Last synced: 11 Apr 2026

https://github.com/jo-minseok/global-warming-100year

🌡️ 2100년까지의 지구 기온, 해수면, 북극 빙하, 탄소 예측 ML [완료]

arima-model global-warming machine-learning matplotlib numpy pandas scikit-learn seaborn

Last synced: 11 Apr 2026

https://github.com/ahmedheakl/diabetes_classification_svm

Classifying patients to know if they have diabetes using Supporting Vector Machine Model.

machine-learning python scikit-learn

Last synced: 13 Apr 2026

https://github.com/daniil-leshchev/spotify_ml

Track Popularity Prediction based on Spotify Data

eda keras ml pandas scikit-learn

Last synced: 12 Apr 2026

https://github.com/kishankrishna1/spam-classifier

Developed a Machine Learning-based Spam Classifier using Multinomial Naive Bayes to identify and filter spam messages with high precision

matplotlib numpy pandas python scikit-learn seaborn

Last synced: 02 Apr 2026

https://github.com/bishopce16/credit_risk_analysis

An analysis to build and evaluate a dataset from LendingClub to discover credit risk using Supervised Machine Learning Algorithms.

imbalanced-learning jupyter-notebook machine-learning machine-learning-algorithms pandas python scikit-learn visual-studio-code

Last synced: 11 Apr 2026

https://github.com/anibalalpizar/python-machine-learning-example

This code reads and preprocesses a dataset for classification using pandas, numpy, matplotlib and scikit-learn. The dataset is split into three parts for training, validation and testing. The data is then scaled and optionally oversampled for balanced classes.

machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/vishal-verma-96/pre-owned-car-price-prediction-using-streamlit-app

Capstone Project by skill Academy- Exploratory Analysis, Visualization and Prediction of Used Car Prices. Deploying the highest-scoring model with Streamlit web app

data-analysis data-science jupyter-notebook machine-learning machine-learning-algorithms matplotlib numpy pandas python3 regression-algorithms scikit-learn seaborn streamlit

Last synced: 11 Apr 2026

https://github.com/nfordumass/hot-seat

Machine Learning Dashboard and Engine for Predicting NFL Coach Firings

astro machine-learning react scikit-learn supabase typescript

Last synced: 09 Mar 2025

https://github.com/krish57-bit/diabetes-prediction-

A comprehensive machine learning pipeline to predict the onset of diabetes using the PIMA Indian Diabetes dataset. This includes data cleaning, visualization, outlier detection, standardization, SMOTE-based imbalance handling, and multiple classification algorithms (Logistic Regression, Naive Bayes, and KNN).

classification data-science diabetes healthcare jupyter-notebook machine-learning python scikit-learn smote

Last synced: 07 May 2026

https://github.com/nikhilchaudhary1/commodity-price-prediction

A Python application for predicting commodity prices (e.g., Pulses, Bread) based on state, city, year, and month using a Linear Regression model. Trained on over 1 million government dataset entries, featuring efficient data processing and prediction capabilities.

commodity-price-prediction data-processing linear-regression machine-learning pandas python scikit-learn

Last synced: 20 Apr 2026

https://github.com/itsdawei/qsc-airplane

A regression model predicting airline stock prices based on public flight data.

regression-analysis scikit-learn statsmodels stock-price-prediction

Last synced: 17 May 2026

https://github.com/balavenkatesh3322/loan-default-prediction

An end-to-end machine learning project to predict loan default risk. Includes Exploratory Data Analysis (EDA), feature engineering, a Gradient Boosting model, and a proposed system architecture for deployment.

data-science deep-learning feature-engineering gradient-boosting loan-default-prediction machine-learning scikit-learn tutorial-exercises

Last synced: 17 May 2026

https://github.com/anandparayil/sign-language-translator

Real-Time AI-Based Sign Language Translator using MediaPipe, Random Forest, and Tkinter GUI.

jupyter-notebook mediapipe opencv python pyttsx3 scikit-learn tkinter-gui

Last synced: 07 Apr 2026

https://github.com/arnoldchrisoduor1/machines

Testing the limits of machines

pytorch scikit-learn tensorflow

Last synced: 11 Apr 2026

https://github.com/Gamowy/Music-Classification

Music genre classification using k nearest neighbors classifier based on gtzan dataset

machinelearning python scikit-learn university-assignment

Last synced: 17 Jul 2025

https://github.com/ayushtiwari134/machine_learning_models

A repo where i upload all the models which i train during my journey of learning Machine Learning from scratch

linear-regression logistic-regression machinelearning matplotlib numpy pandas python random-forest scikit-learn

Last synced: 11 Apr 2026

https://github.com/broodhoney/titanic-ml-from-disaster

This repository contains my analysis and solutions for the Titanic: Machine Learning from Disaster competition on Kaggle. The notebook explores the dataset, performs extensive Exploratory Data Analysis (EDA), applies feature engineering techniques, and builds predictive models to determine survival outcomes based on passenger data

machine-learning numpy pandas python scikit-learn scikitlearn-machine-learning

Last synced: 11 Apr 2026

https://github.com/shru924/ecommerce_customer_behavior_analysis

A machine learning project that analyzes and segments e-commerce customers based on behavior patterns using Python, Random Forest, and data visualization.

customer-segmentation data-analysis jupyter-notebook machine-learning matplotlib pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/jgavinb/customer-churn-ml

Customer Churn prediction using various ML models. Interactive predictions via Streamlit webapp.

joblib machine-learning pkl python scikit-learn streamlit streamlit-application streamlit-webapp

Last synced: 11 Apr 2026

https://github.com/ansh-info/industrial-scale-penicillin-simulation

Optimizing industrial-scale penicillin production using machine learning and data analysis.

jupyter-notebook machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/ahmedshahriar/restaurant-menu-pricing

Predict menu prices from 5M+ UberEats menus with an end-to-end MLOps pipeline: crawl → DWH → curate → train → deploy on Azure ML (MLflow) via APIM & CLIs.

azure azureml bert-embeddings docker fastapi github-actions huggingface machine-learning mlflow mlops optuna python restaurant-menu scikit-learn scrapy tensorflow transformers uber-eats web-crawler

Last synced: 03 Feb 2026

https://github.com/sshBuilder/Movie-recommendation-system

The primary goal of this project is to provide personalized movie recommendations to users based on their preferences and the characteristics of the movies. This is achieved through a multi-step process involving data preprocessing, text vectorization, and recommendation generation.

anaconda-environment data-science jupyter-notebook machine-learning movie-recommendation movies pandas python3 recommendation-system recommender-system scikit-learn scikitlearn-machine-learning

Last synced: 28 Apr 2025

https://github.com/rakibhhridoy/differentprojects

Some of my learning projects that I practice to launch in data science. Not all, but some of few that was stored in my local repository. It can be useful for beginner data science enthusiast. Explore and learn!

data-science deep-learning machine-learning mathematics matplotlib numpy pandas python scikit-learn seaborn statistics

Last synced: 11 Apr 2026

https://github.com/rakibhhridoy/appliedmachinelearninghousing-regression

Let's take the Housing dataset which contains information about different houses in Boston. This data was originally a part of UCI Machine Learning Repository and has been removed now. We can also access this data from the scikit-learn library. The objective is to predict the value of prices of the house using the given features.

deep-learning housing-market housing-prices machine-learning numpy pandas python real-estate regression scikit-learn

Last synced: 05 Apr 2026

https://github.com/amnydv17/landmark-detection

This project aims to leverage the power of deep learning models to automatically detect and pinpoint landmarks such as famous monuments, buildings, natural landmarks, and other recognizable structures within images.

machine-learning matplotlib numpy pandas python3 scikit-learn seaborn tensorflow

Last synced: 11 Apr 2026

https://github.com/dllllb/ds-pipeline

Data Science model pipeline based on SciKit-Learn Estimator API

data-science machine-learning python scikit-learn

Last synced: 16 Apr 2026

https://github.com/amiriiw/text_classification

Welcome to the Text Classification Project! This project is designed to train a model for classifying texts based on their emotional content and then using it to categorize new texts into corresponding emotional categories.

keras numpy pandas pickle scikit-learn tensorflow text-classification

Last synced: 20 Jan 2026

https://github.com/samuele-lolli/data-analytics-techniques

A practical approach to data analytics pipeline.

numpy pandas pytorch scikit-learn

Last synced: 11 Apr 2026

https://github.com/ismaelvr1999/air-quality-clustering

This project focuses on analyzing air quality data and categorizing it into clusters using the K-Means algorithm.

jupyter-notebook machine-learning matplotlib pandas python scikit-learn

Last synced: 05 Mar 2026

https://github.com/luona-zhang/kaggle-data-science-competitions

This repository contains code developed for participating in Kaggle Data Science competitions.

fitting-algorithm machine-learning model-evaluation numpy pandas scikit-learn seaborn tensorflow

Last synced: 07 Apr 2026

https://github.com/dinuka-rp/python-machine-learning

This repository contains the projects that I followed to learn Machine Learning with Python

machine-learning python scikit-learn

Last synced: 11 Apr 2026

https://github.com/jofaval/iris-flowers

Multilabel Classification of the famous Iris Flowers Dataset from Ronald Aylmer Fisher in 1936

classification data-analysis data-science data-visualization google-colab iris-flowers kaggle machine-learning python scikit-learn xgboost

Last synced: 05 Apr 2026

https://github.com/hazim-hf/machine-learning

This repository contains materials and implementations related to machine learning concepts, techniques, and algorithms. The focus is on building self-learning computer systems that improve through experience and data. The course explores fundamental and advanced topics in machine learning, with applications in Big Data across various fields.

decision-trees neural-network pytorch reinforcement-learning scikit-learn support-vector-machine tensorflow unsupervised-learning

Last synced: 07 Apr 2026

https://github.com/vyjayanthipolapragada/fraud_detection_creditcard

Detecting the fraudulent credit card transactions by training Decision Tree model using Scikit-learn and SnapML

classification-model data-preprocessing decision-tree-classifier kaggle-dataset machine-learning numpy pandas python scikit-learn snapml time tree-model

Last synced: 11 Apr 2026

https://github.com/divs-spec/skysync

SkySyncSwarm is a unified drone swarm simulation and control platform that merges the best of UAV simulators, swarm coordination libraries, deep learning models, and autonomous mission planning systems into one cohesive project.

ai-agents flask matlab python3 rrt scikit-learn scipy tcp

Last synced: 11 Apr 2026

https://github.com/infinitode/scikit-learn-decisiontreeclassifier-updater

An open-source tool to convert older Scikit-learn DecisionTreeClassifier models to the newer version.

ai classifier cli converter decisiontree python scikit-learn sklearn tools

Last synced: 31 Mar 2025

https://github.com/dharma-acha/resnet18_imageclassification_cnn

In this part of the project, we implement ResNet-18 from scratch using PyTorch and train it on an image dataset to achieve over 75% accuracy. We apply techniques to prevent overfitting and optimize performance, aiming for an accuracy of 80% or higher.

matplotlib numpy python3 pytorch scikit-learn seaborn

Last synced: 11 Apr 2026

https://github.com/mborrillo/ranking-ciudades-espana

Sistema end-to-end de análisis multicriterio que evalúa 50 ciudades españolas en calidad de vida mediante datos oficiales

business-intelligence data-analysis multi-criteria-decision-analysis pandas python3 quality-of-life ranking-system scikit-learn scoring-models

Last synced: 13 Jan 2026

https://github.com/tejaswirupa/early-prediction-of-diabetes-risk-using-machine-learning

Built a predictive model using CDC health data to identify individuals at risk of developing diabetes. Achieved 90.6% F1-score using Logistic Regression and revealed key health indicators like BMI and blood pressure as top predictors.

data-science datacleaning exploratory-data-analysis modelevaluation preprocessing-data python scikit-learn supervised-machine-learning

Last synced: 15 Jul 2025

https://github.com/agnivchtj/us-census-classifier

Find the optimal classification algorithm that can predict salaries above $50k, based on US Census data.

census-data decision-tree-classifier jupyter-notebooks knn-classifier logistic-regression naive-bayes-classifier python scikit-learn svm-classifier

Last synced: 07 May 2026

https://github.com/shakeel-data/amazon-sales-forecasting-python-bigquery-ml

An end-to-end analytics project using Python, SQL, & ML to forecast Amazon sales and segment customers. We build predictive models (LightGBM, Prophet) and clustering (KMeans) to deliver actionable insights for revenue growth and targeted marketing.

bigquery kmeans-clustring lightgbm linear-regression prophet-facebook scikit-learn

Last synced: 09 May 2026

https://github.com/snghrsw/kikagaku-ml-learning

Pythonで単回帰分析と重回帰分析、ディープラーニングで回帰と分類

liner-regestion multiple-regression numpy pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/epomatti/python-machine-learning

Simple examples of ML using Python

machine-learning python scikit-learn

Last synced: 11 Apr 2026

https://github.com/anthippi/naive-bayes-imdb-classification

A custom Naive Bayes classifier for sentiment analysis of movie reviews from the IMDb dataset, utilizing feature selection based on Information Gain and comparing its performance with scikit-learn's BernoulliNB.

classification imdb matplotlib naive-bayes-classifier numpy pandas scikit-learn sklearn

Last synced: 09 Apr 2026

https://github.com/praatibhsurana/breast-cancer-prediction-svm

A SVM classifier coded in Python using Scikit-Learn to classify whether a patient's tumor is malignant or benign.

kaggle-dataset linear-classifier machine-learning-algorithms python scikit-learn svm-classifier

Last synced: 16 May 2026

https://github.com/rohan3122k/predicting-energy-consumption-using-ann-with-pca

This project leverages Artificial Neural Networks (ANNs) with Principal Component Analysis (PCA) to predict energy demand efficiently. By reducing dimensionality while retaining 95% variance, the model achieves an R² score of 0.9815 and MAE of 523.71 MW. Deployed via Streamlit & GitHub.

artificial-neural-networks electricity git keras machine-learning pca predictive-modeling python scikit-learn streamlit-webapp tensorflow

Last synced: 10 Apr 2026

https://github.com/pramodyasahan/learn-ml

This repository serves as both a personal learning diary and a resource for others interested in understanding and applying machine learning concepts. The projects are categorized based on the type of ML model and are implemented in Python using libraries like scikit-learn, pandas, and numpy.

classification clustering machine-learning matplotlib numpy pandas regression scikit-learn supervised-learning unsupervised-learning

Last synced: 07 Apr 2026

https://github.com/joemathew2004/cancer_prediction

This project implements a machine learning model (Logistic Regression) trained on the Breast Cancer dataset to predict if a tumor is benign or malignant. It includes a Python script for training the model, a terminal-based prediction tool, and a web application built with Streamlit for interactive predictions.

cancer-prediction classification csv joblib logistic-regression machine-learning python scikit-learn streamlit web-application

Last synced: 07 May 2026

https://github.com/aneeshmurali-n/ann-diabetes-prediction

Predicting diabetes progression using an Artificial Neural Network (ANN). This project leverages the scikit-learn diabetes dataset for training and evaluation. Includes data preprocessing, model building, and performance visualization.

ann data-preprocessing data-visualization deep-learning diabetes-prediction exploratory-data-analysis keras machine-learning matplotlib neural-network numpy pandas regression scikit-learn seaborn tensorflow visualization

Last synced: 07 Apr 2026

https://github.com/abdiasarsene/developpement_tableau_de_bord_de_la_chaine_approvisionnement_power_bi

Développer une solution complète pour visualiser, analyser et prédire des données de la chaîne d'approvisionnement.

ci-cd docker fastapi github-actions mysql-database randomizedsearchcv scikit-learn seaborn-plots

Last synced: 23 Jun 2025

https://github.com/nirmaldeepponnada/codeclauseinternshipproject1

This project involves Customer Segmentation using K-Means clustering to group customers based on Recency, Frequency, and Monetary (RFM) analysis from the Online Retail dataset. It also performs Sentiment Analysis on Amazon Product Reviews using Natural Language Processing techniques & Logistic Regression to classify reviews as positive or negative.

kmeans logistic-regression numpy pandas python3 regular-expressions scikit-learn tf-idf-vectorizer

Last synced: 11 Apr 2026

https://github.com/richardbmk/datascience_machinelearning

projects related with data science and machine learning projects.

data-science machine-learning matplotlib numpy pandas scikit-learn scipy seaborn

Last synced: 11 Apr 2026

https://github.com/agnivchtj/ann

Develop an Artificial Neural Network that can classify inputs based on a number of features

backpropagation-algorithm jupyter-notebooks python scikit-learn

Last synced: 07 May 2026

https://github.com/dmarks84/coursework_project_ml-classifier-eval-selection

Project for University of Michigan Applied Data Science Specialization -- Predicted viewer engagement based on features related to video metrics; evaluated a large set of classifiers under different scoring metrics to select the "optimal" one.

classification cross-validation data-modeling data-reporting data-visualization databases dataframes eda grid-search matplotlib numpy pandas python scikit-learn statistics supervised-ml

Last synced: 02 Apr 2026

https://github.com/akapich/clustermatic

Python AutoML library for clustering tasks

automl clustering machine-learning scikit-learn

Last synced: 11 Feb 2026

https://github.com/hariprasath-v/av-dataverse-hack---insurance-claim-prediction

Create a machine learning model to predict if the policyholder will file a claim in the next 6 months or not based on the set of car and policy features.

analyticsvidhya classification exploratory-data-analysis f1-score matplotlib numpy pandas python randomforest-classification scikit-learn seaborn shap

Last synced: 11 Apr 2026

https://github.com/lorenzorottigni/dl-lending-club

Deep Learning python bootcamp: deep learning on Lending Club dataset

deep-learning ipynb keras machine-learning numpy pandas python scikit-learn seaborn tensorflow

Last synced: 11 Apr 2026

https://github.com/fikri-rouzan/energy-consumption-prediction

Final Project for the AI/ML Weekly Class by Google Developer Group on Campus (GDGoC) UIN Jakarta.

jupyter-notebook matplotlib numpy pandas python scikit-learn scipy seaborn

Last synced: 07 Apr 2026

https://github.com/pb319/california_house-price-prediction

This is going to be my first end to end ML project implementation covering all required stages taking guidence from book called "Hands On Machine Learning".

evaluation-metrics hyperparameter-tuning jupyter-notebook kfold-cross-validation machine-learning matplotlib numpy pandas python scikit-learn seaborn train-test-split

Last synced: 11 Apr 2026

https://github.com/atharvapathak/size_estimator_project

This project in Python aims to provide a tool for estimating the size of objects in images or videos. Using computer vision techniques, the project analyzes the input media, detects objects of interest, and provides an estimation of their size based on known reference points or objects.

cicd cnn opencv python pytorch rnn scikit-learn sql tensorflow

Last synced: 11 Apr 2026

https://github.com/nicolas-giacomelli/modelo-previsao-colesterol-com-gradio

Modelo de predição do colesterol com base em informações Modelo conta com pipelines para tratamento de dados Disponibilizado via Gradio

gradio machine-learning matplotlib pandas pingouin python3 saude scikit-learn scipy seaborn

Last synced: 11 Apr 2026

https://github.com/devinw03/movie-genre-nlp

🎬 Classify movie genres from plot summaries using various models, including Transformers, with clear EDA and MLflow tracking for reproducible results.

cosine-similarity countvectorizer datascience distilroberta gru huggingface imdb machine-learning multi-label-classification nlp python pytorch recommendation-engine scikit-learn slack tabulate text-classification word2vec

Last synced: 11 Apr 2026

https://github.com/gayathri2200/car-price-prediction---machine-learning

Car price prediction Machine Learning --- Which is used to predict the price of used cars based on the features.

data-science machine-learning modeldeployment pandas price-prediction python regression scikit-learn streamlit visual-studio visualization

Last synced: 11 Apr 2026

https://github.com/ejw-data/ml-classification-exoplanet

Classification of planets identified by Kepler telescope using multiple models and tuning using GridSearchCV

classification python scikit-learn

Last synced: 09 May 2026

https://github.com/ejw-data/ml-clustering-personality

Analaysis of the big-5 personality test survey results with clustering techniques.

clustering machine-learning python scikit-learn unsupervised-learning

Last synced: 04 May 2026

https://github.com/ejw-data/ml-classification-grants

Compares several machine learning classification models including a neural network to determine whether to approve or reject a grant applicant

classification neural-network python scikit-learn

Last synced: 10 May 2026

https://github.com/muhdhammad/machine-learning

Crafted for hands-on learning and implementation of ML with scikit-learn

data-science jupyter-notebook machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 07 Apr 2026

https://github.com/kaguya163/marketing_campaigns

Анализ маркетинговой эффективности в спортивном ритейле.

ab-testing machine-learning matplotlib numpy pandas python scikit-learn scipy sqlite3

Last synced: 11 Apr 2026

https://github.com/nauxqouh/python-for-data-science-labs

This repo contains weekly practical codes for my Python for Data Science course at university.

data-science jupyter-notebook numpy pandas python pytorch scikit-learn

Last synced: 11 Apr 2026

https://github.com/dyarleniber/hands-on-machine-learning

This repository contains code examples, exercises, and projects related to the concepts covered in the book "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition" by Aurélien Géron.

ai artificial-intelligence keras machine-learning matpolotlib numpy pandas scikit-learn tensorflow

Last synced: 11 Apr 2026

https://github.com/sudarsann27/basic_machine_learning_algorithms

Basic Machine learning algorithms using scikit-learn and other fundamental libraries

data-science data-visualization ensemble-model kaggle numpy pandas scikit-learn supervised-machine-learning

Last synced: 20 Jan 2026

https://github.com/stella4444/linear-regression

learning about linear regression (currently a work in progress) ~ working with data

linear-regression machine-learning numpy scikit-learn

Last synced: 20 Jan 2026

https://github.com/abdiasarsene/healthpredict-api-smart-medical-diagnosis-system

Ce projet propose une API intelligente construite avec FastAPI pour prédire des maladies à partir de données médicales de patients. L'application repose sur un modèle de machine learning (Logistic Regression) géré via MLflow, et peut facilement être déployée grâce à Docker.

bentoml docker-compose dockerfiles jenkinsfiles mlflow pandas ray-serve scikit-learn taskfile

Last synced: 11 Apr 2026

https://github.com/vickshan001/tweet-sentiment-classifier-nlp-svm-project

NLP coursework project using SVM to classify tweet sentiments. Features custom preprocessing, error analysis, and cross-validation.

natural-language-processing nlp python scikit-learn sentiment-analysis svm text-classification tweets

Last synced: 31 Mar 2025

https://github.com/adi3042/sensor-fault-detection

🔍⚙️ Ensure Reliable Operations! Detect anomalies and prevent disruptions with our Sensor Fault Detection system. Explore advanced classification and regression techniques to identify and address sensor faults effectively. Your path to robust and accurate sensor data begins here! 🚨🔧 SensorFaultTech

classification css datetime fault-detection flask functools html ipykernel jupternotebook machine-learning numpy pandas python3 readme regression scikit-learn sensor setuptools venv

Last synced: 11 Apr 2026