An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/cinnaavox/loan-prediction

Machine Learning project predicting loan approvals using Decision Tree Classification. Includes data cleaning, feature engineering, model evaluation and key business insights.

decision-trees matplotlib numpy pandas python scikit-learn seaborn

Last synced: 14 Apr 2026

https://github.com/khushirajurkar/exoplanet-habitability-prediction-model

Predicts whether an exoplanet is habitable using ML. Handles class imbalance with ADASYN, tests multiple models, and saves the best one. Includes confusion matrices, ROC curves, and a clean Jupyter notebook

adasyn astroinformatics confusion-matrix exoplanets logistic-regression machine-learning multiclass-classification python roc-curve scikit-learn smote

Last synced: 06 May 2026

https://github.com/katjaweb/king-county-house-price-prediction

This project aims to predict house prices based on various features such as square footage, number of rooms or location.

machine-learning python regression scikit-learn

Last synced: 19 Jan 2026

https://github.com/afkewolczyk/data_science_bootcamp

A data science project to learn data science essentials such as: pandas, Matplotlib, Scikit learn

ai data-science machine-learning pandas scikit-learn

Last synced: 07 May 2026

https://github.com/gabrielmazzotta/nlp-clustering--movie-similarity-from-plot-summaries

A Python-based movie recommendation system leveraging NLP and clustering techniques. This project includes data processing, vectorization of plot summaries, and the implementation of recommendation algorithms to suggest similar movies based on user input.

clustering cosine-similarity hierarchical-clustering kmeans lemmatization nlp recommendation-engine scikit-learn similarity-score spacy tokenization

Last synced: 21 Jan 2026

https://github.com/nisch-mhrzn/house_prediction

This project predicts house prices using data exploration, feature engineering, and machine learning models like Linear Regression and Random Forest. It demonstrates how to optimize models and evaluate their performance to accurately forecast house prices.

matplotlib numpy pandas python scikit-learn seaborn

Last synced: 08 Apr 2026

https://github.com/ayorick23/python-data-science-cheat-sheet

Guía rápida y práctica de sintaxis, comandos y funciones esenciales de Python para Ciencia de Datos. Perfecta para recordar cómo usar las librerías más comunes como NumPy, Pandas, Matplotlib y Scikit-learn en tus análisis diarios.

cheat-sheet data-analysis data-science data-visualization deep-learning jupyter-notebook machine-learning matplotlib ml numpy pandas python scikit-learn scipy seaborn statistics sympy tensorflow

Last synced: 07 Apr 2026

https://github.com/harris-giki/cancerdetectionmodel_ml

Simple Logistic Regression and Neural Network powered Machine Learning models that predicts whether a breast tumor is malignant or benign based on input features extracted from a breast cancer dataset.

cancer-detection development keras keras-tensorflow logistic-regression machine-learning neural-network scikit-learn streamlit tensorflow

Last synced: 13 Apr 2026

https://github.com/farhad-here/predict_student_performance

Predict Student Performance, is a data analysis and machine learning project aimed at predicting students' final performance (g3) based on demographic, family, and academic features. The project supports both Regression (predicting exact grades) and classification (Pass/Fail categories).

classification data-analysis data-visualization linear-regression machine-learning numpy pandas postgresql powerbi scikit-learn streamlit

Last synced: 14 Apr 2026

https://github.com/icepanorama/internship-visualizations-and-demonstrations

A collection of some of the programs that I've written over the course of my internship.

artificial-intelligence machine-learning matplotlib numpy pandas python3 pytorch scikit-learn

Last synced: 14 Apr 2026

https://github.com/haseeeb21/machine-learning-models

Machine Learning Models trained on Scikit-learn datasets. This repository contains the code files and saved models trained on Toy datasets (Classification & Regression), and Real World dataset.

anaconda classification classification-models jupyter-notebook knn knn-classification machine-learning machine-learning-algorithms python3 regression regression-models scikit-learn scikit-learn-python scikitlearn-machine-learning svm svm-classifier vscode

Last synced: 07 May 2026

https://github.com/alexliap/sk_serve

Deployment of a Scikit-Learn model and it's column transformations made easy.

machine-learning mlops model-deployment scikit-learn

Last synced: 24 Oct 2025

https://github.com/jofaval/sonar

Binary Classification of Sonar Signals of Rocks and Metal cylinders in 1987

data-analysis data-science data-visualization machine-learning python scikit-learn sonar uci

Last synced: 09 Apr 2026

https://github.com/pradeep31747/smartsuggest-personalized_product_recommendations

This project implements a personalized product recommendation system using machine learning techniques to enhance user experience and drive engagement.

jupyter-notebook keras numpy pandas pyhton scikit-learn sql tensorflow vscode

Last synced: 28 Jan 2026

https://github.com/itssahilwhat/ai-fundamentals

A curated collection of fundamental AI concepts, algorithms, and code implementations — including Machine Learning, Deep Learning, and Computer Vision — built from scratch and with practical examples.

computer-vision deep-learning machine-learning numpy pandas python pytorch scikit-learn

Last synced: 15 Apr 2026

https://github.com/manu-karenite/medical-insurance-cost-predictor

Medical Insurance Cost Generator is a Linear Regression based Predictor which is used to estimate and predict the Cost a person has to pay while Buying a Medical Insurance.

kaggle-dataset linear-regression machine-learning matplotlib numpy pandas python3 reactjs scikit-learn

Last synced: 15 Apr 2026

https://github.com/christiansandovalgarcia01-creator/megaline-plan-classifier

Modelo de clasificación para recomendar plan Smart vs Ultra (Megaline). Split 60/20/20, RandomForest ganador, accuracy TEST ≥ 0.75. Incluye matriz de confusión y classification report. Stack: Python, Pandas, scikit-learn, Jupyter.

classification data-science jupyter-notebook machine-learning python random-forest scikit-learn telecom

Last synced: 15 Apr 2026

https://github.com/moustafamohamed01/breast-cancer-prediction

A machine learning model built with PyTorch to predict if a tumor is malignant or benign using the Breast Cancer Dataset. The model uses a neural network to classify the data and shows how to train, evaluate, and visualize results.

ai data-science deep-learning machine-learning neural-network python pytorch scikit-learn

Last synced: 15 Apr 2026

https://github.com/idaraabasiudoh/telco-churn-logistic-regression

A predictive model using logistic regression to identify customers likely to churn from a telecommunications company.

logistic-regression machine-learning python3 scikit-learn

Last synced: 01 Feb 2026

https://github.com/khanovico/python-stock-analyzer

This is a Webapp implemented by python and several data science frameworks, enabling online stock trend analyzing.

amcharts-js-charts data-analysis data-visualization flask javascript pandas python scikit-learn

Last synced: 02 Feb 2026

https://github.com/vladimiracunadev-create/python-data-science-program

Python Data Science Program — 197 clases en 9 partes. Pauta avanzada derivada de Géron, VanderPlas, Huyen, ISLP y Barocas/Hardt/Narayanan. Recurso personal de aprendizaje, enseñanza y mejora continua.

bootcamp data-analysis data-science education jupyter machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 01 Jun 2026

https://github.com/cego669/dirtycategoriesencoding

Repository containing two classes (StringAgglomerativeEncoder and StringDistanceEncoder) useful for grouping or visualizing the distance between dirty categorical variables. They are compatible with the scikit-learn API.

category clustering dimensionality-reduction dirty hierarchical-clustering machine-learning scikit-learn singular-value-decomposition svd

Last synced: 11 Feb 2026

https://github.com/selcia25/iris-dataset-classification

☘This repository contains a Python script for classifying the Iris dataset using the Random Forest algorithm.

data-processing iris-classification pandas random-forest-classifier scikit-learn

Last synced: 16 Apr 2026

https://github.com/sergeimakarovv/energy-data-analytics-ml

Analyzing global data on sustainable energy, predicting CO2 emissions per capita

machine-learning pandas plotly python scikit-learn streamlit

Last synced: 12 Feb 2026

https://github.com/quran-yeamen/serverlifecycleml

Predictive modeling of server lifecycle stages using synthetic data and machine learning.

data-science machine-learning predictive-modeling python scikit-learn synthetic-data

Last synced: 15 Feb 2026

https://github.com/hafidaso/predicting-industrial-machine-downtime-level-3

This project aims to develop a predictive model using machine learning techniques to forecast machine failures based on historical operational data.

imbalanced-learning numpy pandas python scikit-learn seaborn xgboost

Last synced: 16 Apr 2026

https://github.com/sergeimakarovv/solar-panel-detection

Applying deep learning models to detect solar panel installations in satellite imagery and estimating their generation capacity

albumentations convolutional-neural-networks deep-learning geopandas pandas pvlib python pytorch rasterio scikit-learn wms-service

Last synced: 16 Apr 2026

https://github.com/meiyor/abatech_ai_test

This repository contains the files for deploying an Exploratory Data Analysis (EDA) for participant demographic and company-based data collected by the outsourcing service given by the company ABATech located in Colombia. This repository also includes the evaluation of three different classifiers to decode the level of satisfaction of the users.

keras python scikit-learn scikitlearn-machine-learning tensorflow

Last synced: 16 Apr 2026

https://github.com/shreeparab1890/indian-cricketer-classifier

This notebook is trying to bulia a model which will predict a Indian Cricketer based on the given image. In this project we have handled 8 Indian Cricketers and build a model to classify the given image between this 8 Cricketers.

image-classification matplotlib numpy opencv pandas python random-forest-classifier scikit-learn sklearn streamlit

Last synced: 01 Apr 2026

https://github.com/supershivam5/python_projects

💻 Python programming with Numpy, Pandas, Matplotlib.🌟 Love exploring new technologies. Check out my projects!

matplotlib-pyplot numpy pandas scikit-learn seaborn

Last synced: 17 Apr 2026

https://github.com/zenklinov/regression_logistic_-_sentiment_analysis_movie_data

This repository contains code for performing sentiment analysis using scikit-learn and logistic regression

llm natural-language-processing nlp nltk scikit-learn sentiment-analysis

Last synced: 10 May 2026

https://github.com/dimdasci/car-price-prediction-demo

Demo project of EDA and regression task solution: Pandas, Jupyter Notebook, Scikit-learn, LightGBM

eda lightgbm-regressor regression scikit-learn

Last synced: 03 Jun 2026

https://github.com/amirmohammadgholampour/mall-customer-segmentation

Project for segmenting customers in a shopping mall using the Clustering algorithm.

numpy pandas python scikit-learn

Last synced: 02 Apr 2026

https://github.com/satyas567/weatherdataanalysis

Comprehensive Weather Data Analysis with Python: Explore trends, visualize patterns, detect outliers, and predict temperature using humidity and wind speed

jupyter-notebook linear-regression matplotlib numpy pandas python scikit-learn seaborn

Last synced: 02 Apr 2026

https://github.com/anshvaid4/ml_practice

This is the new repository, where I have added all the notebooks demonstrating the usage of various transformers and models for Supervised and Unsupervised algorithms

anaconda jupyter-notebook machine-learning machine-learning-algorithms python scikit-learn

Last synced: 17 Apr 2026

https://github.com/a-poor/sample-model-serve

Demo for using Flask to serve a scikit-learn model as an API

api data-science docker flask machine-learning scikit-learn

Last synced: 30 Apr 2026

https://github.com/ngangawairimu/linear-regression-

This project builds a linear regression model in Python to predict outcomes and derive insights from feature data. It covers data cleaning, feature analysis, and model evaluation, showcasing predictive modeling techniques using scikit-learn, pandas, and visualization libraries.

data-analysis linear-regression machine-learning predictive-modeling python scikit-learn

Last synced: 17 Apr 2026

https://github.com/rohansardar/speechflowguard

A machine learning web API that detects toxic language in user comments using classical ML

docker logistic-regression machine-learning python3 scikit-learn tf-idf tfidf-text-analysis tfidf-vectorizer

Last synced: 17 Apr 2026

https://github.com/shaharband/calcofi-oceanographic-analysis

This repository contains an analysis of the CalCOFI (California Cooperative Oceanic Fisheries Investigations) dataset, which represents one of the longest and most complete time series of oceanographic and larval fish data in the world.

pandas regression scikit-learn

Last synced: 10 May 2026

https://github.com/minhtran241/ml-dl-llm-genai

Showcasing ML/DL fundamentals, paper implementations, deep learning models, and other projects. The purpose of this repository is to provide a playground for me to explore and learn about PyTorch, deep learning, and generative AI.

deep-learning generative-ai llm machine-learning paper-implementations pytorch scikit-learn

Last synced: 18 Apr 2026

https://github.com/gattsu001/telecom-churn-predictor

Predicts which telecom customers are likely to churn with 95% accuracy using engineered features from usage, billing, and support data. Implements Sturges-based binning, one-hot encoding, stratified 80/20 train-test split, and a two-level ensemble pipeline with soft voting. Achieves 94.60% accuracy, 0.8968 AUC, 0.8675 precision, 0.7423 recall.

churn-prediction classification classification-algorithm customer-retention data-science data-visualization feature-engineering joblib jupyter-notebook machine-learning pandas scikit-learn supervised-learning svm

Last synced: 18 Apr 2026

https://github.com/pedroteixeiraw/variational_quantum_circuit_binary_classification

This project focuses on developing a Variational Quantum Circuit capable of performing Binary Classification between two classes: red wine and white wine, based on their characteristics using machine learning.

binary-classification cost-function json machine-learning matplotlib numpy pandas qiskit qiskit-machine-learning quantum-machine-learning scikit-learn training-data variational-circuit

Last synced: 04 Apr 2026

https://github.com/alainlebret/python-et-ia-1

Ressources personnelles du cours "Python & IA" en 2e année GPSE à l'ENSICAEN

artificial-intelligence image-processing machine-learning matplotlib numpy python scikit-image scikit-learn

Last synced: 04 Apr 2026

https://github.com/mnitin-reddy/a-b-testing-and-regression-analysis-for-ad-performance-optimization

Analyzed the performance of Facebook and AdWords ads using A/B testing and regression analysis to identify trends, correlations, and cost-effectiveness. Key insights included distribution of clicks and conversions, monthly trends, and cost-per-conversion analysis to optimize ROI.

abtesting data-science hypothesis-testing machine-learning matplotlib numpy pandas scikit-learn scipy seaborn statsmodels

Last synced: 04 Apr 2026

https://github.com/chengetanaim/high-school-alcoholism-and-academic-performance

Student Alcoholism and Academic Performance Data Analysis

jupyter-notebook scikit-learn

Last synced: 18 Apr 2026

https://github.com/eugen-goebel/predictive-analytics-agent

Automated ML pipeline — data profiling, preprocessing, model training, and evaluation report generation

automation data-science docker machine-learning predictive-analytics python scikit-learn streamlit

Last synced: 05 Apr 2026

https://github.com/alezoon/movie-revenue-prediction

Sk-learn practice using Linear Regression, ML workflow practice.

jupyter machine-learning matplotlib-pyplot numpy pandas python scikit-learn

Last synced: 05 Apr 2026

https://github.com/jeffandyalltogether/mlrecommendationsystem

project code for a recommendation system for Amazon using collaborative filtering, ranking, and matrix factorization to enhance customer satisfaction and product discovery.

eda matplotlib pandas python scikit-learn seaborn tensorflow

Last synced: 05 Apr 2026

https://github.com/emilyfelker/ieee_cis_fraud_detection

Which online transactions are fraudulent? Program that uses various machine learning algorithms to detect fraud.

decision-trees kaggle logistic-regression machine-learning neural-network pandas poetry pytest python scikit-learn sklearn tensorflow xgboost

Last synced: 05 Apr 2026

https://github.com/nowon1/insurance-claim-prediction_version

This project aims to predict the insurance claim amounts based on various customer attributes using machine learning techniques. The project involves data preprocessing, exploratory data analysis, feature engineering, and model training and evaluation.

data-preprocessing data-science data-visualization exploratory-data-analysis feature-engineering insurance jupyter-notebook machine-learning numpy pandas predictive-modeling python random-forest regression-analysis scikit-learn

Last synced: 05 Apr 2026

https://github.com/lorenzorottigni/ml-movies

Machine Learning python bootcamp: Recommender Systems on movies dataset

ipynb machine-learning numpy pandas python recommender-system scikit-learn seaborn

Last synced: 05 Apr 2026

https://github.com/vijaykumarr1452/black_friday_sales_analysis

Black Friday Sales Analysis python machine learning project using pandas and scikit-learn for data preprocessing, model training, and performance evaluation.

confusion-matrix jupyter-notebook machine-learning pandas python random-forest-classifier sales-analysis scikit-learn

Last synced: 19 Apr 2026

https://github.com/namratha2301/carprice_analysisandprediction

This project analyzes factors influencing vehicle prices using a dataset of various attributes, including Engine capacity, Power, Mileage, and Seating capacity.

data-analysis data-visualization exploratory-data-analysis machine-learning pandas predictive-modeling random-forest-classifier regression scikit-learn seaborn

Last synced: 20 Apr 2026

https://github.com/grandechowhiskey/harvard-cs50-ai-projects

This project contains a collection of programming assignments from CS50’s Introduction to Artificial Intelligence with Python course.

html python scikit-learn tensorflow

Last synced: 20 Apr 2026

https://github.com/abdel-17/facial-recognition

Facial recognition using Machine Learning in Python

machine-learning pca python scikit-learn

Last synced: 20 Apr 2026

https://github.com/ghufranbarcha/linear-regression-training-app

This project is a Streamlit application that allows users to upload a CSV file, select variables, and train a linear regression model. The app provides an easy-to-use interface for selecting dependent and independent variables, scaling data, applying polynomial regression, and evaluating model performance.

data-science machine-learning python scikit-learn streamlit

Last synced: 20 Apr 2026

https://github.com/h-sarhan/hate-speech-classifier

Automatic Detection of Hate Speech and Offensive Content

nlp python scikit-learn

Last synced: 22 Apr 2026

https://github.com/deliprofesor/cinematic-data-analytics-and-recommendation-platform

This project analyzes a movie dataset using machine learning algorithms to predict success, explore revenue-popularity relationships, and develop recommendation systems. It employs techniques like K-Means, DBSCAN, GMM, decision trees, PCA, and NLP for insights and personalized suggestions.

clustering content-based-recommendation data-analysis data-visualization decision-tree gmm k-means machine-learning natural-language-processing nlp pca predictive-modeling python recommendation-system scikit-learn user-based-recommendation

Last synced: 26 Apr 2026

https://github.com/leolion3/smartnanotubes-smellinspector-companion

Companion software for the SmellInspector Devices from SmartNanoTubes. Allows specifying substances, connecting multiple devices, collecting data and performing machine learning.

docker machine-learning python3 reactjs scikit-learn smartnanotubes smellinspector

Last synced: 27 Apr 2026

https://github.com/davidrpugh/kaust-dsa-201

Course materials for KAUST DSA 201

deep-learning machine-learning pytorch scikit-learn

Last synced: 27 Apr 2026

https://github.com/renoyegon/customer_segmentation_using_kmeans_clustering

This project applies KMeans clustering to segment customers in the Online Retail II dataset. Using powerful Python libraries such as pandas, scikit-learn, matplotlib, and seaborn, we uncover meaningful customer behavior patterns

kmeans-clustering matplotlib scikit-learn seaborn

Last synced: 28 Apr 2026

https://github.com/lmriccardo/moments-learning

Repository for the First-Second Moments Learning project. In this repo you will find an implementation of a learning model to learn the relationship between time-series model parameters and the first two moments of its outputs

machine-learning mean mlp-regressor models random-forest scikit-learn time-series torch variance

Last synced: 28 Apr 2026

https://github.com/ronverse17/loan-recovery-strategy

End-to-end ML project for predicting high-risk borrowers and recommending recovery actions

classification data-science kmeans-clustering machine-learning matplotlib random-forest-classifier scikit-learn seaborn

Last synced: 28 Apr 2026

https://github.com/brenofariasdasilva/dagster-education-model

Dagster Education Model using Dagster 1.3.11 and Python 3.7.17.

dagster makefile matplotlib pandas pyenv python3 scikit-learn seaborn shellscript

Last synced: 28 Apr 2026

https://github.com/arnab-0053/song-identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing

Last synced: 28 Apr 2026

https://github.com/razalkr70/customer-segmentation-using-dataset

A data science project that segments mall customers using K-Means clustering. Based on age, income, and spending score, it identifies customer groups and visualizes them with 2D and 3D plots for targeted marketing insights.

clustering customer-segmentation data-science data-visualization kmeans machine-learning pca python scikit-learn

Last synced: 28 Apr 2026

https://github.com/arizdn234/spotify-api-with-colab

Crawling, Analyzing, Clustering music data from Spotify API

machile-learning scikit-learn spotify-api spotipy-library

Last synced: 28 Apr 2026

https://github.com/michaelzheng67/ml_classification_optimizer

Algorithm that determines best machine learning classification model to use for a given dataset. Written in Python.

classification machine-learning python scikit-learn

Last synced: 29 Apr 2026

https://github.com/cserajdeep/elm-python-iris

Different Python implementations of Extreme Learning Machine (ELM) on Iris dataset

ann elm iris python scikit-learn

Last synced: 29 Apr 2026

https://github.com/christopherkindl/spotify-artist-success

Predicting artists’ success by using machine learning approaches on features identified in spotify data

pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/inclinedadarsh/regression-metrics

A simple jupyter notebook demonstrating how to use different metrics from 'scikit-learn' library.

jupyter-notebook machine-learning notebook scikit-learn

Last synced: 29 Apr 2026

https://github.com/nahom32/mlp-assignment

This repository is an implementation for machine learning assignment demonstrating the machine learning process.

eda logistic-regression machine-learning scikit-learn

Last synced: 29 Apr 2026

https://github.com/andreaschatzopoulos/face-landmark-detector

Facial landmark detection using HOG features and Ridge Regression. Simple, effective, and fast – no deep learning required.

computer-vision face-detection hog image-processing landmark-detection python ridge-regression scikit-learn

Last synced: 29 Apr 2026

https://github.com/matheusvazdata/retail-sales-forecast-linreg-sklearn

Minimal project for retail sales forecasting using linear regression (scikit-learn).

forecasting linear-regression machine-learning matplotlib numpy pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/karmaniket/gtavcontrol

created dataset using different hand gestures and trained the ML model for in-game real time control for GTA V. Have fun!

gaming gta5 machine-learning mediapipe opencv python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/rishi-sutar/healwise-ai-your-way-to-wellness

Healwise-AI is a health diagnostic tool that uses a Support Vector Classifier (SVC) model to predict diseases based on user-reported symptoms. After predicting, it offers detailed health advice, including descriptions, diets, medications, and workouts related to the diagnosis.

machine-learning scikit-learn support-vector-machine

Last synced: 30 Apr 2026

https://github.com/das-debjit/emotion-detection

A simple ML-powered web app for real-time emotion detection from text using Streamlit and TF-IDF-based classification.

machine-learning nlp python scikit-learn sentiment-analysis streamlit text-classification tfidf web-app

Last synced: 30 Apr 2026