An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/tbarlow12/learn-it-your-way

Using Python Flask, I wanted to create a simple web API that allows users to upload a dataset, choose one or more models, store them server side, and then hit an endpoint to get a prediction.

flask machine-learning python scikit-learn tensorflow

Last synced: 29 Apr 2026

https://github.com/ledsouza/machine-learning-semisupervisionado

Este projeto utiliza algoritmos de aprendizado de máquina semi-supervisionado para classificar a qualidade do leite como alta, média ou baixa.

data-science joblib machine-learning machine-learning-algorithms pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/pramodyasahan/grade-predictor

This project aims to predict student performance based on various features such as job, study time, failures, absences, and first and second period grades. The project utilizes a linear regression model from the scikit-learn library in Python.

machine-learning matplotlib numpy pandas python regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/fikri-rouzan/burnaway-capstone-data-science

Dashboard analitik interaktif untuk memetakan faktor fisik dan pola kerja pemicu burnout pada software developer.

jupyter-notebook matplotlib pandas pillow plotly python scikit-learn seaborn statsmodels streamlit

Last synced: 08 Jun 2026

https://github.com/kumailn/machinelearning

Machine learning with Python

machine-learning python scikit-learn tensorflow

Last synced: 30 Apr 2026

https://github.com/boladjivinny/fire-prediction

Notebook for the Fire fighting using data on Zindi. Ranked number 5 on the public leaderboard and 8 on the private leaderboard. https://zindi.africa/hackathons/cmu-africa-fighting-fire-with-data

feature-engineering hackhathon machine-learning regression scikit-learn stacking

Last synced: 30 Apr 2026

https://github.com/harshitwaldia/disease_detection

A disease detection system using Random Forest Classifier and GUI in Python, identifying illnesses based on user symptoms.

pandas-python python3 random-forest-classifier scikit-learn tkinter-gui

Last synced: 01 May 2026

https://github.com/kristishqau/sentimentanalysis_nlp

A project for sentiment analysis of tweets using various NLP techniques and machine learning models.

datascience jupyter-notebook machine-learning nlp nltk python scikit-learn sentiment-analysis xgboost

Last synced: 01 May 2026

https://github.com/sundanc/btcprediction

Predict Bitcoin prices based on historical data using machine learning techniques

bitcoin-prediction keras machine-learning pandas python python3 scikit-learn scikitlearn-machine-learning

Last synced: 02 May 2026

https://github.com/viniciusds2020/ml_pycaret_classificacao

Sistema de preprocessamento e treinamento de modelos de machine learning utilizando PyCaret. Uma metodologia low-code para processos de MLops

machine-learning mlops preprocessing pycaret python scikit-learn

Last synced: 03 May 2026

https://github.com/arrhythmia-detection/authorprovidedfeaturescombineddt

Deploys a vanilla Decision Tree for Arrhythmia classification using Chapman ECG dataset on Arduino UNO board

arduino-uno arrhythmia-classification atmega328p chapman-ecg decision-tree-classifier eloquent scikit-learn

Last synced: 09 Jun 2026

https://github.com/zhenglinlei/zdmp

Industry 4.0 Optimization with Machine Learning AI

industry-4 knn-classification machine-learning pandas python scikit-learn

Last synced: 03 May 2026

https://github.com/apfirebolt/movie_recommendation_using_scikitlearn_and_pyqt5

A movie recommendation system built using KNN model from scikit-learn library. GUI components are powered by pyQt5, a library to create GUI applications in Python

cosine-similarity jupyter-notebook knn-algorithm movie-recommedation pandas python scikit-learn

Last synced: 03 May 2026

https://github.com/pramodyasahan/binary-classifier

This repository houses the code for a machine learning model designed to predict customer churn. The model is built using Support Vector Machine (SVM) from the scikit-learn library and incorporates preprocessing, pipeline, and grid search techniques for optimal performance.

numpy pandas scikit-learn

Last synced: 03 May 2026

https://github.com/atchayaah/home-value-insights-kc

Data-driven project predicting King County housing prices using EDA, regression models, and ML techniques, developed as part of IBM’s Data Analysis with Python course on Coursera.

joblib matplotlib numpy pandas pickle python scikit-learn seaborn

Last synced: 03 May 2026

https://github.com/darenr/gradientboostingmachines

Notebooks exploring strengths and weaknesses of GBM based classifiers

jupyter-notebook lightgbm pandas scikit-learn xgboost

Last synced: 03 May 2026

https://github.com/ceodaniyal/telecom_customer_churn_prediction

A machine learning project that predicts whether a telecom customer will churn (leave the service) using customer demographics, account information, and service usage. The repository includes data preprocessing, model training (with logistic regression), feature scaling, and example predictions.

classification customer-churn-prediction data-science logistic-regression machine-learning ml-project pandas prediction python scikit-learn streamlit telecom

Last synced: 04 May 2026

https://github.com/baponkar/scikit-logisticregression-application

A simple and detail application analysis of sci kit learn LogisticRegression model .

classification-algorithm logistic-regression machine-learning python3 scikit-learn

Last synced: 04 May 2026

https://github.com/dakii24/credit-card-fraud-detection

This repository contains a machine learning project focused on detecting fraudulent credit card transactions. The project includes data preprocessing, model training, and evaluation to identify and prevent fraudulent activities.

capstone-project class-imbalance classification-algorithm credit-card credit-card-fraud data-science decision-trees fraud machine-learning open-data python scikit-learn svm svm-classifier

Last synced: 04 May 2026

https://github.com/msikorski93/protein-tertiary-structure

Performing a regression task for estimating residue size based on given physicochemical properties of protein tertiary structures (CASP 5-9).

bioinformatics gradient-boosting multilayer-perceptron-network protein-structure-prediction regression-algorithms scikit-learn tensorflow

Last synced: 04 May 2026

https://github.com/aqueeqazam/machine-learning-using-scikit

This repository contains all of the algorithms used to train the machine learning models using the Scikit library.

numpy scikit-learn

Last synced: 04 May 2026

https://github.com/pierrealexandre78/deathpredict

Predict Hospital mortality rate using Machine Learning for patients admitted in ICU (Intensive Care Unit)

healthcare hospital machine-learning predictions python random-forest-classifier scikit-learn xgboost-classifier

Last synced: 05 May 2026

https://github.com/simpl1fy/spam-classifier-project

A web application to classify spam texts or emails.

multinomial-naive-bayes nltk python render scikit-learn text-classification

Last synced: 05 May 2026

https://github.com/smaddanki/pattern-pursuit-challenge

A personal challenge to build a production-ready trading signal system for S&P 500 stocks using deep learning. This project progresses from basic ML models to a complete trading infrastructure, focusing on 5-day forward return prediction and signal generation.

deep-learning machine-learning pytorch quantative-trading quantitative-finance quantitative-research scikit-learn

Last synced: 05 May 2026

https://github.com/rohra-mehak/sciencesync

System for Personalized Google Scholar Alerts Processing and Data Management, and provision of ML based clustering analysis

agglomerative-clustering clustering crossref-api customtkinter google-api google-scholar graph-api machine-learning numpy pandas python3 scientific-article-analysis scikit-learn sqlite3

Last synced: 05 May 2026

https://github.com/aysenurcftc/breast_cancer_streamlit

Breast Cancer Wisconsin Dataset Classifier with Scikit-learn and Streamlit

breast-cancer classification gridsearch scikit-learn streamlit

Last synced: 05 May 2026

https://github.com/kefrankk/ml-fraud-detection

I built a predictive model to detect fraud in financial transactions.

pandas python scikit-learn

Last synced: 05 May 2026

https://github.com/kunalpisolkar24/dsbda_lab

Collection of practical codes for Savitribai Phule Pune University's Data Science and Big Data Analytics Laboratory (310256).

data-analytics data-preprocessing data-science data-wrangling descriptive-statistics linear-regression logistic-regression mapreduce scala scikit-learn sppu-computer-engineering tf-idf

Last synced: 05 May 2026

https://github.com/sevilaymuni/project-no.6-tree-based-models

Random Forest Assisted Suggestions for Salifort Motors Employee Retention: Plan, Analyze, Construct and Execute

data-science decision-trees evaluation-metrics gridsearchcv logistic-regression machine-learning matplotlib python random-forest-classifier scikit-learn seaborn-plots

Last synced: 05 May 2026

https://github.com/sadmansakib93/mental-resilience-analysis-using-machine-learning

Utilized supervised and unsupervised ML techniques to analyze mental health and resilience levels of medical students [Project completed on December, 2019]

artificial-intelligence classification clustering correlation linear-regression machine-learning machine-learning-algorithms mental-health python regression resilience scikit-learn statistical-analysis

Last synced: 06 May 2026

https://github.com/billgewrgoulas/recommendation-systems

Algorithms for joke rating prediction using the joke data-set from Kaggle.

algorithm clustering collaborative-filtering machine-learning numpy pandas recommender-system scikit-learn scypi

Last synced: 06 May 2026

https://github.com/adesartika33/proyek-analisis-data-dataset-iris

Proyek ini bertujuan untuk menganalisis dataset Iris, salah satu dataset klasik dalam bidang Machine Learning dan Data Science. Dataset ini terdiri dari 150 sampel bunga Iris dari tiga spesies (Setosa, Versicolor, dan Virginica)

classification data-science data-visualization eda exploratory-data-analysis iris-dataset machine-learning python random-forest scikit-learn

Last synced: 06 May 2026

https://github.com/ejw-data/ml-playground

Testing the limitations, inabilities, and strengths of models with synthetic data

machine-learning python scikit-learn

Last synced: 06 May 2026

https://github.com/cycle-sync-ai/student-score-analysis

A data-driven student performance analysis project using UCI dataset (396 students, 33 features). Implements machine learning models (K-means, PCA, Decision Tree, Random Forest, Linear Regression) to analyze academic patterns and predict student scores based on lifestyle, health, and study habits.

clustering clustering-algorithm decision-trees feature-engineering learning-management-system linear-regression machine-learning machine-learning-algorithms matplotlib numpy pandas pca pickle prediction prediction-algorithm scikit-learn score seaborn student

Last synced: 06 May 2026

https://github.com/kartheekdama/salary-prediction

This salary prediction model leverages machine learning techniques, including Random Forest, Decision Tree, and Linear Regression, to estimate salaries based on individual attributes such as age, gender, education level, job title, and years of experience. The Random Forest model outperforms the others, achieving the highest R-squared score.

decision-tree exploratory-data-analysis feature-importance linear-regression machine-learning random-forest scikit-learn

Last synced: 06 May 2026

https://github.com/galaxy092/samsung-innovation-campus-big-data-capstone-project

Samsung Innovation Campus Big Data Capstone Project - Weather Prediction

hadoop jupyter-notebook pandas pyspark scikit-learn sparksql

Last synced: 06 May 2026

https://github.com/jbizzlefoshizzle/ibm_capstone_project

Used K-means clustering and mapping libraries to determine best cities in San Diego to open a Mexican restaurant

beautifulsoup4 folium-maps geopy pandas-python scikit-learn

Last synced: 06 May 2026

https://github.com/kirillshiryaev61/customer_activity_prediction

Прогнозирование снижения покупательской активности в интернет-магазине. Модель на основе ML выявляет клиентов с риском оттока для повышения удержания. Учебный проект.

jupyter pandas python scikit-learn

Last synced: 07 May 2026

https://github.com/garimarao24/customer-churn-project

This repository contains a Customer Churn Prediction project that leverages Machine Learning techniques to predict customer churn and segment customers using clustering.

customer-churn kmeans-clustering logistic-regression machine-learning pca scikit-learn

Last synced: 07 May 2026

https://github.com/rishi035/advanced-house-price-predictions

This is my First Project and also participated in kaggle competition

linear-regression machine-learning python random random-forest regressor-models scikit-learn

Last synced: 07 May 2026

https://github.com/pspanoudakis/machine-learning-nlp

NLP 🤖 📖 projects on Vaccine Sentiment Classification 💉 and Question Answering 💬

bert-fine-tuning glove-embeddings neural-networks pytorch question-answering rnn scikit-learn sentiment-classification softmax-regression squad

Last synced: 07 May 2026

https://github.com/andrewsy1004/linear-regression-model-for-house-price-prediction

A linear regression model to predict house prices based on features like size, location, and number of rooms. This project demonstrates the application of machine learning in real estate price estimation

linear-regression python scikit-learn xgbregressor

Last synced: 07 May 2026

https://github.com/tedim52/discjockey

a content-based recommender system for your party playlist preferences

jupyter-notebook matplotlib pandas scikit-learn spotify-web-api

Last synced: 07 May 2026

https://github.com/cnoret/hexa-watts

Interactive data visualization and machine learning app for energy consumption analysis and prediction in France, built with Streamlit. (Text in French)

data-visualization electricity-forecasting energy-analysis france machine-learning scikit-learn streamlit

Last synced: 07 May 2026

https://github.com/moustafamohamed01/mall-customer-segmentation-data

Customer segmentation using K-Means clustering based on annual income and spending score.

data-science data-visualization k-means-clustering machine-learning python scikit-learn unsupervised-learning

Last synced: 08 May 2026

https://github.com/samjoesilvano/password_strength_prediction_using_nlp

Developed a predictive model to categorize passwords as Strong, Good, or Weak, enhancing security and reducing breach risks. The project involves cleaning and analyzing data from an SQL database, using the TF-IDF technique for transformation, and implementing a Logistic Regression model to achieve accurate classifications.

data-analysis data-classification data-cleaning data-visualization logistic-regression machine-learning natural-language-processing pandas password-security password-strength python scikit-learn sql tf-idf

Last synced: 08 May 2026

https://github.com/jatin-mehra119/churn_modeling

This repository is dedicated to predicting customer churn using machine learning techniques. It includes comprehensive scripts for data preprocessing, model training, and evaluation, along with detailed visualizations and insights.

classification-model datavisualization pandas scikit-learn

Last synced: 08 May 2026

https://github.com/vijaykumarr1452/customer-churn-prediction

Analysis the data of telecom company and insights gained to reduce customer churn.

anaconda jupyter-notebook machine-learning pandas prediction scikit-learn

Last synced: 09 May 2026

https://github.com/santiagoasp98/spam-detection

SMS spam detection using Logistic Regression and Multinomial Naive Bayes.

classification logistic-regression machine-learning multinomial-naive-bayes python scikit-learn spam-detection

Last synced: 09 May 2026

https://github.com/alphacrypto246/employee-attrition

This project analyzes employee attrition data to uncover key factors driving employee turnover. Using Python, it employs data preprocessing, exploratory data analysis, and machine learning models to predict attrition and provide actionable insights for improving employee retention strategies.

decision-tree-classifier machine-learning machine-learning-algorithms python scikit-learn scikitlearn-machine-learning

Last synced: 09 May 2026

https://github.com/callmerajesh/ames-housing-price-prediction

Predicting house prices using Decision Tree Regressor on the Ames dataset

ames-housing data-science decision-tree machine-learning python regression scikit-learn

Last synced: 09 May 2026

https://github.com/saahilanande/naivebayes

Implimenting Naive Bayes classifier from scratch for sentiment analysis of IMDB dataset

machine-learning naive-bayes-classifier python-3 scikit-learn

Last synced: 09 May 2026

https://github.com/malisha4065/flightdelaypredictiongroup99

This project focuses on predicting flight delays in the United States domestic air traffic system over 500 000+ data using machine learning techniques. Leveraging a dataset from the Bureau of Transportation Statistics for the year 2020, we aim to develop a predictive model that can anticipate flight delays with 93.1 % high accuracy.

k-nearest-neighbors machine-learning python scikit-learn support-vector-machine

Last synced: 09 May 2026

https://github.com/samuelson777/iris-flower-classification

Iris Flower Classification: A machine learning project that classifies iris flowers into three species based on sepal and petal dimensions. Includes data exploration, visualization, and model evaluation using Python and scikit-learn.

classification data-science data-visualization iris-dataset jupyter-notebook machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/suvasish114/house-price-estimation

A machine learning model that estimate housing prices in California using the California census data

jupyter-notebook machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/bhoomikaniranjan/pulmotrainer

A Deep Learning-based Lung Cancer Detection application using a 3D CNN model with TensorFlow and OpenCV, featuring an interactive Tkinter GUI for easy data processing and training.

matplotlib numpy-pandas opencv python scikit-learn seaborn tensorflow-keras

Last synced: 09 May 2026

https://github.com/mpolinowski/fisher-discriminant-analysis

LDA is a widely used dimensionality reduction technique built on Fisher’s linear discriminant.

linear-discriminant-analysis matplotlib-pyplot python scikit-learn

Last synced: 10 May 2026

https://github.com/laavanjan/real_estate_price_prediction

This project predicts the house price per unit area based on various real estate features using a Linear Regression model. The application is built with Dash, a Python framework for building interactive web apps.

dash linear-regression pandas scikit-learn

Last synced: 10 May 2026

https://github.com/macdon112/credit-card-fraud-detection

Comparing ML models (Random Forest, KNN, Decision Tree) for credit card fraud detection using SMOTE and stratified cross-validation.

classification data-analysis fraud-detection imbalanced-data machine-learning python scikit-learn

Last synced: 10 May 2026

https://github.com/tnleite/real-estate-opportunities-analysis

Este repositório apresenta uma análise de oportunidades no mercado imobiliário, combinando séries temporais, clusterização e previsões para identificar estados com maior potencial de crescimento e orientar estratégias de expansão eficientes.

catboostregressor cluster-analysis data-science kmeans-clustering lightgbm-regressor machine-learning-algorithms numpy regression-models scikit-learn xgboost-regression

Last synced: 10 May 2026

https://github.com/i30101/mathworks2024

Coding tools for 2024 MathWorks Math Modeling Challenge

machine-learning mathematical-modelling python scikit-learn

Last synced: 10 Jun 2026

https://github.com/alphacrypto246/student-learning-style-prediction

An interactive web application built with Streamlit that predicts a student's preferred learning style (visual, auditory, or kinesthetic) using machine learning, aiding educators in personalizing teaching strategies.

machine-learning scikit-learn scikitlearn-machine-learning streamlit

Last synced: 11 May 2026

https://github.com/mpolinowski/tstochastic-neighbor-embedding

Improve Data Quality by discarding non-correlating, noisy Dimensions

matplotlib-pyplot python scikit-learn t-sne

Last synced: 11 May 2026

https://github.com/anras5/criteo-search-data

EDA and statistical tests on CriteoSearchData dataset

data-science pandas scikit-learn statistics

Last synced: 11 May 2026

https://github.com/rajireddy15/student_grade_pred

A machine learning project to predict student final grades using academic and demographic data. Built with pandas, scikit-learn, and visualized with seaborn and matplotlib to gain insights and support early intervention for students.

academic-insights data-science eda education-analytics grade-prediction machine-learning ml-project pandas regression-models scikit-learn student-performance-analysis

Last synced: 11 May 2026

https://github.com/xunchiasg/nyc_property_sales

Exploratory Data Analysis of rolling property sales data in NYC from March 2023-2025

matplotlib-pyplot plotly python scikit-learn

Last synced: 12 May 2026

https://github.com/srosalino/prediction_of_seoul_bikes_demand

The objective of this project is to predict the number of bicycles needed to be made available each hour in order to make the service as efficient as possible

cross-validation data-exploration-and-preprocessing hyperparameter-tuning machine-learning regularization-methods scikit-learn

Last synced: 13 May 2026

https://github.com/msikorski93/heart-failure-prediction

The subject of this repository was to perform binary classification based on respondent's collected features (age, cholesterol level, fasting blood sugar, thallium stress test results, etc.).

classification knn-classifier logistic-regression random-forest-classifier roc-curves scikit-learn svm-classifier

Last synced: 13 May 2026

https://github.com/fgebhart/handson-ml

hands-on machine learning notebooks collection

jupyter-notebook machine-learning scikit-learn

Last synced: 13 May 2026

https://github.com/janek1842/mlbyjan-sandbox

Testbed for private ML investigations

ml scikit-learn

Last synced: 14 May 2026

https://github.com/fulviofavilla/cvd-prediction-ml

Comparative ML analysis for CVD prediction. Winner of the 2023 HPCC Systems Poster Competition.

data-science ecl healthcare hpcc-systems machine-learning pandas python scikit-learn

Last synced: 11 Jun 2026

https://github.com/jayemscript/lab-to-code

A complete Python learning roadmap for scientists and researchers — covering data science, biology, chemistry, physics, and mathematics with curated libraries, tools, and resources.

bioinformatics chemistry data-science jupyter-notebook machine-learning mathematics numpy pandas physics python research roadmap scientific-computing scikit-learn

Last synced: 19 Jun 2026

https://github.com/royxlead/production-drift-detection

Production ML monitoring library - KL, PSI, MMD, and ADWIN drift detectors with empirical benchmarks, confidence tracking, and a 6-page FastAPI dashboard.

data-drift drift-detection fastapi kl-divergence mlops mmd model-monitoring production-ml psi pytorch scikit-learn uncertainty-quantification

Last synced: 23 Jun 2026

https://github.com/imosudi/model_training

Breast Cancer Diagnosis: Logistic Regression, Random Forest, k-NN and Decision Tree classifiers models with feature importance analysis - Includes data exploration, train/test splitting, feature scaling, cross-validation, and model evaluation metrics with confusion matrices and decision boundary visualisation

classification data-science decision-tree educational feature-importance k-nearest-neighbors linear-regression machine-learning model-evaluation python3 random-forest scikit-learn

Last synced: 25 Jun 2026

https://github.com/sundarmd/breast-cancer-detection

Breast-Cancer-Detection is a machine learning project that utilizes logistic regression to predict whether a tumor is benign or malignant based on the Breast Cancer Wisconsin (Diagnostic) dataset. The project demonstrates data preprocessing, model training, and evaluation using the `scikit-learn` library.

logistic-regression machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/jeus0522/7-explore-different-classifier-ml-app

A project exploring various classification algorithms, showcasing their implementation, comparison, and evaluation using Python and scikit-learn.

k-nearest-neighbours knn random-forest scikit-learn streamlit support-vector-machine svm

Last synced: 21 Jan 2026

https://github.com/fanyicharllson/mobile-money-transaction-analysis

Machine learning pipeline for classifying mobile money users (MTN MoMo & Orange Money) into activity segments — CSC 3221 Final Project, ICT University Cameroon.

cameroon data-science ict-university jupyter jupyter-notebook machine-learning mtn-momo orange-money python scikit-learn

Last synced: 31 May 2026