An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/vanilladucky/housing-prediction

This is a data analytics and machine learning project that I undertook using a housing dataset on Kaggle in order to put my machine learning knowledge to practice and some practical application.

data-science machine-learning python scikit-learn

Last synced: 05 May 2026

https://github.com/sevilaymuni/project-no.6-tree-based-models

Random Forest Assisted Suggestions for Salifort Motors Employee Retention: Plan, Analyze, Construct and Execute

data-science decision-trees evaluation-metrics gridsearchcv logistic-regression machine-learning matplotlib python random-forest-classifier scikit-learn seaborn-plots

Last synced: 05 May 2026

https://github.com/zenitsu272/fault-detection-ml

Machine Learning based Fault Detection in machines using sensor data

artificial-intelligence decsion-tree machine-learning pandas pandas-dataframe pandas-python scikit-learn

Last synced: 05 May 2026

https://github.com/fahrettinsolak/ai-based-salary-scale-calculation-project

This project demonstrates a Polynomial Regression model using a dataset related to experience and salary. The model is built using Python with the pandas, matplotlib, and sklearn libraries. The dataset includes information on years of experience and corresponding salary.

artificial-intelligence deep-learning jupyter-notebook machine-learning matplotlib pandas pyhton scikit-learn

Last synced: 05 May 2026

https://github.com/rishisolanke/twitter-sentiment-analysis-using-machine-learning-

A research project that classifies tweets as positive, negative, or neutral using ML algorithms (Logistic Regression, Naïve Bayes, SVM) with NLP preprocessing.

data-science data-visualization logistic-regression machine-learning ml-models naive-bayes natural-language-processing nlp scikit-learn sentiment-analysis svm text-classification twitter-data

Last synced: 06 May 2026

https://github.com/eshansugeesh/fico-score-loan-default-modeling-project

Credit risk assessment using FICO score segmentation, loan default modeling, discretization techniques, and log-likelihood evaluation for predictive analytics in financial services.

bucketing classification credit-risk customer-segmentation data-science discretization fico-score financial-analytics loan-analysis loan-default log-likelihood machine-learning numpy pandas predictive-modeling risk-modeling scikit-learn segmentation statistical-modelling

Last synced: 06 May 2026

https://github.com/radoslawregula/binary-classification-metrics

A model implementing a solution to the binary classification problem along with several accuracy metrics.

binary-classification classification jupyter-notebook machine-learning matplotlib pandas python scikit-learn stochastic-gradient-descent

Last synced: 06 May 2026

https://github.com/kaoutarmi/predition_price-old-cars

Ce projet de prédiction du prix des voitures utilise l’apprentissage automatique pour estimer la valeur des véhicules en fonction de leurs caractéristiques.

car-price-prediction data-preprocessing data-science decision-tree feature-engineering machine-learning regression scikit-learn

Last synced: 06 May 2026

https://github.com/lazarust/jupyternotebooks

Storage spot for all my Jupyter Notebooks. Check some of them out!!

jupyter-notebook jupyter-notebooks keras scikit-learn sklearn

Last synced: 06 May 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

🏠 Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/andrewsy1004/logistic-regression-spam-classifier

This project implements a spam email classifier using Logistic Regression.

numpy pandas scikit-learn

Last synced: 06 May 2026

https://github.com/5hraddha/optimize-oil-well-locations

In the quest for harnessing valuable energy resources, the OilyGiant mining company wants to expand its operations by discovering new oil well locations. To achieve this, a data-driven approach is adopted, leveraging geological exploration data from three distinct regions and employing techniques in data analysis and modeling.

linear-regression numpy pandas scikit-learn supervised-learning

Last synced: 06 May 2026

https://github.com/adesartika33/proyek-analisis-data-dataset-iris

Proyek ini bertujuan untuk menganalisis dataset Iris, salah satu dataset klasik dalam bidang Machine Learning dan Data Science. Dataset ini terdiri dari 150 sampel bunga Iris dari tiga spesies (Setosa, Versicolor, dan Virginica)

classification data-science data-visualization eda exploratory-data-analysis iris-dataset machine-learning python random-forest scikit-learn

Last synced: 06 May 2026

https://github.com/pimakarov/textkd-p4-fewshot-distilbert

📊 Compare few-shot text classification with DistilBERT and TF-IDF + SVM using IMDB data, analyzing performance across various sample sizes.

bert distilbert few-shot-learning nlp python pytorch scikit-learn text-classification transfer-learning trasformer

Last synced: 06 May 2026

https://github.com/dwade-eng/customer-lead-conversion-analysis

This project explores a real-world lead conversion dataset, using a structured machine learning pipeline to classify leads into likely or unlikely converters. It includes complete steps from data wrangling and visualization to feature engineering and model evaluation.

html matplotlib pandas python3 scikit-learn seaborn

Last synced: 06 May 2026

https://github.com/ejw-data/ml-playground

Testing the limitations, inabilities, and strengths of models with synthetic data

machine-learning python scikit-learn

Last synced: 06 May 2026

https://github.com/douglas-data-analyst/predictive-analysis

Modelo preditivo para previsão de vendas usando scikit-learn e machine learning

data-science machine-learning predictive-analytics python sales-forecasting scikit-learn time-series

Last synced: 06 May 2026

https://github.com/pradeep-r04/spam-email-classification

Spam Email Classification Using NLP and Machine Learning involves building a system to identify and categorize emails as either spam or non-spam (ham). This process typically uses Natural Language Processing (NLP) techniques to analyze and preprocess text data and machine learning algorithms to train a model for classification.

artificial-intelligence machine-learning naive-bayes-classifier nlp pkl python scikit-learn streamlit

Last synced: 06 May 2026

https://github.com/cycle-sync-ai/student-score-analysis

A data-driven student performance analysis project using UCI dataset (396 students, 33 features). Implements machine learning models (K-means, PCA, Decision Tree, Random Forest, Linear Regression) to analyze academic patterns and predict student scores based on lifestyle, health, and study habits.

clustering clustering-algorithm decision-trees feature-engineering learning-management-system linear-regression machine-learning machine-learning-algorithms matplotlib numpy pandas pca pickle prediction prediction-algorithm scikit-learn score seaborn student

Last synced: 06 May 2026

https://github.com/williyam-m/company-registration-trends

Utilized Linear Regression from scikit-learn to predict future company registration trends.

flask matplotlib numpy pandas-python scikit-learn

Last synced: 06 May 2026

https://github.com/lintangwisesa/pdb_mti_ui_lab1_k6

Tugas Lab 1 Pengelolaan Data Besar MTI UI 2023

machine-learning python3 scikit-learn

Last synced: 06 May 2026

https://github.com/bhavyac16/flairifyme

FlairifyMe is a Reddit Flair Detector for r/india subreddit, that takes a post's URL as user input and predicts the flair for the post using a model generated by Logistic Regression.

flair-prediction flask hacktoberfest linear-svm logistic-regression naive-bayes-classifier nltk praw-reddit reddit-flair-detector scikit-learn scraped-data subreddit text-classification

Last synced: 06 May 2026

https://github.com/sahilmate/ebm-breast-cancer-classifier

This repository implements an Explainable Boosting Machine (EBM) model for breast cancer classification using scikit-learn and interpret. The project includes data preprocessing, model training, accuracy evaluation, and feature importance visualization.

breast-cancer-classification data-visualization explainable-boosting-machine feature-importance interpret machine-learning scikit-learn

Last synced: 06 May 2026

https://github.com/rafay-imraan/recommendation-system

A machine learning model that outputs personalized similar movie recommendations for people based on the ones they have rated positively.

machine-learning pandas python scikit-learn

Last synced: 06 May 2026

https://github.com/ccastleberry/hands_on_machine_learning

Notebooks and files created while working through the book Hands on Machine Learning

data-science jupyter-notebook scikit-learn tensorflow

Last synced: 06 May 2026

https://github.com/avtorgenii/ml-playground

A repository for exploring and experimenting with datasets, building machine learning models, and testing various techniques in data preprocessing, feature engineering, and model evaluation.

matplotlib ml pandas scikit-learn

Last synced: 06 May 2026

https://github.com/blacknahil/spam-detection

A simple web application for detecting spam messages using a machine learning model. The application is built using Flask and provides an interactive interface for users to input a message and get a prediction whether it is spam or ham along with the probability.

flask html-css-javascript pandas scikit-learn

Last synced: 06 May 2026

https://github.com/jbizzlefoshizzle/ibm_capstone_project

Used K-means clustering and mapping libraries to determine best cities in San Diego to open a Mexican restaurant

beautifulsoup4 folium-maps geopy pandas-python scikit-learn

Last synced: 06 May 2026

https://github.com/kianaabrisham/svm-from-scratch

Linear SVM from scratch with hinge loss + decision boundaries

classification from-scratch fundamentals hinge-loss numpy optimization scikit-learn svm

Last synced: 07 May 2026

https://github.com/taquynhnga2001/regression-calories-burnt-prediction

Develop regression models which can predict the total calories a person has burnt during a workout based on some biological measures.

machine-learning python regression-analysis scikit-learn

Last synced: 07 May 2026

https://github.com/eshrathaziz/heart-disease-risk-assessment

Predicting heart disease risk using machine learning for Healthcare Insights.

data-science jupyter-notebook learning machine python scikit-learn

Last synced: 07 May 2026

https://github.com/ayaarbi/prediction_des_maladies_cardiovasculaires_avec_ml

Ce projet, développé au sein de cours de Machine Learning, utilise des algorithmes de classification supervisée pour prédire la présence de maladies cardio-vasculaires à partir de données médicales publiées sur Kaggle.

cardiovascular-diseases jupyter-notebook machine-learning matplotlib pandas python scikit-learn

Last synced: 07 May 2026

https://github.com/z-fran/walmart-store-sales-forecasting

Data analysis and machine learning solution in Python for the Kaggle competition Walmart Recruiting - Store Sales Forecasting.

machine-learning sales-analysis sales-forecasting sales-prediction scikit-learn walmart-sales-forecasting

Last synced: 07 May 2026

https://github.com/garimarao24/customer-churn-project

This repository contains a Customer Churn Prediction project that leverages Machine Learning techniques to predict customer churn and segment customers using clustering.

customer-churn kmeans-clustering logistic-regression machine-learning pca scikit-learn

Last synced: 07 May 2026

https://github.com/rishi035/advanced-house-price-predictions

This is my First Project and also participated in kaggle competition

linear-regression machine-learning python random random-forest regressor-models scikit-learn

Last synced: 07 May 2026

https://github.com/tony123105/comp4423_garbage_classification

Garbage classification using traditional machine learning approaches (HOG, LBP, SIFT features with SVM, KNN, Random Forest classifiers) and an ensemble method to categorize waste into 10 types.

computer-vision feature-extraction garbage-classification hog image-classification knn lbp machine-learning opencv python random-forest scikit-learn sift svm

Last synced: 07 May 2026

https://github.com/pspanoudakis/machine-learning-nlp

NLP 🤖 📖 projects on Vaccine Sentiment Classification 💉 and Question Answering 💬

bert-fine-tuning glove-embeddings neural-networks pytorch question-answering rnn scikit-learn sentiment-classification softmax-regression squad

Last synced: 07 May 2026

https://github.com/nicovandenhooff/wids-datathon-2022

This repository contains solution for the 2022 Women in Data Science Kaggle competition that I participated in, which obtained a top 10% leaderboard standing.

catboost data-visualization datascience energy-consumption ensemble-learning exploratory-data-analysis kaggle lightgbm machine-learning scikit-learn women-in-data-science xgboost

Last synced: 07 May 2026

https://github.com/dynle/2020f-ml

2020F Keio University - Machine Learning Laboratory

machine-learning python scikit-learn

Last synced: 07 May 2026

https://github.com/tedim52/discjockey

a content-based recommender system for your party playlist preferences

jupyter-notebook matplotlib pandas scikit-learn spotify-web-api

Last synced: 07 May 2026

https://github.com/cnoret/hexa-watts

Interactive data visualization and machine learning app for energy consumption analysis and prediction in France, built with Streamlit. (Text in French)

data-visualization electricity-forecasting energy-analysis france machine-learning scikit-learn streamlit

Last synced: 07 May 2026

https://github.com/mark-mdo47/family-machine-learning-project-2017

We are doing a two-part Machine Learning project this summer with SciKit-Learn and Keras/TensorFlow

machine-learning python scikit-learn tensorflow

Last synced: 07 May 2026

https://github.com/henrytseng/example_docker_scikit-learn

A quick example of using Scikit-Learn from a Docker container

docker scikit-learn

Last synced: 08 May 2026

https://github.com/anusha-me/disease-x-detection-ml-project

A machine learning classification system for early detection of Disease X based on patient symptoms using Python, Scikit-learn, and Streamlit.

classification data-science disease-prediction healthcare-ai machine-learning medicaldata scikit-learn streamlit

Last synced: 08 May 2026

https://github.com/samjoesilvano/password_strength_prediction_using_nlp

Developed a predictive model to categorize passwords as Strong, Good, or Weak, enhancing security and reducing breach risks. The project involves cleaning and analyzing data from an SQL database, using the TF-IDF technique for transformation, and implementing a Logistic Regression model to achieve accurate classifications.

data-analysis data-classification data-cleaning data-visualization logistic-regression machine-learning natural-language-processing pandas password-security password-strength python scikit-learn sql tf-idf

Last synced: 08 May 2026

https://github.com/prajjwal6969/recommender-system-using-python

A collection of content-based recommendation systems for songs and movies using Python and machine learning.

content-based-filtering cosine-similarity machine-learning movie-recommendation python recommender-system scikit-learn song-recommendation

Last synced: 08 May 2026

https://github.com/jatin-mehra119/churn_modeling

This repository is dedicated to predicting customer churn using machine learning techniques. It includes comprehensive scripts for data preprocessing, model training, and evaluation, along with detailed visualizations and insights.

classification-model datavisualization pandas scikit-learn

Last synced: 08 May 2026

https://github.com/mpolinowski/local-linear-embedding

Improve Data Quality by discarding non-correlating, noisy Dimensions

locally-linear-embedding pyplot python scikit-learn

Last synced: 08 May 2026

https://github.com/deepanshkhurana/udacityproject-prediciting-boston-housing-prices

This is a Udacity Project for the Machine Learning Nanodegree. Here, we are trying to predict Boston Housing Prices using sklearn.

data-analysis data-science machine-learning python scikit-learn udacity

Last synced: 08 May 2026

https://github.com/gregoritsch3/dl_cv_e2e_potatodiseaseclassification

A guided CodeBasics Deep Learning Project where a Convolutional Model is deployed onto a Website (FastAPI) and Mobile App (React Native, Google Cloud). Its purpose is the classification of potato plant images into "healthy", "Early Blight" and "Late Blight" categories.

cnn-classification gcp model-deployment scikit-learn tensorflow

Last synced: 08 May 2026

https://github.com/oriolventur/assignment-2-model-creation

Assignment 2 from Artificial Intelligence 1 course: Model creation using synthetic data and scikit-learn.

jupyter-notebook model-creation python scikit-learn

Last synced: 08 May 2026

https://github.com/seyha1007/amazon-reviews-analysis

🧐 This project analyzes Amazon Fine Food Reviews to investigate whether negative reviews are more emotionally intense and lexically repetitive than positive ones. Using R, we apply sentiment analysis and lexical diversity metrics to uncover patterns in consumer review language.

acp amazon-reviews bert data-analytics glove jupyter-notebook lstm-sentiment-analysis machine-learning nltk random-forest scikit-learn sentiment-classification sentimental-analysis support-vector-machine

Last synced: 08 May 2026

https://github.com/sundarmd/breast-cancer-detection

Breast-Cancer-Detection is a machine learning project that utilizes logistic regression to predict whether a tumor is benign or malignant based on the Breast Cancer Wisconsin (Diagnostic) dataset. The project demonstrates data preprocessing, model training, and evaluation using the `scikit-learn` library.

logistic-regression machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/shingiraibhengesa/house-price-predictor

A machine learning project that predicts house prices based on user input features such as square footage, number of bedrooms, and more.

machine-learning-models matplotlib numpy python scikit-learn seaborn

Last synced: 09 May 2026

https://github.com/vijaykumarr1452/customer-churn-prediction

Analysis the data of telecom company and insights gained to reduce customer churn.

anaconda jupyter-notebook machine-learning pandas prediction scikit-learn

Last synced: 09 May 2026

https://github.com/davidrpugh/kaust-cs-294w

Course materials for KAUST CS 294W

deep-learning machine-learning pytorch scikit-learn

Last synced: 09 May 2026

https://github.com/radoslawregula/iris-classification

Jupyter notebook implementing an efficient machine learning method to classify flowers from the Iris data set.

classification iris-dataset jupyter-notebook machine-learning python scikit-learn softmax-classifier

Last synced: 09 May 2026

https://github.com/l1ght14/customer-churn-prediction

Predict customer churn using machine learning models like Logistic Regression and Random Forest. Includes data preprocessing, model evaluation, feature importance, and insights to drive retention strategies.

churn-prediction classification customer-churn customer-churn-prediction data-analysis logistic-regression machine-learning python random-forest scikit-learn telecom

Last synced: 09 May 2026

https://github.com/alphacrypto246/employee-attrition

This project analyzes employee attrition data to uncover key factors driving employee turnover. Using Python, it employs data preprocessing, exploratory data analysis, and machine learning models to predict attrition and provide actionable insights for improving employee retention strategies.

decision-tree-classifier machine-learning machine-learning-algorithms python scikit-learn scikitlearn-machine-learning

Last synced: 09 May 2026

https://github.com/akwardhan/loan-default-prediction-xgboost-streamlit

Full-scale loan default prediction system using XGBoost, trained on 1.3M LendingClub loans. Includes feature-rich preprocessing, class imbalance handling, recall-focused ML pipeline, and Streamlit web deployment for real-time borrower risk scoring.

credit-risk data-science google-colab loan-default-prediction machine-learning python real-world-project scikit-learn streamlit xgboost

Last synced: 09 May 2026

https://github.com/peterchain/titanic

Script for the Titanic dataset for evaluating which passengers survived

kaggle machine-learning pandas-dataframe python3 scikit-learn

Last synced: 09 May 2026

https://github.com/otuemre/viginids

VigiNIDS: A machine learning-based system for detecting malicious network traffic using the UNSW-NB15 dataset. It distinguishes between normal and attack activities, providing a data-driven approach to network security.

classification cybersecurity intrusion-detection-system machine-learning network-intrusion-detection python scikit-learn unsw-nb15 xgboost

Last synced: 09 May 2026

https://github.com/roggersanguzu/tomato-disease-detector

This project Uses transfer learning with MobileNetV2 to accurately classify tomato leaf diseases including Mosaic Virus, Septoria Leaf Spot, Blight, and Healthy leaves.

deep-learning python scikit-learn transfer-learning

Last synced: 09 May 2026

https://github.com/mpolinowski/multi-dimensional-scaling

Multidimensional Scaling is a family of statistical methods that focus on creating mappings of items based on distance.

matplotlib-pyplot multi-dimensional-scaling python scikit-learn

Last synced: 09 May 2026

https://github.com/saahilanande/naivebayes

Implimenting Naive Bayes classifier from scratch for sentiment analysis of IMDB dataset

machine-learning naive-bayes-classifier python-3 scikit-learn

Last synced: 09 May 2026

https://github.com/adadalshabab/human-stress-analysis-greadsearch-classifier

The project leverages data from physiological signals, self-reported surveys, behavioral observations, or other relevant sources to infer and analyze stress levels.

classification knn-classification machine-learning machine-learning-algorithms matplotlib pandas scikit-learn

Last synced: 09 May 2026

https://github.com/rajan-bhateja/aqi-predictor

Different models trained on Indian Cities to predict AQI

machine-learning-algorithms model-comparison neural-networks scikit-learn tensorflow

Last synced: 09 May 2026

https://github.com/samuelson777/iris-flower-classification

Iris Flower Classification: A machine learning project that classifies iris flowers into three species based on sepal and petal dimensions. Includes data exploration, visualization, and model evaluation using Python and scikit-learn.

classification data-science data-visualization iris-dataset jupyter-notebook machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/njaffe/eda_example_2025

Sample end-to-end data analysis walkthrough using Python and Scikit-learn.

data-science data-visualization jupyter-notebooks machine-learning python regression scikit-learn

Last synced: 09 May 2026

https://github.com/suvasish114/house-price-estimation

A machine learning model that estimate housing prices in California using the California census data

jupyter-notebook machine-learning python scikit-learn

Last synced: 09 May 2026

https://github.com/mpolinowski/fisher-discriminant-analysis

LDA is a widely used dimensionality reduction technique built on Fisher’s linear discriminant.

linear-discriminant-analysis matplotlib-pyplot python scikit-learn

Last synced: 10 May 2026

https://github.com/laavanjan/real_estate_price_prediction

This project predicts the house price per unit area based on various real estate features using a Linear Regression model. The application is built with Dash, a Python framework for building interactive web apps.

dash linear-regression pandas scikit-learn

Last synced: 10 May 2026

https://github.com/amirdora/python_ml_supervisedlearning_example

Building Classification Models with scikit-learn

machine-learning python3 scikit-learn

Last synced: 10 May 2026

https://github.com/chengetanaim/sentimentanalysisforfinancialnews

This is a Django application for predicting whether the sentiment of a financial news headline is positive, negative or neutral (from an investor point of view)

beautifulsoup4 chartjs django html-css-javascript logistic-regression machine-learning natural-language-processing scikit-learn tfidf-vectorizer webscraping

Last synced: 10 May 2026

https://github.com/hassanislam463/nyc_airbnb_eda

This project is a comprehensive data analysis of Airbnb listings in New York City, exploring pricing trends, seasonality effects, host market dynamics, rental preferences, and revenue estimation. It provides valuable insights for hosts, investors, and policymakers to optimize Airbnb operations and understand the short-term rental landscape in NYC.

exploratory-data-analysis matplotlib python scikit-learn seaborn

Last synced: 10 May 2026

https://github.com/ejw-data/ml-classification-credit-risk

Compares several machine learning classification models to determine whether to approve or reject a loan request

classification python scikit-learn

Last synced: 10 May 2026

https://github.com/afonsojramos/feup-iart

Projects developed for Artificial Intelligence class.

feup feup-iart iart neural-network python scikit-learn tensorflow

Last synced: 10 May 2026

https://github.com/i30101/mathworks2024

Coding tools for 2024 MathWorks Math Modeling Challenge

machine-learning mathematical-modelling python scikit-learn

Last synced: 10 Jun 2026

https://github.com/alphacrypto246/student-learning-style-prediction

An interactive web application built with Streamlit that predicts a student's preferred learning style (visual, auditory, or kinesthetic) using machine learning, aiding educators in personalizing teaching strategies.

machine-learning scikit-learn scikitlearn-machine-learning streamlit

Last synced: 11 May 2026

https://github.com/vijaykumarr1452/ipl-first-innings-score-prediction-deployment

Deployment of IPL Score Prediction Analyser Model. https://github.com/vijaykumarr1452/IPL-First-Innings-Score-Prediction)

css deployment gunicorn html machine-learning ml predictive-analytics python scikit-learn

Last synced: 11 May 2026

https://github.com/mpolinowski/tstochastic-neighbor-embedding

Improve Data Quality by discarding non-correlating, noisy Dimensions

matplotlib-pyplot python scikit-learn t-sne

Last synced: 11 May 2026