An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/sarowarahmed/advertising-sales-app

📈 Advertising Sales Predictor: A web app powered by a Machine Learning model, built with Numpy, Pandas, Scikit-learn, and Streamlit, to forecast sales based on TV, Newspaper, and Online Advertising. Deployed on Streamlit Cloud for real-time, easy-to-use predictions.

advertising app machine-learning multiple-linear-regression numpy pandas sales scikit-learn streamlit

Last synced: 07 Feb 2026

https://github.com/danicaalana/wine-dataset-decision-tree

This project is developed as part of Digital Skill Fair (DSF) 35.0 - Data Science by Dibimbing. I am using Wine Recognition Dataset from scikit-learn, which is the results of a chemical analysis of wines grown in the same region in Italy by three different cultivators.

data data-analysis-python data-science decision-tree-classification machine-learning python scikit-learn wine-dataset

Last synced: 18 Apr 2026

https://github.com/max00358/sign_language_detection

A sign language detector that recognizes ASL(American Sign Language) alphabet

mediapipe opencv scikit-learn

Last synced: 09 Feb 2026

https://github.com/cego669/dirtycategoriesencoding

Repository containing two classes (StringAgglomerativeEncoder and StringDistanceEncoder) useful for grouping or visualizing the distance between dirty categorical variables. They are compatible with the scikit-learn API.

category clustering dimensionality-reduction dirty hierarchical-clustering machine-learning scikit-learn singular-value-decomposition svd

Last synced: 11 Feb 2026

https://github.com/mindkerchief/baselineml

A collection of machine learning task performed during my studies in computer science major in intelligent system.

decision-tree dummy gaussian-mixture-models kmeans-clustering linear-regression logistic-regression machine-learning matplotlib numpy pandas random-forest scikit-learn seaborn tensorflow

Last synced: 16 Apr 2026

https://github.com/mattia-hulathduwage/wine-quality-analyzer

A machine learning project that analyzes wine quality using clustering, regression, and classification techniques. The model predicts wine quality scores based on chemical properties and determines the most influential features affecting quality.

machine-learning matplotlib numpy pandas python scikit-learn seaborn

Last synced: 16 Apr 2026

https://github.com/silky-x0/spam-detector

An machine learning algorithm to detect spam emails or such.

jupyter-notebook nltk-python pandas python3 scikit-learn

Last synced: 16 Apr 2026

https://github.com/sanikamal/deep-learning-atoz

A collection of deep learning architectures ,model, code snippets, tips and mini projects.

computer-vision deep-learning nlp scikit-learn skimage tensorflow

Last synced: 16 Apr 2026

https://github.com/drorata/mnist-examples

ML examples for the MNIST dataset

machine-learning ml mnist python scikit-learn torch

Last synced: 19 Apr 2026

https://github.com/ejw-data/proj-food-inspections

Analyzing Chicago Food Inspection data for interesting insights by combining multiple data resources and performing feature engineering.

decision-trees pandas preprocessing python scikit-learn

Last synced: 17 Apr 2026

https://github.com/vaishnavis03/finlatics_ml_program

This repository contains the .ipynb files for 3 datasets, along with a PPT for each. The datasets included are Facebook Marketplace Data, Sales Prediction Data, and Wine Quality data.

correlation data-analysis data-science data-visualization knn linear-regression machine-learning matplotlib numpy pandas random-forest-classifier scikit-learn

Last synced: 17 Apr 2026

https://github.com/kheriberto/knn_project

This is a simple project that uses dummie data to practice and demonstrate my knowledge of the KNN algorithm.

data-analysis knn-classifier numpy python scikit-learn seaborn

Last synced: 02 Apr 2026

https://github.com/broodhoney/blue-book-for-bulldozers

This repository holds the project which solves a regression problem on predicting the futures sales of bulldozers. This is from a kaggle competition.

matplotlib numpy pandas python scikit-learn

Last synced: 02 Apr 2026

https://github.com/prashver/end-to-end-model-deployment-on-aws

Student Performance Analysis with Machine Learning analyzes factors impacting student outcomes using a robust machine learning pipeline. Achieving an impressive R2 score, it predicts student performance effectively. With extensive data preprocessing and deployment on AWS Elastic Beanstalk, it ensures scalability and high availability.

amazon-web-services aws-elastic-beanstalk end-to-end-deployment flask machine-learning-algorithms matplotlib numpy pandas scikit-learn seaborn

Last synced: 02 Apr 2026

https://github.com/nikhilgugwad/sentiment-analysis

Sentiment analysis for the Kannada language to classify Kannada sentences into different emotions.

numpy pandas scikit-learn

Last synced: 17 Apr 2026

https://github.com/mangesh-balkawade/pythonautomationsscripts

This is the repository which contains the python automations scripts and machine learning case studies , and Python Projects that I have write to learn automations and ML using python.

automation data-science machine-learning-algorithms matplotlib mongodb pandas python3 scikit-learn seaborn webscraping

Last synced: 13 Apr 2026

https://github.com/pablonunes/houseprediction

This a simple model to predict housing price in King County in Washingthon. Uses Scikit Learn, Numpy. Seaborn, Pandas, Scipy.

housing-data housing-prices scikit-learn scikitlearn-machine-learning seaborn

Last synced: 17 Apr 2026

https://github.com/shaharband/calcofi-oceanographic-analysis

This repository contains an analysis of the CalCOFI (California Cooperative Oceanic Fisheries Investigations) dataset, which represents one of the longest and most complete time series of oceanographic and larval fish data in the world.

pandas regression scikit-learn

Last synced: 10 May 2026

https://github.com/mnj-tothetop/english-handwritten-characters-recognizer

A handwritten english character recognizer [0-9, A-Z, a-z] made by using a Dataset of 3409 images. Tensorflow, Keras, Scikit-learn, and OpenCV was used to implement the Convolution Neural Network (CNN). Matplotlib and Seaborn were used to visualize the data.

artificial-intelligence convolutional-neural-networks keras matplotlib opencv-python scikit-learn seaborn tensorflow

Last synced: 18 Apr 2026

https://github.com/justsecret123/nba-players-stats-analysis

A quick interactive Notebook to visualize some NBA players stats (points, assists, steals, blocks...) and totals, rankings and comparisons. Feel free to add any player in the .csv data files. 🏀

csv ipython-notebook ipywidgets jupyter-notebook jupyterlab matplotlib pandas python scikit-learn seaborn

Last synced: 18 Apr 2026

https://github.com/sentinel-ml/sentinel_ai

Machine Learning Model to detect fraud in financial systems

ai python pytorch scikit-learn security security-tools tensorflow

Last synced: 04 Apr 2026

https://github.com/mnitin-reddy/a-b-testing-and-regression-analysis-for-ad-performance-optimization

Analyzed the performance of Facebook and AdWords ads using A/B testing and regression analysis to identify trends, correlations, and cost-effectiveness. Key insights included distribution of clicks and conversions, monthly trends, and cost-per-conversion analysis to optimize ROI.

abtesting data-science hypothesis-testing machine-learning matplotlib numpy pandas scikit-learn scipy seaborn statsmodels

Last synced: 04 Apr 2026

https://github.com/chengetanaim/high-school-alcoholism-and-academic-performance

Student Alcoholism and Academic Performance Data Analysis

jupyter-notebook scikit-learn

Last synced: 18 Apr 2026

https://github.com/sundanc/weatherprediction

This project implements a weather prediction system that predicts the temperature based on real-time weather data, including features like humidity, wind speed, and day-related features (day of the week, month

machine-learning machinelearning numpy pandas programming python scikit-learn scikitlearn-machine-learning weather-prediction

Last synced: 18 Apr 2026

https://github.com/hariprasath-v/machinehack-analytics-olympiad-2022

Create a machine learning model to help an insurance company understand which claims are worth rejecting and the claims which should be accepted for reimbursement.

catboost-classifier exploratory-data-analysis logloss machinehack numpy optuna pandas python scikit-learn shap

Last synced: 18 Apr 2026

https://github.com/simrandalal/semantic-book-recommender

A semantic content-based book recommender using sentence-transformer embeddings, cosine similarity, and a Streamlit interface.

dotenv huggingface-transformers nlp-machine-learning pandas python scikit-learn similarity-search streamlit

Last synced: 05 Apr 2026

https://github.com/deliprofesor/game-search-volume-prediction-machine-learning-models-and-forecasting

This repository uses machine learning models like Random Forest, XGBoost, LightGBM, and time-series forecasting with Prophet to predict game search volumes. Additionally, Grid Search is applied for hyperparameter tuning of the LightGBM model.

data-cleaning data-science data-visualization feature-selection forecasting-models game-search grid-search hyperparameter-tuning lightgbm machine-learning pandas prophet python random-forest scikit-learn time-series-analysis time-series-forecasting xgboost

Last synced: 18 Apr 2026

https://github.com/malick08012/heart-disease-prediction

A machine learning project that predicts the risk of heart disease based on patient health data. Includes data cleaning, EDA, visualization, model training, evaluation and feature importance analysis

artificial-intelligence heartdisease-prediction logistic-regression machine-learning python scikit-learn

Last synced: 18 Apr 2026

https://github.com/jeffandyalltogether/mlrecommendationsystem

project code for a recommendation system for Amazon using collaborative filtering, ranking, and matrix factorization to enhance customer satisfaction and product discovery.

eda matplotlib pandas python scikit-learn seaborn tensorflow

Last synced: 05 Apr 2026

https://github.com/yashrajgithub/crop-recommendation

KrishiGyaan is a web app designed to help farmers make informed decisions on crop selection. By analyzing soil and environmental factors, the app provides personalized crop recommendations, enhancing agricultural productivity and promoting sustainable farming practices.

api artificial-intelligence crop-recommendation-system data-preprocessing data-visualization json machine-learning-algorithms pickle python random-forest-classifier scikit-learn streamlit supervised-learning train-test-split user-interface

Last synced: 05 Apr 2026

https://github.com/emilyfelker/ieee_cis_fraud_detection

Which online transactions are fraudulent? Program that uses various machine learning algorithms to detect fraud.

decision-trees kaggle logistic-regression machine-learning neural-network pandas poetry pytest python scikit-learn sklearn tensorflow xgboost

Last synced: 05 Apr 2026

https://github.com/lexxai/goit_python_ds_hw_04

Модуль 4. Класифікація та оцінка роботи моделі. Лінійна регресія: перенавчання та регуляризація

lasso-regression linear-regression numpy pandas python red regression ridge-regression scikit-learn

Last synced: 05 Apr 2026

https://github.com/kheriberto/linear_regression_ecommerce

Simple project showcasing crafting a linear regression model with SciKit Learn

data-analysis jupyter-notebook linear-regression pandas python scikit-learn seaborn

Last synced: 19 Apr 2026

https://github.com/sulcer/fair-boost-regression

FairBoosting Regression

python scikit-learn

Last synced: 12 Jun 2026

https://github.com/prahaladhchandrahasan/housingprices_adavanced_regression

A machine learning model for "House Prices: Advanced Regression Techniques" kaggle competition.

machine-learning-algorithms matplotlib-pyplot numpy pandas python3 scikit-learn

Last synced: 20 Apr 2026

https://github.com/ghufranbarcha/linear-regression-training-app

This project is a Streamlit application that allows users to upload a CSV file, select variables, and train a linear regression model. The app provides an easy-to-use interface for selecting dependent and independent variables, scaling data, applying polynomial regression, and evaluating model performance.

data-science machine-learning python scikit-learn streamlit

Last synced: 20 Apr 2026

https://github.com/5hraddha/megaline-plan-recommendations

Megaline is a telecom operator and it offers its clients two prepaid plans, Surf and Ultimate.Megaline has found out that many of their subscribers use legacy plans. They want to develop a model that would analyze subscribers' behavior and recommend one of Megaline's newer plans: Smart or Ultra.

decision-tree-classifier logistic-regression random-forest-classifier scikit-learn supervised-learning

Last synced: 22 Apr 2026

https://github.com/sarangs1621/weather-prediction

Weather Prediction Using Machine Learning is a project that leverages machine learning algorithms to predict weather conditions based on historical data. It evaluates three popular ML models (Decision Tree, KNN, and Logistic Regression) and provides performance insights through metrics and visualizations.

data-analysis decision-tree jupyter-notebook knn logistic-regression machine-learning predictive-modeling python scikit-learn weather-prediction

Last synced: 25 Apr 2026

https://github.com/a-n-i-t-t-a/credit_card_fraud_detection

Fraudulent transactions are a growing concern in the financial sector, and leveraging machine learning can help detect anomalies in real-time. I built a Credit Card Fraud Detection System using the K-Nearest Neighbors (KNN) algorithm, trained on a dataset with key transaction patterns.

flask knn-classifier machine-learning pandas python scikit-learn

Last synced: 27 Apr 2026

https://github.com/mihirmakwana03/ci7521-cw1-notebook

Multi-class classification on imbalanced data — 8 sklearn classifiers + SMOTE + ROC-AUC benchmarking. Kingston CI7521 CW1.

classification hyperparameter-tuning imbalanced-data machine-learning scikit-learn smote

Last synced: 27 Apr 2026

https://github.com/sundanc/movierecommendation

Movie recommendation system based on user input. Built with Streamlit

movie-recommendation-app python scikit-learn scikitlearn-machine-learning streamlib

Last synced: 27 Apr 2026

https://github.com/davidrpugh/kaust-dsa-201

Course materials for KAUST DSA 201

deep-learning machine-learning pytorch scikit-learn

Last synced: 27 Apr 2026

https://github.com/hai4320/ml_ai_notebook

All my note about ML, AI and Data Science

ai machine-learning numpy pandas scikit-learn

Last synced: 28 Apr 2026

https://github.com/emmanuelletocs/steam-game-recommender

A powerful recommendation system for Steam games, combining Content-Based and Collaborative Filtering techniques. Built with Python, Scikit-learn, and Streamlit to deliver accurate, real-time game recommendations. Perfect for gamers and data scientists interested in building intelligent recommendation engines.

als-algorithm data-analysis gaming-industry knn machine-learning mds mysql ncf neural-network pyspark recommendation-engine recommendation-system scikit-learn spark

Last synced: 28 Apr 2026

https://github.com/findthehead/pentestpayload

A KNN algorithm based Web Application Payload search and modification engine with a nice red FLASK based GUI

knn-classification knn-regression machine-learning pentest-tool scikit-learn websecurity

Last synced: 28 Apr 2026

https://github.com/arnab-0053/song-identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing

Last synced: 28 Apr 2026

https://github.com/rakibhhridoy/customersegmentation-clustering

Customer segmentation heavily use in business purpose. It is needed skill for business intelligence and applied machine learning engineer. This represent quite basic way the customer segmentation is done. In python the task is quite easy to do.

agglomerative-clustering clustering-algorithm customer ecommerce kmeans-clustering machine-learning scikit-learn scikitlearn-machine-learning segmentation unsupervised-learning unsupervised-machine-learning

Last synced: 28 Apr 2026

https://github.com/belsabbagh/employee-turnover-and-customer-churn-classification

A data science project that tests mutliple models on an employee tunronver and customer churn problem

machine-learning pandas python scikit-learn

Last synced: 28 Apr 2026

https://github.com/jarif87/text-key-extractor

A Django web app that uses TF-IDF to extract keywords from text, featuring a modern, responsive UI with animated gradients and glassmorphism.

django-application keywords-extraction pandas python scikit-learn

Last synced: 29 Apr 2026

https://github.com/m-muecke/text-normalizer

Text normalizer integration for sklearn.pipeline.Pipeline class

nlp nltk python scikit-learn

Last synced: 29 Apr 2026

https://github.com/nahom32/mlp-assignment

This repository is an implementation for machine learning assignment demonstrating the machine learning process.

eda logistic-regression machine-learning scikit-learn

Last synced: 29 Apr 2026

https://github.com/henriqueotogami/imersao-dados-3-alura

Terceira edição da Imersão Dados da Alura (03 a 07/05/21). O projeto dessa edição foi inspirado em um desafio do Laboratory Innovation Science at Harvard disponibilizado no Kaggle.

alura bioinformatics data-science drug-discovery google-collab harvard-university imersaodados jupyter-notebook kaggle-challenge laboratory-innovation-science matplotlib pandas python3 scikit-learn seaborn

Last synced: 29 Apr 2026

https://github.com/diestok/bmlb2025

Material for the BMLB2025 course

classification keras learning machine regression scikit-learn

Last synced: 29 Apr 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/tbarlow12/learn-it-your-way

Using Python Flask, I wanted to create a simple web API that allows users to upload a dataset, choose one or more models, store them server side, and then hit an endpoint to get a prediction.

flask machine-learning python scikit-learn tensorflow

Last synced: 29 Apr 2026

https://github.com/jarif87/tune-popularity-app

Flask web app to predict song popularity using CatBoost. Enter five song features for instant predictions. Modern, responsive UI, no CSRF for development.

catboost-classifier eda flask-application matplotlib-python music-classification python scikit-learn seaborn

Last synced: 30 Apr 2026

https://github.com/boladjivinny/fire-prediction

Notebook for the Fire fighting using data on Zindi. Ranked number 5 on the public leaderboard and 8 on the private leaderboard. https://zindi.africa/hackathons/cmu-africa-fighting-fire-with-data

feature-engineering hackhathon machine-learning regression scikit-learn stacking

Last synced: 30 Apr 2026

https://github.com/fadlani-aditya/iris-plant-classification

This project focuses on classifying different species of Iris flowers using the Random Forest algorithm. The dataset, sourced from Scikit-learn, contains four key features: sepal length, sepal width, petal length, and petal width, which are used to predict the flower species (Setosa, Versicolor, and Virginica).

agriculture data-science iris-dataset machine-learning python scikit-learn supervised-learning

Last synced: 01 May 2026

https://github.com/deepthipathlawath20/emotion-recognition-bimodal

Bimodal emotion recognition (face + speech) with feature-level fusion and classic ML classifiers.

audio computer-vision emotion-recognition knn mfcc multimodal navie-bayes-algorithm python scikit-learn svm tensorflow

Last synced: 01 May 2026

https://github.com/barbarahayd/com410-ml

atividades aula machine learning

decision-tree scikit-learn

Last synced: 01 May 2026

https://github.com/luthfiwulandari/machine-learning-breast-cancer

This project is a simple application that uses logistic regression to detect breast cancer. It classifies tumors as either malignant or benign based on the dataset provided by Scikit-learn.

datascience jupyter logistic-regression machine-learning python scikit-learn

Last synced: 01 May 2026

https://github.com/dhruvv1402/spam-detection-python-

This project is a Spam Detection System built using Python. It classifies SMS messages as spam or ham (not spam) using machine learning techniques.

countvectorizer kaggle-dataset nlp-machine-learning nltk numpy pandas python scikit-learn supervised-machine-learning tf-idf

Last synced: 01 May 2026

https://github.com/dmschauer/aws-sagemaker-deployment-test

I did a simple test to see how deploying a machine learning model on AWS Sagemaker and thus turning it into an API works. Since scikit-learn models require less dependencies than e.g. TensorFlow models I went with them for this test. To do so I used a tutorial.

aws boto3 python sagemaker scikit-learn

Last synced: 02 May 2026

https://github.com/pierrekieffer/datapreprocessing

Custom data preprocessing library made for machine learning

data-preparation data-preprocessing machine-learning preprocessing scikit-learn

Last synced: 02 May 2026

https://github.com/viniciusds2020/ml_pycaret_classificacao

Sistema de preprocessamento e treinamento de modelos de machine learning utilizando PyCaret. Uma metodologia low-code para processos de MLops

machine-learning mlops preprocessing pycaret python scikit-learn

Last synced: 03 May 2026

https://github.com/alessandromonolo/fraud-detection-binary-classification-model

This project builds a machine learning model to classify fraudulent clients using a banking dataset. Data preprocessing, statistical analysis, and feature selection were performed before training KNN and Random Forest Classifier. Model performance was evaluated using accuracy, precision, recall, and F1-score.

classification-model fraud-detection knn-classification machine-learning pandas python random-forest scikit-learn statistical-analysis

Last synced: 03 May 2026

https://github.com/arrhythmia-detection/authorprovidedfeaturescombineddt

Deploys a vanilla Decision Tree for Arrhythmia classification using Chapman ECG dataset on Arduino UNO board

arduino-uno arrhythmia-classification atmega328p chapman-ecg decision-tree-classifier eloquent scikit-learn

Last synced: 09 Jun 2026

https://github.com/srisaihariharan/mic_sentiment_analysis_v

Sentiment analysis of IMDb movie reviews using Python, Scikit-learn, and TF-IDF.

machine-learning natural-language-processing nlp python scikit-learn sentiment-analysis sentiment-classification

Last synced: 03 May 2026

https://github.com/abdiasarsene/predictive-churn-management-data-driven-customer

Use unsupervised learning techniques to segment a company’s customers into distinct groups in order to personalize marketing campaigns. To ultimately propose specific marketing strategies for each customer segment based on the insights obtained.

acp kmeans-clustering matplotlib pandas plotly python scikit-learn seaborn

Last synced: 03 May 2026

https://github.com/pramodyasahan/binary-classifier

This repository houses the code for a machine learning model designed to predict customer churn. The model is built using Support Vector Machine (SVM) from the scikit-learn library and incorporates preprocessing, pipeline, and grid search techniques for optimal performance.

numpy pandas scikit-learn

Last synced: 03 May 2026

https://github.com/atchayaah/home-value-insights-kc

Data-driven project predicting King County housing prices using EDA, regression models, and ML techniques, developed as part of IBM’s Data Analysis with Python course on Coursera.

joblib matplotlib numpy pandas pickle python scikit-learn seaborn

Last synced: 03 May 2026

https://github.com/samarth4023/shell-internship-2

🤖 AICTE Shell Internship - NLP Chatbot This repository contains the implementation of a Chatbot using NLP, developed as part of the AICTE Shell Internship. The chatbot is designed to understand and respond to user queries using Natural Language Processing (NLP) techniques.

ai artificial-intelligence chatbot natural-language-processing nlp nltk python scikit-learn streamlit

Last synced: 04 May 2026

https://github.com/baponkar/scikit-logisticregression-application

A simple and detail application analysis of sci kit learn LogisticRegression model .

classification-algorithm logistic-regression machine-learning python3 scikit-learn

Last synced: 04 May 2026

https://github.com/homebackend/pdf-title-page-splitter

Splits a pdf based on identified title pages using ML trained model

machine-learning opencv pdf-splitter pdf2image pypdf2 scikit-learn tensorflow

Last synced: 04 May 2026

https://github.com/satvikpraveen/sklearn-mastery

Enterprise-grade ML framework showcasing advanced Scikit-Learn implementations with production-ready pipelines, algorithm-optimized synthetic data generation, comprehensive evaluation suite with statistical testing, custom transformers, ensemble methods, and real-world industry applications across healthcare, finance, and manufacturing domains.

artificial-intelligence ci-cd classification custom-transformers data-science docker ensemble-methods feature-engineering fintech fraud-detection healthcare-ai hyperparameter-tuning jupyter-notebooks machine-learning mlops model-evaluation pipeline-architecture predictive-maintenance python scikit-learn

Last synced: 04 May 2026

https://github.com/bhawnamehbubani/airline-passenger-referral-program-development-with-classification-techniques

Prediction of airline passenger referrals using Logistic Regression, GridSearchCV, and TF-IDF vectorization with Python, Pandas, Scikit-learn, and Excel.

excel gridsearchcv logistic-regression pandas python3 scikit-learn tf-idf-vectorization

Last synced: 04 May 2026