An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/sulcer/fair-boost-regression

FairBoosting Regression

python scikit-learn

Last synced: 12 Jun 2026

https://github.com/prahaladhchandrahasan/housingprices_adavanced_regression

A machine learning model for "House Prices: Advanced Regression Techniques" kaggle competition.

machine-learning-algorithms matplotlib-pyplot numpy pandas python3 scikit-learn

Last synced: 20 Apr 2026

https://github.com/bruceunx/ai-simulator

aiplayground 人工智能学习乐园

ai maching-learning scikit-learn

Last synced: 20 Apr 2026

https://github.com/tryomar/data-miner

DataMiner is an interactive web application for data mining and machine learning. It helps users upload, clean, transform, and analyze datasets while building predictive models — all through a simple and powerful Streamlit interface.

data-cleaning data-mining data-preprocessing data-science data-visualization interactive-dashboards pandas python scikit-learn streamlit

Last synced: 20 Apr 2026

https://github.com/sayan-mondal2022/mlops-assignment

A project for validating the Machine learning models

machine-learning scikit-learn streamlit

Last synced: 22 Apr 2026

https://github.com/waikato-datamining/spectral-data-converter-sklearn

Scikit-learn plugins for the spectral-data-converter library.

kasperl scikit-learn sdc seppl spectral-data

Last synced: 24 Apr 2026

https://github.com/hoccyy/house-price-prediction

Machine learning model built with Scikit-learn to predict house prices based on various features.

linear-regression machine-learning ml pickle prediction-model scikit-learn scikitlearn-machine-learning

Last synced: 24 Apr 2026

https://github.com/leolion3/smartnanotubes-smellinspector-companion

Companion software for the SmellInspector Devices from SmartNanoTubes. Allows specifying substances, connecting multiple devices, collecting data and performing machine learning.

docker machine-learning python3 reactjs scikit-learn smartnanotubes smellinspector

Last synced: 27 Apr 2026

https://github.com/tillscode/personal-finance-ml-analysis

Machine learning analysis of personal financial data with predictive modeling and interactive dashboard

dashboard data-analysis finance machine-learning python scikit-learn

Last synced: 28 Apr 2026

https://github.com/dwade-eng/amazon-product-recommender-prototype-

This project is a content-based product recommendation engine inspired by Amazon's "Customers who viewed this item also viewed" feature. It uses a dataset of product metadata and user interactions to suggest similar items based on product titles, brands, and categories using TF-IDF vectorization and cosine similarity.

html numpy pandas python3 scikit-learn

Last synced: 28 Apr 2026

https://github.com/rajivaleaakash/customer-churn-prediction

A machine learning project focused on predicting customer churn using various data analysis and modeling techniques. The repository includes data preprocessing, feature engineering, exploratory data analysis (EDA), model training, evaluation, and visualization to help businesses identify customers at risk of leaving.

churn-prediction classification customer-churn data-analysis data-science gridsearchcv imblearn machine-learning numpy pandas pyhton randomsearchcv scikit-learn

Last synced: 28 Apr 2026

https://github.com/findthehead/pentestpayload

A KNN algorithm based Web Application Payload search and modification engine with a nice red FLASK based GUI

knn-classification knn-regression machine-learning pentest-tool scikit-learn websecurity

Last synced: 28 Apr 2026

https://github.com/catcoder27/ai-portfolio

Reusable ML scaffold: notebooks, model cards, reports

data-science kaggle machine-learning pandas scikit-learn

Last synced: 28 Apr 2026

https://github.com/incalculable-driverslicence975/data-projects-portfolio

📊 Showcase data projects that highlight analytics, machine learning, and MLOps with reproducible code and clear business insights.

ai computer-vision dashboard data-science-projects data-visualization deep-learning etl excel finance hadoop hiveq keras machine-learning nlp pandas portfolio-project scikit-learn tableau-dashboards

Last synced: 28 Apr 2026

https://github.com/arnab-0053/song-identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing

Last synced: 28 Apr 2026

https://github.com/razalkr70/customer-segmentation-using-dataset

A data science project that segments mall customers using K-Means clustering. Based on age, income, and spending score, it identifies customer groups and visualizes them with 2D and 3D plots for targeted marketing insights.

clustering customer-segmentation data-science data-visualization kmeans machine-learning pca python scikit-learn

Last synced: 28 Apr 2026

https://github.com/alessine/predicting_pirate_attack_success

Using machine learning to predict the success or failure of pirate attacks; elaborated during the Data Science Bootcamp at Propulsion Academy

bokeh fine-tuning interactive-visualizations machine-learning modelling overfitting plotly prediction scikit-learn

Last synced: 28 Apr 2026

https://github.com/michaelzheng67/ml_classification_optimizer

Algorithm that determines best machine learning classification model to use for a given dataset. Written in Python.

classification machine-learning python scikit-learn

Last synced: 29 Apr 2026

https://github.com/thekartikeyamishra/interactive-ai-model-trainer

A Google Colab notebook for interactively training simple AI/ML classification models. Features CSV upload, dummy data generation, feature/target selection, model choice, and basic evaluation. Includes a user-friendly UI. Ideal for educational purposes. See screenshots below!

google googlecolab ipywidgets matpl numpy pandas python scikit-learn seaborn

Last synced: 29 Apr 2026

https://github.com/skypse/santander-coders-data_science-course

Curso de Data Science, proposto pelo Satander, utilizando Python!

jupyter-notebook numpy pandas-python python scikit-learn

Last synced: 29 Apr 2026

https://github.com/jarif87/text-key-extractor

A Django web app that uses TF-IDF to extract keywords from text, featuring a modern, responsive UI with animated gradients and glassmorphism.

django-application keywords-extraction pandas python scikit-learn

Last synced: 29 Apr 2026

https://github.com/vaishnavijain25/pca-based-digit-classification

A machine learning project that uses Principal Component Analysis (PCA) for dimensionality reduction and Logistic Regression for classifying handwritten digit images from the scikit-learn digits dataset.

digit-recognition dimensionality-reduction image-classification logistic-regression machine-learning pca-analysis scikit-learn

Last synced: 29 Apr 2026

https://github.com/fx31337/predict_zigzag

Prototype code to predict zigzag pattern prices.

machine-learning ml scikit-learn

Last synced: 29 Apr 2026

https://github.com/adnanrahin/sentiment_classification_logistic_regeression

Sentiment Analysis extracts subjective information in the source material. It's widely used in modern business, to understand the business module, product quality and consumer point of view regarding the products or the business.

logistic-regression machine-learning natural-language-processing preprocessing python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/gustaminas/ai_primer---flatland

A project from the AI_primer course at Vilnius university.

cnn-keras data-augmentation data-mixup dropout-keras scikit-learn shape-classification

Last synced: 29 Apr 2026

https://github.com/nahom32/mlp-assignment

This repository is an implementation for machine learning assignment demonstrating the machine learning process.

eda logistic-regression machine-learning scikit-learn

Last synced: 29 Apr 2026

https://github.com/pdoup/ml-codes

Python source files and notebooks for the Machine Learning course weekly tasks

machine-learning scikit-learn

Last synced: 29 Apr 2026

https://github.com/andreaschatzopoulos/face-landmark-detector

Facial landmark detection using HOG features and Ridge Regression. Simple, effective, and fast – no deep learning required.

computer-vision face-detection hog image-processing landmark-detection python ridge-regression scikit-learn

Last synced: 29 Apr 2026

https://github.com/karimosman89/energy-consumption-forecasting

Predict future energy consumption based on historical data.Create a model that predicts energy consumption in households or businesses to optimize energy distribution and reduce costs.Assist energy companies in planning and managing supply efficiently.

arima lstm matplotlib pandas python scikit-learn

Last synced: 29 Apr 2026

https://github.com/matheusvazdata/retail-sales-forecast-linreg-sklearn

Minimal project for retail sales forecasting using linear regression (scikit-learn).

forecasting linear-regression machine-learning matplotlib numpy pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/henriqueotogami/imersao-dados-3-alura

Terceira edição da Imersão Dados da Alura (03 a 07/05/21). O projeto dessa edição foi inspirado em um desafio do Laboratory Innovation Science at Harvard disponibilizado no Kaggle.

alura bioinformatics data-science drug-discovery google-collab harvard-university imersaodados jupyter-notebook kaggle-challenge laboratory-innovation-science matplotlib pandas python3 scikit-learn seaborn

Last synced: 29 Apr 2026

https://github.com/karmaniket/gtavcontrol

created dataset using different hand gestures and trained the ML model for in-game real time control for GTA V. Have fun!

gaming gta5 machine-learning mediapipe opencv python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/mertafacan/fertilizer-prediction-kaggle-playground-s05e06

Top 9% in Kaggle Playground Series - Predicting Optimal Fertilizers - Season 5, Episode 6

catboost kaggle kaggle-competition machine-learning optuna scikit-learn xgboost

Last synced: 29 Apr 2026

https://github.com/mukeshthenraj/fraud-detection-model

Logistic Regression, Grid Search, and ROC-PR curve evaluation on fraud detection dataset

classification fraud-detection machine-learning numpy pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/diestok/bmlb2025

Material for the BMLB2025 course

classification keras learning machine regression scikit-learn

Last synced: 29 Apr 2026

https://github.com/xbants/recommendation-api

🎬 Intelligent movie recommendation system with FastAPI backend, Streamlit frontend, and collaborative filtering ML. Rate movies, get personalized suggestions, and enjoy automatic model retraining.

fastapi machine-learning movie-recommedation python3 scikit-learn streamlit

Last synced: 29 Apr 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/tasnimtalha09/la-crime-analysis-from-2020-to-2025

As part of an academic project, this analysis dig into the crime statistics of Los Angeles Police Department (LAPD) from the year 2020 till 2025.

jupyter jupyter-notebook jupyter-notebooks machine-learning matplotlib pandas python python-3 python3 scikit-learn seaborn sklearn

Last synced: 29 Apr 2026

https://github.com/nikhil-donthusaram/loanapprovalprediction-randomforest

A machine learning web app built using Random Forest Classifier to predict whether a loan will be approved or not based on applicant details. Built with Python, Streamlit, and scikit-learn.

classification jupyter-notebook machine-learning python random-forest scikit-learn streamlit vscode

Last synced: 29 Apr 2026

https://github.com/hexbyte-lab/resumatch

AI-powered resume-to-job matching tool with NLP analysis | Python + Flask + Machine Learning

cosine-similarity flask job-search machine-learning nltk portfolio-project python resume scikit-learn tfidf

Last synced: 29 Apr 2026

https://github.com/tbarlow12/learn-it-your-way

Using Python Flask, I wanted to create a simple web API that allows users to upload a dataset, choose one or more models, store them server side, and then hit an endpoint to get a prediction.

flask machine-learning python scikit-learn tensorflow

Last synced: 29 Apr 2026

https://github.com/jarif87/dna-based-identification-of-e.coli

Django web app predicting E. coli in DNA sequences using a machine learning model, with a responsive interface and client-side validation. Files generated by project.py.

classification django-application dna-sequences html-css-javascript mlp-classifier python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/abhinav330/instagram-influencers-analysis

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/rishi-sutar/healwise-ai-your-way-to-wellness

Healwise-AI is a health diagnostic tool that uses a Support Vector Classifier (SVC) model to predict diseases based on user-reported symptoms. After predicting, it offers detailed health advice, including descriptions, diets, medications, and workouts related to the diagnosis.

machine-learning scikit-learn support-vector-machine

Last synced: 30 Apr 2026

https://github.com/jarif87/tune-popularity-app

Flask web app to predict song popularity using CatBoost. Enter five song features for instant predictions. Modern, responsive UI, no CSRF for development.

catboost-classifier eda flask-application matplotlib-python music-classification python scikit-learn seaborn

Last synced: 30 Apr 2026

https://github.com/sjain2580/simple-linear-regression-model

This project demonstrates a simple, yet robust, multiple linear regression model built with Python and scikit-learn to predict median house values in California.

joblib linear-regression matplotlib matplotlib-pyplot numpy python scikit-learn

Last synced: 30 Apr 2026

https://github.com/ledsouza/machine-learning-semisupervisionado

Este projeto utiliza algoritmos de aprendizado de máquina semi-supervisionado para classificar a qualidade do leite como alta, média ou baixa.

data-science joblib machine-learning machine-learning-algorithms pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/das-debjit/emotion-detection

A simple ML-powered web app for real-time emotion detection from text using Streamlit and TF-IDF-based classification.

machine-learning nlp python scikit-learn sentiment-analysis streamlit text-classification tfidf web-app

Last synced: 30 Apr 2026

https://github.com/andrewjmack/cryptoclustering

The purpose of this project is to utilize knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. Methods for analysis include K-Means clustering and dimensional reduction through Principal Component Analysis ("PCA").

jupyter-notebook pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/maguids/supervised-learning---video-games

This project consists on exploratory data analysis and the application of supervised learning models for classification using a Video Games dataset. Second Semester of the First Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

jupyter-notebook machine-learning matplotlib numpy pandas scikit-learn seaborn supervised-learning

Last synced: 30 Apr 2026

https://github.com/rokuu010/boxing-match-predictor

Machine learning project to predict the outcomes of pro boxing matches using Dataset/web-scraped data

boxing data-science machine-learning prediction-model python scikit-learn selenium sports-analytics

Last synced: 30 Apr 2026

https://github.com/pramodyasahan/grade-predictor

This project aims to predict student performance based on various features such as job, study time, failures, absences, and first and second period grades. The project utilizes a linear regression model from the scikit-learn library in Python.

machine-learning matplotlib numpy pandas python regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/smakde/learning-resource-recommender

A lightweight recommender that helps you discover your next learning resource. It blends patterns from similar users with content keywords, and explains each suggestion in the UI.

als content-based-filtering evaluation-metrics explainable-ai hybrid-recommender implicit-feedback implicit-lib lightfm logistic-matrix-factorization mapk matrix-factorization ndcg pandas precision-at-k python recommender-system scikit-learn streamlit tf-idf top-n-recommendations

Last synced: 30 Apr 2026

https://github.com/sayed-ashfaq/delhivery-dataanalysis

In this project, I conducted basic analysis, feature engineering, normalization, and outlier handling, along with statistical and non-parametric testing to extract insights.

feature-engineering normalization outlier-detection pandas python scikit-learn statistcal-tests statistical-analysis

Last synced: 30 Apr 2026

https://github.com/moritzkoerber/tune_preprocessing_algos

Files for this blogpost https://moritzkoerber.github.io/python/tutorial/2019/11/18/blogpost/

cross-validation hyperparameter-tuning machine-learning python scikit-learn

Last synced: 30 Apr 2026

https://github.com/tinaland101/credit-risk-classification

The purpose of this project is to build a credit risk classification model using machine learning techniques. This model helps identify the creditworthiness of borrowers based on historical lending data. Specifically, it uses a logistic regression model to predict whether a loan is healthy (0) or high-risk (1).

numpy pandas pathlib scikit-learn

Last synced: 30 Apr 2026

https://github.com/fikri-rouzan/student-stress-levels-classification

Proyek pemodelan machine learning untuk mengklasifikasikan tingkat stres mahasiswa berdasarkan parameter input akademik dan psikologis.

joblib jupyter-notebook matplotlib numpy pandas python scikit-learn seaborn streamlit

Last synced: 08 Jun 2026

https://github.com/fikri-rouzan/burnaway-capstone-data-science

Dashboard analitik interaktif untuk memetakan faktor fisik dan pola kerja pemicu burnout pada software developer.

jupyter-notebook matplotlib pandas pillow plotly python scikit-learn seaborn statsmodels streamlit

Last synced: 08 Jun 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/abhivur/connections-ai

Contributors: Meet Gamdha, Gaurav Nimmagadda

bert python scikit-learn word2vec

Last synced: 30 Apr 2026

https://github.com/kumailn/machinelearning

Machine learning with Python

machine-learning python scikit-learn tensorflow

Last synced: 30 Apr 2026

https://github.com/dharma-acha/explanability_in_deepneuralnetworks

Our project aims to enhance the transparency and trustworthiness of the VGG model in critical fields like healthcare imaging and self-driving cars. By integrating explainability methods into the VGG model for image classification, we will clarify its decision-making process.

colab-notebook matplotlib numpy pandas scikit-learn seaborn

Last synced: 30 Apr 2026

https://github.com/fbarffmann/credit-risk-classification

Classified 19,000+ loans as high-risk or healthy using logistic regression. Achieved 100% precision for healthy loans and 84% precision for high-risk loans.

classification credit-risk data-analysis logistic-regression machine-learning model-evaluation pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/boladjivinny/fire-prediction

Notebook for the Fire fighting using data on Zindi. Ranked number 5 on the public leaderboard and 8 on the private leaderboard. https://zindi.africa/hackathons/cmu-africa-fighting-fire-with-data

feature-engineering hackhathon machine-learning regression scikit-learn stacking

Last synced: 30 Apr 2026

https://github.com/themihirmathur/mihir-clickpost-data-science-intern-round-1-assignment-submission

The objective of this project is to predict the predicted_exact_sla, which is the number of days between the shipment and delivery of an order, using historical shipment data.

data-science machine-learning pandas python random-forest-regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/harshitwaldia/disease_detection

A disease detection system using Random Forest Classifier and GUI in Python, identifying illnesses based on user symptoms.

pandas-python python3 random-forest-classifier scikit-learn tkinter-gui

Last synced: 01 May 2026

https://github.com/fadlani-aditya/iris-plant-classification

This project focuses on classifying different species of Iris flowers using the Random Forest algorithm. The dataset, sourced from Scikit-learn, contains four key features: sepal length, sepal width, petal length, and petal width, which are used to predict the flower species (Setosa, Versicolor, and Virginica).

agriculture data-science iris-dataset machine-learning python scikit-learn supervised-learning

Last synced: 01 May 2026

https://github.com/myahninsi/customer-segmentation-recommendation-ml

This project addressed challenges in understanding customer behavior and personalizing shopping experiences for an e-commerce platform. Developed ML solutions including K-Means clustering for segmentation, Random Forest regression for CLV prediction, and collaborative filtering for product recommendations.

collaborative-filtering k-means-clustering pandas python random-forest scikit-learn

Last synced: 01 May 2026

https://github.com/himanshugoyal77/shell-detection-frontend

Fraud detection of companies using Machine learning and django

django scikit-learn

Last synced: 01 May 2026

https://github.com/arturovaine/n8n-nodes-sklearn

Custom n8n nodes for integrating scikit-learn machine learning algorithms into your n8n workflows.

machine-learning n8n n8n-nodes scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/deepthipathlawath20/emotion-recognition-bimodal

Bimodal emotion recognition (face + speech) with feature-level fusion and classic ML classifiers.

audio computer-vision emotion-recognition knn mfcc multimodal navie-bayes-algorithm python scikit-learn svm tensorflow

Last synced: 01 May 2026

https://github.com/kristishqau/sentimentanalysis_nlp

A project for sentiment analysis of tweets using various NLP techniques and machine learning models.

datascience jupyter-notebook machine-learning nlp nltk python scikit-learn sentiment-analysis xgboost

Last synced: 01 May 2026

https://github.com/barbarahayd/com410-ml

atividades aula machine learning

decision-tree scikit-learn

Last synced: 01 May 2026

https://github.com/antonio-f/housing-simplemlexample

Basic example with California Housing Prices dataset from the StatLib repository using scikit-learn

housing-simplemlexample machine-learning scikit-learn simple

Last synced: 01 May 2026

https://github.com/luthfiwulandari/machine-learning-breast-cancer

This project is a simple application that uses logistic regression to detect breast cancer. It classifies tumors as either malignant or benign based on the dataset provided by Scikit-learn.

datascience jupyter logistic-regression machine-learning python scikit-learn

Last synced: 01 May 2026

https://github.com/jlee9503/medical-readmission

Conduct an analysis of medical readmission status using hospital patient data and the Social Determinants of Health dataset. Identify key factors influencing readmission rates to provide insights for improving healthcare outcomes.

python random-forest-regression scikit-learn tableau

Last synced: 01 May 2026

https://github.com/dhruvv1402/spam-detection-python-

This project is a Spam Detection System built using Python. It classifies SMS messages as spam or ham (not spam) using machine learning techniques.

countvectorizer kaggle-dataset nlp-machine-learning nltk numpy pandas python scikit-learn supervised-machine-learning tf-idf

Last synced: 01 May 2026