An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/chengetanaim/high-school-alcoholism-and-academic-performance

Student Alcoholism and Academic Performance Data Analysis

jupyter-notebook scikit-learn

Last synced: 18 Apr 2026

https://github.com/eugen-goebel/predictive-analytics-agent

Automated ML pipeline — data profiling, preprocessing, model training, and evaluation report generation

automation data-science docker machine-learning predictive-analytics python scikit-learn streamlit

Last synced: 05 Apr 2026

https://github.com/hariprasath-v/machinehack-analytics-olympiad-2022

Create a machine learning model to help an insurance company understand which claims are worth rejecting and the claims which should be accepted for reimbursement.

catboost-classifier exploratory-data-analysis logloss machinehack numpy optuna pandas python scikit-learn shap

Last synced: 18 Apr 2026

https://github.com/alezoon/movie-revenue-prediction

Sk-learn practice using Linear Regression, ML workflow practice.

jupyter machine-learning matplotlib-pyplot numpy pandas python scikit-learn

Last synced: 05 Apr 2026

https://github.com/simrandalal/semantic-book-recommender

A semantic content-based book recommender using sentence-transformer embeddings, cosine similarity, and a Streamlit interface.

dotenv huggingface-transformers nlp-machine-learning pandas python scikit-learn similarity-search streamlit

Last synced: 05 Apr 2026

https://github.com/manalisbhavsar/mall-customers-clustering

K-Means clustering to mall customer data, segmenting customers based on their annual income and spending score. To identify patterns and group customers for targeted marketing.

data-analysis data-visualization matplotlib numpy pandas python scikit-learn

Last synced: 18 Apr 2026

https://github.com/naren1704/ml-approach-for-employee-performance-prediction

A Flask UI that predicts the performance of employee based on the XGBoost trained model.

css flask html python scikit-learn xgboost

Last synced: 05 Apr 2026

https://github.com/barek2k2/ml_ruby

Ruby gem uses Machine Learning(ML) techniques to make predictions and classifications, and it's powered by Python3 under the hood.

artificial-intelligence data-science machine-learning pandas prediction python3 ruby ruby-on-rails scikit-learn

Last synced: 05 Apr 2026

https://github.com/emilyfelker/ieee_cis_fraud_detection

Which online transactions are fraudulent? Program that uses various machine learning algorithms to detect fraud.

decision-trees kaggle logistic-regression machine-learning neural-network pandas poetry pytest python scikit-learn sklearn tensorflow xgboost

Last synced: 05 Apr 2026

https://github.com/oadultradeepfield/galaxy10-anomaly-detection

A public API and experimental PyTorch pipeline for anomaly detection in the Galaxy10 DECals dataset using ResNet50, autoencoders, and clustering techniques

flask google-cloud-run kaggle pytorch scikit-learn

Last synced: 05 Apr 2026

https://github.com/nowon1/insurance-claim-prediction_version

This project aims to predict the insurance claim amounts based on various customer attributes using machine learning techniques. The project involves data preprocessing, exploratory data analysis, feature engineering, and model training and evaluation.

data-preprocessing data-science data-visualization exploratory-data-analysis feature-engineering insurance jupyter-notebook machine-learning numpy pandas predictive-modeling python random-forest regression-analysis scikit-learn

Last synced: 05 Apr 2026

https://github.com/perpendicooler/elementary-research-for-steamboat-willie-s-store-in-poland

An elementary research for a company to opening store in a city using gurobi and pulp optimization.

christofides-algorithm gurobipy numpy pandas pulp python3 scikit-learn travelling-salesman-problem

Last synced: 05 Apr 2026

https://github.com/lorenzorottigni/ml-movies

Machine Learning python bootcamp: Recommender Systems on movies dataset

ipynb machine-learning numpy pandas python recommender-system scikit-learn seaborn

Last synced: 05 Apr 2026

https://github.com/thekartikeyamishra/ai-customer-feedback-summarizer

The AI Customer Feedback Summarizer is a Python-based application that processes customer feedback, extracts insights, and summarizes reviews. This basic version uses extractive summarization techniques, and the advanced version integrates advanced sentiment analysis, visualization, and industry-specific fine-tuning.

ai chatbot gpt machine-learning matplotlib nltk pandas python scikit-learn streamlit

Last synced: 18 Apr 2026

https://github.com/vijaykumarr1452/black_friday_sales_analysis

Black Friday Sales Analysis python machine learning project using pandas and scikit-learn for data preprocessing, model training, and performance evaluation.

confusion-matrix jupyter-notebook machine-learning pandas python random-forest-classifier sales-analysis scikit-learn

Last synced: 19 Apr 2026

https://github.com/kheriberto/linear_regression_ecommerce

Simple project showcasing crafting a linear regression model with SciKit Learn

data-analysis jupyter-notebook linear-regression pandas python scikit-learn seaborn

Last synced: 19 Apr 2026

https://github.com/yassin522/heartbeat-categorization

This project is aimed at developing a machine learning model that can accurately classify heartbeats as either normal or abnormal. The model is trained on a dataset of ECG (electrocardiogram) signals, which were collected from patients and labeled by medical professionals.

cnn deep-learning keras machine-learning scikit-learn tensorflow

Last synced: 20 Apr 2026

https://github.com/sulcer/fair-boost-regression

FairBoosting Regression

python scikit-learn

Last synced: 12 Jun 2026

https://github.com/vyjayanthipolapragada/car_mileage_prediction

Predicting the mileage of car using the linear regression model with Scikit-learn

kaggle-titanic linear-regression machine-learning numpy pandas predictive-modeling python scikit-learn

Last synced: 20 Apr 2026

https://github.com/namratha2301/carprice_analysisandprediction

This project analyzes factors influencing vehicle prices using a dataset of various attributes, including Engine capacity, Power, Mileage, and Seating capacity.

data-analysis data-visualization exploratory-data-analysis machine-learning pandas predictive-modeling random-forest-classifier regression scikit-learn seaborn

Last synced: 20 Apr 2026

https://github.com/bruceunx/ai-simulator

aiplayground 人工智能学习乐园

ai maching-learning scikit-learn

Last synced: 20 Apr 2026

https://github.com/grandechowhiskey/harvard-cs50-ai-projects

This project contains a collection of programming assignments from CS50’s Introduction to Artificial Intelligence with Python course.

html python scikit-learn tensorflow

Last synced: 20 Apr 2026

https://github.com/himasnhu-at/freecodecamp--ml

ML Models I built for my freeCodeCamp's Machine Learning with Python certification

freecodecamp freecodecamp-project machine-learning machine-learning-algorithms matplotlib pandas python scikit-learn

Last synced: 20 Apr 2026

https://github.com/tr-3n/-ai-powered-resume-analyzer-multi-source-job-matcher

AI-Powered Resume Analyzer & Multi-Source Job Matcher, is a web application built using Python and Streamlit that helps job seekers find the best job opportunities based on their resume. The app extracts text from uploaded resumes, matches it with job listings from multiple sources, and displays the most relevant jobs.

ai api html-css job job-recommendation job-search jobmatching natural-language-processing pandas pypdf2 python resume-analyzer scikit-learn streamlit web-development

Last synced: 20 Apr 2026

https://github.com/alphacrypto246/customer-churn

This project predicts customer churn using machine learning. It includes data preprocessing, exploratory analysis, model training, and evaluation to identify key factors driving churn and provide actionable insights for retention.

knn-classification machine-learning machine-learning-algorithms python scikit-learn scikitlearn-machine-learning

Last synced: 20 Apr 2026

https://github.com/chdl17/lead-score-case-study

Lead scoring is the process of assigning a numerical value or score to each lead, based on factors such as demographics and behavior, to determine their potential value as customers.

machine-learning-algorithms matplotlib-pyplot python scikit-learn

Last synced: 20 Apr 2026

https://github.com/ghufranbarcha/linear-regression-training-app

This project is a Streamlit application that allows users to upload a CSV file, select variables, and train a linear regression model. The app provides an easy-to-use interface for selecting dependent and independent variables, scaling data, applying polynomial regression, and evaluating model performance.

data-science machine-learning python scikit-learn streamlit

Last synced: 20 Apr 2026

https://github.com/yogeshsinghkatoch9/advanced_nyc_housing_price_prediction

A robust ensemble learning framework for advanced NYC housing price prediction, leveraging global, clustered, and local ensembles with hyperparameter tuning.

data-science ensemble-learning housing-prices machine-learning new-york python scikit-learn

Last synced: 21 Apr 2026

https://github.com/sayan-mondal2022/mlops-assignment

A project for validating the Machine learning models

machine-learning scikit-learn streamlit

Last synced: 22 Apr 2026

https://github.com/5hraddha/megaline-plan-recommendations

Megaline is a telecom operator and it offers its clients two prepaid plans, Surf and Ultimate.Megaline has found out that many of their subscribers use legacy plans. They want to develop a model that would analyze subscribers' behavior and recommend one of Megaline's newer plans: Smart or Ultra.

decision-tree-classifier logistic-regression random-forest-classifier scikit-learn supervised-learning

Last synced: 22 Apr 2026

https://github.com/hoccyy/house-price-prediction

Machine learning model built with Scikit-learn to predict house prices based on various features.

linear-regression machine-learning ml pickle prediction-model scikit-learn scikitlearn-machine-learning

Last synced: 24 Apr 2026

https://github.com/capsuleismail/parkinsons-telemonitoring-dataset

Dataset used to predict Parkinson’s disease severity based on biomedical voice measurements.

data-science jupyter-notebook machinelearning-python scikit-learn

Last synced: 25 Apr 2026

https://github.com/sarangs1621/weather-prediction

Weather Prediction Using Machine Learning is a project that leverages machine learning algorithms to predict weather conditions based on historical data. It evaluates three popular ML models (Decision Tree, KNN, and Logistic Regression) and provides performance insights through metrics and visualizations.

data-analysis decision-tree jupyter-notebook knn logistic-regression machine-learning predictive-modeling python scikit-learn weather-prediction

Last synced: 25 Apr 2026

https://github.com/bp0609/decision-tree-implementation-from-scratch

This repo contains the decision tree implementation from scratch for all possible cases i) discrete features, discrete output; ii) discrete features, real output; iii) real features, discrete output; iv) real features, real output.

decision-tree-classifier decision-tree-regressor scikit-learn

Last synced: 26 Apr 2026

https://github.com/leolion3/smartnanotubes-smellinspector-companion

Companion software for the SmellInspector Devices from SmartNanoTubes. Allows specifying substances, connecting multiple devices, collecting data and performing machine learning.

docker machine-learning python3 reactjs scikit-learn smartnanotubes smellinspector

Last synced: 27 Apr 2026

https://github.com/mihirmakwana03/ci7521-cw1-notebook

Multi-class classification on imbalanced data — 8 sklearn classifiers + SMOTE + ROC-AUC benchmarking. Kingston CI7521 CW1.

classification hyperparameter-tuning imbalanced-data machine-learning scikit-learn smote

Last synced: 27 Apr 2026

https://github.com/toscdom/spam_detection

This repository contains a project focused on analyzing and classifying emails to detect SPAM. It includes: Training a machine learning classifier for SPAM detection. Identifying key topics in SPAM emails using NLP techniques. Calculating semantic distances to evaluate topic similarity. Tools used include Python libraries like nlp frameworks

classifier nlp nltk scikit-learn semantic-analysis spam-detection

Last synced: 27 Apr 2026

https://github.com/sundanc/movierecommendation

Movie recommendation system based on user input. Built with Streamlit

movie-recommendation-app python scikit-learn scikitlearn-machine-learning streamlib

Last synced: 27 Apr 2026

https://github.com/capsuleismail/spambase

Classifying Email as Spam or Non-Spam with RandomForestClassifier

datascience jupyter-notebook machinelearning-python scikit-learn

Last synced: 28 Apr 2026

https://github.com/tillscode/personal-finance-ml-analysis

Machine learning analysis of personal financial data with predictive modeling and interactive dashboard

dashboard data-analysis finance machine-learning python scikit-learn

Last synced: 28 Apr 2026

https://github.com/hai4320/ml_ai_notebook

All my note about ML, AI and Data Science

ai machine-learning numpy pandas scikit-learn

Last synced: 28 Apr 2026

https://github.com/dwade-eng/amazon-product-recommender-prototype-

This project is a content-based product recommendation engine inspired by Amazon's "Customers who viewed this item also viewed" feature. It uses a dataset of product metadata and user interactions to suggest similar items based on product titles, brands, and categories using TF-IDF vectorization and cosine similarity.

html numpy pandas python3 scikit-learn

Last synced: 28 Apr 2026

https://github.com/akash-47-tank/predictive-customer-churn-analyzer

A professional-grade customer churn prediction system that not only predicts customer churn but also provides clear explanations for the predictions. Built with Python, XGBoost, and SHAP.

machine-learning pandas python scikit-learn shap streamlit xgboost

Last synced: 28 Apr 2026

https://github.com/nexus69420/movie-recommender-streamlit

A hybrid movie recommendation system that combines content-based filtering using NLP and collaborative filtering using SVD. Built with Python, Streamlit, and trained on TMDB and MovieLens data. Delivers personalized recommendations with a simple web interface.

collaborative-filtering content-based-recommendation data-science machine-learning nlp python recommendation-system scikit-learn streamlit svd

Last synced: 28 Apr 2026

https://github.com/arnab-0053/song-identifier

It identifies songs and artists from lyric snippets using two distinct methods - simple NLP based approach and BM25(Best Match 25) approach.

bm25 nlp nltk python rank-bm25 scikit-learn song-lyrics spotify-dataset text-preprocessing

Last synced: 28 Apr 2026

https://github.com/abhi227070/car-price-prediction

This project implements a machine learning model to predict the price of cars based on various features such as mileage, manufacturing date, fuel type, and more. Users can input car information, and the model will estimate the price of the car based on the provided data. This tool can be useful for both car buyers and sellers to estimate car price.

data-analysis machine-learning machine-learning-algorithms machinelearning python3 regression regression-models scikit-learn scikitlearn-machine-learning

Last synced: 28 Apr 2026

https://github.com/razalkr70/customer-segmentation-using-dataset

A data science project that segments mall customers using K-Means clustering. Based on age, income, and spending score, it identifies customer groups and visualizes them with 2D and 3D plots for targeted marketing insights.

clustering customer-segmentation data-science data-visualization kmeans machine-learning pca python scikit-learn

Last synced: 28 Apr 2026

https://github.com/tom-uchida/introduction_to_machine_learning

Machine learning private lesson.

machine-learning scikit-learn

Last synced: 28 Apr 2026

https://github.com/arizdn234/spotify-api-with-colab

Crawling, Analyzing, Clustering music data from Spotify API

machile-learning scikit-learn spotify-api spotipy-library

Last synced: 28 Apr 2026

https://github.com/michaelzheng67/ml_classification_optimizer

Algorithm that determines best machine learning classification model to use for a given dataset. Written in Python.

classification machine-learning python scikit-learn

Last synced: 29 Apr 2026

https://github.com/thekartikeyamishra/interactive-ai-model-trainer

A Google Colab notebook for interactively training simple AI/ML classification models. Features CSV upload, dummy data generation, feature/target selection, model choice, and basic evaluation. Includes a user-friendly UI. Ideal for educational purposes. See screenshots below!

google googlecolab ipywidgets matpl numpy pandas python scikit-learn seaborn

Last synced: 29 Apr 2026

https://github.com/jarif87/text-key-extractor

A Django web app that uses TF-IDF to extract keywords from text, featuring a modern, responsive UI with animated gradients and glassmorphism.

django-application keywords-extraction pandas python scikit-learn

Last synced: 29 Apr 2026

https://github.com/pymc-learn/pymc-learn-sphinx-theme

Sphinx theme for Pymc-learn documentation

pymc3 pymc4 scikit-learn sphinx sphinx-theme

Last synced: 29 Apr 2026

https://github.com/christopherkindl/spotify-artist-success

Predicting artists’ success by using machine learning approaches on features identified in spotify data

pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/m-muecke/text-normalizer

Text normalizer integration for sklearn.pipeline.Pipeline class

nlp nltk python scikit-learn

Last synced: 29 Apr 2026

https://github.com/adnanrahin/sentiment_classification_logistic_regeression

Sentiment Analysis extracts subjective information in the source material. It's widely used in modern business, to understand the business module, product quality and consumer point of view regarding the products or the business.

logistic-regression machine-learning natural-language-processing preprocessing python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/gustaminas/ai_primer---flatland

A project from the AI_primer course at Vilnius university.

cnn-keras data-augmentation data-mixup dropout-keras scikit-learn shape-classification

Last synced: 29 Apr 2026

https://github.com/saikumar787/car_price_prediction_using_linear-regression

A machine learning project to predict the selling price of used cars using regression techniques. Includes data preprocessing, model training, evaluation, and testing on new data.

car-price-prediction-with-machine-learning data-analysis joblib jupiter-notebook linear-regression-models model-deployment python scikit-learn standardscaler

Last synced: 29 Apr 2026

https://github.com/nahom32/mlp-assignment

This repository is an implementation for machine learning assignment demonstrating the machine learning process.

eda logistic-regression machine-learning scikit-learn

Last synced: 29 Apr 2026

https://github.com/karimosman89/energy-consumption-forecasting

Predict future energy consumption based on historical data.Create a model that predicts energy consumption in households or businesses to optimize energy distribution and reduce costs.Assist energy companies in planning and managing supply efficiently.

arima lstm matplotlib pandas python scikit-learn

Last synced: 29 Apr 2026

https://github.com/karmaniket/gtavcontrol

created dataset using different hand gestures and trained the ML model for in-game real time control for GTA V. Have fun!

gaming gta5 machine-learning mediapipe opencv python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/mukeshthenraj/fraud-detection-model

Logistic Regression, Grid Search, and ROC-PR curve evaluation on fraud detection dataset

classification fraud-detection machine-learning numpy pandas scikit-learn

Last synced: 29 Apr 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/nikhil-donthusaram/loanapprovalprediction-randomforest

A machine learning web app built using Random Forest Classifier to predict whether a loan will be approved or not based on applicant details. Built with Python, Streamlit, and scikit-learn.

classification jupyter-notebook machine-learning python random-forest scikit-learn streamlit vscode

Last synced: 29 Apr 2026

https://github.com/jarif87/dna-based-identification-of-e.coli

Django web app predicting E. coli in DNA sequences using a machine learning model, with a responsive interface and client-side validation. Files generated by project.py.

classification django-application dna-sequences html-css-javascript mlp-classifier python3 scikit-learn

Last synced: 29 Apr 2026

https://github.com/abhinav330/instagram-influencers-analysis

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/sjain2580/simple-linear-regression-model

This project demonstrates a simple, yet robust, multiple linear regression model built with Python and scikit-learn to predict median house values in California.

joblib linear-regression matplotlib matplotlib-pyplot numpy python scikit-learn

Last synced: 30 Apr 2026

https://github.com/andrewjmack/cryptoclustering

The purpose of this project is to utilize knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. Methods for analysis include K-Means clustering and dimensional reduction through Principal Component Analysis ("PCA").

jupyter-notebook pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/pramodyasahan/grade-predictor

This project aims to predict student performance based on various features such as job, study time, failures, absences, and first and second period grades. The project utilizes a linear regression model from the scikit-learn library in Python.

machine-learning matplotlib numpy pandas python regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/moritzkoerber/tune_preprocessing_algos

Files for this blogpost https://moritzkoerber.github.io/python/tutorial/2019/11/18/blogpost/

cross-validation hyperparameter-tuning machine-learning python scikit-learn

Last synced: 30 Apr 2026

https://github.com/tinaland101/credit-risk-classification

The purpose of this project is to build a credit risk classification model using machine learning techniques. This model helps identify the creditworthiness of borrowers based on historical lending data. Specifically, it uses a logistic regression model to predict whether a loan is healthy (0) or high-risk (1).

numpy pandas pathlib scikit-learn

Last synced: 30 Apr 2026

https://github.com/fikri-rouzan/student-stress-levels-classification

Proyek pemodelan machine learning untuk mengklasifikasikan tingkat stres mahasiswa berdasarkan parameter input akademik dan psikologis.

joblib jupyter-notebook matplotlib numpy pandas python scikit-learn seaborn streamlit

Last synced: 08 Jun 2026

https://github.com/samuelpillai/machine-learning-classification-regression-nlp

A curated collection of machine learning mini-projects covering classification, regression, and natural language processing (NLP). This project demonstrates model training, evaluation, feature engineering, and pipeline integration using real-world datasets and Python tools like Scikit-learn, pandas, and NLTK.

classification data-analysis data-science data-visualization feature-engineering jupyter-notebook machine-learning ml-pipeline model-evaluation nlp python regression-models scikit-learn supervised-learning text-mining

Last synced: 30 Apr 2026

https://github.com/dharma-acha/explanability_in_deepneuralnetworks

Our project aims to enhance the transparency and trustworthiness of the VGG model in critical fields like healthcare imaging and self-driving cars. By integrating explainability methods into the VGG model for image classification, we will clarify its decision-making process.

colab-notebook matplotlib numpy pandas scikit-learn seaborn

Last synced: 30 Apr 2026

https://github.com/fbarffmann/credit-risk-classification

Classified 19,000+ loans as high-risk or healthy using logistic regression. Achieved 100% precision for healthy loans and 84% precision for high-risk loans.

classification credit-risk data-analysis logistic-regression machine-learning model-evaluation pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/themihirmathur/mihir-clickpost-data-science-intern-round-1-assignment-submission

The objective of this project is to predict the predicted_exact_sla, which is the number of days between the shipment and delivery of an order, using historical shipment data.

data-science machine-learning pandas python random-forest-regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/fadlani-aditya/iris-plant-classification

This project focuses on classifying different species of Iris flowers using the Random Forest algorithm. The dataset, sourced from Scikit-learn, contains four key features: sepal length, sepal width, petal length, and petal width, which are used to predict the flower species (Setosa, Versicolor, and Virginica).

agriculture data-science iris-dataset machine-learning python scikit-learn supervised-learning

Last synced: 01 May 2026

https://github.com/myahninsi/customer-segmentation-recommendation-ml

This project addressed challenges in understanding customer behavior and personalizing shopping experiences for an e-commerce platform. Developed ML solutions including K-Means clustering for segmentation, Random Forest regression for CLV prediction, and collaborative filtering for product recommendations.

collaborative-filtering k-means-clustering pandas python random-forest scikit-learn

Last synced: 01 May 2026