An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/inesruizblach/data-science-project

A data science project exploring Portuguese "Vinho Verde" wine quality prediction. Features EDA, feature engineering, ML models, and evaluation using Python, pandas, scikit-learn, and visualization tools.

binary-classification classification data-science exploratory-data-analysis feature-engineering imbalanced-learn jupyter-notebook machine-learning model-evaluation pandas regression scikit-learn seaborn uci-dataset wine-quality

Last synced: 09 May 2026

https://github.com/kianaabrisham/naive-bayes-sentiment

Sentiment classification using Multinomial NB (scratch + sklearn)

bag-of-words naive-bayes nlp scikit-learn sentiment-analysis text-classification

Last synced: 14 May 2026

https://github.com/scorchinghot/core-machine-learning-exploration

This repository provides a hands-on exploration of classical machine learning algorithms applied to the MovieLens 100k dataset, aiming to build intuition and understanding of core ML concepts.

core-ml data-science hands-on machine-learning ml-algorithms python scikit-learn tutorial

Last synced: 05 Oct 2025

https://github.com/vedanty3/bulldozer-price-prediction

A machine learning project aiming to build a machine learning model which could predict the sales price of bulldozer.

andrew-ng-machine-learning ensemble-machine-learning gridsearchcv jupyter-notebook machine-learning matplotlib numpy pandas python randomforestregressor randomizedsearchcv scikit-learn ztm

Last synced: 05 Apr 2026

https://github.com/xbants/recommendation-api

🎬 Intelligent movie recommendation system with FastAPI backend, Streamlit frontend, and collaborative filtering ML. Rate movies, get personalized suggestions, and enjoy automatic model retraining.

fastapi machine-learning movie-recommedation python3 scikit-learn streamlit

Last synced: 29 Apr 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/nikhil-donthusaram/loanapprovalprediction-randomforest

A machine learning web app built using Random Forest Classifier to predict whether a loan will be approved or not based on applicant details. Built with Python, Streamlit, and scikit-learn.

classification jupyter-notebook machine-learning python random-forest scikit-learn streamlit vscode

Last synced: 29 Apr 2026

https://github.com/tbarlow12/learn-it-your-way

Using Python Flask, I wanted to create a simple web API that allows users to upload a dataset, choose one or more models, store them server side, and then hit an endpoint to get a prediction.

flask machine-learning python scikit-learn tensorflow

Last synced: 29 Apr 2026

https://github.com/abhinav330/instagram-influencers-analysis

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/ledsouza/machine-learning-semisupervisionado

Este projeto utiliza algoritmos de aprendizado de máquina semi-supervisionado para classificar a qualidade do leite como alta, média ou baixa.

data-science joblib machine-learning machine-learning-algorithms pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/andrewjmack/cryptoclustering

The purpose of this project is to utilize knowledge of Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes. Methods for analysis include K-Means clustering and dimensional reduction through Principal Component Analysis ("PCA").

jupyter-notebook pandas python scikit-learn

Last synced: 30 Apr 2026

https://github.com/rokuu010/boxing-match-predictor

Machine learning project to predict the outcomes of pro boxing matches using Dataset/web-scraped data

boxing data-science machine-learning prediction-model python scikit-learn selenium sports-analytics

Last synced: 30 Apr 2026

https://github.com/pramodyasahan/grade-predictor

This project aims to predict student performance based on various features such as job, study time, failures, absences, and first and second period grades. The project utilizes a linear regression model from the scikit-learn library in Python.

machine-learning matplotlib numpy pandas python regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/sayed-ashfaq/delhivery-dataanalysis

In this project, I conducted basic analysis, feature engineering, normalization, and outlier handling, along with statistical and non-parametric testing to extract insights.

feature-engineering normalization outlier-detection pandas python scikit-learn statistcal-tests statistical-analysis

Last synced: 30 Apr 2026

https://github.com/fikri-rouzan/student-stress-levels-classification

Proyek pemodelan machine learning untuk mengklasifikasikan tingkat stres mahasiswa berdasarkan parameter input akademik dan psikologis.

joblib jupyter-notebook matplotlib numpy pandas python scikit-learn seaborn streamlit

Last synced: 08 Jun 2026

https://github.com/fikri-rouzan/burnaway-capstone-data-science

Dashboard analitik interaktif untuk memetakan faktor fisik dan pola kerja pemicu burnout pada software developer.

jupyter-notebook matplotlib pandas pillow plotly python scikit-learn seaborn statsmodels streamlit

Last synced: 08 Jun 2026

https://github.com/dharma-acha/explanability_in_deepneuralnetworks

Our project aims to enhance the transparency and trustworthiness of the VGG model in critical fields like healthcare imaging and self-driving cars. By integrating explainability methods into the VGG model for image classification, we will clarify its decision-making process.

colab-notebook matplotlib numpy pandas scikit-learn seaborn

Last synced: 30 Apr 2026

https://github.com/boladjivinny/fire-prediction

Notebook for the Fire fighting using data on Zindi. Ranked number 5 on the public leaderboard and 8 on the private leaderboard. https://zindi.africa/hackathons/cmu-africa-fighting-fire-with-data

feature-engineering hackhathon machine-learning regression scikit-learn stacking

Last synced: 30 Apr 2026

https://github.com/themihirmathur/mihir-clickpost-data-science-intern-round-1-assignment-submission

The objective of this project is to predict the predicted_exact_sla, which is the number of days between the shipment and delivery of an order, using historical shipment data.

data-science machine-learning pandas python random-forest-regression scikit-learn

Last synced: 30 Apr 2026

https://github.com/fadlani-aditya/iris-plant-classification

This project focuses on classifying different species of Iris flowers using the Random Forest algorithm. The dataset, sourced from Scikit-learn, contains four key features: sepal length, sepal width, petal length, and petal width, which are used to predict the flower species (Setosa, Versicolor, and Virginica).

agriculture data-science iris-dataset machine-learning python scikit-learn supervised-learning

Last synced: 01 May 2026

https://github.com/myahninsi/customer-segmentation-recommendation-ml

This project addressed challenges in understanding customer behavior and personalizing shopping experiences for an e-commerce platform. Developed ML solutions including K-Means clustering for segmentation, Random Forest regression for CLV prediction, and collaborative filtering for product recommendations.

collaborative-filtering k-means-clustering pandas python random-forest scikit-learn

Last synced: 01 May 2026

https://github.com/arturovaine/n8n-nodes-sklearn

Custom n8n nodes for integrating scikit-learn machine learning algorithms into your n8n workflows.

machine-learning n8n n8n-nodes scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/antonio-f/housing-simplemlexample

Basic example with California Housing Prices dataset from the StatLib repository using scikit-learn

housing-simplemlexample machine-learning scikit-learn simple

Last synced: 01 May 2026

https://github.com/luthfiwulandari/machine-learning-breast-cancer

This project is a simple application that uses logistic regression to detect breast cancer. It classifies tumors as either malignant or benign based on the dataset provided by Scikit-learn.

datascience jupyter logistic-regression machine-learning python scikit-learn

Last synced: 01 May 2026

https://github.com/jlee9503/medical-readmission

Conduct an analysis of medical readmission status using hospital patient data and the Social Determinants of Health dataset. Identify key factors influencing readmission rates to provide insights for improving healthcare outcomes.

python random-forest-regression scikit-learn tableau

Last synced: 01 May 2026

https://github.com/dhruvv1402/spam-detection-python-

This project is a Spam Detection System built using Python. It classifies SMS messages as spam or ham (not spam) using machine learning techniques.

countvectorizer kaggle-dataset nlp-machine-learning nltk numpy pandas python scikit-learn supervised-machine-learning tf-idf

Last synced: 01 May 2026

https://github.com/maxwelllzh/linearizer

Linearizing parameters for linear regression

data-analysis machine-learning scikit-learn

Last synced: 02 May 2026

https://github.com/dmschauer/aws-sagemaker-deployment-test

I did a simple test to see how deploying a machine learning model on AWS Sagemaker and thus turning it into an API works. Since scikit-learn models require less dependencies than e.g. TensorFlow models I went with them for this test. To do so I used a tutorial.

aws boto3 python sagemaker scikit-learn

Last synced: 02 May 2026

https://github.com/pierrekieffer/datapreprocessing

Custom data preprocessing library made for machine learning

data-preparation data-preprocessing machine-learning preprocessing scikit-learn

Last synced: 02 May 2026

https://github.com/moritzkoerber/data_science_posts

This repository hosts the code for my data science related blog posts.

hyperparameter-tuning machine-learning pipeline python scikit-learn

Last synced: 03 May 2026

https://github.com/viniciusds2020/ml_pycaret_classificacao

Sistema de preprocessamento e treinamento de modelos de machine learning utilizando PyCaret. Uma metodologia low-code para processos de MLops

machine-learning mlops preprocessing pycaret python scikit-learn

Last synced: 03 May 2026

https://github.com/rohitinu6/tesla-price-prediction

A machine learning project that predicts future stock price movements using Logistic Regression, SVC, and XGBoost with engineered financial features.

data-analysis data-visualization feature-engineering financial-analysis logistic-regression machine-learning matplotlib python scikit-learn seaborn stock-market stock-price-prediction support-vector-machine time-series xgboost

Last synced: 03 May 2026

https://github.com/alessandromonolo/fraud-detection-binary-classification-model

This project builds a machine learning model to classify fraudulent clients using a banking dataset. Data preprocessing, statistical analysis, and feature selection were performed before training KNN and Random Forest Classifier. Model performance was evaluated using accuracy, precision, recall, and F1-score.

classification-model fraud-detection knn-classification machine-learning pandas python random-forest scikit-learn statistical-analysis

Last synced: 03 May 2026

https://github.com/zhenglinlei/zdmp

Industry 4.0 Optimization with Machine Learning AI

industry-4 knn-classification machine-learning pandas python scikit-learn

Last synced: 03 May 2026

https://github.com/srisaihariharan/mic_sentiment_analysis_v

Sentiment analysis of IMDb movie reviews using Python, Scikit-learn, and TF-IDF.

machine-learning natural-language-processing nlp python scikit-learn sentiment-analysis sentiment-classification

Last synced: 03 May 2026

https://github.com/abdiasarsene/predictive-churn-management-data-driven-customer

Use unsupervised learning techniques to segment a company’s customers into distinct groups in order to personalize marketing campaigns. To ultimately propose specific marketing strategies for each customer segment based on the insights obtained.

acp kmeans-clustering matplotlib pandas plotly python scikit-learn seaborn

Last synced: 03 May 2026

https://github.com/kaustavmodak/business-aided-customer-feedback-assessment-system

A Streamlit-based sentiment analysis app that classifies customer reviews into Positive, Neutral, or Negative using a pre-trained ML mode

framework machine-learning matplotlib nlp nltk numpy pandas pickle regex scikit-learn seaborn sentiment-analysis streamlt tfidf-vectorizer

Last synced: 03 May 2026

https://github.com/jonad/finding_donors

Predicting income with UCI Census Income Dataset using supervised machine learning algorithms

numpy pandas scikit-learn scikitlearn-machine-learning

Last synced: 03 May 2026

https://github.com/srilaasya/breast-cancer-classifier

Used several Python libraries to make a K-Nearest Neighbor classifier that is trained to predict whether a patient has breast cancer

knearest-neighbor-classifier python scikit-learn

Last synced: 03 May 2026

https://github.com/darenr/gradientboostingmachines

Notebooks exploring strengths and weaknesses of GBM based classifiers

jupyter-notebook lightgbm pandas scikit-learn xgboost

Last synced: 03 May 2026

https://github.com/lucs1590/commom_segmentations

The purpose of this repository is to document and expose code samples using common threading techniques.

computational-vision machine-learning open-source opencv python scikit-image scikit-learn segmentation sklearn

Last synced: 03 May 2026

https://github.com/ceodaniyal/telecom_customer_churn_prediction

A machine learning project that predicts whether a telecom customer will churn (leave the service) using customer demographics, account information, and service usage. The repository includes data preprocessing, model training (with logistic regression), feature scaling, and example predictions.

classification customer-churn-prediction data-science logistic-regression machine-learning ml-project pandas prediction python scikit-learn streamlit telecom

Last synced: 04 May 2026

https://github.com/codejsha/machine-learning-examples

Examples of machine learning using scikit-learn

machine-learning scikit-learn

Last synced: 04 May 2026

https://github.com/baponkar/scikit-logisticregression-application

A simple and detail application analysis of sci kit learn LogisticRegression model .

classification-algorithm logistic-regression machine-learning python3 scikit-learn

Last synced: 04 May 2026

https://github.com/homebackend/pdf-title-page-splitter

Splits a pdf based on identified title pages using ML trained model

machine-learning opencv pdf-splitter pdf2image pypdf2 scikit-learn tensorflow

Last synced: 04 May 2026

https://github.com/joel-beck/airbnb-oslo

Price Prediction Models for Airbnb Apartments in Oslo | Winter Term 2021/22

prediction python pytorch scikit-learn

Last synced: 04 May 2026

https://github.com/bhawnamehbubani/airline-passenger-referral-program-development-with-classification-techniques

Prediction of airline passenger referrals using Logistic Regression, GridSearchCV, and TF-IDF vectorization with Python, Pandas, Scikit-learn, and Excel.

excel gridsearchcv logistic-regression pandas python3 scikit-learn tf-idf-vectorization

Last synced: 04 May 2026

https://github.com/keven-rdr/rio-airbnb-predictor

Estudo de IA, utilizando modelos de previsão como o regressor para determinar valor de imóvel

airbnb ia kaggle php price regression-models scikit-learn

Last synced: 04 May 2026

https://github.com/dakii24/credit-card-fraud-detection

This repository contains a machine learning project focused on detecting fraudulent credit card transactions. The project includes data preprocessing, model training, and evaluation to identify and prevent fraudulent activities.

capstone-project class-imbalance classification-algorithm credit-card credit-card-fraud data-science decision-trees fraud machine-learning open-data python scikit-learn svm svm-classifier

Last synced: 04 May 2026

https://github.com/madhu26sree/diabetes-prediction

This project leverages the Support Vector Machine (SVM) algorithm to predict whether a person is likely to have diabetes or not, using the Diabetes dataset. It covers data preprocessing, model building, evaluation using Python.

machine-learning python scikit-learn

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/chathumiamarasinghe/nn-training-model

A comprehensive project for training neural networks to solve real-world problems. This repository includes customizable code for building, training, and evaluating neural network architectures using popular deep learning frameworks.

jupyter-notebook matplotlib numpy phyton scikit-learn

Last synced: 04 May 2026

https://github.com/siddhantborse/atmosviz

Atmos Viz is a Python-based project designed to analyze, visualize, and predict global temperature trends across various cities and countries using time-series analysis and advanced data science techniques. Leveraging historical climate data, this project integrates machine learning models, geospatial mapping, and interactive visualizations to unco

geopandas geospatial-analysis gis matplotlib numpy pandas plotly python scikit-learn seaborn shapefiles time timeseries-analysis timeseries-data

Last synced: 05 May 2026

https://github.com/sxv357/xtern-artificial-intelligence-work-based-assessment

This application takes in data regarding undergraduate college students in the state of Indiana such as their year, what major they're pursuing, which university they attend, and makes a prediction about their food order.

jupyter-notebook matplotlib pandas pickle scikit-learn seaborn

Last synced: 05 May 2026

https://github.com/pierrealexandre78/deathpredict

Predict Hospital mortality rate using Machine Learning for patients admitted in ICU (Intensive Care Unit)

healthcare hospital machine-learning predictions python random-forest-classifier scikit-learn xgboost-classifier

Last synced: 05 May 2026

https://github.com/thekartikeyamishra/resumeevaluatorapp

The Automated Resume Evaluator is a Python-based application that helps evaluate resumes against job descriptions. It calculates an Applicant Tracking System (ATS) score, which is the percentage of keywords from the job description found in the resume.

flask machine-learning matplotlib nlp nltk pypdf python scikit-learn spacy textblob

Last synced: 05 May 2026

https://github.com/himanshkr03/comparative_performance_on_fashionmnist

This repository explores various machine learning and deep learning models for classifying images from the Fashion MNIST dataset. It includes data exploration, model training, evaluation, and visualization techniques to gain insights into the classification task.

deep-learning fashion-mnist fine hybrid-model image-classification keras machine-learning scikit-learn tensorflow xgboost-algorithm

Last synced: 05 May 2026

https://github.com/hallowshaw/text-emotion-classification-using-lstm-and-tokenization

This repository provides a machine learning and deep learning pipeline for text emotion detection. It includes a pretrained LSTM model, tokenizer, and preprocessing steps to classify emotions such as joy, sadness, and anger from text input. Easily deployable with provided resources and scripts.

emotion-classification emotion-detection feature-engineering lstm nltk nltk-python scikit-learn scikitlearn-machine-learning sentiment-analysis sequential-models text-classification text-classification-multi-label tokenization tokenizer

Last synced: 05 May 2026

https://github.com/hitthecodelabs/petalanalyticsstreamlit

Web application developed with Streamlit that predicts the Iris flower type based on its physical features

matplotlib model numpy pickle python scikit-learn sklearn streamlit

Last synced: 05 May 2026

https://github.com/smaddanki/pattern-pursuit-challenge

A personal challenge to build a production-ready trading signal system for S&P 500 stocks using deep learning. This project progresses from basic ML models to a complete trading infrastructure, focusing on 5-day forward return prediction and signal generation.

deep-learning machine-learning pytorch quantative-trading quantitative-finance quantitative-research scikit-learn

Last synced: 05 May 2026

https://github.com/rohansardar/iris_flower

A basic ML project on the iris flower classification

data-science iris-classification iris-dataset ml python scikit-learn

Last synced: 05 May 2026

https://github.com/gbourniq/cnn-multiclass-classification-gear

Using Machine Learning and Deep Learning to predict the category of outdoor equipment

image-classification keras-tensorflow multiclass-classification python scikit-learn tensorboard-visualizations

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/antoniskl/un-general-debate-corpus-classification

The aim of this project is to classify UNGDC speeches with regards to climate change. As a secondary objective, a correlation is being examined between these speeches, the forestation and the happiness index of the countries.

classification data-science jupyter-notebook machine-learning nlp python regression scikit-learn text-classification text-preprocessing

Last synced: 05 May 2026

https://github.com/kefrankk/ml-fraud-detection

I built a predictive model to detect fraud in financial transactions.

pandas python scikit-learn

Last synced: 05 May 2026

https://github.com/kunalpisolkar24/dsbda_lab

Collection of practical codes for Savitribai Phule Pune University's Data Science and Big Data Analytics Laboratory (310256).

data-analytics data-preprocessing data-science data-wrangling descriptive-statistics linear-regression logistic-regression mapreduce scala scikit-learn sppu-computer-engineering tf-idf

Last synced: 05 May 2026

https://github.com/divinenaman/color-extraction-api

Extract colours from images using K-means, along with FastAPI pipeline.

fastapi k-means-clustering scikit-learn

Last synced: 05 May 2026

https://github.com/zenitsu272/fault-detection-ml

Machine Learning based Fault Detection in machines using sensor data

artificial-intelligence decsion-tree machine-learning pandas pandas-dataframe pandas-python scikit-learn

Last synced: 05 May 2026

https://github.com/grandechowhiskey/fcc-machine_learning-boilerplates

A collection of projects completed as part of the FreeCodeCamp "Machine Learning with Python" certification. These projects focus on implementing machine learning models, data preprocessing, and predictive analysis using libraries like scikit-learn and TensorFlow.

ai ml python3 scikit-learn tensorflow

Last synced: 06 May 2026

https://github.com/rishisolanke/twitter-sentiment-analysis-using-machine-learning-

A research project that classifies tweets as positive, negative, or neutral using ML algorithms (Logistic Regression, Naïve Bayes, SVM) with NLP preprocessing.

data-science data-visualization logistic-regression machine-learning ml-models naive-bayes natural-language-processing nlp scikit-learn sentiment-analysis svm text-classification twitter-data

Last synced: 06 May 2026

https://github.com/keneandita/iris-intel

Iris Flower Classifier is a simple web app built with Streamlit that predicts the species of an Iris flower based on user-input flower features. It uses pre-trained machine learning models including Logistic Regression, K-Nearest Neighbors, SVM, and Decision Tree to make real-time predictions.

iris-classification jupyter-notebook machine-learning python scikit-learn streamlit

Last synced: 06 May 2026

https://github.com/eshansugeesh/fico-score-loan-default-modeling-project

Credit risk assessment using FICO score segmentation, loan default modeling, discretization techniques, and log-likelihood evaluation for predictive analytics in financial services.

bucketing classification credit-risk customer-segmentation data-science discretization fico-score financial-analytics loan-analysis loan-default log-likelihood machine-learning numpy pandas predictive-modeling risk-modeling scikit-learn segmentation statistical-modelling

Last synced: 06 May 2026

https://github.com/nicolas-giacomelli/modelo_regressao_linear_vendas

Modelo de regressão linear para previsão de vendas Desafio do curso de IA da RocketSeat

matplotlib pandas python3 scikit-learn

Last synced: 06 May 2026

https://github.com/billgewrgoulas/recommendation-systems

Algorithms for joke rating prediction using the joke data-set from Kaggle.

algorithm clustering collaborative-filtering machine-learning numpy pandas recommender-system scikit-learn scypi

Last synced: 06 May 2026

https://github.com/kaoutarmi/predition_price-old-cars

Ce projet de prédiction du prix des voitures utilise l’apprentissage automatique pour estimer la valeur des véhicules en fonction de leurs caractéristiques.

car-price-prediction data-preprocessing data-science decision-tree feature-engineering machine-learning regression scikit-learn

Last synced: 06 May 2026

https://github.com/erick957/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

🏠 Predict house prices using advanced regression techniques with this comprehensive analysis and cleaning project, from data loading to model deployment.

data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 06 May 2026

https://github.com/andrewsy1004/logistic-regression-spam-classifier

This project implements a spam email classifier using Logistic Regression.

numpy pandas scikit-learn

Last synced: 06 May 2026

https://github.com/5hraddha/optimize-oil-well-locations

In the quest for harnessing valuable energy resources, the OilyGiant mining company wants to expand its operations by discovering new oil well locations. To achieve this, a data-driven approach is adopted, leveraging geological exploration data from three distinct regions and employing techniques in data analysis and modeling.

linear-regression numpy pandas scikit-learn supervised-learning

Last synced: 06 May 2026

https://github.com/sabin74/boston_house_prediction

This project aims to predict the median value of owner-occupied homes in Boston suburbs using various machine learning regression models. Multiple regression techniques were applied, including Linear Regression, Decision Tree, Random Forest, Gradient Boosting and dimensionality reduction with PCA. Hyperparameter tuning was performed.

boston-housing-price-prediction hyperparameter-tuning kaggle-dataset pca-analysis python3 regression-models scikit-learn

Last synced: 06 May 2026