An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/antonio-f/find-duplicate-questions

Find duplicate questions on StackOverflow by their embeddings. From the Natural Language Processing course - Coursera's Advanced Machine Learning specialization.

cosine-similarity discounted-cumulative-gain embeddings gensim natural-language-processing nlp nltk scikit-learn starspace text-similarity word2vec

Last synced: 27 Apr 2026

https://github.com/the-developer-306/house-price-predictor

House Price Predictor: Harnessing machine learning algorithms to forecast housing prices in Boston, empowering buyers and sellers with accurate predictions based on key factors like location, crime rate, rooms, accessibility, and more.

csv ipynb-jupyter-notebook joblib matplotlib numpy pandas python scikit-learn

Last synced: 23 Feb 2026

https://github.com/tddschn/hack-ncsu-2024

ML and doc part of our Hack_NCState project builtin in less than 1 day | Racial Bias in Criminal Justice Visualized: Code Black

bias machine-learning scikit-learn

Last synced: 08 May 2026

https://github.com/canayter/unsupervised-machine-learning

Utilizing Python and unsupervised learning to predict if cryptocurrencies are affected by 24-hour or 7-day price changes.

k-means-clustering python scikit-learn unsupervised-machine-learning

Last synced: 08 May 2026

https://github.com/kento75/keiba_machine_learning

scikit-learnを用いた競馬予測用スクリプト

machine-learning matplotlib pandas postgresql psycopg2 python3 scikit-learn

Last synced: 18 Apr 2026

https://github.com/anarya22/heart-disease-classification

Predicting heart disease using machine learning. This notebook looks into various python base ML and DS libraries in an attempt to build a machine learning model capable of predicting whether or not someone has heart disease based on their medical attributes.

data-cleaning data-visualization machine-learning matplotlib numpy pandas scikit-learn

Last synced: 01 May 2026

https://github.com/alam025/customer-churn-prediction

🎯 Predict customer churn with 96%+ accuracy using Random Forest ML. Beautiful visualizations, production-ready code, and real business impact. Save revenue before customers leave! 🚀

churn-prediction classification customer-analytics customer-churn customer-retention data-science machine-learning pandas predictive-analytics python random-forest scikit-learn

Last synced: 11 Jun 2026

https://github.com/franpog859/titanic-competition

❄️🚢 Machine Learning project workflow reference. Model predicts if given people survive the Titanic disaster basing on among others their age, sex and names

classification data-science kaggle machine-learning scikit-learn titanic workflow

Last synced: 05 May 2026

https://github.com/soumya6tiwari/customer-segmentation-using-rfm-analysis

This project focuses on customer segmentation using RFM (Recency, Frequency, Monetary) analysis and K-Means clustering. It enables businesses to identify high-value customers, optimize marketing strategies, and improve customer retention through data-driven insights.

backend clustering flask frontend kmeans-clustering matplotlib numpy pandas python rfm-analysis scikit-learn unsupervised-learning

Last synced: 16 Feb 2026

https://github.com/ahmetcansolak/decision-tree-classifier-scikit-learn

A simple decision tree classifier example using scikit-learn

decision-tree-classifier python scikit-learn

Last synced: 28 Apr 2026

https://github.com/dhavaltaunk08/gender-classification

I did this project during my internship at IIT Guwahati. It aimed to perform gender classification in video streaming.

deep-learning librosa opencv-python python scikit-learn

Last synced: 14 May 2026

https://github.com/nemeslaszlo/heart-disease

Heart disease classification project with different models (LogisticRegression, KNeighboursClassifier, RandomForestClassifier) and detailed reports.

classification knearest-neighbor-classifier logistic-regression mathplotlib numpy pandas randomforest-classification scikit-learn seaborn

Last synced: 15 Apr 2026

https://github.com/antoniskl/amsterdam-metro-crowdedness-prediction

The aim of this full-stack project is to predict with RandomForest and visualize crowdedness for metro stations of Amsterdam by using external factors.

amsterdam covid-19 crowded-areas dash full-stack metro prediction-model python random-forest regression scikit-learn ticketmaster-api

Last synced: 14 May 2026

https://github.com/official-biswadeb941/clopimedi---your-healths-trusted-care

ClopiMedi is an AI-driven healthcare application that simplifies doctor appointment bookings, offering personalized recommendations based on medical conditions to enhance patient-provider connections.

adam ai flask flask-api flask-api-backend full-stack-web-development joblib machine-learning scikit-learn tensorflow

Last synced: 28 Apr 2026

https://github.com/anishshinde01/machine-learning-exercises

Python implementations of machine learning, statistics, and mathematical foundations.

linear-algebra machine-learning machine-learning-algorithms matplotlib numerical-analysis numpy python scikit-learn scipy statistics

Last synced: 11 Jun 2026

https://github.com/charmee123/krishakvriddhi-final

I have also deployed this site on replit you can also check from that. https://replit.com/@charmee123/KrishakVriddhi?v=1

bootstrap css flask html javascript machine-learning python replit scikit-learn weather-api

Last synced: 14 Apr 2026

https://github.com/nirmalyabag20/breast-cancer-prediction-using-machine-learning

This project leverages machine learning to classify breast cancer as malignant or benign based on tumor characteristics. By applying and evaluating multiple algorithms, the model achieves high accuracy, demonstrating the practical application of data-driven solutions in medical diagnostics.

logistic-regression matplotlib numpy pandas python scikit-learn seaborn

Last synced: 12 Feb 2026

https://github.com/kohlerhector/trex-tree-reward-exploration

Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.

decision-trees drl exploration rl scikit-learn stable-baselines3

Last synced: 04 May 2026

https://github.com/ccharlesss/financeml

machine learning web application using Python's FastAPI and scikit-learn to predict S&P 500 stock price trends and cluster stocks based on average annual returns and volatility. Utilised the MVC design pattern to structure the application effectively. Implemented a decision tree classifier with 84% accuracy.

cicd docker fastapi finance javascript jenkins machine-learning restful-api scikit-learn webapplication

Last synced: 15 Apr 2026

https://github.com/analitico-771/creditworthiness_classification_model

This is an Application that trains a model using supervised learning and imbalanced-learn library in order to classify and identify the creditworthiness of borrowers

artificial-intelligence credit-risk fintech imbalanced-learning machine-learning python quantitative-finance scikit-learn supervised-machine-learning

Last synced: 04 May 2026

https://github.com/kbo-data-portal/pipeline

Automates KBO data collection and deployment with Airflow.

airflow dbt kbo lightgbm python scikit-learn

Last synced: 07 Oct 2025

https://github.com/antim21/spamsense-ai

Classifying emails into Spam or Not Spam categories using Machine Learning techniques

machine-learning nlp python scikit-learn

Last synced: 04 May 2026

https://github.com/kritimbist/365-days-of-github-challenge-ai-machine-learning

This repository is part of my 365 Days Challenge: AI × Machine learning, where I combine my passion for Machine Learning 🤖 to learn, build, and document projects every single day for one year.

data-science data-visualization deep-learning machine-learning matplotlib numpy python scikit-learn

Last synced: 28 Apr 2026

https://github.com/francescopaolol/logisticregression

About predicting survival on the Titanic and get familiar with ML basics

jupyter-notebook kaggle logistic-regression machine-learning ml pandas scikit-learn

Last synced: 16 Apr 2026

https://github.com/adzialocha/notebook

Jupyter notebooks for random experiments with audio processing, data analysis and machine learning

jupyter-notebook keras learning librosa music21 scikit-learn

Last synced: 15 Apr 2026

https://github.com/khaymanii/big_mart_prediction_model

This model was built using Python and Logistic Regression Algorithm

matplotlib numpy pandas python scikit-learn seaborn

Last synced: 01 May 2026

https://github.com/aakanksha1406/fake-news-classifier

to identify when an article might be fake news

keras lstm lstm-neural-networks nltk python scikit-learn tensorflow

Last synced: 13 Feb 2026

https://github.com/vectominist/mednlp

Mandarin Medical Dialogue Analysis with Pytorch.

dialog huggingface mandarin medical pytorch scikit-learn transformers

Last synced: 04 May 2026

https://github.com/adithaker/falafel

🤖 A from-scratch implementation of a small scaled federated learning application.

cli-app distributed-systems federated-learning logistic-regression python scikit-learn

Last synced: 28 Apr 2026

https://github.com/nirmalyabag20/crop-yield-prediction-using-machine-learning

This project uses machine learning to predict crop yields based on factors like region, crop type, rainfall, temperature, and pesticide use. By analyzing a dataset of over 28,000 records, the models provide accurate yield forecasts, helping optimize farming decisions and resource management, ultimately contributing to sustainable agriculture.

jupyter-notebook matplotlib numpy pandas python scikit-learn seaborn

Last synced: 06 Feb 2026

https://github.com/chengetanaim/sentimentanalysisforfinancialnewsnotebook

Building the model of a financial news sentiment classifier. Financial news headlines will be classified as positive, negative or neutral (from an investor point of view)

logistic-regression machine-learning natural-language-processing scikit-learn tfidf-vectorizer

Last synced: 04 May 2026

https://github.com/bhuvaneshwarguttula/student-performance-indicator

To understand and predict how the student's performance (test scores) is affected by the other variables (Gender, Ethnicity, Parental level of education, Lunch, Test preparation course).

exploratory-data-analysis machine-learning pandas python scikit-learn student-performance-analysis

Last synced: 07 Mar 2026

https://github.com/deliprofesor/ridge-regression-for-sales-prediction-model-evaluation-and-hyperparameter-tuning

This project builds and optimizes a model on a dataset using Ridge regression and polynomial features. Model accuracy is enhanced through regularization and polynomial transformations. Grid search and cross-validation are used to find the best parameters, and the model's performance is evaluated.

cross-validation data-science data-visualization grid-search machine-learning model-optimization mse overfitting-prevention polynomial-regression python r2-score regression-analysis regularization ridge-regression rmse scikit-learn

Last synced: 03 May 2026

https://github.com/vishal-038/attendance_by_face_recogination

This project is a face recognition-based attendance system that uses Python, OpenCV, Scikit-learn, Streamlit, and various other libraries like Pandas, Numpy, Datetime, and OS for different functionalities. It enables adding faces to the database, taking attendance based on face recognition, and showing live attendance through a web interface built

opencv python scikit-learn

Last synced: 14 Feb 2026

https://github.com/loong64/onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ai-framework deep-learning hardware-acceleration loong64 loongarch64 machine-learning neural-networks onnx pytorch scikit-learn tensorflow

Last synced: 09 May 2026

https://github.com/lakshitalearning/churninsight

Customer Churn prediction means knowing which customers are likely to leave or unsubscribe from your service.

churn-prediction data-science flask google-colab machine-learning predictive-analytics python scikit-learn user-retention web-development

Last synced: 09 May 2026

https://github.com/davidcamilo0710/hate_speech_analysis

Hate speech detection using NLP for linguistic analysis and machine learning (XGBoost) for classification with Python and SpaCy.

hate-speech-detection linguistic-analysis nlp scikit-learn spacy xgboost

Last synced: 09 May 2026

https://github.com/hq969/customer-churn-prediction-with-hyperparameter-optimization-and-model-deployment

A complete end-to-end machine learning project that predicts customer churn using the Telco dataset. It includes data preprocessing, exploratory data analysis (EDA), model training with Random Forest, hyperparameter tuning, evaluation, and deployment via a Flask API.

flask numpy pandas python scikit-learn xgboost

Last synced: 02 Apr 2026

https://github.com/RickContreras/StudentPerformancePredictionSaberPro

Modelo de clasificación para predecir el desempeño de estudiantes en las Pruebas Saber Pro en Colombia. Incluye análisis exploratorio de datos, preprocesamiento y modelos de machine learning.

classification colombia data-analysis data-science education educational-assessment exploratory-data-analysis jupyter-notebook machine-learning python saber-pro scikit-learn student-performance

Last synced: 24 Oct 2025

https://github.com/rakibhhridoy/supportvectormachinein-medical

Support vector machine in medical disease detection. Both linear and non-linear data can be fitted in svm through its kernel specialization In medical we focus on precision or recall rather than accuracy.

diabetes-prediction machine-learning medical precision-medicine recall-precision scikit-learn support-vector-machines svm

Last synced: 29 Apr 2026

https://github.com/snehilsanyal/ee524

Course webpage for IIT Guwahati EE524 Machine Learning Lab (Jul-Nov 2020) Session

course-webpage machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 01 May 2026

https://github.com/nmsby/pca-machine-learning-lab

Principal Component Analysis (PCA) implementation and analysis lab for Machine Learning. Features manual PCA implementation, scikit-learn applications, data compression, and feature extraction with detailed visualizations.

data-analysis dimensionality-reduction jupyter-notebook machine-learning numpy pca python scikit-learn visualization

Last synced: 01 May 2026

https://github.com/garcane/Income-Prediction-ML

This is a machine learning project aimed at predicting whether an individual's annual income exceeds $50,000 based on their demographic and personal information.

data data-science machine-learning ml numpy pandas python random-forest scikit-learn

Last synced: 24 Oct 2025

https://github.com/andresmg07/real-time-sign-language-translator

AI-driven real-time American Sign Language translator. Implemented leveraging Support Vector Machines (SVM), OpenCV library and MediaPipe hands module.

ai computer-vision machine-learning mediapipe opencv pattern-recognition scikit-learn support-vector-machines

Last synced: 16 Apr 2026

https://github.com/jesly-joji/spam-ham-classifier

Used Naive Bayes Algorithm, NLP Text Preprocessing Techniques

naive-bayes-classifier nlp scikit-learn streamlit text-preprocessing

Last synced: 03 May 2026

https://github.com/jasper-koops/easy-gscv

This library allows you to quickly train machine learning classifiers by automatically splitting the data set and using both grid search and cross validation in the training process.

classification machine-learning python3 scikit-learn

Last synced: 14 Feb 2026

https://github.com/byigitt/smartmove

fake data generation and analysis for ankara metro station

ankara cv2 metro numpy pandas scikit-learn

Last synced: 03 May 2026

https://github.com/siam29/ensemble-majority-voting-hard

In this project, we implemented an ensemble learning approach using majority voting (hard voting) with five machine learning classifiers: DT, RF, XGBC, ANN, and KNN. The ensemble model achieved an impressive accuracy score of 99.95% and an F1 score of 85.51%.

credit-card-fraud ensemble-learning machine-learning matplotlib pandas scikit-learn

Last synced: 09 May 2026

https://github.com/m-rishab/credbet

A loan prediction web app which tells You that you are eligible for loan or not!

decision-tree-classifier matplotlib numpy pandas python scikit-learn

Last synced: 02 Apr 2026

https://github.com/md-emon-hasan/ai-from-university

🎓 Collection of academic resources, projects, and exercises related to artificial intelligence concepts learned in university coursework.

ai artificial-intelligence linear-regression logestic-regression mahcine-learning ml scikit-learn

Last synced: 17 Apr 2026

https://github.com/iakshatgandhi/fake-news-classification-model-main

A machine learning-based project designed to classify news articles as real or fake. This system combines advanced natural language processing (NLP), robust machine learning models, and intuitive visualizations to deliver accurate and scalable predictions.

matplotlib nltk pickle python scikit-learn seaborn

Last synced: 09 Oct 2025

https://github.com/h-fuzzy-logic/python-finding-nsf-award-themes

Using NLP to find themes and concepts in NSF Awards

nltk pandas python scikit-learn

Last synced: 03 May 2026

https://github.com/baggiponte/ta-statistics-for-big-data-2022

🎓 Introduction to Python and Machine Learning [UniMi • AY 2021/2022]

clustering data-science data-visualization machine-learning python scikit-learn

Last synced: 03 May 2026

https://github.com/siam29/credit-card-fraud-detection-in-real-time

This project delivers a fast and efficient fraud detection methodology, providing predictions in under a second, emphasizing the importance of both high performance and quick response times.

ensemble-machine-learning feature-selection genetic-algorithm machine-learning matplotlib pandas pca scikit-learn

Last synced: 03 May 2026

https://github.com/ricardouchub/colab-ml-pipeline-agent

Agente en Colab que, dado un dataset en CSV, planifica y ejecuta un pipeline de Machine Learning de inicio a fin: análisis inicial, preprocesamiento, entrenamiento con Scikit-Learn y reporte automático con evalcards.

agent ai deepseek evalcards langchain llm ml pipeline-agent scikit-learn

Last synced: 16 Apr 2026

https://github.com/msikorski93/alzheimer-s-disease-classification

A multi classification using scikit-learn and TensorFlow models on MRI scans of patient's brains.

alzheimers-disease classification efficientnetb0 inceptionv3 knn-classifier mri-brain random-forest scikit-learn svc tensorflow

Last synced: 01 May 2026

https://github.com/carmoreno/analisisaccidentalidadbogota

Data Analysis about traffic accidents at Bogotá, Colombia.

data-analysis data-science jupyer-notebook matplotlib numpy pandas scikit-learn

Last synced: 17 Apr 2026

https://github.com/t-abishek/embedded-intent-classifier

A production-grade FastAPI application that uses sentence embeddings to classify user prompts into 4 categories: Built using Python, BGE SentenceTransformer, Scikit-learn, and FastAPI.

classifier embedded huggingface pandas scikit-learn transformer

Last synced: 10 May 2026

https://github.com/zachpinto/xc-rankings-predictions

Applied ML Project predicting cross-country team rankings based on individual-level performances

random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/ayyucedemirbas/solar_power_elasticnet

ElasticNet Linear Regression on Solar Power Generation

elasticnet-regression scikit-learn skops tabular-regression

Last synced: 29 Apr 2026

https://github.com/bestmahdi2/uni__dataminningstackoverflowproject

A university project related to data mining lesson on StackOverflow website data with Python language

cart csv data-mining logistic-regression matplotlib mlp naive-bayes nltk numpy pandas python scikit-learn scipy seaborn stackoverflow svc textblob tqdm xgboost

Last synced: 16 Feb 2026

https://github.com/aryansk/customer-segmentation-analysis

Advanced customer segmentation project using K-Means clustering to analyze customer behavior based on annual income, spending score, and age.

elbow-method exploratory-data-analysis machine-learning machine-learning-algorithms python scikit-learn sentiment-analysis sentiment-classification

Last synced: 29 Apr 2026

https://github.com/harshitwaldia/stock-price-prediction

An AI-driven stock market analysis dashboard that predicts next-day stock prices using a deep learning LSTM model. The project features: 🔮 AI Predictions for stock movements 🌍 Global market support (US, India, China, Japan, UK) 📊 Interactive React dashboard with charts & recent searches ⚡ Flask backend powered by Tensor/Keras & Yahoo Finance

dashboard flask flask-cors keras-tensorflow lstm-neural-networks machine-learning numpy react-typescript scikit-learn stock-price-prediction

Last synced: 03 May 2026

https://github.com/njorogepaul-moghul/iris-flower-classification

This project predicts the species of an Iris flower (Setosa, Versicolor, Virginica) based on its sepal and petal measurements. We trained and evaluated multiple ML models — with Logistic Regression performing best at 93% accuracy. Finally, we deployed on streamlit:[app] (https://irisflowerapp-ripwlmfmctrzqphjapj97t.streamlit.app/)

iris-classification jupyter-notebook logistic-regression machine-learning python random-forest-classifier scikit-learn

Last synced: 29 Apr 2026

https://github.com/ivanyu/kaggle-digit-recognizer

Kaggle's "Digit Recognizer" competition

kaggle keras machine-learning scikit-learn

Last synced: 17 Apr 2026

https://github.com/kshula/cipatala-hospital-management-system

Cipatala Hospital management systempowered by AI and machine learning built with Django and Bootstrap

bootstrap django django-project html-css-javascript python scientific-computing scikit-learn tensorflow

Last synced: 01 Mar 2026

https://github.com/mg380/ibm-applied-data-science-capstone

This Capstone is the 10th (final) course in IBM Data Science Professional Certificate specialization, and it actually summarises in the form of project all materials that have been learned during this specialization

capstone data data-analysis data-science datascience ibm machine-learning plotly python scikit-learn sql

Last synced: 05 Mar 2026

https://github.com/mijisu0103/data-driven-decision-making-risk-analysis

This repository contains my coursework project for ECS7005P - Risk and Decision-Making for Data Science and AI. It applies probabilistic models, Bayesian networks, and decision analysis using Python and PyAgrum to evaluate risk and optimise decision-making under uncertainty.

machine-learning pandas probability-and-statistics pyagrum python quantitative-decision-making risk-assessment scikit-learn

Last synced: 10 May 2026

https://github.com/chitralputhran/drive-curve-machine-learning-app

:blue_car: Drive Curve is a web application made with the help of Flask, a microframework for Python based on Werkzeug, Jinja 2, and good intentions. On the backend, a Machine Learning model is used for predicting the price of the car. The machine learning model was trained on the Automobile Dataset from the UCI Machine Learning Repository.

flask machine-learning python scikit-learn webapp

Last synced: 03 May 2026

https://github.com/zazi2002/machine-learning-project

Introduction to Machine Learning project with the goal of improving the classification performance on a dataset by optimizing the number of features and weak learners.

dimentionality-reduction ensemble-learning numpy pca random-forest scikit-learn

Last synced: 02 May 2026

https://github.com/prashver/titanic-survival-prediction

This project tackles the Titanic challenge on Kaggle, predicting passenger survival based on variables like age, sex, and passenger class. The Jupyter notebook covers essential steps of a data science pipeline, including exploratory data analysis, data cleaning, feature engineering, and modeling. The dataset used is the Titanic dataset.

classification-algorithm machine-learning-algorithms matplotlib numpy pandas scikit-learn seaborn

Last synced: 02 May 2026

https://github.com/rakibhhridoy/machinelearning-featureselection

Before training a model or feed a model, first priority is on data,not in model. The more data is preprocessed and engineered the more model will learn. Feature selectio one of the methods processing data before feeding the model. Various feature selection techniques is shown here.

extratreesclassifier feature-selection gridsearchcv lasso-regression logistic-regression machine-learning numpy pandas pca rfe rfecv scikit-learn selectkbest

Last synced: 02 May 2026

https://github.com/artemxdata/car-price-prediction

Car Price Prediction – Machine learning project for estimating car prices based on technical specifications and market data. The goal is to achieve an RMSE below 2500 by comparing multiple models (Linear Regression, Random Forest, LightGBM) and analyzing training vs. prediction time.

car-price-prediction data-science lightgbm machine-learning notebook python regression rmse scikit-learn supervised-learning used-cars vehicle-pricing

Last synced: 01 May 2026