An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/takkii/pylean

Data analysis ( 🐍 💎 📈 )

analayze matplotlib numpy pandas python scikit-learn

Last synced: 09 Sep 2025

https://github.com/filiplangiewicz/automltunability

📈 Analyzing the impact of hyperparameter optimization

automl machine-learning scikit-learn

Last synced: 18 Feb 2026

https://github.com/offchan42/thai-thesis-classification

Classify each document inside the corpus using Python machine learning module: scikit-learn

nlp python python2 scikit-learn segment thai thai-language thai-thesis-classification

Last synced: 13 Aug 2025

https://github.com/edisedis777/pyspark-ml-features

A PySpark implementation of 6 lesser-known Scikit-Learn features optimized for Azure Databricks. This project translates powerful machine learning techniques from Scikit-Learn into PySpark's distributed computing framework.

azure databricks databricks-notebooks large-scale machine-learning pyspark python scikit-learn scikitlearn-machine-learning

Last synced: 13 Apr 2026

https://github.com/somenath203/titanic-survival-project-backend

Click the link below to check the swagger documentation of the website live

fastapi pandas python render scikit-learn seaborn titanic-survival-predictor

Last synced: 05 Apr 2026

https://github.com/shanmukhsrisaivedullapalli/automatic-ticket-classification

This project processes customer complaint data using pandas for data manipulation and applies text preprocessing techniques, including lemmatization, to clean and normalize complaint text. The `tqdm` library provides progress bars for efficient tracking of text processing tasks.

matplotlib neural-networks nlp numpy pandas python3 scikit-learn seaborn tensorflow tqdm wordcloud

Last synced: 11 Apr 2026

https://github.com/jersongb22/datascience_ibm_stockpredictionlstm_project

In the IBM Advanced Data Science specialization, an interactive real-time web application was developed using LSTM networks in TensorFlow to predict stock market trends for global companies.

apache-spark data-science deep-learning lstm-neural-networks machine machine-learning plotly python scikit-learn streamlit tensorflow

Last synced: 13 Apr 2026

https://github.com/virajbhutada/article-recommendation-system

This project aims to redefine content discovery by delivering personalized article recommendations tailored to individual user preferences. We use advanced machine learning techniques like PCA and K-means clustering to analyze user behavior and article characteristics to provide highly accurate recommendations.

anaconda article-recommendation clustering-algorithm data-analysis data-science keras-tensorflow machine-learning machine-learning-algorithms ml-models numpy pandas plotly python scikit-learn scipy

Last synced: 06 Jan 2026

https://github.com/jibbs1703/classic-ml-models

This repository contains scripts for developing, training and evaluating machine learning models using several python frameworks.

aws data-preprocessing data-science deep-learning feature-engineering machine-learning multiclass-classification neural-networks predictive-modeling pyspark-mllib pytest scikit-learn xgboost-classifier

Last synced: 10 Apr 2026

https://github.com/haloapping/ml-workflow

Template alur kerja machine learning.

mahine-learning numpy pandas python3 scikit-learn

Last synced: 11 Apr 2026

https://github.com/alessandromonolo/descriptive-texts-classification-by-usage-purposes-of-estate-properties

The project aims to identify the best model for the classification of texts derived from descriptions of assets subject to Italian judicial auctions. The employed models include both conventional models, such as Logistic Regression, Naive Bayes, SVM, and XGBoost, and neural network models, such as Fasttext and XLM-Roberta.

fasttext logistic-regression naive-bayes nlp python pytorch scikit-learn seaborn spacy svm text-classification tfidf tokenizer xgboost xlm-roberta

Last synced: 08 Apr 2026

https://github.com/asosnovsky/analyzing-blood-vessel-aneurysm

A few simple scripts to identify aneurysm in a blood-vessel (research projects)

machine-learning meanshift medical-image-processing scikit-learn

Last synced: 20 May 2026

https://github.com/khaymanii/parkinsons-disease-detection-model

This model was built with Python and Support Vector Machine Algorithm

matplotlib numpy pandas python scikit-learn

Last synced: 19 Apr 2026

https://github.com/alam025/ai-email-guardian

🛡️ AI-Powered Email Guardian: 99.2% accurate spam detection using machine learning. Open-source, privacy-focused email security. ⚡ 50ms detection time.

artificial-intelligence email-filter email-security hishing-detection machine-learning-cybersecurity nlp open-source privacy python scikit-learn security-tools spam-detection tensorflow text-classification

Last synced: 10 Mar 2026

https://github.com/camilajaviera91/prediction-of-housing-prices-using-linear-regression

This project provides tools to search for datasets on Kaggle, download and preprocess them, and perform predictions using a Linear Regression model. It includes interactive text-based user interfaces built with `curses`.

curses kaggle linear-regression matplotlib-pyplot mean-absolute-error mean-square-error numpy pandas pathlib python scikit-learn train-test-split

Last synced: 10 Apr 2026

https://github.com/soumyagautam/sign-sense

Deep Learning and Neural Network based Sign Sense or 'Sign Language' to Speech converter is an desktop app which can detect hand signs in a frame and can convert them to Speech, according to their respective meaning. Opposite to this, it can also recognise your voice and can convert it to sign language.

ai cv2 dataprocessing deep-learning keras machine-learning mediapipe moviepy-library neural-network openai-whisper scikit-learn tensorflow tkinter-python

Last synced: 10 Apr 2026

https://github.com/bahar15984/obesity-classification

Machine Learning Pipeline for Obesity Classification using Azure ML & Python

azure azure-ml classification data-science healthcare machine-learning mlops obesity pandas pipeline python scikit-learn

Last synced: 03 Nov 2025

https://github.com/karimosman89/legal-document-nlp

Create a tool that uses NLP to extract key information from legal documents, contracts, or agreements.Use NLP techniques for named entity recognition and text classification.Streamline the review process for legal teams by automating information extraction.

nltk python scikit-learn spacy

Last synced: 11 Apr 2026

https://github.com/lordmitrii/win-prediction-django

A web application on Django framework. It predicts a winning team based on given sets of dota2 heroes.

django dota2 jupyter-notebook machine-learning python scikit-learn web

Last synced: 13 Apr 2026

https://github.com/md-emon-hasan/ml-project-car-price-prediction

🚗 End-to-end ML project for predicting car prices based on various features. Includes data preprocessing, model training, and a Flask web for predictions.

car-price-prediction car-price-predictor data-science feature-engineering ml predictive-modeling scikit-learn

Last synced: 10 Mar 2026

https://github.com/abdullah321umar/internee.pk-dataanalytics_internship-assignment4

🌟 Fraud Detection in Application 🌟 Through Isolation Forest and K-Means Clustering, the project detects suspicious patterns like inconsistent income, duplicate entries, and unrealistic employment data. This end-to-end workflow transforms raw data into actionable fraud insights — enhancing trust and accuracy.

anomaly-detection csv-handling data-cleaning data-exporting data-import data-normalization exploratory-data-analysis export interpretation matplotlib model-evaluation pandas pca python reporting scaling scikit-learn seaborn

Last synced: 06 May 2026

https://github.com/lfenzo/ml-solar-sao-paulo

Implementation of scientific project regarding the use of Machine Learning in Solar Radiation Prediction

forecasting machine-learning python scikit-learn

Last synced: 11 Apr 2026

https://github.com/tasninanika/k-means-clustering

An interactive and insightful customer segmentation project using K-Means Clustering.

matplotlib numpy pandas plotly python3 scikit-learn seaborn

Last synced: 11 Apr 2026

https://github.com/squadron-leader/ecopredict-ai

EcoPredict AI is a powerful, AI-driven solution for predicting Greenhouse Gas (GHG) emissions based on user-input industry data. Designed for environmental sustainability initiatives, EcoPredict AI utilizes machine learning models to deliver accurate carbon emission predictions and is deployed via Streamlit for real-time access.

epa-data linear-regression python regression-model scikit-learn streamlit

Last synced: 12 Apr 2026

https://github.com/finite-sample/stagecoachml

Build two-stage models when your features arrive in two batches at different times.

machine-learning scikit-learn two-stage-models

Last synced: 14 Jan 2026

https://github.com/adrien-1997/bike-forecast-paris-velib

Bike-sharing demand forecasting in Paris (Vélib’). A data science and machine learning project leveraging open urban mobility data to predict bike availability, analyze time series usage patterns, and provide interactive dashboards for visualization.

bike-sharing dashboard data-science duckdb forecasting machine-learning matplotlib open-data pandas paris predictive-modeling python scikit-learn streamlit transportation urban-mobility velib

Last synced: 11 Apr 2026

https://github.com/swimshahriar/heart-attack-prediction

Heart attack prediction from 13 features.

jupyter-notebook pandas python3 scikit-learn

Last synced: 18 Apr 2026

https://github.com/vatshayan/hospital-discharge-analysis

Analysis of Hospitalization Discharge Rates in Lake County, Illinois of various attributes like Anxiety, Alcohol, mood, Diabetes, Asthma, etc

data-analysis data-visualization jupyter-notebook machine machine-learning machine-learning-algorithms scikit-learn

Last synced: 04 Mar 2025

https://github.com/salmandeveloperz/ml_house_prediction

project for house price prediction using Classification & Regression models. Includes Docker setup for easy deployment.

classification-model clustering deep-learning machine-learning matplotlib numpy pandas python3 regression-models scikit-learn

Last synced: 10 Apr 2026

https://github.com/uhstray-io/pyrizon

Data Collection, Analysis, Mapping, Pipelining & Transformation, & API using Python

api data-engineering etl numpy pandas plotly python pytorch raw-data scikit-learn seaborne sql sqlite tensorflow

Last synced: 09 Apr 2026

https://github.com/davidyen1124/cowculator

COWCULATOR: AI-driven catering cost forecasting in Python. Trains order-level and daily time series models, exports an edge-ready JSON bundle, and includes a demo web UI.

cli data-science edge-ai forecasting github-actions machine-learning mypy pandas python ruff scikit-learn time-series uv

Last synced: 05 May 2026

https://github.com/varun-khorgade/churnshield-customer-retention-predictor

Built an ML-based classification model to predict customer churn. Applied data preprocessing, feature engineering, and ensemble algorithms to improve prediction accuracy and help businesses implement retention strategies.

classification-algorithm datapreprocessing f1-score feature-engineering hyperparameter-tuning logistic-regression matplotlib model-evaluation numpy pandas python ran roc-auc scikit-learn seaborn xgboost

Last synced: 07 May 2026

https://github.com/rakibhhridoy/easywaydiveinto-datascience

Data Science is not as easy as it seems at first. The most problem faced by new learner are lack of resource knowledge as well as confusion in using the various resources. I hope this repository will benefit confusion learner.

algorithms algorithms-implemented bayesian-statistics data-science deep-learning deep-neural-networks linear-algebra machine-learning matplotlib multivariate-calculus numpy optimization pandas python scikit-learn scipy seaborn statistics statsmodels tensorflow

Last synced: 06 Apr 2026

https://github.com/paulj1989/bulgarian-constitutional-court-decisions

Developing NLP models for text and sentence classification using legal texts from the Bulgarian constitutional court.

keras neural-network nlp scikit-learn tensorflow tesseract

Last synced: 04 May 2026

https://github.com/elifirinci/mushrooms-plants-classification

This project features AI models for identifying mushrooms and plants as poisonous or edible using image-based predictions. Both models are tested through an interactive Gradio interface, ensuring user-friendly and accurate identification for foragers and researchers.

classification cnn cnn-classification gradio image-classification machine-learning mushroom-classification plant-classification scikit-learn

Last synced: 17 May 2026

https://github.com/fohlen/stats-experiment

A tiny stats experiment with GENESIS data

matplotlib python3 scikit-learn

Last synced: 17 May 2026

https://github.com/jersongb22/datascience_mlops_movierecommendations_project

Simulating a Data Scientist's role in a startup aggregating streaming platforms. Building movie queries and ML-based recommendation system with MLOps focus. ML model web app deployed with Render.

data-science fastapi machine-learning matplotlib pandas python render scikit-learn stopwords

Last synced: 10 Apr 2026

https://github.com/evangks/hierarchical-clustering-mall-customers

A comprehensive machine learning project demonstrating hierarchical clustering for customer segmentation on the Mall Customers dataset. Includes EDA, preprocessing, multiple linkage/distance comparisons, and professional visualizations.

clustering data-science hierarchical-clustering jupyter-notebook machine-learning mall-customers portfolio-project python scikit-learn unsupervised-learning

Last synced: 07 Mar 2026

https://github.com/somjit101/human-activity-recognition

This project is to build a model that predicts the human activities such as Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing or Laying using readings from the sensors on a smartphone carried by the user.

decision-tree-classifier eda feature-engineering gradient-boosting-classifier grid-search human-activity-recognition keras logistic-regression lstm random-forest-classifier rbf-kernel scikit-learn seaborn-plots signal-processing support-vector-classifier support-vector-machine t-sne tensorflow uci-har-dataset uci-machine-learning

Last synced: 23 Feb 2026

https://github.com/udityamerit/curafind-powered-by-ai

CuraFind AI is a web-based application leveraging Natural Language Processing (NLP) to intelligently recommend medicines. Users can search using symptoms, medicine names, or free-text descriptions, and receive suggestions along with brand substitutes for drugs

ai machine-learning nlp numpy pandas scikit-learn

Last synced: 18 Sep 2025

https://github.com/tasninanika/mammographic-masses-analysis-dt

This project uses a Decision Tree Classifier to predict whether a detected mammographic mass is benign (0) or malignant (1) based on input features.

decision-tree-classifier numpy pandas pyhton3 scikit-learn

Last synced: 11 Apr 2026

https://github.com/queirozpedro/cluesuspeitosrna

Estudando o funcionamento do Jogo Clue Suspeitos e implementando treinamento de uma Rede Neural. Clue Card Game é um jogo de cartas, onde os jogadores passam por rodadas de perguntas e respostas em busca de descobrir o cenário do crime, composto por um suspeito, um lugar e uma arma.

matplotlib mlp-classifier python scikit-learn

Last synced: 16 May 2026

https://github.com/aarryasutar/hate_speech_detection

This project aims to detect hate speech on Twitter using advanced NLP and machine learning techniques, exploring feature extraction methods like TF-IDF and sentiment analysis, and evaluating models such as Logistic Regression and SVM.

confusion-matrix doc2vec gensim logistic-regression matplotlib naive-bayes nltk numpy pandas python random-forest scikit-learn seaborn stemming stopwords-removal svm tf-idf-vectorizer tokenization vader word-cloud

Last synced: 09 Apr 2026

https://github.com/rayyan9477/machine-learning-driven-backorder-prediction-system

Experience a state-of-the-art Django web application designed to predict product backorders with exceptional accuracy. This platform leverages advanced machine learning techniques, incorporating pre-trained Random Forest Classifier, Decision Tree, and LGBM models.

matplotlib notebook numpy pandas python scikit-learn

Last synced: 12 Apr 2026

https://github.com/mgobeaalcoba/survival_predictor_on_the_titanic_scikit_learn

Titanic Survival Predictor using Scikit-Learn: Machine learning model and analysis to predict passenger survival on the Titanic based on historical data.

matplotlib numpy pandas python3 scikit-learn seaborn titanic-dataset titanic-kaggle titanic-survival-prediction

Last synced: 10 Apr 2026

https://github.com/gdapriana/clickbait-detector-backend

This repository contains the backend logic for the “Clickbait Detector” app. Built using Python, it employs an Artificial Neural Network (ANN) to predict the likelihood of a news headline being clickbait. It provides REST API endpoints to interact with the model.

flask python scikit-learn tensorflow

Last synced: 11 Apr 2026

https://github.com/soroush-04/incrementalsvm-road-accident-prediction

Enhance SVM and incremental SVM machine learning models for road accident severity prediction

incremental-learning machine-learning python scikit-learn svm

Last synced: 09 Apr 2026

https://github.com/tasninanika/heart-disease-analysis

The Heart Disease Analysis project is a comprehensive machine learning study aimed at predicting the presence of heart disease using the Heart Disease UCI Dataset.

knn logistic-regression matplotlib numpy pandas python3 random-forest scikit-learn seaborn

Last synced: 09 Apr 2026

https://github.com/kr1shnasomani/sentimentscope

Sentiment analysis on movie review using TensorFlow and GloVe embeddings

deep-learning keras matplotlib natural-language-processing neural-networks numpy pandas scikit-learn tensorflow

Last synced: 12 Apr 2026

https://github.com/ladityagogoi/shadowguard

The ShadowGuard Browser Extension is a powerful tool designed to enhance user experience by identifying and highlighting potential dark patterns on websites. Our extension employs a combination of machine learning algorithms and natural language processing (NLP) models to detect and classify various deceptive design practices

css flask html javascript joblib numpy pandas python scikit-learn

Last synced: 11 Apr 2026

https://github.com/tasninanika/will-you-survive-frontend

A full-stack machine learning app to predict Titanic passenger survival with a modern, interactive UI. Powered by FastAPI, scikit-learn, and a React frontend.

fastapi framer-motion python3 react react-router scikit-learn

Last synced: 12 Apr 2026

https://github.com/zen204/airbnb-availability

A machine learning model that predicts Airbnb listing availability, utilizing feature engineering and supervised learning techniques to improve guest experience and optimize host management.

binary-classification data-analysis data-preprocessing data-visualization feature-engineering machine-learning matplotlib model-evaluation nlp pandas predictive-modeling python scikit-learn seaborn supervised-learning

Last synced: 21 Jan 2026

https://github.com/mgobeaalcoba/linear_algebra_for_machine_learning

Explore fundamental linear algebra concepts essential for machine learning in this repository, with code examples and explanations. Get a solid foundation for ML!

machine-learning matplotlib numpy pandas python3 scikit-learn scipy seaborn

Last synced: 12 Apr 2026

https://github.com/artikumari28/movie-recommender-system

This project is a content-based movie recommendation system, where movies are recommended based on their similarity in content. The system analyzes various features such as genres, cast, and descriptions to suggest similar movies.

google-colab machine-learning nltk numpy pandas pickle scikit-learn streamlit

Last synced: 06 Apr 2026

https://github.com/vicperal/ai-genai_projects

Python projects about LLM and ML use cases. I am using modules such as Pandas, Numpy, Plotly, scikit-learn, Transformers, Flask, JSON, etc. to analyze data, predict, generate insights and create text from models such as LLMs, linear regression, assembly methods, etc. Server- Front-End using Flask

assembly clinical-trials flask json linear-regression llm ml numpy pandas plotly price-prediction python rag random-forest scikit-learn sentimental-analysis sql text-summarization tokens-counter transformers

Last synced: 02 Apr 2026

https://github.com/selcia25/sleep-disorder-detection

💤This project aims to develop an automated method for detecting sleep disorders from heart rate signals.

cnn-classification kmeans-clustering machine-learning matplotlib scikit-learn scipy sleep-disorders tensorflow

Last synced: 05 Jan 2026

https://github.com/armanjscript/fusion-rag

A powerful web-based application designed to answer questions based on the content of uploaded PDF documents. This project leverages the **Fusion-in-Decoder (FiD)** approach for **Retrieval-Augmented Generation (RAG)**, combining semantic similarity, technical term relevance, and recency to deliver accurate and contextually relevant responses

chroma chromadb fusion-rag langchain langchain-ollama ollama pypdf qwen2-5 rag rag-chatbot scikit-learn streamlit tf-idf-score tf-idf-vectorizer vector-database

Last synced: 10 Apr 2026

https://github.com/hayatoy/gcpml-notebook

Dockerfile with Jupyter Machine Learning environment plus Google Cloud SDK

dockerfile google-cloud-platform jupyter scikit-learn tensorflow

Last synced: 12 Apr 2026

https://github.com/ayberkyavuz/ml_model_server_docker_deployment

This repository is for containing source codes of machine learning model server deployment.

deployment docker flask machine-learning model python random-forest scikit-learn

Last synced: 08 Apr 2026

https://github.com/andystmc/nextflownyc

Developed a machine learning model (Bidirectional LSTM) to forecast NYC traffic volumes using 10 years of automated traffic count data. Achieved strong predictive accuracy, demonstrating the power of deep learning for urban traffic analysis.

data-analysis data-cleaning data-science data-visualization exploratory-data-analysis feature-engineering hyperparameter-tuning jupyter-notebook lstm-neural-networks machine-learning numpy pandas predictive-modeling python3 scikit-learn tensorflow-keras traffic-flow-forecasting

Last synced: 07 Apr 2026

https://github.com/jai0212/cash-app-bias-busters

A platform developed with Cash App to help ML engineers detect and visualize biases in models using Fairlearn. Features include a collaborative and interactive dashboard (React, Chart.js), a Flask backend, and a secure MySQL database for data storage and analysis.

bias-detection chartjs fairlearn flask machine-learning mysql numpy pandas pytest python react scikit-learn scipy

Last synced: 16 Feb 2026

https://github.com/tasninanika/coded_data_prediction-knn

K-Nearest Neighbors (KNN) is a supervised machine learning algorithm

knn pandas python3 scikit-learn

Last synced: 07 Apr 2026

https://github.com/shreeyas-48/creditcardfrauddetection

Project for detecting credit card frauds using neural networks and logistic regression

autoencoder keras logistic-regression matplotlib neural-networks numpy pandas python scikit-learn

Last synced: 12 Apr 2026

https://github.com/evangks/k-means-clustering-synthetic-dataset

Customer Segmentation using K-Means Clustering: A complete machine learning workflow for segmenting customers based on synthetic demographic and spending data, with visualizations, evaluation metrics, and reproducible Jupyter notebook.

clustering customer-segmentation data-science jupyter-notebook k-means-clustering machine-learning portfolio-project python27 scikit-learn unsupervised-learning

Last synced: 10 Mar 2026

https://github.com/headless-start/cs2-endtoend-chatbot

This repository contains a simple end to end Counter Strike 2 chat bot.

chatbot counter-strike-2 css flask html5 nltk python3 scikit-learn streamlit

Last synced: 11 Apr 2026

https://github.com/grachale/predict_pass_exam

Creating AdaBoost classifier with decision trees for predicting whether a student will pass or fail an exam (classification) based on the number of study hours and their scores in the previous exam.

adaboost cross-validation decision-tree jupyter-notebook matplotlib python scikit-learn seaborn

Last synced: 06 May 2026

https://github.com/guoshijiang/scikit-learn

带你一起学习scikit-learn

nlp-machine-learning scikit-learn

Last synced: 14 Sep 2025

https://github.com/f-aguzzi/ChemFuseKit

Chemometrics library for data fusion, model training and prediction of data from multiple sensor sources.

chemometrics datafusion knn lda pca plsda scikit-learn svm

Last synced: 21 Sep 2025

https://github.com/hvignolo87/marketing-campaign-classification

Real case of classification with machine learning. Analysis of real data from telemarketing campaigns of a Portuguese bank.

binary-classification data-science pandas python scikit-learn xgbclassifier xgboost

Last synced: 12 Apr 2026

https://github.com/medyessinkhlif/medclaimml

An AI-powered machine learning application designed to process healthcare reimbursement claims. It analyzes medical documents, client information, insurance policies, and legal regulations to predict accurate reimbursement amounts, ensuring efficiency, compliance, and fraud detection.

healthcare jest-tests mern-stack mongodb nodejs nosql numpy pytorch react scikit-learn tailwindcss

Last synced: 13 May 2025