An open API service indexing awesome lists of open source software.

scikit-learn

scikit-learn is a widely-used Python module for classic machine learning. It is built on top of SciPy.

https://github.com/jesly-joji/spam-ham-classifier

Used Naive Bayes Algorithm, NLP Text Preprocessing Techniques

naive-bayes-classifier nlp scikit-learn streamlit text-preprocessing

Last synced: 03 May 2026

https://github.com/iakshatgandhi/fake-news-classification-model-main

A machine learning-based project designed to classify news articles as real or fake. This system combines advanced natural language processing (NLP), robust machine learning models, and intuitive visualizations to deliver accurate and scalable predictions.

matplotlib nltk pickle python scikit-learn seaborn

Last synced: 09 Oct 2025

https://github.com/chengetanaim/sentimentanalysisforfinancialnewsnotebook

Building the model of a financial news sentiment classifier. Financial news headlines will be classified as positive, negative or neutral (from an investor point of view)

logistic-regression machine-learning natural-language-processing scikit-learn tfidf-vectorizer

Last synced: 04 May 2026

https://github.com/nirmalyabag20/crop-yield-prediction-using-machine-learning

This project uses machine learning to predict crop yields based on factors like region, crop type, rainfall, temperature, and pesticide use. By analyzing a dataset of over 28,000 records, the models provide accurate yield forecasts, helping optimize farming decisions and resource management, ultimately contributing to sustainable agriculture.

jupyter-notebook matplotlib numpy pandas python scikit-learn seaborn

Last synced: 06 Feb 2026

https://github.com/vectominist/mednlp

Mandarin Medical Dialogue Analysis with Pytorch.

dialog huggingface mandarin medical pytorch scikit-learn transformers

Last synced: 04 May 2026

https://github.com/adzialocha/notebook

Jupyter notebooks for random experiments with audio processing, data analysis and machine learning

jupyter-notebook keras learning librosa music21 scikit-learn

Last synced: 15 Apr 2026

https://github.com/antim21/spamsense-ai

Classifying emails into Spam or Not Spam categories using Machine Learning techniques

machine-learning nlp python scikit-learn

Last synced: 04 May 2026

https://github.com/analitico-771/creditworthiness_classification_model

This is an Application that trains a model using supervised learning and imbalanced-learn library in order to classify and identify the creditworthiness of borrowers

artificial-intelligence credit-risk fintech imbalanced-learning machine-learning python quantitative-finance scikit-learn supervised-machine-learning

Last synced: 04 May 2026

https://github.com/soumya6tiwari/customer-segmentation-using-rfm-analysis

This project focuses on customer segmentation using RFM (Recency, Frequency, Monetary) analysis and K-Means clustering. It enables businesses to identify high-value customers, optimize marketing strategies, and improve customer retention through data-driven insights.

backend clustering flask frontend kmeans-clustering matplotlib numpy pandas python rfm-analysis scikit-learn unsupervised-learning

Last synced: 16 Feb 2026

https://github.com/kohlerhector/trex-tree-reward-exploration

Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.

decision-trees drl exploration rl scikit-learn stable-baselines3

Last synced: 04 May 2026

https://github.com/ccharlesss/financeml

machine learning web application using Python's FastAPI and scikit-learn to predict S&P 500 stock price trends and cluster stocks based on average annual returns and volatility. Utilised the MVC design pattern to structure the application effectively. Implemented a decision tree classifier with 84% accuracy.

cicd docker fastapi finance javascript jenkins machine-learning restful-api scikit-learn webapplication

Last synced: 15 Apr 2026

https://github.com/deliprofesor/ridge-regression-for-sales-prediction-model-evaluation-and-hyperparameter-tuning

This project builds and optimizes a model on a dataset using Ridge regression and polynomial features. Model accuracy is enhanced through regularization and polynomial transformations. Grid search and cross-validation are used to find the best parameters, and the model's performance is evaluated.

cross-validation data-science data-visualization grid-search machine-learning model-optimization mse overfitting-prevention polynomial-regression python r2-score regression-analysis regularization ridge-regression rmse scikit-learn

Last synced: 03 May 2026

https://github.com/franpog859/titanic-competition

❄️🚢 Machine Learning project workflow reference. Model predicts if given people survive the Titanic disaster basing on among others their age, sex and names

classification data-science kaggle machine-learning scikit-learn titanic workflow

Last synced: 05 May 2026

https://github.com/nemeslaszlo/heart-disease

Heart disease classification project with different models (LogisticRegression, KNeighboursClassifier, RandomForestClassifier) and detailed reports.

classification knearest-neighbor-classifier logistic-regression mathplotlib numpy pandas randomforest-classification scikit-learn seaborn

Last synced: 15 Apr 2026

https://github.com/kento75/keiba_machine_learning

scikit-learnを用いた競馬予測用スクリプト

machine-learning matplotlib pandas postgresql psycopg2 python3 scikit-learn

Last synced: 18 Apr 2026

https://github.com/solanovisitor/keratoconusdetector

A repository to train and evaluate CNN-LSTM models aiming to detect Keratoconus on Galilei G6 optical biometer data.

cnn deep-learning keras lstm machine-learning pandas python scikit-learn tensorflow

Last synced: 05 Apr 2026

https://github.com/anupam0202/contextual-rag-chatbot

Contextual RAG Chatbot that processes PDF documents using the Google Gemini API

google-generativeai numpy pypdf2 scikit-learn streamlit

Last synced: 05 May 2026

https://github.com/shuddha2021/stellar-candidate-selector

A sophisticated candidate selection algorithm leveraging multi-criteria analysis and machine learning to identify top software engineering candidates. This tool features flexible filtering, score adjustment, and detailed visualizations to streamline the recruitment process.

candidate-selection data-analysis data-visualization machine-learning pandas plotting-in-python python python-data-analysis recruitment scikit-learn

Last synced: 05 May 2026

https://github.com/markoshb/my-data-science-learning-projects

Short but illustrative notebooks to showcase data-analysis in Python

data-science matplotlib-pyplot pandas python pythorch scikit-learn tensorflow

Last synced: 05 Apr 2026

https://github.com/grampers-dev/co2oracle

The CO2 Oracle project uses machine learning and AI to analyze and predict CO2 emissions for environmental management. Using a Kaggle dataset, it demonstrates predictive analytics to understand and forecast emissions. Written in Python, it employs libraries like Pandas, NumPy, and Scikit-Learn.

artificial-intelligence machine-learning numpy pandas python scikit-learn

Last synced: 07 Feb 2026

https://github.com/mgobeaalcoba/data_champions_meli

Algorithms and work carried out within the framework of data champions by Mercado Libre

algorithms canvas classification clustering data-science machine-learning python3 scikit-learn

Last synced: 18 Apr 2026

https://github.com/sigilbyte/choquet-classifier

Implementation of the Choquet classifier using the scikit-learn API design.

machine-learning regression regression-models scikit-learn scikitlearn-machine-learning

Last synced: 05 May 2026

https://github.com/codenexa/nairobi

Quantifying Integrity in the Digital Age Misinformation spreads rapidly, accountability often falters, and the lines between transparency and manipulation blur

csv ipynb-jupyter-notebook matpotlib pkl-model python scikit-learn

Last synced: 05 May 2026

https://github.com/rakshit-vasava/predictive-analytics-for-insurance-purchase

Predicting customer insurance purchases using stacking models and SMOTE for the Homesite Quote Conversion Problem on Kaggle.

k-nearest-neighbours kaggle-competition multilayer-perceptron python random-forest scikit-learn smote support-vector-machines

Last synced: 05 May 2026

https://github.com/tromesh/sinhala-parser

Sinhala parser project is based on Natural Language Processing (NLP)

flux-architecture natural-language-processing nlp python react scikit-learn sinhala

Last synced: 05 May 2026

https://github.com/pngo1997/yelp-business-recommender-system

Building an item-based collaborative recommendation system using embeddings for establishments from the Yelp dataset.

content-based-recommendation embeddings geo-mapping geospatial information-retrieval python recommender-system scikit-learn spacy

Last synced: 05 May 2026

https://github.com/myounus-codes/saleprice-prediction-dataset-analysis-and-cleaning-advance-regression

In this project I have cleaned the data for the model. Project Google Colab Link: https://colab.research.google.com/drive/1vQY-XEFJSdEkW2PQOSf1j13Yk8L-XXNw?usp=sharing

algorithms data-analysis data-science eda google-colab machine-learning numpy pandas python scikit-learn scikit-learn-python

Last synced: 05 May 2026

https://github.com/grachale/predict_life_expect

Predicting life expectancy (regression) with usage of custom random forest, linear regression and decision tree regressor from scikit-learn.

decision-tree-regression jupyter-notebook linear-regression pandas python random-forest regression scikit-learn

Last synced: 05 May 2026

https://github.com/intscription/python-programs

Python basics-advance

numpy pandas scikit-learn

Last synced: 05 May 2026

https://github.com/kbo-data-portal/pipeline

Automates KBO data collection and deployment with Airflow.

airflow dbt kbo lightgbm python scikit-learn

Last synced: 07 Oct 2025

https://github.com/tomwassing/brane-project

Brane example project using the Scikit-learn and Matplotlib packages

brane branescript matplotlib scikit-learn

Last synced: 17 Oct 2025

https://github.com/somjit101/nlp-casestudy-quora-question-similarity

An application of NLP and classical ML algorithms to an interesting real-world use case of predicting similarity between two questions on Quora. This allows the platform to combine similar questions into one and combine their answers to avoid duplication and unnecessary confusion.

cross-validation feature-engineering feature-extraction gradient-boosting kaggle logistic-regression machine-learning model-calibration natural-language-processing nlp quora-question-pairs scikit-learn svm text-mining xgboost

Last synced: 05 Apr 2026

https://github.com/sarthak-1408/rain-fall-prediction

This repository represents the End to End Machine Learning Project (Rain Fall Prediction in Australia).

heroku heroku-deployment machine-learning numpy pandas rain-fall rain-fall-prediction scikit-learn xgboost-algorithm

Last synced: 05 May 2026

https://github.com/ayushsaksena30/cosmic-classifier

This notebook implements a structured machine learning pipeline to classify cosmic data using the CatBoost Classifier, known for its efficiency with categorical features and minimal preprocessing requirements.

catboost-classifier label-encoder machine-learning matplotlib numpy pandas robust-scaler scikit-learn seaborn simple-imputer

Last synced: 15 Apr 2026

https://github.com/venky-1710/stress-level-predection

Stress Level Prediction is a web app using machine learning to estimate user stress levels. It takes inputs like anxiety, sleep quality, and academic performance, then predicts stress using a Decision Tree Classifier. Built with Python, Flask, and scikit-learn, it's useful for students, researchers, and those interested in stress management.

css flask html machine-learning numpy pandas python python-sklearn scikit-learn

Last synced: 05 Apr 2026

https://github.com/assamirzafar/learning

My Roadmaps and challenges are in this repo...I will add my colab and kaggle notebook links along with py script files in here.

calculus convolutional-neural-networks deep-learning deep-neural-networks keras linear-algebra machine-learning numpy opencv probability python3 pytorch scikit-learn scipy statistics

Last synced: 05 Apr 2026

https://github.com/jordandeklerk/pygridge

A scikit-learn compatible Python package for data-driven group regularized ridge regression

python regression regularized-regression scikit-learn

Last synced: 05 May 2026

https://github.com/rhazra-003/fake_news_detector

A Machine Learning model to detect fake news with more than 95% accuracy

fake-news numpy pandas scikit-learn

Last synced: 18 Apr 2026

https://github.com/siam29/hybrid-feature-engineering-and-ensemble-learning

In this ML project, I proposed a methodology that provided an outperformed performance compared to another existing paper. For the comparison here focused mainly on F1, accuracy, AUC, and ROC score. This methodology provides a 99.96% accuracy score and 90.05% F1 score. 

feature-selection keras-tensorflow machine-learning matplotlib python scikit-learn

Last synced: 18 Apr 2026

https://github.com/joaoassalim/class-by-description-classifier-with-nlp

Enhancing Item Classification through Natural Language Processing: Leveraging Text Descriptions for Precise Categorization

bert fine-tuning nlp nlp-machine-learning scikit-learn sklearn tensorflow

Last synced: 06 May 2026

https://github.com/khaymanii/diabetes_prediction_model

This is a Machine learning model built using Python

matplotlib numpy pandas python scikit-learn

Last synced: 19 Apr 2026

https://github.com/drcbeatz/machine-learning-tool

Machine Learning Tool - Train and test supervised ML algorithms (incl. binary classification and regression) on custom data sets and visualize your results without knowing how to code.

data-science data-visualization django machine-learning python scikit-learn

Last synced: 06 May 2026

https://github.com/himendersharma0712/life_expectancy_pred

This repository is for a hackathon project.

jupyter-notebook machine-learning python scikit-learn

Last synced: 06 May 2026

https://github.com/shubhranpara/heart-disease-predictor

I have created this project as my Python term assignment. In this project I have trained a ML model to predict the heart disease using Scikit-learn library in python.

google-colab jupyter-notebook machine-learning medical prediction-model python scikit-learn

Last synced: 06 May 2026

https://github.com/sandeepbalachandran/predictor

A collection of prediction algorithms for different purposes

collection jupyter-notebook machine-learning notebook predictor regression-models scikit-learn

Last synced: 06 May 2026

https://github.com/varun-khorgade/cvinsight-ai-resume-analyzer

AI tool that analyzes resumes, extracts keywords, and matches them with job descriptions.

css django html5 nlp python scikit-learn textparse

Last synced: 06 May 2026

https://github.com/nurulashraf/ann-cancer-prediction

An Artificial Neural Network built with TensorFlow and Keras to predict breast cancer based on the Wisconsin Breast Cancer dataset.

artificial-neural-network breast-cancer-prediction deep-learning keras machine-learning python scikit-learn tensorflow

Last synced: 06 May 2026

https://github.com/kieranlitschel/kerassearchcv

Built for the implementation of Keras in Tensorflow. Behaves similarly to GridSearchCV and RandomizedSearchCV in Sci-Kit learn, but allows for progress to be saved between folds and for fitting and scoring folds in parallel.

classification grid-search keras keras-tensorflow multithreading randomized-search scikit-learn

Last synced: 20 Apr 2026

https://github.com/khaymanii/house-price-prediction-model

This model was built using Python and XGBoost Regression algorithm

matplotlib numpy pandas python scikit-learn

Last synced: 06 May 2026

https://github.com/k-ashik/genescout-ai-genetic-disease-pathologist

GeneScout: An interpretable AI Pathologist that predicts 5 genetic diseases with 93.5% accuracy using an Ensemble Voting Classifier and SHAP for clinical explainability.

data-science explainable-ai healthcare-ai machine-learning precision-medicine python scikit-learn shap streamlit

Last synced: 20 Apr 2026

https://github.com/deaneeth/telco-churn-prediction-mlops

Production-ready ML pipeline for telco customer churn prediction using advanced ensemble methods (XGBoost, CatBoost, Random Forest). Handles class imbalance, provides business insights, and includes modular MLOps architecture. Built with scikit-learn, featuring comprehensive EDA, feature engineering, and business impact analysis.

catboost data-preprocessing ensemble-methods feature-engineering machine-learning mlops pipeline-development python random-forest scikit-learn telco-analytics xgboost

Last synced: 15 Apr 2026

https://github.com/jagadishdas21/brain-tumor-detection

This repository contains the implementation of a deep learning model to detect brain tumors from MRI images using Convolutional Neural Networks (CNN). The goal of this project is to classify MRI images as either having a brain tumor (Positive) or not having one (Negative).

computer-vision convolutional-neural-networks matplotlib scikit-learn tensorflow

Last synced: 26 Feb 2026

https://github.com/prajwalsinha/unveiling-climate-change-dynamics-through-earth-surface-temperature-analysis

Climate change analysis through global surface temperature data. Includes data preprocessing, statistical analysis, visualizations, and forecasting. Python-based project using Pandas, Matplotlib, and Scikit-learn.

data dataanalysis dynamic-mapping pyplot python scikit-learn seaborn

Last synced: 10 Feb 2026

https://github.com/elcorto/gp_playground

Explore selected topics related to Gaussian processes

gaussian-processes gpy gpytorch kernel-ridge-regression machine-learning scikit-learn tinygp

Last synced: 06 May 2026

https://github.com/mohammadvhossein/ml-gym

The ML-GYM repository showcases machine learning projects using **scikit-learn**, covering classification, regression, and clustering. It offers educational resources for beginners and practical examples for experienced users, complete with detailed instructions.

classification-algorithms clustering-methods cross-validation data-preprocessing data-science decision-trees feature-engineering machine-learning model-evaluation neural-networks python-programming random-forests regression-techniques scikit-learn supervised-learning unsupervised-learning

Last synced: 06 May 2026

https://github.com/magnuss0/movie-rec-system

The project extracts movie data using TheMovieDB API, processes it using TF-IDF and cosine similarity for generating recommendations, and stores the data in a DuckDB database. The system is encapsulated within a FastAPI web application and can be deployed using Docker. It provides movie recommendations in JSON format.

cosine-similarity docker duckdb movies-recommendation moviesdb-api ploomber poetry-python scikit-learn streamlit tf-idf

Last synced: 14 Apr 2026

https://github.com/rixiiz/using-knn-to-predict-the-obp-of-mlb-players

Using KNN to predict the On Base Percentage (OBP) of Major League Baseball (MLB) players at the end of the season

artificial-intelligence dataset f1-score jupyter-notebook knn-regression machine-learning matplotlib mse numpy pandas python scikit-learn supervised-learning

Last synced: 05 Apr 2026

https://github.com/omanshu209/ml-basics-2022

Machine Learnings(AI) models developed using the scikit-learn library in Python.

jupyter-notebook machine-learning python python3 scikit-learn

Last synced: 06 May 2026

https://github.com/glencrawford/matchmaker

A k-nearest neighbors machine learning project to perform similarity matching using a dataset of OkCupid dating profiles.

django machine-learning python scikit-learn scipy

Last synced: 06 May 2026

https://github.com/sorna-fast/breast-cancer-diagnosis-neural-network

ANN-based breast cancer classifier using the Wisconsin Diagnostic Dataset. Implements advanced feature engineering and achieves 98.25% test accuracy. Includes comprehensive EDA, model training, and clinical impact analysis

keras-classification-models keras-neural-networks keras-tensorflow matplotlib-pyplot pandas-dataframe scikit-learn seaborn-plots sklearn-library tensorflow

Last synced: 20 Apr 2026

https://github.com/sralter/happy_customers

Predicting whether a customer is happy based on the results from a survey.

eda ensemble-classifier hyperopt lazypredict ml scikit-learn

Last synced: 21 Apr 2026

https://github.com/mpolinowski/isometric-mapping

Non-linear dimensionality reduction through Isometric Mapping

isomap matplotlib-pyplot python scikit-learn

Last synced: 06 May 2026

https://github.com/kartikdixit2468/advanced-jarvis-ai-using-python

An A.I voice assistant in python using simple machine learning algorithms and BardAPI.

bard bardapi jarvis machine-learning python scikit-learn voice-assistant voice-recognition

Last synced: 16 Apr 2026

https://github.com/flexycode/ccmaclrl

🤖 This repository is intended for our Machine Learning CCMACLRL COM231ML by Professor Elizer Ponio Jr

artificial-intelligence linnear-regression machine-learning machine-learning-algorithms python random-forest scikit-learn supervised-learning tensorflow

Last synced: 07 May 2026

https://github.com/marksikaundi/handson-machinelearning

Complete Collection about Machine Learning

matplotlib pandas-python scikit-learn tensorflow

Last synced: 07 May 2026

https://github.com/cbjuan/paper-ijimai-ml-employability

Jupyter notebook developed to support the research presented in the paper "Proposing a machine learning approach to analyze and predict employment and its factors"

jupyter-notebook python research scikit-learn

Last synced: 07 May 2026

https://github.com/kashifmoin1410/computer-vision-traditional-vs.-deep-learning-approaches

This project compares traditional Bag-of-Words with SVM and a custom ResNet-style CNN for image classification on the CIFAR-10 dataset. It covers the full workflow: feature extraction, model building, training, evaluation, and visualization. Results demonstrate the superior accuracy and robustness of deep learning models over classic ML pipelines.

bag-of-words cifar10 cnn comparative-analysis computer-vision deep-learning feature-extraction image-classification keras knn-classification machine-learning model-evaluation neural-network python3 resnet scikit-learn sift-algorithm svm-classifier

Last synced: 06 May 2026

https://github.com/nirmalyabag20/loan-status-prediction-using-machine-learning

This project focuses on predicting the loan status (approved or not approved) based on various applicant details. The goal is to develop a machine learning model that accurately classifies whether a loan should be approved, helping financial institutions make informed lending decisions.

matplotlib numpy pandas python scikit-learn seaborn support-vector-machine

Last synced: 19 Jan 2026

https://github.com/asut00/machine-learning-program_42ai

Comprehensive Machine Learning path by 42AI: hands-on modules on regression, gradient descent, and real-world ML applications.

linear-regression machine-learning matplotlib numpy pandas python scikit-learn

Last synced: 07 May 2026

https://github.com/aymanmansur/insider-threat-detection-using-cert-dataset-logon-

Detecting anomalies in user logon behavior using the CERT Insider Threat Detection Dataset. This project extracts key features like session duration and logon frequency during non-working hours and applies Isolation Forest to identify suspicious activity.

matplotlib pandas python scikit-learn

Last synced: 07 May 2026

https://github.com/rajikaimal/emma

:santa: Intelligent mention bot for GitHub organizations

bot emma machine-learning python scikit-learn

Last synced: 24 Apr 2026

https://github.com/noahtigner/discoverdaily

A Spotify Recommender System. Trains a Classifier on your musical tastes and recommends songs daily. Uses the Spotify API and scikit-learn for machine learning.

machine-learning recommender-system scikit-learn spotify spotify-api

Last synced: 24 Apr 2026

https://github.com/haloapping/ml-with-me

Kalo dengar istilah ML, biasanya rada ambigu. Soalnya punya beberapa kepanjangan, seperti Mobile Legend, Makan Lontong, dan lain-lain. Tapi pada repo ini membahas Machine Learning :)

ml pusing python3 scikit-learn stress tau-ah-gelap

Last synced: 14 Apr 2026

https://github.com/piyush1927/flightforecast

ML model to predict flight prices based on various features like departure time, arrival time, duration, airline, source, destination, and number of stops.

machine-learning mathplotlib numpy pandas scikit-learn seaborn

Last synced: 16 Apr 2026

https://github.com/shimazadeh/total-perspective-vortex

This subject aims to create a brain computer interface based on electroencephalographic data (EEG data) with the help of machine learning algorithms. Using a subject’s EEG reading, you’ll have to infer what he or she is thinking about or doing - (motion) A or B in a t0 to tn timeframe.

ai algorithm classification datascience dimensionality-reduction eeg scikit-learn

Last synced: 25 Apr 2026

https://github.com/idaraabasiudoh/knn-customer-classification

Labels telecommunication customer base to respective groups to determine service type required for each customer.

data-analysis jupyter-notebook machine-learning pyhton3 scikit-learn

Last synced: 07 May 2026

https://github.com/joseprsm/nectarine

🍑 Neural Enhanced Collaborative Tool for Automated Recommendation and INtelligent Exploration

argo-workflows recommender-systems scikit-learn tensorflow tensorflow-recommenders

Last synced: 07 May 2026

https://github.com/md-emon-hasan/6-classification-iris-ml-apps

A ML project on the classification of the Iris dataset, demonstrating data preprocessing, model training, and evaluation using Python and scikit-learn.

classification data-science iris-classification iris-dataset iris-flower-classification predictive-modeling scikit-learn

Last synced: 26 Apr 2026

https://github.com/singhrahuldps/myscikitlearn

My implementation of some Machine Learning Algorithms from scratch.

classifier-model decision-trees machine-learning scikit-learn

Last synced: 27 Apr 2026

https://github.com/chirindaopensource/measuring_corruption_from_text_data

End-to-End Python implementation of Muço’s (2025) corruption measurement framework. Combines NLP pipeline (regex extraction, Porter stemming, TF-IDF), PCA-based dimensionality reduction, and fixed-effects OLS to quantify institutional quality from Brazilian audit reports. Includes supervised learning robustness checks and LOO sensitivity analysis.

audit-analysis brazilian-data corruption-measurement dictionary-based-classification dimensionality-reduction econometrics fixed-effects government-transparency institutional-quality natural-language-processing nltk political-economy portuguese-nlp principal-component-analysis research-replication scikit-learn supervised-learning text-as-data text-classification text-mining

Last synced: 27 Apr 2026