An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with imbalanced-data

A curated list of projects in awesome lists tagged with imbalanced-data .

https://github.com/ufoym/imbalanced-dataset-sampler

A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.

data-sampling image-classification imbalanced-data pytorch

Last synced: 14 May 2025

https://github.com/analyticalmindsltd/smote_variants

A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features

imbalanced-data imbalanced-learning oversampling smote

Last synced: 21 Oct 2025

https://github.com/30lm32/ml-projects

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

ab-testing deep-learning docker gensim geolocation imbalanced-data kdtree keras lstm-neural-networks machine-learning mlflow nlp random-forest spam-classification svm tensorboard tensorflow text-classification timeseries-analysis word2vec

Last synced: 08 May 2025

https://github.com/zhiningliu1998/self-paced-ensemble

[ICDE'20] ⚖️ A general, efficient ensemble framework for imbalanced classification. | 泛用,高效,鲁棒的类别不平衡学习框架

class-imbalance classification ensemble ensemble-learning ensemble-methods ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning machine-learning pypi python3

Last synced: 05 Apr 2025

https://github.com/jrzaurin/lightgbm-with-focal-loss

An implementation of the focal loss to be used with LightGBM for binary and multi-class classification problems

focal-loss imbalanced-data lightgbm python3

Last synced: 03 Oct 2025

https://github.com/dialnd/imbalanced-algorithms

Python-based implementations of algorithms for learning on imbalanced data.

data-science imbalanced-data machine-learning notre-dame python

Last synced: 11 Apr 2025

https://github.com/solegalli/machine-learning-imbalanced-data

Code repository for the online course Machine Learning with Imbalanced Data

data-science imbalanced-classification imbalanced-data imbalanced-learning machine-learning python

Last synced: 16 May 2025

https://github.com/zhiningliu1998/mesa

[NeurIPS’20] ⚖️ Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题

class-imbalance ensemble ensemble-machine-learning ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning mesa meta-learning-algorithms meta-sampler meta-training

Last synced: 01 Apr 2026

https://github.com/ZhiningLiu1998/mesa

[NeurIPS’20] ⚖️ Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题

class-imbalance ensemble ensemble-machine-learning ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning mesa meta-learning-algorithms meta-sampler meta-training

Last synced: 11 Apr 2025

https://github.com/ashishpatel26/datascienv

datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

catboost data-science data-science-env datascienv imbalanced-data lightgbm matplotlib numpy pandas pycaret scikit-learn seaborn tensorflow2 xgboost

Last synced: 24 Oct 2025

https://github.com/tgsmith61591/skoot

A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.

data-science imbalanced-data machine-learning pandas python scikit-learn skutil

Last synced: 11 Sep 2025

https://github.com/jiequancui/ResLT

ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)

class-imbalance imbalance-classification imbalanced-data imbalanced-learning long-tail long-tailed-recognition

Last synced: 08 May 2025

https://github.com/wangz10/class_imbalance

Jupyter Notebook presentation for class imbalance in binary classification

classification imbalanced-data machine-learning tutorial

Last synced: 11 May 2025

https://github.com/borealisai/ranksim-imbalanced-regression

[ICML 2022] RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

imbalanced-data imbalanced-learning imbalanced-regression regression

Last synced: 07 Apr 2025

https://github.com/zhiningliu1998/bat

[ICML'24] BAT: 🚀 Boost Class-imbalanced Node Classification with <10 lines of Code | 从拓扑视角出发10行代码改善类别不平衡节点分类

data-augmentation graph-algorithms graph-machine-learning graph-mining imbalanced-data imbalanced-learning machine-learning node-classification

Last synced: 05 May 2025

https://github.com/lirongwu/graphmixup

Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"

graph-algorithms graph-self-supervised-learning imbalanced-classification imbalanced-data reinforcement-learning

Last synced: 27 Jul 2025

https://github.com/LirongWu/GraphMixup

Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"

graph-algorithms graph-self-supervised-learning imbalanced-classification imbalanced-data reinforcement-learning

Last synced: 15 Aug 2025

https://github.com/kyosek/probability-calibration-imbalanced

This repository implements Pozzolo, et al., (2015)'s probability calibration for imbalanced data.

bayesian-methods classification creditcard-fraud imbalanced-data machine-learning probability-calibration

Last synced: 15 May 2025

https://github.com/predict-idlab/headachedss

Repository containing all code and data required to reproduce the experiments of 'A decision support system to follow up and diagnose chronic primary headache patients using semantically enriched data'

decision-support-system imbalanced-data knowledge-based-systems unsupervised-features

Last synced: 07 Jul 2025

https://github.com/sergio11/online_payment_fraud

Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. 🚨🤖 Complete process from EDA and data preprocessing to model training and evaluation. 📊🔍

classification data-preprocessing data-science deep-neural-networks dnn exploratory-data-analysis financial-fraud fraud-detection fraud-detection-model imbalanced-data keras machine-learning neural-network python smote tensorflow

Last synced: 17 Aug 2025

https://github.com/predict-idlab/tpehgdb-experiments

Experiments conducted on the TPEHGDB dataset to reproduce the reported results from "A critical look at studies applying over-sampling on the TPEHGDB dataset"

data-leakage imbalanced-data oversampling tpehgdb-dataset

Last synced: 07 Jul 2025

https://github.com/majobasgall/smote-mr

SMOTE-MR: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data which applies a MapReduce based-approach. SMOTE-MR is categorized as an `approximated/ non exact` solution. Also, there is an `exact` solution called SMOTE-BD written by the author (See: https://github.com/majobasgall/smote-bd)

big-data imbalanced-data machile-learning scala smote spark

Last synced: 17 May 2026

https://github.com/oyebamiji-micheal/employee-status-prediction-web-app

A machine learning web app which predicts whether an employee gets promoted or not

imbalanced-data machine-learning random-forest streamlit xgboost

Last synced: 20 May 2026

https://github.com/splch/qbs

An effective and flexible Quantile-Based Balanced Sampling algorithm for addressing class imbalance in datasets while preserving the underlying data distribution, improving model performance across various machine learning applications.

classification data-analysis imbalanced-classification imbalanced-data machine-learning resampling

Last synced: 01 Apr 2025

https://github.com/amirreza81/applied-data-science-course

Comprehensive notes, practical exercises, and problem-solving solutions from the Applied Data Science course, covering data preprocessing, machine learning algorithms, statistical analysis, data visualization, and real-world applications.

accuracy-measure boosting classification data-cleaning data-preprocessing data-science data-visualisation deep-learning dimensionality-reduction eda feature-engineering image-classification imbalanced-data kaggle-dataset machine-learning multiclass-classification pandas regression scikit-learn stroke-prediction

Last synced: 22 Mar 2025

https://github.com/cyprianfusi/frauddetectionmodel-with-gretl

With this model: the amount of backlog would be reduced significantly, the amount of staff needed to do the job would be reduced drastically, the processing time would be shortened significantly and more cases of fraudulent transactions would be tracked down in a given amount of data processed - more than 40% increase in efficiency!

adaboostclassifier backlog cap-curve imbalanced-data logistic-regression

Last synced: 21 Mar 2025

https://github.com/debugger404/imbalanced-classification

Imbalanced Data Classification Repository - 📦🤖 Code for classifying products into categories using deep learning. Divided into dataset creation, model development, and transfer learning sections. Implements TensorFlow for efficient training, tackles imbalanced classes, and includes saved models and one-hot encoded labels.

classification deep-learning imbalanced-data one-hot-encoding python tensoflow

Last synced: 08 Apr 2025

https://github.com/ekellbuch/longtail_ensembles

Evaluating ensemble performance in long-tailed datasets (Neurips 2023 Heavy Tails Workshop)

class-imbalance ensemble-learning fairness-ml imbalanced-classes imbalanced-classification imbalanced-data imbalanced-learning

Last synced: 15 May 2026

https://github.com/amr-yasser226/intrusion-detection-kaggle

End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 91.46% public / 91.63% private.

catboost cyber-security data-preprocessing ensemble-learning feature-engineering imbalanced-data jupyter-notebooks kaggle lightgbm machine-learning outlier-detection random-forest xgboost

Last synced: 18 May 2026

https://github.com/musadiqpasha/imbalance-learning

To tackle imbalanced classification in fake job detection dataset using various resampling techniques and model evaluations.

classification imbalanced-data ipython-notebook resampling smote-sampling

Last synced: 09 May 2025

https://github.com/aman5319/bank-marketing-analysis

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

bank-marketing-analysis desiciontree imbalanced-data logistic-regression roc-auc sklearn smote

Last synced: 06 May 2026

https://github.com/quantumcoderrr/credit-card-fraud-detection

💴 A machine learning project that detects fraudulent credit card transactions using classification algorithms. Includes data preprocessing, EDA, model training & evaluation with techniques like Random Forest, Logistic Regression, and SMOTE for class imbalance. Built for secure financial insights and real-world fraud detection use cases.

classification credit-card data-science financial-security fraud-detection imbalanced-data logistic-regression machine-learning

Last synced: 30 Aug 2025

https://github.com/mohitgupta0123/fraud_detection_mlops

End-to-end Fraud Detection MLOps pipeline integrating MLflow, FastAPI, Streamlit, Docker, Kubernetes, Prometheus, and Grafana for real-time fraud prediction, experiment tracking, and monitoring.

anomaly-detection docker end-to-end-pipeline fastapi fraud-detection grafana imbalanced-data kubernetes machine-learning mlflow mlops-project prometheus python streamlit

Last synced: 11 Apr 2026

https://github.com/lazerlambda/team09applieddl

Model Distillation for Unlabeled and Imbalanced Data for Amino-Acid-Strings

data-science deep-learning distillation imbalanced-data lmu munich python statistics transformer unlabeled-data

Last synced: 25 Mar 2025

https://github.com/pydevcasts/churn_modeling_article

customer churn prediction system for banking institutions using advanced feature engineering and ensemble learning techniques. The model addresses highly imbalanced datasets (10:1 ratio) by combining SMOTE oversampling with a Soft Voting Classifier (Random Forest, Gradient Boosting, and XGBoost)

customer-churn-prediction ensemble-learning imbalanced-data smote

Last synced: 30 May 2026

https://github.com/omidghadami95/efficientnetv2_catvsdog

Binary classification, SHAP (Explainable Artificial Intelligence), and Grid Search (for tuning hyperparameters) using EfficientNetV2-B0 on Cat VS Dog dataset.

binary binary-classification catvsdog catvsdog-classifier deep-learning efficientnet efficientnetv2 efficientnetv2-b0 explainable-ai explainable-ml fairness-ai fairness-ml gridsearch imbalanced-data imbalanced-dataset keras shap tensorflow2

Last synced: 17 Apr 2026

https://github.com/kyosek/analyzing-online-prices-by-using-machine-learning-techniques-analysis

Analyzing Online Prices by Using Machine Learning Techniques (master thesis) - Analysis part source code

classification imbalanced-data logistic-regression price-changes

Last synced: 29 Jul 2025

https://github.com/rajireddy15/employee-attrition-prediction-hr-analytics-

Employee Attrition Prediction (HR Analytics) helps organizations analyze employee data, identify factors driving turnover, and predict attrition using machine learning and visual dashboards, enabling data-driven HR decisions and retention strategies.

data-cleaning data-collection data-manipulation data-preprocessing data-science data-visualization eda feature-engineering imbalanced-data machine-learning mysql-database numpy pandas

Last synced: 04 May 2026

https://github.com/khyatimahendru/balancing-data-smote

In this project, I have worked on the problem of Credit Card Fraud Detection. The data is highly imbalanced with the positive class (fraud) accounting merely for 0.172% of the data. In classification problem balancing your data is extremely important. Here I have described how accuracy should not be the only criteria to judge model performance.

classification classification-algorithims data-science imbalanced-data machine-learning smote

Last synced: 07 Apr 2025

https://github.com/mohamedlotfy989/credit-card-fraud-detection

This repository focuses on credit card fraud detection using machine learning models, addressing class imbalance with SMOTE & undersampling, and optimizing performance via Grid Search & RandomizedSearchCV. It explores Logistic Regression, Random Forest, Voting Classifier, and XGBoost. balancing precision-recall trade-offs for fraud detection.

classic-machine-learning credit-card-fraud ensemble-learning fraud-detection grid-search-hyperparameters hyperparameter-tuning imbalanced-data logistic-regression precision-recall random-forest randomizedsearchcv smote threshold-tuning undersampling voting-classifier xgboost

Last synced: 19 Oct 2025

https://github.com/mxagar/advanced_ml_techniques

This repository collects material, guides and links related to intermediate-advanced techniques used in professional machine learning.

feature-engineering feature-selection hyperparameter-optimization imbalanced-data machine-learning

Last synced: 03 Feb 2026

https://github.com/dennismgoetz/dac

"Data Analytics Challenge" course at the Catholic University of Eichstätt-Ingolstadt

asn-smote credit-fraud imbalanced-data oversampling-technique smote

Last synced: 16 May 2025

https://github.com/rakibhhridoy/handlingimbalanceddataset-business

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase but by others illegally. Some huge transactions can also done by suspicious figure, it need to catch em.

auc business-intelligence fraud-detection imbalanced-data imbalanced-learning machine-learning oversampling precision recall smote transcations

Last synced: 14 May 2025

https://github.com/christopher-w-murphy/class-imbalance-in-ww-polarization

Treating the measurement of the same-sign W polarization fraction as a class imbalance problem

high-energy-physics imbalanced-data keras maximum-likelihood-estimation

Last synced: 28 May 2026

https://github.com/kyosek/analyzing-online-prices-by-using-machine-learning-techniques-ml

Analyzing Online Prices by Using Machine Learning Techniques (master thesis) - Analysis part source code

imbalanced-data machine-learning price-changes

Last synced: 21 Jul 2025

https://github.com/andreazoccatelli/tabular_data_augmentation_continuous

This repository contains the scripts used to write my master degree thesis project: "Augmentation of tabular data with continuous features for binary imbalanced classification problems"

cgan copula data-augmentation imbalanced-classification imbalanced-data imbalanced-learning

Last synced: 21 Apr 2026

https://github.com/konnik88/heart-disease-ml-practice

Practice notebook on heart-disease risk with a small/noisy dataset: EDA → preprocessing → classic ML baselines (scikit-learn). Not for clinical use

classification eda healthcare heart-disease imbalanced-data jupyter-notebook machine-learning model-evaluation optuna reproducibility scikit-learn

Last synced: 18 May 2026

https://github.com/aryanpillai2007/credit-card-fraud-detection

The primary goal of this project is to develop a comprehensive fraud detection system that enhances the security and trustworthiness of financial transactions.

anomaly-detection classification credit-card-fraud data-preprocessing data-science data-visualization fraud-detection imbalanced-data logistic-regression machine-learning outlier-detection pca pca-analysis python roc-curve scikit-learn

Last synced: 18 May 2026

https://github.com/jamnicki/yoga-pose-ensemble-clf

University course project: Machine learning -- Ensemble learning strategies comparison on classification quality for imbalanced data

ensemble-learning imbalanced-data machine-learning multi-class-classification

Last synced: 07 Apr 2025

https://github.com/zuzann18/credit-risk-classification

End-to-end machine learning project for predicting loan defaults on the HMEQ home equity loan dataset. Includes data preprocessing, EDA, feature engineering, model training (Logistic Regression, Random Forest, XGBoost), hyperparameter tuning, model comparison, SHAP-based interpretability, and business recommendations

imbalanced-data logistic-regression mice random-forest-classifier shap smote xgboost

Last synced: 04 Feb 2026

https://github.com/macdon112/credit-card-fraud-detection

Comparing ML models (Random Forest, KNN, Decision Tree) for credit card fraud detection using SMOTE and stratified cross-validation.

classification data-analysis fraud-detection imbalanced-data machine-learning python scikit-learn

Last synced: 10 May 2026

https://github.com/obirikan/ml_model_fraud_detection

This project demonstrates how to use Logistic Regression to detect fraudulent transactions using SMOTE for an imbalanced data

imbalanced-data logistic-regression smote-oversampler

Last synced: 18 Jul 2025

https://github.com/ricardorobledo/malicious_server_hack_detection

Predictive model to detect malicious hacking patterns in banking servers. Utilizes advanced Machine Learning techniques such as SMOTE, Gradient Boosting, and probability calibration to predict attacks befor

anaconda cibersecurity imbalanced-data imbalanced-learning imblearn kaggle matplotlib numpy pandas pandas-library python3 sklearn

Last synced: 14 Apr 2026

https://github.com/halacoded/bodyperformance_imbalanced

exploring Imbalanced classification and Techniques to Handle Imbalance.

coded imbalanced-classification imbalanced-data machine-learning

Last synced: 21 Jun 2025

https://github.com/mindful-ai-assistants/2-social-buzz-ai-gboost-and-lowdefault-modeling

2-Gradient Boosting Machines and Low-Default Modeling: A repository for research, implementation, and best practices with Gradient Boosting methods (GBM, XGBoost, LightGBM), H2O AutoML, and robust strategies for modeling extreme class imbalance ("Low Default") in data science for finance and risk.

anomaly-detection auto-machine-learning credit-risk disease-prediction financia-lmodeling fraud-detection gbm gradientboosting h2o imbalanced-data lightgbm low-default-modeling machinelearning model-interpretability natural-language-processing oneness-consciousness randomforest risk-analytics smote xgboost

Last synced: 09 Oct 2025

https://github.com/miozilla/conmatrix

conmatrix :1234::alien::checkered_flag: : Confusion Matrix # Data Imbalance # Evaluation # Weights & Biases

accuracy azureml bias binary-classification classification-model confusion-matrix dataset evaluate false-positive-rate fn fp imbalanced-data precision recall roc sensitivity specificity tn tp weight

Last synced: 10 Oct 2025

https://github.com/ahsankhizar5/fraud-detection-ml-pipeline

A complete machine learning pipeline for fraud detection using balanced classification, feature engineering, and ROC-based evaluation.

anomaly-detection classification data-science fraud-detection imbalanced-data ml-pipeline python random-forest

Last synced: 31 May 2026

https://github.com/msikorski93/retinal-vessel-segmentation-using-deep-learning

Retinal vessel segmentation is the task of segmenting vessels in retina imagery. This binary task was performed with a U-Net network.

computer-vision efficientnet healthcare imbalanced-data keras retinal-images retinal-vessel-segmentation segmentation-models tensorflow u-net

Last synced: 14 May 2026

https://github.com/leabrodyheine/ml-kaggle-cirrhosis-data

This project showcases skills in machine learning, data preprocessing, and model evaluation using Python libraries such as scikit-learn, XGBoost, and Optuna. It involves implementing various machine learning models, handling imbalanced data, and employing imputation techniques to enhance model performance for predicting cirrhosis outcomes.

data-analysis data-pre imbalanced-data imputation machine-learning optuna pipeline scikit-learn xgboost

Last synced: 14 May 2026

https://github.com/ricardorobledo/spamemailclassification

Spam email classification using machine learning (Random Forest, SVC, Logistic Regression, etc.) with data balancing techniques (SMOTE, BorderlineSMOTE, ADASYN). Final calibrated Random Forest model achieves ROC-AUC 0.982 and PR-AUC 0.979 on the Spam Email Classification dataset.

imbalanced-data imbalanced-learning numpy pandas python3 sklearn

Last synced: 05 Apr 2026

https://github.com/0zean/hellingerforest

A Python library built in Rust for implementing the Hellinger distance splitting criteria in a Random Forest Classifier to address imbalanced data. Work in progress.

decision-trees imbalanced-classification imbalanced-data random-forest-classifier

Last synced: 05 Jun 2026

https://github.com/mihirmakwana03/ci7521-cw1-notebook

Multi-class classification on imbalanced data — 8 sklearn classifiers + SMOTE + ROC-AUC benchmarking. Kingston CI7521 CW1.

classification hyperparameter-tuning imbalanced-data machine-learning scikit-learn smote

Last synced: 27 Apr 2026

https://github.com/louis-alexandre-laguet/rain-prediction-dl-ml

This project aims to predict rainfall using machine learning and deep learning models. It includes data analysis, preprocessing, and the application of algorithms like Logistic Regression, SVM, Random Forest, and deep learning models like LSTM for the Kaggle Rain Prediction Challenge.

classification data-preprocessing deep-learning hyperparameter-optimization imbalanced-data kaggle-challenge logistic-regression lstm machine-learning pytorch pytorch-lightning rain-prediction random-forest svm weather-prediction

Last synced: 28 Apr 2026

https://github.com/ammahmoudi/credit-risk-prediction

Predicting credit risk when a person requests for loan using random forest on south German dataset (fixing imbalanced data)

credit-risk imbalanced-data machine-learning ml random-forest

Last synced: 08 Jun 2026

https://github.com/shahzadmustafa15/credit-card-fraud-detection

Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.

classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn

Last synced: 29 Apr 2026

https://github.com/abdelrahman-amen/active_learning_using_imbalanced_dataset

This project explores active learning techniques, focusing on query strategies to optimize informative data selection for model training. It aims to reduce labeled data while improving model performance, especially with imbalanced datasets where certain classes are underrepresented.

activelearning bmi entropy imbalanced-data margin python querybycommittee smote uncertainty

Last synced: 01 May 2026

https://github.com/viniciusds2020/ml_balaceamento_allknn

Este repositório contém um código de Machine Learning que utiliza o algoritmo AllKNN do pacote imblearn para realizar o balanceamento de dados.

allknn imbalanced-data imblearn machine-learning sklearn

Last synced: 01 May 2026