Projects in Awesome Lists tagged with imbalanced-data
A curated list of projects in awesome lists tagged with imbalanced-data .
https://github.com/ufoym/imbalanced-dataset-sampler
A (PyTorch) imbalanced dataset sampler for oversampling low frequent classes and undersampling high frequent ones.
data-sampling image-classification imbalanced-data pytorch
Last synced: 14 May 2025
https://github.com/YyzHarry/imbalanced-regression
[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression
computer-vision healthcare icml icml-2021 imbalance imbalanced-classification imbalanced-data imbalanced-learning imbalanced-regression long-tail natural-language-processing regression
Last synced: 09 May 2025
https://github.com/YyzHarry/imbalanced-semi-self
[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning
class-imbalance imbalanced-classification imbalanced-data imbalanced-learning long-tail long-tailed-recognition neurips neurips-2020 self-supervised-learning semi-supervised-learning unlabeled-data
Last synced: 08 May 2025
https://github.com/analyticalmindsltd/smote_variants
A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features
imbalanced-data imbalanced-learning oversampling smote
Last synced: 21 Oct 2025
https://github.com/zhiningliu1998/imbalanced-ensemble
🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库
class-imbalance classification data-mining data-science ensemble ensemble-imbalanced-learning ensemble-learning ensemble-model imbalanced-classification imbalanced-data imbalanced-learning long-tail machine-learning multi-class-classification python python3 scikit-learn sklearn
Last synced: 15 May 2025
https://github.com/ZhiningLiu1998/imbalanced-ensemble
🛠️ Class-imbalanced Ensemble Learning Toolbox. | 类别不平衡/长尾机器学习库
class-imbalance classification data-mining data-science ensemble ensemble-imbalanced-learning ensemble-learning ensemble-model imbalanced-classification imbalanced-data imbalanced-learning long-tail machine-learning multi-class-classification python python3 scikit-learn sklearn
Last synced: 11 Apr 2025
https://github.com/30lm32/ml-projects
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
ab-testing deep-learning docker gensim geolocation imbalanced-data kdtree keras lstm-neural-networks machine-learning mlflow nlp random-forest spam-classification svm tensorboard tensorflow text-classification timeseries-analysis word2vec
Last synced: 08 May 2025
https://github.com/zhiningliu1998/self-paced-ensemble
[ICDE'20] ⚖️ A general, efficient ensemble framework for imbalanced classification. | 泛用,高效,鲁棒的类别不平衡学习框架
class-imbalance classification ensemble ensemble-learning ensemble-methods ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning machine-learning pypi python3
Last synced: 05 Apr 2025
https://github.com/dvlab-research/parametric-contrastive-learning
Parametric Contrastive Learning (ICCV2021) & GPaCo (TPAMI 2023)
class-imbalance contrastive-learning iccv2021 image-classification imagenet imbalanced-data imbalanced-learning long-tailed-recognition parametric-contrastive-learning pytorch supervised-contrastive-learning supervised-learning tpami
Last synced: 03 Jul 2025
https://github.com/jrzaurin/lightgbm-with-focal-loss
An implementation of the focal loss to be used with LightGBM for binary and multi-class classification problems
focal-loss imbalanced-data lightgbm python3
Last synced: 03 Oct 2025
https://github.com/dvlab-research/Parametric-Contrastive-Learning
Parametric Contrastive Learning (ICCV2021) & GPaCo (TPAMI 2023)
class-imbalance contrastive-learning iccv2021 image-classification imagenet imbalanced-data imbalanced-learning long-tailed-recognition parametric-contrastive-learning pytorch supervised-contrastive-learning supervised-learning tpami
Last synced: 08 May 2025
https://github.com/dialnd/imbalanced-algorithms
Python-based implementations of algorithms for learning on imbalanced data.
data-science imbalanced-data machine-learning notre-dame python
Last synced: 11 Apr 2025
https://github.com/solegalli/machine-learning-imbalanced-data
Code repository for the online course Machine Learning with Imbalanced Data
data-science imbalanced-classification imbalanced-data imbalanced-learning machine-learning python
Last synced: 16 May 2025
https://github.com/YyzHarry/multi-domain-imbalance
[ECCV 2022] Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization, and Beyond
deep-learning domain-adaptation domain-generalization eccv eccv-2022 imbalance imbalanced-classification imbalanced-data imbalanced-learning long-tail long-tailed-recognition multi-domain multi-domain-learning ood ood-generalization
Last synced: 08 May 2025
https://github.com/zhiningliu1998/mesa
[NeurIPS’20] ⚖️ Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
class-imbalance ensemble ensemble-machine-learning ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning mesa meta-learning-algorithms meta-sampler meta-training
Last synced: 01 Apr 2026
https://github.com/ZhiningLiu1998/mesa
[NeurIPS’20] ⚖️ Build powerful ensemble class-imbalanced learning models via meta-knowledge-powered resampler. | 设计元知识驱动的采样器解决类别不平衡问题
class-imbalance ensemble ensemble-machine-learning ensemble-model imbalance-classification imbalanced-data imbalanced-learn imbalanced-learning mesa meta-learning-algorithms meta-sampler meta-training
Last synced: 11 Apr 2025
https://github.com/ashishpatel26/datascienv
datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries
catboost data-science data-science-env datascienv imbalanced-data lightgbm matplotlib numpy pandas pycaret scikit-learn seaborn tensorflow2 xgboost
Last synced: 24 Oct 2025
https://github.com/tgsmith61591/skoot
A package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.
data-science imbalanced-data machine-learning pandas python scikit-learn skutil
Last synced: 11 Sep 2025
https://github.com/jiequancui/ResLT
ResLT: Residual Learning for Long-tailed Recognition (TPAMI 2022)
class-imbalance imbalance-classification imbalanced-data imbalanced-learning long-tail long-tailed-recognition
Last synced: 08 May 2025
https://github.com/wangz10/class_imbalance
Jupyter Notebook presentation for class imbalance in binary classification
classification imbalanced-data machine-learning tutorial
Last synced: 11 May 2025
https://github.com/borealisai/ranksim-imbalanced-regression
[ICML 2022] RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression
imbalanced-data imbalanced-learning imbalanced-regression regression
Last synced: 07 Apr 2025
https://github.com/priyavrat-misra/xrays-and-gradcam
Classification and Gradient-based Localization of Chest Radiographs using PyTorch.
cnn covid-19 deep-learning densenet121 early-stopping fine-tuning gradcam imbalanced-data localization oversampling pneumonia pytorch-implementation radiographs resnet18 transfer-learning vgg16 xrays
Last synced: 14 Apr 2025
https://github.com/ncordon/imbalance
binary-classification imbalanced-data oversampling r
Last synced: 13 Apr 2025
https://github.com/zhiningliu1998/bat
[ICML'24] BAT: 🚀 Boost Class-imbalanced Node Classification with <10 lines of Code | 从拓扑视角出发10行代码改善类别不平衡节点分类
data-augmentation graph-algorithms graph-machine-learning graph-mining imbalanced-data imbalanced-learning machine-learning node-classification
Last synced: 05 May 2025
https://github.com/lirongwu/graphmixup
Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"
graph-algorithms graph-self-supervised-learning imbalanced-classification imbalanced-data reinforcement-learning
Last synced: 27 Jul 2025
https://github.com/LirongWu/GraphMixup
Code for ECML-PKDD 2022 paper "GraphMixup: Improving Class-Imbalanced Node Classification by Reinforcement Mixup and Self-supervised Context Prediction"
graph-algorithms graph-self-supervised-learning imbalanced-classification imbalanced-data reinforcement-learning
Last synced: 15 Aug 2025
https://github.com/mdh266/textclassificationapp
Building and Deploying A Serverless Text Classification Web App
data-science docker document-classification fastapi imbalanced-data imbalanced-learning machine-learning naive-bayes natural-language-processing nlp nltk scikit-learn support-vector-machine text-classification
Last synced: 30 Jul 2025
https://github.com/kyosek/probability-calibration-imbalanced
This repository implements Pozzolo, et al., (2015)'s probability calibration for imbalanced data.
bayesian-methods classification creditcard-fraud imbalanced-data machine-learning probability-calibration
Last synced: 15 May 2025
https://github.com/pegah-ardehkhani/customer-churn-prediction-and-analysis
Analysis and Prediction of the Customer Churn Using Machine Learning Models (Highest Accuracy) and Plotly Library
accuracy churn-prediction classification classification-algorithm cross-validation customer-churn customer-churn-analysis customer-churn-prediction data-science feature-engineering feature-importance gridsearchcv imbalanced-data machine-learning machine-learning-algorithms plotly python roc-auc sklearn telco
Last synced: 29 Jun 2025
https://github.com/predict-idlab/headachedss
Repository containing all code and data required to reproduce the experiments of 'A decision support system to follow up and diagnose chronic primary headache patients using semantically enriched data'
decision-support-system imbalanced-data knowledge-based-systems unsupervised-features
Last synced: 07 Jul 2025
https://github.com/amajji/multi-class-classification
Deployment of a classification model on a webapp using FLASK for the backend and html/CSS/JS for frontend
analyse-data app classification data flask flask-application imbala imbalanced-classes imbalanced-classification imbalanced-data machine-learning machine-learning-algorithms preprocessing webapp webapplication
Last synced: 24 Sep 2025
https://github.com/sergio11/online_payment_fraud
Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. 🚨🤖 Complete process from EDA and data preprocessing to model training and evaluation. 📊🔍
classification data-preprocessing data-science deep-neural-networks dnn exploratory-data-analysis financial-fraud fraud-detection fraud-detection-model imbalanced-data keras machine-learning neural-network python smote tensorflow
Last synced: 17 Aug 2025
https://github.com/predict-idlab/tpehgdb-experiments
Experiments conducted on the TPEHGDB dataset to reproduce the reported results from "A critical look at studies applying over-sampling on the TPEHGDB dataset"
data-leakage imbalanced-data oversampling tpehgdb-dataset
Last synced: 07 Jul 2025
https://github.com/mariomarroquim/eplogic
Scripts for Anti-HLA antibody target prediction via machine learning
cross-validation hla-eplets hla-matching imbalanced-data immunoinformatics immunology machine-learning scikit-learn single-antigen-beads
Last synced: 11 Nov 2025
https://github.com/majobasgall/smote-mr
SMOTE-MR: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data which applies a MapReduce based-approach. SMOTE-MR is categorized as an `approximated/ non exact` solution. Also, there is an `exact` solution called SMOTE-BD written by the author (See: https://github.com/majobasgall/smote-bd)
big-data imbalanced-data machile-learning scala smote spark
Last synced: 17 May 2026
https://github.com/oyebamiji-micheal/employee-status-prediction-web-app
A machine learning web app which predicts whether an employee gets promoted or not
imbalanced-data machine-learning random-forest streamlit xgboost
Last synced: 20 May 2026
https://github.com/splch/qbs
An effective and flexible Quantile-Based Balanced Sampling algorithm for addressing class imbalance in datasets while preserving the underlying data distribution, improving model performance across various machine learning applications.
classification data-analysis imbalanced-classification imbalanced-data machine-learning resampling
Last synced: 01 Apr 2025
https://github.com/amirreza81/applied-data-science-course
Comprehensive notes, practical exercises, and problem-solving solutions from the Applied Data Science course, covering data preprocessing, machine learning algorithms, statistical analysis, data visualization, and real-world applications.
accuracy-measure boosting classification data-cleaning data-preprocessing data-science data-visualisation deep-learning dimensionality-reduction eda feature-engineering image-classification imbalanced-data kaggle-dataset machine-learning multiclass-classification pandas regression scikit-learn stroke-prediction
Last synced: 22 Mar 2025
https://github.com/solegalli/imbalanced-data-myths-mistakes-solutions
Code repository for the book "Imbalanced Data: Myths, Mistakes and Modern Solutions".
cost-sensitive-learning data-preparation data-preprocessing data-science imbalanced-classification imbalanced-data imbalanced-learning imblearn machine-learning machine-learning-algorithms python scikit-learn
Last synced: 22 Jun 2026
https://github.com/sushant1827/machine-learning-for-predictive-maintenance
Demonstrate the application of machine learning on a real-world predictive maintenance dataset, using measurements from actual industrial equipment.
binary-classification classification-report confusion-matrix data-imbalance data-visualization decision-tree-classifier exploratory-data-analysis feature-engineering feature-importance feature-selection gradient-boosting-classifier imbalanced-data multi-class-classification roc-auc-curve
Last synced: 12 Apr 2025
https://github.com/sushantdhumak/credit-card-fraud-detection
Demonstrates the use of ML for Anomaly Detection for Credit Card Transactions: Identifying Fraudulent Activity using Imbalanced Data
anamoly-detection correlation-analysis data-scaling data-visualization decision-tree-classifier exploratory-data-analysis gridsearchcv imbalanced-data knn-classifier logistic-regression near-miss outlier-detection outlier-removal precision-recall-curve random-forest-classifier roc-auc roc-auc-curve svc under-sampling
Last synced: 26 Mar 2025
https://github.com/cyprianfusi/frauddetectionmodel-with-gretl
With this model: the amount of backlog would be reduced significantly, the amount of staff needed to do the job would be reduced drastically, the processing time would be shortened significantly and more cases of fraudulent transactions would be tracked down in a given amount of data processed - more than 40% increase in efficiency!
adaboostclassifier backlog cap-curve imbalanced-data logistic-regression
Last synced: 21 Mar 2025
https://github.com/sushantdhumak/machine-learning-for-predictive-maintenance
Demonstrate the application of machine learning on a real-world predictive maintenance dataset, using measurements from actual industrial equipment.
binary-classification classification-report confusion-matrix data-imbalance data-visualization decision-tree-classifier exploratory-data-analysis feature-engineering feature-importance feature-selection gradient-boosting-classifier imbalanced-data multi-class-classification roc-auc-curve
Last synced: 26 Mar 2025
https://github.com/sushant1827/credit-card-fraud-detection
Demonstrates the use of ML for Anomaly Detection for Credit Card Transactions: Identifying Fraudulent Activity using Imbalanced Data
anamoly-detection correlation-analysis data-scaling data-visualization decision-tree-classifier exploratory-data-analysis gridsearchcv imbalanced-data knn-classifier logistic-regression near-miss outlier-detection outlier-removal precision-recall-curve random-forest-classifier roc-auc roc-auc-curve svc under-sampling
Last synced: 15 Feb 2026
https://github.com/debugger404/imbalanced-classification
Imbalanced Data Classification Repository - 📦🤖 Code for classifying products into categories using deep learning. Divided into dataset creation, model development, and transfer learning sections. Implements TensorFlow for efficient training, tackles imbalanced classes, and includes saved models and one-hot encoded labels.
classification deep-learning imbalanced-data one-hot-encoding python tensoflow
Last synced: 08 Apr 2025
https://github.com/ekellbuch/longtail_ensembles
Evaluating ensemble performance in long-tailed datasets (Neurips 2023 Heavy Tails Workshop)
class-imbalance ensemble-learning fairness-ml imbalanced-classes imbalanced-classification imbalanced-data imbalanced-learning
Last synced: 15 May 2026
https://github.com/amr-yasser226/intrusion-detection-kaggle
End-to-end pipeline for multi-class cyber-attack detection using per-flow network features: data profiling, deduplication, skew-correction, outlier treatment, feature engineering, imbalance handling, and tree-based modeling (XGBoost, LightGBM, CatBoost, stacking), with a final Kaggle submission scoring 91.46% public / 91.63% private.
catboost cyber-security data-preprocessing ensemble-learning feature-engineering imbalanced-data jupyter-notebooks kaggle lightgbm machine-learning outlier-detection random-forest xgboost
Last synced: 18 May 2026
https://github.com/musadiqpasha/imbalance-learning
To tackle imbalanced classification in fake job detection dataset using various resampling techniques and model evaluations.
classification imbalanced-data ipython-notebook resampling smote-sampling
Last synced: 09 May 2025
https://github.com/aman5319/bank-marketing-analysis
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
bank-marketing-analysis desiciontree imbalanced-data logistic-regression roc-auc sklearn smote
Last synced: 06 May 2026
https://github.com/vbabashov/loan-classification
Loan Eligibility - Classification (Python)
classification imbalanced-data lgbmclassifier logistic-regression random-forest-classifier xgboost-classifier
Last synced: 18 Sep 2025
https://github.com/quantumcoderrr/credit-card-fraud-detection
💴 A machine learning project that detects fraudulent credit card transactions using classification algorithms. Includes data preprocessing, EDA, model training & evaluation with techniques like Random Forest, Logistic Regression, and SMOTE for class imbalance. Built for secure financial insights and real-world fraud detection use cases.
classification credit-card data-science financial-security fraud-detection imbalanced-data logistic-regression machine-learning
Last synced: 30 Aug 2025
https://github.com/mohitgupta0123/fraud_detection_mlops
End-to-end Fraud Detection MLOps pipeline integrating MLflow, FastAPI, Streamlit, Docker, Kubernetes, Prometheus, and Grafana for real-time fraud prediction, experiment tracking, and monitoring.
anomaly-detection docker end-to-end-pipeline fastapi fraud-detection grafana imbalanced-data kubernetes machine-learning mlflow mlops-project prometheus python streamlit
Last synced: 11 Apr 2026
https://github.com/lazerlambda/team09applieddl
Model Distillation for Unlabeled and Imbalanced Data for Amino-Acid-Strings
data-science deep-learning distillation imbalanced-data lmu munich python statistics transformer unlabeled-data
Last synced: 25 Mar 2025
https://github.com/pydevcasts/churn_modeling_article
customer churn prediction system for banking institutions using advanced feature engineering and ensemble learning techniques. The model addresses highly imbalanced datasets (10:1 ratio) by combining SMOTE oversampling with a Soft Voting Classifier (Random Forest, Gradient Boosting, and XGBoost)
customer-churn-prediction ensemble-learning imbalanced-data smote
Last synced: 30 May 2026
https://github.com/omidghadami95/efficientnetv2_catvsdog
Binary classification, SHAP (Explainable Artificial Intelligence), and Grid Search (for tuning hyperparameters) using EfficientNetV2-B0 on Cat VS Dog dataset.
binary binary-classification catvsdog catvsdog-classifier deep-learning efficientnet efficientnetv2 efficientnetv2-b0 explainable-ai explainable-ml fairness-ai fairness-ml gridsearch imbalanced-data imbalanced-dataset keras shap tensorflow2
Last synced: 17 Apr 2026
https://github.com/kyosek/analyzing-online-prices-by-using-machine-learning-techniques-analysis
Analyzing Online Prices by Using Machine Learning Techniques (master thesis) - Analysis part source code
classification imbalanced-data logistic-regression price-changes
Last synced: 29 Jul 2025
https://github.com/rajireddy15/employee-attrition-prediction-hr-analytics-
Employee Attrition Prediction (HR Analytics) helps organizations analyze employee data, identify factors driving turnover, and predict attrition using machine learning and visual dashboards, enabling data-driven HR decisions and retention strategies.
data-cleaning data-collection data-manipulation data-preprocessing data-science data-visualization eda feature-engineering imbalanced-data machine-learning mysql-database numpy pandas
Last synced: 04 May 2026
https://github.com/das-amlan/customer-churn-prediction
Predicting customer churn using machine learning algorithms
customer-churn-prediction imbalanced-data keras-tensorflow machine-learning pandas prediction-model python scikit-learn seaborn tensorflow
Last synced: 11 Apr 2026
https://github.com/khyatimahendru/balancing-data-smote
In this project, I have worked on the problem of Credit Card Fraud Detection. The data is highly imbalanced with the positive class (fraud) accounting merely for 0.172% of the data. In classification problem balancing your data is extremely important. Here I have described how accuracy should not be the only criteria to judge model performance.
classification classification-algorithims data-science imbalanced-data machine-learning smote
Last synced: 07 Apr 2025
https://github.com/mohamedlotfy989/credit-card-fraud-detection
This repository focuses on credit card fraud detection using machine learning models, addressing class imbalance with SMOTE & undersampling, and optimizing performance via Grid Search & RandomizedSearchCV. It explores Logistic Regression, Random Forest, Voting Classifier, and XGBoost. balancing precision-recall trade-offs for fraud detection.
classic-machine-learning credit-card-fraud ensemble-learning fraud-detection grid-search-hyperparameters hyperparameter-tuning imbalanced-data logistic-regression precision-recall random-forest randomizedsearchcv smote threshold-tuning undersampling voting-classifier xgboost
Last synced: 19 Oct 2025
https://github.com/mxagar/advanced_ml_techniques
This repository collects material, guides and links related to intermediate-advanced techniques used in professional machine learning.
feature-engineering feature-selection hyperparameter-optimization imbalanced-data machine-learning
Last synced: 03 Feb 2026
https://github.com/dennismgoetz/dac
"Data Analytics Challenge" course at the Catholic University of Eichstätt-Ingolstadt
asn-smote credit-fraud imbalanced-data oversampling-technique smote
Last synced: 16 May 2025
https://github.com/rakibhhridoy/handlingimbalanceddataset-business
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase but by others illegally. Some huge transactions can also done by suspicious figure, it need to catch em.
auc business-intelligence fraud-detection imbalanced-data imbalanced-learning machine-learning oversampling precision recall smote transcations
Last synced: 14 May 2025
https://github.com/muhammedhossam/pneumonia-detection-
A deep learning-based project for classifying chest X-ray images to detect pneumonia
binary-classification cnn computer-vision data-preprocessing deep-learning image-classification imbalanced-data keras machine-learning medical-imaging pneumonia-detection tensorflow xray-images
Last synced: 10 May 2026
https://github.com/christopher-w-murphy/class-imbalance-in-ww-polarization
Treating the measurement of the same-sign W polarization fraction as a class imbalance problem
high-energy-physics imbalanced-data keras maximum-likelihood-estimation
Last synced: 28 May 2026
https://github.com/kyosek/analyzing-online-prices-by-using-machine-learning-techniques-ml
Analyzing Online Prices by Using Machine Learning Techniques (master thesis) - Analysis part source code
imbalanced-data machine-learning price-changes
Last synced: 21 Jul 2025
https://github.com/andreazoccatelli/tabular_data_augmentation_continuous
This repository contains the scripts used to write my master degree thesis project: "Augmentation of tabular data with continuous features for binary imbalanced classification problems"
cgan copula data-augmentation imbalanced-classification imbalanced-data imbalanced-learning
Last synced: 21 Apr 2026
https://github.com/konnik88/heart-disease-ml-practice
Practice notebook on heart-disease risk with a small/noisy dataset: EDA → preprocessing → classic ML baselines (scikit-learn). Not for clinical use
classification eda healthcare heart-disease imbalanced-data jupyter-notebook machine-learning model-evaluation optuna reproducibility scikit-learn
Last synced: 18 May 2026
https://github.com/aryanpillai2007/credit-card-fraud-detection
The primary goal of this project is to develop a comprehensive fraud detection system that enhances the security and trustworthiness of financial transactions.
anomaly-detection classification credit-card-fraud data-preprocessing data-science data-visualization fraud-detection imbalanced-data logistic-regression machine-learning outlier-detection pca pca-analysis python roc-curve scikit-learn
Last synced: 18 May 2026
https://github.com/filipusarif/imbalance-data-svm-python
Working with Imbalance Dataset for classification using SVM model
classification-algorithm imbalanced-data jupyter-notebook machine-learning-algorithms machine-learning-models
Last synced: 04 Sep 2025
https://github.com/jamnicki/yoga-pose-ensemble-clf
University course project: Machine learning -- Ensemble learning strategies comparison on classification quality for imbalanced data
ensemble-learning imbalanced-data machine-learning multi-class-classification
Last synced: 07 Apr 2025
https://github.com/zuzann18/credit-risk-classification
End-to-end machine learning project for predicting loan defaults on the HMEQ home equity loan dataset. Includes data preprocessing, EDA, feature engineering, model training (Logistic Regression, Random Forest, XGBoost), hyperparameter tuning, model comparison, SHAP-based interpretability, and business recommendations
imbalanced-data logistic-regression mice random-forest-classifier shap smote xgboost
Last synced: 04 Feb 2026
https://github.com/macdon112/credit-card-fraud-detection
Comparing ML models (Random Forest, KNN, Decision Tree) for credit card fraud detection using SMOTE and stratified cross-validation.
classification data-analysis fraud-detection imbalanced-data machine-learning python scikit-learn
Last synced: 10 May 2026
https://github.com/pdoup/atml-notebooks
Proposed assignment notebooks for Advanced Topics in Machine Learning tasks
active-learning cost-sensitive-learning imbalanced-data machine-learning multi-instance-learning multi-label-classification numpy scikit-learn
Last synced: 11 May 2026
https://github.com/obirikan/ml_model_fraud_detection
This project demonstrates how to use Logistic Regression to detect fraudulent transactions using SMOTE for an imbalanced data
imbalanced-data logistic-regression smote-oversampler
Last synced: 18 Jul 2025
https://github.com/rubyyy1118/nlp_spam_detection_and_text_generation
The Natural Language Analysis - Assignment in my MSc Business Analytics course
data-preprocessing deep-learning imbalanced-data long-short-term-memory-models machine-learning natural-language-processing python recurrent-neural-networks spam-detection support-vector-machines text-classification text-generation tf-idf word2vec
Last synced: 17 May 2026
https://github.com/ricardorobledo/malicious_server_hack_detection
Predictive model to detect malicious hacking patterns in banking servers. Utilizes advanced Machine Learning techniques such as SMOTE, Gradient Boosting, and probability calibration to predict attacks befor
anaconda cibersecurity imbalanced-data imbalanced-learning imblearn kaggle matplotlib numpy pandas pandas-library python3 sklearn
Last synced: 14 Apr 2026
https://github.com/steviecurran/fraud-detection-ml
Fraud Detection with Machine Learning
ab-testing confusionmatrix fraud-detection imbalanced-data machine-learning matplotlib pandas python recall-precision skit-learn
Last synced: 31 May 2026
https://github.com/halacoded/bodyperformance_imbalanced
exploring Imbalanced classification and Techniques to Handle Imbalance.
coded imbalanced-classification imbalanced-data machine-learning
Last synced: 21 Jun 2025
https://github.com/mindful-ai-assistants/2-social-buzz-ai-gboost-and-lowdefault-modeling
2-Gradient Boosting Machines and Low-Default Modeling: A repository for research, implementation, and best practices with Gradient Boosting methods (GBM, XGBoost, LightGBM), H2O AutoML, and robust strategies for modeling extreme class imbalance ("Low Default") in data science for finance and risk.
anomaly-detection auto-machine-learning credit-risk disease-prediction financia-lmodeling fraud-detection gbm gradientboosting h2o imbalanced-data lightgbm low-default-modeling machinelearning model-interpretability natural-language-processing oneness-consciousness randomforest risk-analytics smote xgboost
Last synced: 09 Oct 2025
https://github.com/miozilla/conmatrix
conmatrix :1234::alien::checkered_flag: : Confusion Matrix # Data Imbalance # Evaluation # Weights & Biases
accuracy azureml bias binary-classification classification-model confusion-matrix dataset evaluate false-positive-rate fn fp imbalanced-data precision recall roc sensitivity specificity tn tp weight
Last synced: 10 Oct 2025
https://github.com/ahsankhizar5/fraud-detection-ml-pipeline
A complete machine learning pipeline for fraud detection using balanced classification, feature engineering, and ROC-based evaluation.
anomaly-detection classification data-science fraud-detection imbalanced-data ml-pipeline python random-forest
Last synced: 31 May 2026
https://github.com/desininja/employee-attrition-analysis
To know the main reasons for attrition of employees.
attrition classification classifier data-analytics data-science data-visualization hr hr-analytics imbalanced-data logistic-regression smote smote-sampling undersampling
Last synced: 16 Oct 2025
https://github.com/omidghadami95/metaphor-detection-cnn-lstm
Metaphor detection using cnn lstm
bidirectional-lstm cnn dynamic-learning-rate imbalanced-data lemmatization lstm lstm-cnn metaphor metaphor-detection natural-language-processing nlp tokenization
Last synced: 19 Oct 2025
https://github.com/msikorski93/retinal-vessel-segmentation-using-deep-learning
Retinal vessel segmentation is the task of segmenting vessels in retina imagery. This binary task was performed with a U-Net network.
computer-vision efficientnet healthcare imbalanced-data keras retinal-images retinal-vessel-segmentation segmentation-models tensorflow u-net
Last synced: 14 May 2026
https://github.com/trilokida/credit-card-fraud-detection
Synthetic Financial Datasets For Fraud Detection
classification credit-card credit-card-fraud downsampling ensemble-machine-learning fraud-detection imbalanced-data imblearn random-forest synthetic-data upsampling xgb xgboost
Last synced: 16 Apr 2026
https://github.com/leabrodyheine/ml-kaggle-cirrhosis-data
This project showcases skills in machine learning, data preprocessing, and model evaluation using Python libraries such as scikit-learn, XGBoost, and Optuna. It involves implementing various machine learning models, handling imbalanced data, and employing imputation techniques to enhance model performance for predicting cirrhosis outcomes.
data-analysis data-pre imbalanced-data imputation machine-learning optuna pipeline scikit-learn xgboost
Last synced: 14 May 2026
https://github.com/ricardorobledo/spamemailclassification
Spam email classification using machine learning (Random Forest, SVC, Logistic Regression, etc.) with data balancing techniques (SMOTE, BorderlineSMOTE, ADASYN). Final calibrated Random Forest model achieves ROC-AUC 0.982 and PR-AUC 0.979 on the Spam Email Classification dataset.
imbalanced-data imbalanced-learning numpy pandas python3 sklearn
Last synced: 05 Apr 2026
https://github.com/0zean/hellingerforest
A Python library built in Rust for implementing the Hellinger distance splitting criteria in a Random Forest Classifier to address imbalanced data. Work in progress.
decision-trees imbalanced-classification imbalanced-data random-forest-classifier
Last synced: 05 Jun 2026
https://github.com/muhakbarhamid21/student-dropout-prediction
Prediksi risiko dropout mahasiswa & dashboard interaktif performa akademik.
catboost dashboard data-science imbalanced-data looker-studio machine-learning prediction streamlit student-dropout
Last synced: 24 Apr 2026
https://github.com/mihirmakwana03/ci7521-cw1-notebook
Multi-class classification on imbalanced data — 8 sklearn classifiers + SMOTE + ROC-AUC benchmarking. Kingston CI7521 CW1.
classification hyperparameter-tuning imbalanced-data machine-learning scikit-learn smote
Last synced: 27 Apr 2026
https://github.com/louis-alexandre-laguet/rain-prediction-dl-ml
This project aims to predict rainfall using machine learning and deep learning models. It includes data analysis, preprocessing, and the application of algorithms like Logistic Regression, SVM, Random Forest, and deep learning models like LSTM for the Kaggle Rain Prediction Challenge.
classification data-preprocessing deep-learning hyperparameter-optimization imbalanced-data kaggle-challenge logistic-regression lstm machine-learning pytorch pytorch-lightning rain-prediction random-forest svm weather-prediction
Last synced: 28 Apr 2026
https://github.com/ammahmoudi/credit-risk-prediction
Predicting credit risk when a person requests for loan using random forest on south German dataset (fixing imbalanced data)
credit-risk imbalanced-data machine-learning ml random-forest
Last synced: 08 Jun 2026
https://github.com/shahzadmustafa15/credit-card-fraud-detection
Credit card fraud detection using Random Forest with Stratified K-Fold cross-validation and F1-score evaluation.
classification confusion cross-validation f1-score fraud-detection imbalanced-data kaggle machine-learning python random-forest scikit-learn
Last synced: 29 Apr 2026
https://github.com/abdelrahman-amen/active_learning_using_imbalanced_dataset
This project explores active learning techniques, focusing on query strategies to optimize informative data selection for model training. It aims to reduce labeled data while improving model performance, especially with imbalanced datasets where certain classes are underrepresented.
activelearning bmi entropy imbalanced-data margin python querybycommittee smote uncertainty
Last synced: 01 May 2026
https://github.com/viniciusds2020/ml_balaceamento_allknn
Este repositório contém um código de Machine Learning que utiliza o algoritmo AllKNN do pacote imblearn para realizar o balanceamento de dados.
allknn imbalanced-data imblearn machine-learning sklearn
Last synced: 01 May 2026