Projects in Awesome Lists tagged with smote
A curated list of projects in awesome lists tagged with smote .
https://github.com/analyticalmindsltd/smote_variants
A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features
imbalanced-data imbalanced-learning oversampling smote
Last synced: 11 Apr 2025
https://github.com/tgsmith61591/smrt
Handle class imbalance intelligently by using variational auto-encoders to generate synthetic observations of your minority class.
autoencoder class-imbalance machine-learning neural-networks smote tensorflow vae variational-autoencoder
Last synced: 13 Apr 2025
https://github.com/bhattbhavesh91/imbalance_class_sklearn
Address imbalance classes in machine learning projects.
class-imbalance classification-algorithm machine-learning python smote
Last synced: 17 Apr 2025
https://github.com/bcbi/classimbalance.jl
Sampling-based methods for correcting for class imbalance in two-category classification problems
Last synced: 14 Apr 2025
https://github.com/rikhuijzer/resample.jl
An implementation of SMOTE
julia julia-language smote smote-sampling upsampling
Last synced: 13 Feb 2025
https://github.com/desininja/voice-disorder
Data Science project. ML algorithms to detect voice disorders.
accuracy algorithms classification classification-algorithm classifier classifier-model data data-extraction data-mining data-science health machine-learning smote voice-disorder
Last synced: 30 Apr 2025
https://github.com/chaitanyac22/fraud_analytics_credit_card_fraud_detection
The aim of this project is to predict fraudulent credit card transactions with the help of different machine learning models.
adasyn banking credit-card-fraud-detection data-analysis decision-tree-classifier fraud-analytics hyperparameter-optimization hyperparameter-tuning imblearn kneighborsclassifier logistic-regression machine-learning-algorithms oversampling pipelines power-transformers random-forest-classifier randomoversampler smote svm-classifier xgboost-classifier
Last synced: 13 Apr 2025
https://github.com/hmjianggatech/esmote
ESmote - An R package implemneting fast SMOTE algorithm
imbalanced-learning machine-learning r smote
Last synced: 26 Mar 2025
https://github.com/adamouization/seal-pup-aerial-imagery-neural-network-classifier
:mag_right: Machine Learning & Data Visualisation/Processing techniques for classifying seal pups from aerial imagery using Neural Networks.
classification classification-model jupyter jupyter-lab logistic-regression machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot numpy pandas python python3 scikit-learn scikitlearn-machine-learning scipy seaborn smote svm
Last synced: 11 Apr 2025
https://github.com/majobasgall/smote-mr
SMOTE-MR: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data which applies a MapReduce based-approach. SMOTE-MR is categorized as an `approximated/ non exact` solution. Also, there is an `exact` solution called SMOTE-BD written by the author (See: https://github.com/majobasgall/smote-bd)
big-data imbalanced-data machile-learning scala smote spark
Last synced: 26 Feb 2025
https://github.com/pavankethavath/microsoft-classifying-cybersecurity-incidents-with-ml
A machine learning pipeline for classifying cybersecurity incidents as True Positive(TP), Benign Positive(BP), or False Positive(FP) using the Microsoft GUIDE dataset. Features advanced preprocessing, XGBoost optimization, SMOTE, SHAP analysis, and deployment-ready models. Tools: Python, scikit-learn, XGBoost, LightGBM, SHAP and imbalanced-learn
classificationreport correlation-analysis dataanalysis decision-tree-classifier exploratory-data-analysis feature-engineering feature-selection gradientboosting hyperparameter-tuning joblib lgbmclassifier logistic-regression machine-learning modelselection pandas randomforestclassifier randomsearchcv shap smote xgboost-classifier
Last synced: 23 Apr 2025
https://github.com/georgedouzas/imbalanced-learn-extra
Implementation of novel oversampling algorithms.
clustering-base-oversampling data-science geometric-smote geometric-somo imbalanced-data imbalanced-learn imbalanced-learning kmeans-smote machine-learning oversampling python scikit-learn smote somo
Last synced: 24 Mar 2025
https://github.com/ugurcanerdogan/cross-validation-with-imbalanced-dataset
BBM467*SDSP - Small Data Science Project - Things to consider in cross validation and resampling when dealing with Imbalanced Data : What is the right way?
bbm467 cross-validation data data-science kfold-cross-validation logistic-regression machine-learning oversampling sdsp smote
Last synced: 06 Apr 2025
https://github.com/seahrh/bad-renter
Working examples of Spark ML Pipeline and SMOTE algorithm for synthetic data augmentation
Last synced: 29 Mar 2025
https://github.com/dimitriskatos/health_stroke_prediction
Prediction possible strokes using RandomForest and PCA
cleaning-data pca prediction random-forest-classifier smote
Last synced: 06 Apr 2025
https://github.com/rohitpawar001/machine_learning
This repository contains all the machine learning algorithms and the ml concepts.
classification hyperparameter-tuning linear-regression logistic-regression machine-learning naive-bayes numpy pandas python regression scikit-learn smote svm
Last synced: 09 Feb 2025
https://github.com/chaitanyac22/telecom-churn-prediction
In this project, data analytics is used to analyze customer-level data of a leading telecom firm, build predictive models to identify customers at high risk of churn, and identify the main indicators of churn. The project focuses on a four-month window, wherein the first two months are the ‘good’ phase, the third month is the ‘action’ phase, while the fourth month is the ‘churn’ phase. The business objective is to predict the churn in the last i.e. fourth month using the data from the first three months.
class-imbalance classification data-analytics data-cleaning data-manipulation evaluation-metrics feature-engineering hyperparameter-tuning logistic-regression machine-learning model-building model-evaluation over-sampling pca random-forest-classifier rfe smote statistics telecom xgboost
Last synced: 27 Mar 2025
https://github.com/yessasvini23/hugging-face-agentic-ai-cerification
Agentic AI is the future, and I'm excited to be part of it. This course gave me a strong foundation in building AI agents,
ai aiagentic deep-learning huggingface smote
Last synced: 14 Mar 2025
https://github.com/jesly-joji/spam-ham-classifier-with-handing-of-data-imbalance
Spam/Ham Classifier with SMOTE Technique
naive-bayes-classifier nlp smote streamlit
Last synced: 02 Mar 2025
https://github.com/aman5319/bank-marketing-analysis
The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
bank-marketing-analysis desiciontree imbalanced-data logistic-regression roc-auc sklearn smote
Last synced: 29 Mar 2025
https://github.com/rakshit-vasava/predictive-analytics-for-insurance-purchase
Predicting customer insurance purchases using stacking models and SMOTE for the Homesite Quote Conversion Problem on Kaggle.
k-nearest-neighbours kaggle-competition multilayer-perceptron python random-forest scikit-learn smote support-vector-machines
Last synced: 05 Apr 2025
https://github.com/santi-souza/stroke-eda-ml
Stroke: Statistical analysis of risk factors and creation of predictive models using machine learning
confusion-matrix cross-validation crossvalidation flexdashboard ggplot2 gradient-boosting logistic-regression machine-learning plotly random-forest rmarkdown rstudio smote statistics stroke stroke-prediction
Last synced: 07 Apr 2025
https://github.com/aishwaryagm1999/aircraft-network-security-using-yara-rules-and-machine-learning-for-threat-detection-and-prevention
This project addresses cybersecurity in aviation by developing a machine learning-enhanced intrusion detection and prevention system (IDPS) for aircraft networks. Combining YARA-based signature detection with behavior-based (ML) anomaly detection, the system mitigates cyber threats in real-time, protecting aircraft from sophisticated attacks.
argus automation cybersecurity feature-hashing idps joblib machine-learning matplotlib networking numpy pandas python random-forest-classifier requests seaborn shell smote smotesmote tshark yara
Last synced: 08 Apr 2025
https://github.com/mahnoorsheikh16/credit-card-default-prediction
This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.
encoding hiplot imblearn json knn-imputer logistic-regression matplotlib numpy pandas pca-analysis plotly scipy seaborn sklearn smote streamlit support-vector-machines timeseries-forecasting visualization xgboost-classifier
Last synced: 22 Mar 2025
https://github.com/tlapanco/knn-project
Projecto para la materia de Sistemas inteligentes haciendo uso de KNN oversampling.
jupyter-notebook knn pandas python scikit-learn smote
Last synced: 17 Mar 2025
https://github.com/shwetapardhi/project-bankruptcy_prediction
Using various machine learning models (Logistic Regression, Gaussian Naïve Bayes, KNN, Gradient Boosting Classifier, Decision Tree Classifier, Random Forest Classifier.) to predict whether a company will go bankrupt in the following years, based on financial attributes of the company; Addressed the issue of imbalanced classes, different importance
decision-tree-classifier eda gaussian-naive-bayes gradient-boosting-classifier knn logistic-regression python random-forest-classifier smote
Last synced: 27 Feb 2025
https://github.com/mahnoorsheikh16/Credit-Card-Default-Prediction
This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.
chi-square-test encoding hiplot imblearn json knn-imputer matplotlib numpy pandas pca-analysis pillow plotly robust-scalar scipy seaborn sklearn smote streamlit ttest visualization
Last synced: 01 Mar 2025
https://github.com/tanyachutani/credit-card-fraud-detection
Applied undersampling and oversampling using SMOTE.
credit-card-fraud-detection data-imbalance fraud-detection machine-learning oversampling smote undersampling
Last synced: 27 Feb 2025
https://github.com/antoniof1704/imbalanced-classification-example
An example of a model I built where the dataset contained a very imbalanced class. Due to data governance rules, I have replaced the original dataset used in the modelling with a credit card fraud dataset from Kaggle.
fraud-detection imbalanced-classification jupiter-notebook modelling smote
Last synced: 01 Mar 2025
https://github.com/das-amlan/credit-card-fraud-detection-using-ml-and-different-class-imbalance-handling-approaches
A machine learning-based system for detecting fraudulent credit card transactions, leveraging advanced techniques like SMOTE, under-sampling, and anomaly detection models. Built with Random Forest, XGBoost, and SVM for robust performance on imbalanced datasets.
isolation-forest pyhton random-forest smote svm xgboost
Last synced: 03 Apr 2025
https://github.com/seahrh/spark-util
Utility for common use cases and bug workarounds in Apache Spark 2
Last synced: 24 Mar 2025
https://github.com/jiteshshelke/instrument-reviews-sentiment-analysis
🎵🔍 "A Flask-based Sentiment Analysis web app 🎭 that classifies instrument reviews 📝 using NLP 🤖 and Machine Learning 📊."
data-science flask instruments-review logistic-regression machine-learning nlp python sentiment-analysis smote text-classification tf-idf webapplication
Last synced: 31 Mar 2025
https://github.com/havva-nur-ezginci/diabetes-eyedisease-detection-ml-dl
💻This project involves the application of machine learning and deep learning methods 📊 to detect diabetes🩸 and diabetes-related eye 👁️diseases.
adagrad adam-optimizer adasyn decision-tree deep-learning gaussian-naive-bayes google-colab-gpu kaggle kneighborsclassifier logistic-regression machine-learning mlpclassifier perceptron random-forest-classifier sgd-optimizer smote support-vector-classifier vgg16 vgg19
Last synced: 06 Apr 2025
https://github.com/mnlscn/predicting-modelling-ingbank
smote meets uplift modeling
oversa r smote uplift-modeling
Last synced: 05 Apr 2025
https://github.com/sambhu431/xray-pneumonia-classification-cnn-project
This project utilizes (CNN) to accurately classify X-Ray images for pneumonia detection. It explores three different approaches to handle data imbalance and achieve optimal model performance. The project includes detailed evaluation metrics and use Streamlit which enables a seamless classification.
cnn cnn-model cv2 deep-learning keras keras-tensorflow penumonia pneumonia-classification smote streamlit tensorflow xray xray-images
Last synced: 28 Mar 2025
https://github.com/dvarshith/transaction-fraud-detection
Machine Learning pipeline for financial transaction fraud detection. Incorporates SMOTE, ensemble models, neural networks.
arizona-state-university autoencoder catboost data-mining ensemble-learning financial-security fraud-detection imbalanced-learning kaggle lightbgm machine-learning neural-networks python smote xgboost
Last synced: 28 Feb 2025
https://github.com/sarahloree/project-3--credit-card-user-churn-prediction
This is the third project I completed as part of the Advanced Machine Learning module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.
bagging bagging-classifier boosting boosting-classifier cross-validation datapreprocessing eda exploratory-data-analysis hyperparameter-optimization hyperparameter-tuning random-forest random-forest-classifier sampling smote
Last synced: 24 Feb 2025
https://github.com/jbalooshie/credit_risk_analysis
Testing various supervised machine learning models to predict a loan applicant's credit risk.
balanced-random-forest clustercentroids credit-risk easyensembleclassifier randomoversampler smote smoteenn
Last synced: 09 Apr 2025
https://github.com/sambhu431/nlp-resume-classification
The project leveraged advanced NLP techniques to accurately classify resume catehories with high precision and recall. Includes a Streamlit interface for seamless resume uploads and predictions. Built to handle edge cases like invalid inputs and out-of-dataset values.
logistic-regression machine-learning nlp nlp-machine-learning nlpproject nltk recomecategorization resumeclassifier smote streamlit
Last synced: 29 Mar 2025
https://github.com/mbappeenjoyer/creditcarddefault_prediction
Credit card defaulter prediction using data science techniques
catboost-classifier datascience optuna smote
Last synced: 20 Feb 2025
https://github.com/mariam-zaidi/monkey-pox_detection-thesis
M-Pox detection using deep learning models and eXplainable AI
cnn deep-learning explainable-ai layerwiserelevancepropogation medical-imaging multi-class-classification smote transfer-learning
Last synced: 28 Feb 2025
https://github.com/robinmillford/predicting-diabetes-a-machine-learning-approach-to-early-intervention
The goal of this project was to develop a predictive model for diabetes using a dataset containing various health-related features
data-analysis data-science diabetes-prediction jupyter-notebook machine-learning smote
Last synced: 11 Mar 2025
https://github.com/pjaiswalusf/stroke-prediction
This project leverages machine learning to predict stroke risk using XGBoost, Random Forest, and Logistic Regression. It incorporates advanced data preprocessing, class imbalance handling with SMOTE, and hyperparameter optimization using Optuna. Model interpretability is enhanced with SHAP to identify key risk factors.
data-science datapreprocessing logistic-regression machine-learning optuna random-forest shap smote xgboost
Last synced: 02 Mar 2025
https://github.com/deliprofesor/breast-cancer-detection-using-svm-with-smote-and-model-optimization
This project analyzes health and lifestyle factors influencing heart attack risk using statistical methods and machine learning, with Ridge Regression identified as the best predictive model.
classification data data-preprocessing data-science data-visualization gridsearchcv machine-learning python roc-curve smote svm
Last synced: 10 Apr 2025
https://github.com/abdelrahman-amen/active_learning_using_imbalanced_dataset
This project explores active learning techniques, focusing on query strategies to optimize informative data selection for model training. It aims to reduce labeled data while improving model performance, especially with imbalanced datasets where certain classes are underrepresented.
activelearning bmi entropy imbalanced-data margin python querybycommittee smote uncertainty
Last synced: 05 Apr 2025
https://github.com/swsword1234/xray-pneumonia-classification-cnn-project
This project utilizes (CNN) to accurately classify X-Ray images for pneumonia detection. It explores three different approaches to handle data imbalance and achieve optimal model performance. The project includes detailed evaluation metrics and use Streamlit which enables a seamless classification.
cnn cnn-model cv2 deep-learning keras keras-tensorflow penumonia pneumonia-classification smote streamlit tensorflow xray xray-images
Last synced: 05 Apr 2025
https://github.com/aleksdrophunter/employee-attrition-prediction-with-machine-learning
Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
attrition data-analysis data-science data-visualization employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote
Last synced: 28 Mar 2025
https://github.com/sarahm44/credit-risk-predictor
Uses several machine learning models to predict credit risk.
balanced-random-forest cluster-centroids cluster-centroids-undersampling credit-risk easy-ensemble-classifier ensemble-learning fintech logistic-regression machine-learning naive-random-oversampler random-forest resampling smote smote-oversampler smoteenn-combination
Last synced: 04 Apr 2025
https://github.com/hetuvpatel/brain-stroke-prediction
Machine Learning project for predicting stroke risk using healthcare data. Includes EDA, preprocessing, SMOTE, feature selection (RFE), evaluation of Logistic Regression, Decision Tree, Random Forest, KNN, SVM, and Stacked Ensemble models.
data-mining ensemble-learning healthcare machine-learning predictive-modeling python rfe scikit-learn smote
Last synced: 28 Apr 2025
https://github.com/lingumd/credit_risk_analysis
Machine learning models for predicting credit risk in LendingClub dataset.
balancedrandomforestclassifier classification-report cluster-centroids-undersampling confusion-matrix easyensembleclassifier get-dummies google-colab imbalanced-learn machine-learning matplotlib-pyplot numpy pandas pathlib randomoversampler scikit-learn smote smoteenn
Last synced: 16 Mar 2025
https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning
Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.
attrition employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote
Last synced: 26 Mar 2025
https://github.com/soumyapro/wine-quality-prediction
This project is about the prediction of wine quality using machine learning algorithms
boxplot matplotlib numpy pandas random-forest smote
Last synced: 01 Mar 2025
https://github.com/gss-18/fraud-detection
🚀 Credit Card Fraud Detection using Machine Learning & XGBoost. Evaluating models on an imbalanced dataset, using SMOTE, ROC-AUC analysis, and finding the best fraud detection approach. 📊🔍
classification credit-card-fraud data-science fraud-detection imbalanced-data machine-learning python sklearn smote xgboost
Last synced: 17 Mar 2025
https://github.com/jianninapinto/bandersnatch
This project implements a machine learning model using Random Forest, XGBoost, and Support Vector Machines algorithms with oversampling and undersampling techniques to handle imbalanced classes for classification tasks in the context of predicting the rarity of monsters.
altair imbalanced-classification imblearn machine-learning mongodb oversampling pycharm-ide pymongo python random-forest-classifier scikit-learn smote support-vector-machines undersampling xgboost
Last synced: 19 Jan 2025
https://github.com/desininja/employee-attrition-analysis
To know the main reasons for attrition of employees.
attrition classification classifier data-analytics data-science data-visualization hr hr-analytics imbalanced-data logistic-regression smote smote-sampling undersampling
Last synced: 22 Feb 2025
https://github.com/abideen-olawuwo/predictive-maintenance
A predictive maintainance model
decision-trees ipython matplotlib numpy pandas plotly python random-forest seaborn sklearn smote
Last synced: 13 Mar 2025
https://github.com/rakibhhridoy/handlingimbalanceddataset-business
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase but by others illegally. Some huge transactions can also done by suspicious figure, it need to catch em.
auc business-intelligence fraud-detection imbalanced-data imbalanced-learning machine-learning oversampling precision recall smote transcations
Last synced: 17 Feb 2025
https://github.com/thedatatenno/churn-prediction
A machine learning project to predict customer churn using classification models with SMOTE and hyperparameter tuning."
accuracy-score churn-prediction classification confusion-matrix data-science f1-score gridsearchcv knn-classifier logistic-regression machine-learning matplotlib portfolio-project python random-forest scikit-learn seaborn smote telecom xgboost-classifier
Last synced: 09 Apr 2025
https://github.com/khyatimahendru/balancing-data-smote
In this project, I have worked on the problem of Credit Card Fraud Detection. The data is highly imbalanced with the positive class (fraud) accounting merely for 0.172% of the data. In classification problem balancing your data is extremely important. Here I have described how accuracy should not be the only criteria to judge model performance.
classification classification-algorithims data-science imbalanced-data machine-learning smote
Last synced: 07 Apr 2025
https://github.com/mohamedlotfy989/credit-card-fraud-detection
This repository focuses on credit card fraud detection using machine learning models, addressing class imbalance with SMOTE & undersampling, and optimizing performance via Grid Search & RandomizedSearchCV. It explores Logistic Regression, Random Forest, Voting Classifier, and XGBoost. balancing precision-recall trade-offs for fraud detection.
classic-machine-learning credit-card-fraud ensemble-learning fraud-detection grid-search-hyperparameters hyperparameter-tuning imbalanced-data logistic-regression precision-recall random-forest randomizedsearchcv smote threshold-tuning undersampling voting-classifier xgboost
Last synced: 30 Mar 2025
https://github.com/sunnyrao07/water-quality-analysis
A machine learning project that predicts water potability based on chemical and physical attributes, using models like Logistic Regression, Random Forest, and XGBoost.
data-cleaning label-encoding logistic-regression matplotlib model-evaluation numpy pandas pyhton random-forest sckiit-learn seaborn smote standard-scaler xgboost
Last synced: 16 Apr 2025
https://github.com/sankoktas/bhi360-fall-detection
Fall detection system using Bosch BHI360 sensor data with time-series labeling, feature extraction, and machine learning (LOSO CV + Gradient Boosting).
accelerometer bhi360 bosch-sensors data-augmentation fall-detection feature-extraction gradient-boosting gyroscope human-activity-recognition label-studio loso-cross-validation machine-learning python scikit-learn sensor-data smote time-series
Last synced: 05 Apr 2025