An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with smote

A curated list of projects in awesome lists tagged with smote .

https://github.com/analyticalmindsltd/smote_variants

A collection of 85 minority oversampling techniques (SMOTE) for imbalanced learning with multi-class oversampling and model selection features

imbalanced-data imbalanced-learning oversampling smote

Last synced: 11 Apr 2025

https://github.com/tgsmith61591/smrt

Handle class imbalance intelligently by using variational auto-encoders to generate synthetic observations of your minority class.

autoencoder class-imbalance machine-learning neural-networks smote tensorflow vae variational-autoencoder

Last synced: 13 Apr 2025

https://github.com/bhattbhavesh91/imbalance_class_sklearn

Address imbalance classes in machine learning projects.

class-imbalance classification-algorithm machine-learning python smote

Last synced: 17 Apr 2025

https://github.com/bcbi/classimbalance.jl

Sampling-based methods for correcting for class imbalance in two-category classification problems

class imbalance rose smote

Last synced: 14 Apr 2025

https://github.com/hmjianggatech/esmote

ESmote - An R package implemneting fast SMOTE algorithm

imbalanced-learning machine-learning r smote

Last synced: 26 Mar 2025

https://github.com/majobasgall/smote-mr

SMOTE-MR: A distributed Synthetic Minority Oversampling Technique (SMOTE) for Big Data which applies a MapReduce based-approach. SMOTE-MR is categorized as an `approximated/ non exact` solution. Also, there is an `exact` solution called SMOTE-BD written by the author (See: https://github.com/majobasgall/smote-bd)

big-data imbalanced-data machile-learning scala smote spark

Last synced: 26 Feb 2025

https://github.com/pavankethavath/microsoft-classifying-cybersecurity-incidents-with-ml

A machine learning pipeline for classifying cybersecurity incidents as True Positive(TP), Benign Positive(BP), or False Positive(FP) using the Microsoft GUIDE dataset. Features advanced preprocessing, XGBoost optimization, SMOTE, SHAP analysis, and deployment-ready models. Tools: Python, scikit-learn, XGBoost, LightGBM, SHAP and imbalanced-learn

classificationreport correlation-analysis dataanalysis decision-tree-classifier exploratory-data-analysis feature-engineering feature-selection gradientboosting hyperparameter-tuning joblib lgbmclassifier logistic-regression machine-learning modelselection pandas randomforestclassifier randomsearchcv shap smote xgboost-classifier

Last synced: 23 Apr 2025

https://github.com/ugurcanerdogan/cross-validation-with-imbalanced-dataset

BBM467*SDSP - Small Data Science Project - Things to consider in cross validation and resampling when dealing with Imbalanced Data : What is the right way?

bbm467 cross-validation data data-science kfold-cross-validation logistic-regression machine-learning oversampling sdsp smote

Last synced: 06 Apr 2025

https://github.com/seahrh/bad-renter

Working examples of Spark ML Pipeline and SMOTE algorithm for synthetic data augmentation

aws-emr scala smote spark-ml

Last synced: 29 Mar 2025

https://github.com/dimitriskatos/health_stroke_prediction

Prediction possible strokes using RandomForest and PCA

cleaning-data pca prediction random-forest-classifier smote

Last synced: 06 Apr 2025

https://github.com/chaitanyac22/telecom-churn-prediction

In this project, data analytics is used to analyze customer-level data of a leading telecom firm, build predictive models to identify customers at high risk of churn, and identify the main indicators of churn. The project focuses on a four-month window, wherein the first two months are the ‘good’ phase, the third month is the ‘action’ phase, while the fourth month is the ‘churn’ phase. The business objective is to predict the churn in the last i.e. fourth month using the data from the first three months.

class-imbalance classification data-analytics data-cleaning data-manipulation evaluation-metrics feature-engineering hyperparameter-tuning logistic-regression machine-learning model-building model-evaluation over-sampling pca random-forest-classifier rfe smote statistics telecom xgboost

Last synced: 27 Mar 2025

https://github.com/yessasvini23/hugging-face-agentic-ai-cerification

Agentic AI is the future, and I'm excited to be part of it. This course gave me a strong foundation in building AI agents,

ai aiagentic deep-learning huggingface smote

Last synced: 14 Mar 2025

https://github.com/aman5319/bank-marketing-analysis

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

bank-marketing-analysis desiciontree imbalanced-data logistic-regression roc-auc sklearn smote

Last synced: 29 Mar 2025

https://github.com/rakshit-vasava/predictive-analytics-for-insurance-purchase

Predicting customer insurance purchases using stacking models and SMOTE for the Homesite Quote Conversion Problem on Kaggle.

k-nearest-neighbours kaggle-competition multilayer-perceptron python random-forest scikit-learn smote support-vector-machines

Last synced: 05 Apr 2025

https://github.com/aishwaryagm1999/aircraft-network-security-using-yara-rules-and-machine-learning-for-threat-detection-and-prevention

This project addresses cybersecurity in aviation by developing a machine learning-enhanced intrusion detection and prevention system (IDPS) for aircraft networks. Combining YARA-based signature detection with behavior-based (ML) anomaly detection, the system mitigates cyber threats in real-time, protecting aircraft from sophisticated attacks.

argus automation cybersecurity feature-hashing idps joblib machine-learning matplotlib networking numpy pandas python random-forest-classifier requests seaborn shell smote smotesmote tshark yara

Last synced: 08 Apr 2025

https://github.com/mahnoorsheikh16/credit-card-default-prediction

This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.

encoding hiplot imblearn json knn-imputer logistic-regression matplotlib numpy pandas pca-analysis plotly scipy seaborn sklearn smote streamlit support-vector-machines timeseries-forecasting visualization xgboost-classifier

Last synced: 22 Mar 2025

https://github.com/tlapanco/knn-project

Projecto para la materia de Sistemas inteligentes haciendo uso de KNN oversampling.

jupyter-notebook knn pandas python scikit-learn smote

Last synced: 17 Mar 2025

https://github.com/shwetapardhi/project-bankruptcy_prediction

Using various machine learning models (Logistic Regression, Gaussian Naïve Bayes, KNN, Gradient Boosting Classifier, Decision Tree Classifier, Random Forest Classifier.) to predict whether a company will go bankrupt in the following years, based on financial attributes of the company; Addressed the issue of imbalanced classes, different importance

decision-tree-classifier eda gaussian-naive-bayes gradient-boosting-classifier knn logistic-regression python random-forest-classifier smote

Last synced: 27 Feb 2025

https://github.com/mahnoorsheikh16/Credit-Card-Default-Prediction

This project focuses on predicting whether a customer will default on their credit card payment in the upcoming month. Utilizing historical transaction data and customer demographics, the project employs various machine learning algorithms to distinguish between risky and non-risky customers for better credit risk management.

chi-square-test encoding hiplot imblearn json knn-imputer matplotlib numpy pandas pca-analysis pillow plotly robust-scalar scipy seaborn sklearn smote streamlit ttest visualization

Last synced: 01 Mar 2025

https://github.com/antoniof1704/imbalanced-classification-example

An example of a model I built where the dataset contained a very imbalanced class. Due to data governance rules, I have replaced the original dataset used in the modelling with a credit card fraud dataset from Kaggle.

fraud-detection imbalanced-classification jupiter-notebook modelling smote

Last synced: 01 Mar 2025

https://github.com/das-amlan/credit-card-fraud-detection-using-ml-and-different-class-imbalance-handling-approaches

A machine learning-based system for detecting fraudulent credit card transactions, leveraging advanced techniques like SMOTE, under-sampling, and anomaly detection models. Built with Random Forest, XGBoost, and SVM for robust performance on imbalanced datasets.

isolation-forest pyhton random-forest smote svm xgboost

Last synced: 03 Apr 2025

https://github.com/seahrh/spark-util

Utility for common use cases and bug workarounds in Apache Spark 2

apache-spark scala smote

Last synced: 24 Mar 2025

https://github.com/jiteshshelke/instrument-reviews-sentiment-analysis

🎵🔍 "A Flask-based Sentiment Analysis web app 🎭 that classifies instrument reviews 📝 using NLP 🤖 and Machine Learning 📊."

data-science flask instruments-review logistic-regression machine-learning nlp python sentiment-analysis smote text-classification tf-idf webapplication

Last synced: 31 Mar 2025

https://github.com/mnlscn/predicting-modelling-ingbank

smote meets uplift modeling

oversa r smote uplift-modeling

Last synced: 05 Apr 2025

https://github.com/sambhu431/xray-pneumonia-classification-cnn-project

This project utilizes (CNN) to accurately classify X-Ray images for pneumonia detection. It explores three different approaches to handle data imbalance and achieve optimal model performance. The project includes detailed evaluation metrics and use Streamlit which enables a seamless classification.

cnn cnn-model cv2 deep-learning keras keras-tensorflow penumonia pneumonia-classification smote streamlit tensorflow xray xray-images

Last synced: 28 Mar 2025

https://github.com/dvarshith/transaction-fraud-detection

Machine Learning pipeline for financial transaction fraud detection. Incorporates SMOTE, ensemble models, neural networks.

arizona-state-university autoencoder catboost data-mining ensemble-learning financial-security fraud-detection imbalanced-learning kaggle lightbgm machine-learning neural-networks python smote xgboost

Last synced: 28 Feb 2025

https://github.com/sarahloree/project-3--credit-card-user-churn-prediction

This is the third project I completed as part of the Advanced Machine Learning module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.

bagging bagging-classifier boosting boosting-classifier cross-validation datapreprocessing eda exploratory-data-analysis hyperparameter-optimization hyperparameter-tuning random-forest random-forest-classifier sampling smote

Last synced: 24 Feb 2025

https://github.com/jbalooshie/credit_risk_analysis

Testing various supervised machine learning models to predict a loan applicant's credit risk.

balanced-random-forest clustercentroids credit-risk easyensembleclassifier randomoversampler smote smoteenn

Last synced: 09 Apr 2025

https://github.com/sambhu431/nlp-resume-classification

The project leveraged advanced NLP techniques to accurately classify resume catehories with high precision and recall. Includes a Streamlit interface for seamless resume uploads and predictions. Built to handle edge cases like invalid inputs and out-of-dataset values.

logistic-regression machine-learning nlp nlp-machine-learning nlpproject nltk recomecategorization resumeclassifier smote streamlit

Last synced: 29 Mar 2025

https://github.com/mbappeenjoyer/creditcarddefault_prediction

Credit card defaulter prediction using data science techniques

catboost-classifier datascience optuna smote

Last synced: 20 Feb 2025

https://github.com/robinmillford/predicting-diabetes-a-machine-learning-approach-to-early-intervention

The goal of this project was to develop a predictive model for diabetes using a dataset containing various health-related features

data-analysis data-science diabetes-prediction jupyter-notebook machine-learning smote

Last synced: 11 Mar 2025

https://github.com/pjaiswalusf/stroke-prediction

This project leverages machine learning to predict stroke risk using XGBoost, Random Forest, and Logistic Regression. It incorporates advanced data preprocessing, class imbalance handling with SMOTE, and hyperparameter optimization using Optuna. Model interpretability is enhanced with SHAP to identify key risk factors.

data-science datapreprocessing logistic-regression machine-learning optuna random-forest shap smote xgboost

Last synced: 02 Mar 2025

https://github.com/deliprofesor/breast-cancer-detection-using-svm-with-smote-and-model-optimization

This project analyzes health and lifestyle factors influencing heart attack risk using statistical methods and machine learning, with Ridge Regression identified as the best predictive model.

classification data data-preprocessing data-science data-visualization gridsearchcv machine-learning python roc-curve smote svm

Last synced: 10 Apr 2025

https://github.com/abdelrahman-amen/active_learning_using_imbalanced_dataset

This project explores active learning techniques, focusing on query strategies to optimize informative data selection for model training. It aims to reduce labeled data while improving model performance, especially with imbalanced datasets where certain classes are underrepresented.

activelearning bmi entropy imbalanced-data margin python querybycommittee smote uncertainty

Last synced: 05 Apr 2025

https://github.com/swsword1234/xray-pneumonia-classification-cnn-project

This project utilizes (CNN) to accurately classify X-Ray images for pneumonia detection. It explores three different approaches to handle data imbalance and achieve optimal model performance. The project includes detailed evaluation metrics and use Streamlit which enables a seamless classification.

cnn cnn-model cv2 deep-learning keras keras-tensorflow penumonia pneumonia-classification smote streamlit tensorflow xray xray-images

Last synced: 05 Apr 2025

https://github.com/aleksdrophunter/employee-attrition-prediction-with-machine-learning

Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.

attrition data-analysis data-science data-visualization employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote

Last synced: 28 Mar 2025

https://github.com/hetuvpatel/brain-stroke-prediction

Machine Learning project for predicting stroke risk using healthcare data. Includes EDA, preprocessing, SMOTE, feature selection (RFE), evaluation of Logistic Regression, Decision Tree, Random Forest, KNN, SVM, and Stacked Ensemble models.

data-mining ensemble-learning healthcare machine-learning predictive-modeling python rfe scikit-learn smote

Last synced: 28 Apr 2025

https://github.com/ayan6943/employee-attrition-prediction-with-machine-learning

Employee Attrition Prediction with Machine Learning | Analyzing HR data to predict employee turnover using Random Forest. Includes EDA, feature engineering, model training, and evaluation. Achieved 90% accuracy.

attrition employee machine-learning matplotlib numpy pandas python randomforestclassifier scikit-learn seaborn smote

Last synced: 26 Mar 2025

https://github.com/soumyapro/wine-quality-prediction

This project is about the prediction of wine quality using machine learning algorithms

boxplot matplotlib numpy pandas random-forest smote

Last synced: 01 Mar 2025

https://github.com/gss-18/fraud-detection

🚀 Credit Card Fraud Detection using Machine Learning & XGBoost. Evaluating models on an imbalanced dataset, using SMOTE, ROC-AUC analysis, and finding the best fraud detection approach. 📊🔍

classification credit-card-fraud data-science fraud-detection imbalanced-data machine-learning python sklearn smote xgboost

Last synced: 17 Mar 2025

https://github.com/jianninapinto/bandersnatch

This project implements a machine learning model using Random Forest, XGBoost, and Support Vector Machines algorithms with oversampling and undersampling techniques to handle imbalanced classes for classification tasks in the context of predicting the rarity of monsters.

altair imbalanced-classification imblearn machine-learning mongodb oversampling pycharm-ide pymongo python random-forest-classifier scikit-learn smote support-vector-machines undersampling xgboost

Last synced: 19 Jan 2025

https://github.com/rakibhhridoy/handlingimbalanceddataset-business

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase but by others illegally. Some huge transactions can also done by suspicious figure, it need to catch em.

auc business-intelligence fraud-detection imbalanced-data imbalanced-learning machine-learning oversampling precision recall smote transcations

Last synced: 17 Feb 2025

https://github.com/khyatimahendru/balancing-data-smote

In this project, I have worked on the problem of Credit Card Fraud Detection. The data is highly imbalanced with the positive class (fraud) accounting merely for 0.172% of the data. In classification problem balancing your data is extremely important. Here I have described how accuracy should not be the only criteria to judge model performance.

classification classification-algorithims data-science imbalanced-data machine-learning smote

Last synced: 07 Apr 2025

https://github.com/mohamedlotfy989/credit-card-fraud-detection

This repository focuses on credit card fraud detection using machine learning models, addressing class imbalance with SMOTE & undersampling, and optimizing performance via Grid Search & RandomizedSearchCV. It explores Logistic Regression, Random Forest, Voting Classifier, and XGBoost. balancing precision-recall trade-offs for fraud detection.

classic-machine-learning credit-card-fraud ensemble-learning fraud-detection grid-search-hyperparameters hyperparameter-tuning imbalanced-data logistic-regression precision-recall random-forest randomizedsearchcv smote threshold-tuning undersampling voting-classifier xgboost

Last synced: 30 Mar 2025

https://github.com/sunnyrao07/water-quality-analysis

A machine learning project that predicts water potability based on chemical and physical attributes, using models like Logistic Regression, Random Forest, and XGBoost.

data-cleaning label-encoding logistic-regression matplotlib model-evaluation numpy pandas pyhton random-forest sckiit-learn seaborn smote standard-scaler xgboost

Last synced: 16 Apr 2025

https://github.com/sankoktas/bhi360-fall-detection

Fall detection system using Bosch BHI360 sensor data with time-series labeling, feature extraction, and machine learning (LOSO CV + Gradient Boosting).

accelerometer bhi360 bosch-sensors data-augmentation fall-detection feature-extraction gradient-boosting gyroscope human-activity-recognition label-studio loso-cross-validation machine-learning python scikit-learn sensor-data smote time-series

Last synced: 05 Apr 2025