Projects in Awesome Lists tagged with preprocessing-data
A curated list of projects in awesome lists tagged with preprocessing-data .
https://github.com/vanderschaarlab/hyperimpute
A framework for prototyping and benchmarking imputation methods
data-science imputation imputation-algorithm machine-learning machine-learning-prerequisites preprocessing-data python scikit-learn
Last synced: 07 Apr 2025
https://github.com/weiglszonja/meeg-tools
EEG/MEG data preprocessing and analyses framework
connectivity-analysis eeg-analysis eeg-preprocessing pipeline preprocessing-data time-frequency-analysis
Last synced: 16 Aug 2025
https://github.com/cecivieira/cotas-genero-eleicoes-e-proposicoes-legislativas
Análise de dados sobre cotas de gênero e seu impacto nas eleições e proposições legislativas da Câmara dos Deputados Federais entre 1934 e 2021. Parte do TCC da pós-graduação em Inteligência Artificial e Aprendizado de Máquina na @pucminas
dataanalysis pandas preprocessing-data python randomforestclassifier
Last synced: 20 Sep 2025
https://github.com/sabaudian/music_genre_classification_project
Audio Pattern Recognition project - Music Genres Classification
artificial-intelligence audio-analysis audio-classification audio-processing genre-classification genres-classification k-nearest-neighbours k-nn machine-learning music-genre-classification music-information-retrieval neural-network preprocessing preprocessing-data python random-forest random-forest-classification svm svm-classifier
Last synced: 12 Aug 2025
https://github.com/drleniaw/analysis_sentiment_twitter_free_sex_in_indonesian
Analysis Sentiment on Twitter Free Sex In Indonesia
collaboration crawling jupyter-notebook lda naive-bayes-classifier preprocessing-data python sentiment-analysis support-vector-machines twitter twitter-sentiment-analysis vader-lexicon word2vec wordcloud
Last synced: 26 Jul 2025
https://github.com/rafiqamar/hr-analytics-project
Cleaned and processed HR data using Python for analysis and visualization. Analyzed employee trends and performance using SQL and Python. Built an interactive Power BI dashboard connected to MySQL for dynamic insights.
exploratory-data-analysis mysql-database powerbi preprocessing-data python
Last synced: 05 Apr 2025
https://github.com/jadelhelm/autoprepad
Anomaly Detection Pipeline automates data preprocessing for unsupervised scenarios without labels.
anomalies anomalies-detection anomaly anomaly-detection anomalydetection automated automated-machine-learning data-quality machine-learning machine-learning-algorithms preprocessing preprocessing-data preprocessing-pipeline pyod python sklearn tabular
Last synced: 06 May 2025
https://github.com/luisfelipepoma/machine_learning
Learning about the algorithms used in machine learning, along with techniques for training and testing models.
backpropagation-learning-algorithm data-science feature-engineering gradient-descent html ia learning loss-functions metrics-visualization neuronal-networks nlp normalization-techniques optimizer-algorithms preprocessing-data python regression-models
Last synced: 15 Oct 2025
https://github.com/maxbubblegum47/preprocessing
Preprocessing method for Information Retrieval System
algorithm algorithms preprocessing preprocessing-data python python3 unimore-informatica
Last synced: 22 Mar 2025
https://github.com/rafiqamar/customer-churn-prediction-app
Built and deployed a Streamlit-based customer churn prediction app using ML models. Preprocessed data with encoding and scaling, improving model accuracy. Designed for churn prediction and retention insights.
exploratory-data-analysis machine-learning-algorithms preprocessing-data python streamlit-webapp
Last synced: 05 Jul 2025
https://github.com/tszon/end-to-end_ds_ml_project
I built an end-to-end customer churn segregation and prediction project.
containerisation data-science docker explianable-ai exploratory-data-analysis feature-engineering hdbscan-clustering kmeans-clustering machine-learning mlflow preprocessing-data scikit-learn shap statistical-test statistical-tests streamlit supervised-learning visualisation vscode
Last synced: 03 Sep 2025
https://github.com/xndrxssx/cotton_candy_spectral_analysis
Diretório com os algoritmos de pré-processamento e modelos para análise de dados espectrais da uva de mesa Cotton Candy.
machine-learning-algorithms msc pca pcr plsr preprocessing-data random-forest savitzky-golay snv spectroscopy standard-normal-variate support-vector-machine
Last synced: 30 Jul 2025
https://github.com/himank-khatri/classiflow
A web app that automates tedious data preprocessing and machine learning model testing.
exploratory-data-analysis machinelearning preprocessing-data python streamlit vizualization
Last synced: 01 Aug 2025
https://github.com/ddihora1604/advanced_business_analytics_on_world_bank_global_financial_inclusion_data_2021
Bridging the Gaps in Financial Inclusion: Understanding the Cash-Credit Paradox, Divide between Cash and Digital Payments, and Financial Resilience.
advanced-excel business-analytics data-analysis data-engineering data-mining data-visualization database exploratory-data-analysis machine-learning preprocessing-data python
Last synced: 17 Oct 2025
https://github.com/vipanchip/ai-powered-recipe-recommendation
This project is a Recipe Recommendation System that suggests recipes tailored to the user's specified nutritional values and ingredients. It integrates machine learning techniques with an intuitive web application framework to provide personalized recipe suggestions.
css flask html knn-classification machine-learning preprocessing-data
Last synced: 08 Aug 2025
https://github.com/ArtZaragozaGitHub/CV--P5_Plants_Seedling_Classification
A robust image classifier using CNNs to efficiently classify different plant seedlings and weeds to improve crop yields and minimize the extensive human effort to do this manually.
cnn-classification cnn-for-visual-recognition confusion-matrix cv2-library imagedatagenerator label-binarizer learning-rate matplotlib numpy-library opencv optimizer-visualization pandas-library preprocessing-data reducelronplateau seaborn sequential-models tensorflow tensorflow-keras train-test-validation
Last synced: 06 Nov 2025
https://github.com/lucianoscarpaci/news-data-classification
Using the Reuters dataset, this example illustrates the process of data preprocessing, model definition and training, and performance evaluation.
keras model-definition model-training performance-evaluation preprocessing-data reuters scikit-learn seaborn tensorflow
Last synced: 06 Mar 2025
https://github.com/iroyalx/dataset_preprocessing_sample
UNI S6: Preprocessing in Data Mining using ucimlrepo
data-mining preprocess preprocessing preprocessing-data preprocessor
Last synced: 13 Jun 2025
https://github.com/jingvu/anime-database-preprocessing-r-project
During the data preprocessing step, I identified three tasks that I believe are crucial and require careful attention: data transformation, handling outliers, and managing missing values. This repository serves as a resource to share what I've learned on these topics for anyone interested.
anime-dataset preprocessing-data r rmarkdown
Last synced: 24 Dec 2025
https://github.com/tszon/data-science-projects
Included are all the worth-noting Data Science projects in my learning journey with DataCamp.
data-analysis data-science exploratory-data-analysis feature-engineering machine-learning modelling preprocessing-data scikit-learn supervised-learning
Last synced: 15 Mar 2025
https://github.com/gaurav-singh7092/resumatch
An AI-powered resume and job description matching application using natural language processing and machine learning techniques. This application provides intelligent analysis of resume-job compatibility with detailed scoring and recommendations.
fastapi keyword-extraction nextjs nlp preprocessing-data python similarity-score tailwind
Last synced: 02 Jul 2025
https://github.com/hayatiyrtgl/audio_processing_for_cnn_network
Spectrum creation is the most important thing while dealing with audio data
audio audio-processing librosa preprocessing preprocessing-data python stft
Last synced: 08 Apr 2025
https://github.com/shellynagar27/transportation-and-logistics-challenge
Analyzing logistics data to optimize shipment efficiency, reduce delays, and enhance supply chain visibility using Power BI. Insights include top routes, delays, supplier trends, and peak shipments.
cleaning-data critical-thinking data-analysis data-visualization exploratory-data-analysis feature-engineering powerbi preprocessing-data problem-solving python
Last synced: 08 Sep 2025
https://github.com/nazir20/chatgpt-tweets-preprocessing
Preprocessing of Tweets about Chatgpt
ai chatgpt cleaning-data colab-notebook openai preprocessing-data python tweet-analysis twitter-scraping
Last synced: 24 Jun 2025
https://github.com/animesh-chourey/loan-classifier
Trained machine learning algorithms (Logistic Regression, KNN, SVM, Decision Tree) specifically, after performing visualization and pre-preocessing tasks on a loan dataset. Executed the evaluation metrics such as F1-score, Log loss and jaccard-similarity score to assess the algorithms performance.
decision-tree f1-score jaccard-similarity knn logistic-regression logloss matplotlib numpy pandas preprocessing-data svm
Last synced: 01 Mar 2025
https://github.com/blleshi/neural_network_binary_classification
Venture Funding with Deep Learning (Neural Network Binary Classification)
binary-classification binary-crossentropy deep-learning hdf5 keras neural-network neural-network-model pandas preprocessing-data scikit-learn standard-scaler tensorflow venture-funding
Last synced: 27 Feb 2025
https://github.com/jatin-mehra119/flight-price-prediction
This study aims to analyze flight booking data from "Ease My Trip" website, using statistical tests and linear regression to extract insights. By understanding this data, valuable information can be gained to benefit passengers using the platform.
data-analysis datacleaning datavisualization machine-learning preprocessing-data python sklearn-pipeline sklearn-regression-algorithm streamlit-webapp
Last synced: 10 Mar 2025
https://github.com/subhadipsinha722133/multiple-disease-prediction
🤖This is an interactive Streamlit web application that predicts the likelihood of multiple diseases(Diabetes Prediction, Heart Disease Prediction, Parkinson's Disease Prediction) using Machine Learning models.
machine-learning-algorithms prediction preprocessing-data sklearn streamlit
Last synced: 07 Oct 2025
https://github.com/Jingvu/Anime-Database-Preprocessing-R-Project
During the data preprocessing step, I identified three tasks that I believe are crucial and require careful attention: data transformation, handling outliers, and managing missing values. This repository serves as a resource to share what I've learned on these topics for anyone interested.
anime-dataset preprocessing-data r rmarkdown
Last synced: 13 Oct 2025
https://github.com/sarahloree/project-2--bank-loan-marketing-model
This is the second project I completed as part of the Machine Learning Module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.
business-analytics data-engineering decision-tree-classifier decision-trees eda modelbuilding modelevaluation performance-analysis performance-metrics performancemonitoring preprocessing-data
Last synced: 17 Oct 2025
https://github.com/lummy-a/montgomery-county-crime-analysis
Analysis of crime patterns in Montgomery County (2018-2022) using Python data science tools to identify trends, spatial hotspots, and temporal distributions across crime types. Includes visualizations and insights to inform prevention strategies.
analysis crime-analysis crime-data geospatial-analysis jupyter-notebook preprocessing-data python statistical-analysis visualization
Last synced: 30 Apr 2025
https://github.com/thiwak/preprocess-50k-tiles-sri-lanka
Preprocessing scripts for 1:50K tiles issued by the survey department, Sri Lanka
arcpy automation gdal-python geospatial preprocessing-data
Last synced: 25 Oct 2025
https://github.com/alejandrolara11/machinelearningcourse
Machine Learning Basics: From Setup to Clustering
data-analysis data-science machine-learning numpy pandas plotly preprocessing-data python scikit-learn seaborn streamlit
Last synced: 26 Mar 2025
https://github.com/himank-khatri/classification-builder
A web app that automates tedious data preprocessing and machine learning model testing.
exploratory-data-analysis machinelearning preprocessing-data python streamlit vizualization
Last synced: 02 Mar 2025
https://github.com/saadhaniftaj/ai-essayscore-automated-essay-scoring-using-lstm
AI-EssayScore is an automated essay scoring system using LSTM neural networks. It tokenizes and pads essays, processes them through an LSTM model, and predicts scores. The project includes data preprocessing, model training, evaluation, and saving the model for future use.
automated-machine-learning evaluation-metrics intro-to-ai lstm preprocessing-data
Last synced: 20 Mar 2025
https://github.com/multiomics-analytics-group/acore
Functionality to preprocess and analyse multi-omics data
analysis omics omics-data-integration preprocessing-data
Last synced: 12 Apr 2025
https://github.com/tejaswirupa/early-prediction-of-diabetes-risk-using-machine-learning
Built a predictive model using CDC health data to identify individuals at risk of developing diabetes. Achieved 90.6% F1-score using Logistic Regression and revealed key health indicators like BMI and blood pressure as top predictors.
data-science datacleaning exploratory-data-analysis modelevaluation preprocessing-data python scikit-learn supervised-machine-learning
Last synced: 15 Jul 2025
https://github.com/hoangleminh17/ranks-prediction-for-lol
A method to predict rankings based on performances of players for game League Of Legends
jupyter-notebook league-of-legends linear-regression predictive-modeling preprocessing-data python3 ridge-regression
Last synced: 16 Jul 2025
https://github.com/lucasdsbr/ai-data-preprocessing
Data preprocessing for Artificial Intelligence
data-science googlecolab preprocessing-data python
Last synced: 23 Feb 2025
https://github.com/bhavinpatel4199/machine-learning-framework
This repository, showcases various projects that explore key concepts in both supervised and unsupervised learning, with a focus on real-world applications. The projects utilize a range of machine learning techniques, including data preprocessing, feature selection, exploratory data analysis (EDA), and model optimization.
classification clustering data-science data-structures data-visualization exploratory-data-analysis machine-learning machine-learning-algorithms machine-learning-models pandas-dataframe predictive-modeling preprocessing-data sklearn supervised-learning unsupervised-learning
Last synced: 07 Apr 2025
https://github.com/datafog/datafog
Python library to redact PII/business information from entering semantic data pipelines (RAG, 'chat on your data')
ai embeddings llm ml mlops pii preprocessing preprocessing-data privacy privacy-protection privacy-tools rag semantic-analysis
Last synced: 25 Feb 2025
https://github.com/mohd-faizy/preprocess_ml
This repository hosts Python code that utilizes the Scikit-learn preprocessing API for data preprocessing. The code presents a comprehensive range of tools that handle missing data, scale data, encode categorical variables, and perform other functions.
data-science feature-engineering feature-engineering-algorithm feature-extraction feature-selection machine-learning outlier-detection preprocessing-data preprocessor scikit-learn
Last synced: 16 Sep 2025
https://github.com/tomaslopera/fifa_analysis
exploratory-data-analysis matplotlib numpy pandas preprocessing-data
Last synced: 29 Jul 2025