Projects in Awesome Lists tagged with data-preprocessing
A curated list of projects in awesome lists tagged with data-preprocessing .
https://github.com/zzw922cn/Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset
Last synced: 02 Apr 2025
https://github.com/zzw922cn/automatic_speech_recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
audio automatic-speech-recognition chinese-speech-recognition cnn data-preprocessing deep-learning end-to-end evaluation feature-vector layer-normalization lstm paper phonemes rnn rnn-encoder-decoder speech-recognition tensorflow timit-dataset
Last synced: 15 May 2025
https://github.com/skrub-data/skrub
Machine learning with dataframes
data data-analysis data-cleaning data-preparation data-preprocessing data-science data-wrangling dataframe dataframes dirty-data machine-learning
Last synced: 13 May 2025
https://github.com/data-prep-kit/data-prep-kit
Open source project for data preparation for GenAI applications
code-quality data data-prep data-preparation data-preprocessing data-preprocessing-pipelines datacuration datarecipes deduplication finetuning large-language-models large-scale-data-processing llm llmapps malware python ray spark
Last synced: 15 Dec 2025
https://github.com/machinelearnjs/machinelearnjs
Machine Learning library for the web and Node.
data-preprocessing easy-to-use feature-extraction machine-learning minimalistic node probabilistic-models random-forest statistical-learning structured-data svm web
Last synced: 24 Dec 2025
https://github.com/akanz1/klib
Easy to use Python library of customized functions for cleaning and analyzing data.
data-analysis data-cleaning data-preprocessing data-science data-visualization feature-selection klib python
Last synced: 21 Oct 2025
https://github.com/desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data
Last synced: 22 Nov 2025
https://github.com/shamspias/customizable-gpt-chatbot
A dynamic, scalable AI chatbot built with Django REST framework, supporting custom training from PDFs, documents, websites, and YouTube videos. Leveraging OpenAI's GPT-3.5, Pinecone, FAISS, and Celery for seamless integration and performance.
artificial-intelligence autogpt chatbot conversational-ai data-preprocessing django django-rest-framework gpt-3 gpt-voice langchain langchain-python longchain machine-learning natural-language-processing nlp python voice-chat voice-recognition voice-to-text voice-transcription
Last synced: 05 Apr 2025
https://github.com/Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.
anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data
Last synced: 03 Apr 2025
https://github.com/msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch
Last synced: 07 May 2025
https://github.com/harunurrashid97/100-Days-Of-ML-Code
A day to day plan for this challenge. Covers both theoritical and practical aspects
100-days-of-code 100daysofmlcode article data-preprocessing data-science datascience decision-tree eda exploratory-data-analysis implementation infographics linear-regression machine-learning machine-learning-algorithms python regression-algorithms siraj-raval-challenge textsummarization tutorials vizualization
Last synced: 19 Jul 2025
https://github.com/tirendazacademy/pandas-tutorial
Jupyter Notebooks and Data Sets for Pandas Library
data data-analysis data-preprocessing data-science machine-learning pandas pandas-dataframe pandas-datareader pandas-library pandas-python pandas-series pandas-tricks-for-data-manipulation pandas-tutorial python
Last synced: 06 Apr 2025
https://github.com/hasnainraz/semsegpipeline
A simpler way of reading and augmenting image segmentation data into TensorFlow
augmentation data-augmentation data-augmentations data-preprocessing deep-learning image-augmentation image-preprocessing input-pipeline masks pipeline python semantic-segmentation tensorflow
Last synced: 30 Jun 2025
https://github.com/thepanacealab/smmt
Social Media Mining Toolkit (SMMT) main repository
annotation data-acquisition data-annotation data-preprocessing gathering spacy tweets twitter-api
Last synced: 20 Aug 2025
https://github.com/triton-inference-server/dali_backend
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
dali data-preprocessing deep-learning fast-data-pipeline gpu image-processing nvidia-dali python
Last synced: 04 Apr 2025
https://github.com/dansuh17/segan-pytorch
SEGAN pytorch implementation https://arxiv.org/abs/1703.09452
audio data-preprocessing mir pytorch segan segan-pytorch source-separation speech-enhancement
Last synced: 16 Apr 2025
https://github.com/asavinov/prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
business-intelligence data-preparation data-preprocessing data-processing data-science data-wrangling feature-engineering map-reduce olap pandas python spark workflow
Last synced: 11 Apr 2025
https://github.com/hypox64/candock
A time series signal analysis and classification framework
classification data-augmentation data-preprocessing deep-learning eeg series-signal-analysis
Last synced: 24 Apr 2025
https://github.com/laureberti/learn2clean
Learn2Clean: Optimizing the Sequence of Tasks for Data Preparation and Cleaning
automated data-cleaning data-cleaning-pipeline data-curation data-preprocessing reinforcement-learning
Last synced: 11 Sep 2025
https://github.com/danielhanchen/sciblox
sciblox - Easier Data Science and Machine Learning
boosting data-analysis data-mining data-preprocessing data-science data-visualization imputation machine-learning python sklearn
Last synced: 30 Apr 2025
https://github.com/soumyadip007/data-science-using-python-university-course-module
βData scienceβ is just about as broad of a term as they come. It may be easiest to describe what it is by listing its more concrete components: Data exploration & analysis. Included here: Pandas; NumPy; SciPy; a helping hand from Python's Standard Library.
data-preparation data-preprocessing data-processing data-science data-visualization jupyter-notebook knn numpy panda plotting python
Last synced: 23 Jun 2025
https://github.com/elysian01/data-purifier
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.
data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3
Last synced: 04 Oct 2025
https://github.com/repetere/modelscript
REPO MOVED TO https://github.com/repetere/jsonstack-data - Data Science and Machine learning in JavaScript
data-mining data-preprocessing data-science javascript machine-learning
Last synced: 01 Oct 2025
https://github.com/ammsa/dtcleaner
DTCleaner: data cleaning using multi-target decision trees.
data-cleaning data-mining data-preprocessing data-quality data-science data-wrangling
Last synced: 21 Mar 2025
https://github.com/kukuster/sumstatsrehab
GWAS summary statistics files QC tool
bioinformatics bioinformatics-tool compbio computational-biology data-prep data-preparation data-preprocessing gwas gwas-pipeline gwas-summary-statistics summary-statistics sumstats
Last synced: 09 Apr 2025
https://github.com/ELToulemonde/dataPreparation
Data preparation for data science projects.
data-preparation data-preprocessing data-science date-conversion r speed variable-elimination variable-selection
Last synced: 30 Jul 2025
https://github.com/eltoulemonde/datapreparation
Data preparation for data science projects.
data-preparation data-preprocessing data-science date-conversion r speed variable-elimination variable-selection
Last synced: 19 Aug 2025
https://github.com/kwokhing/yandexcatboost-python-demo
Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset
catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost
Last synced: 09 Apr 2025
https://github.com/nicomignoni/tab2img
A tool to convert tabular data into images, in order to be used by CNNs Inspired by the "DeepInsight" paper.
cnn data-preprocessing deepinsight tabular-data
Last synced: 14 Jul 2025
https://github.com/twardoch/split-markdown4gpt
A Python tool for splitting large Markdown files into smaller sections based on a specified token limit. This is particularly useful for processing large Markdown files with GPT models, as it allows the models to handle the data in manageable chunks.
data-preprocessing gpt gpt-3 gpt-35-turbo gpt-35-turbo-16k gpt-4 markdown markdown-processing mistletoe natural-language-processing nlp openai openai-gpt python split-text summarization text-analysis text-processing text-summarization text-tokenization
Last synced: 08 Jul 2025
https://github.com/azaz9026/medicine-recommendation-system
A Medicine Recommendation System in machine learning (ML) is a software application designed to assist healthcare professionals and patients in selecting the most appropriate medication based on various factors such as medical history, symptoms, demographics, and drug interactions
api data-preprocessing eda encoding flask machine-learning render-template sklearn-library statistics
Last synced: 10 Apr 2025
https://github.com/mahtafetrat/manatts-persian-speech-dataset
ManaTTS is the largest open Persian speech dataset with 100+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 08 Apr 2025
https://github.com/buabaj/xplore
A python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
artificial-intelligence data-preprocessing data-science data-wrangling machine-learning
Last synced: 12 Apr 2025
https://github.com/gyrdym/ml_preprocessing
Implementation of popular data preprocessing algorithms for Machine learning
data-preprocessing data-science machine-learning machine-learning-algorithms onehot-encoder ordinal-encoder
Last synced: 21 Mar 2025
https://github.com/datafog/datafog-python
Open source PII detection and anonymization tool: easy-to-use, configurable, and extensible
ai data-anonymization data-preprocessing devsecaiops llm-privacy open-source pii pii-detection privacy privacy-protection python
Last synced: 26 Apr 2025
https://github.com/hemangjoshi37a/personalgoalassistant
AI-driven Personal Goal Assistant: Reinforcement learning-powered software mimics user behavior, interacts with computer inputs, and autonomously achieves goals in finance, social networking, and productivity. Open-source, Python-based RL agent.
ai automation autonomous-agents computer-vision data-preprocessing deep-learning finance goal-setting machine-learning mimic natural-language-processing open-source personal-goals productivity python reinforcement-learning social-media user-data
Last synced: 12 Aug 2025
https://github.com/MahtaFetrat/ManaTTS-Persian-Speech-Dataset
ManaTTS is the largest open Persian speech dataset with 86+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 01 Mar 2025
https://github.com/aayushpatel007/topicrankpy
A Python package to get useful information from documents using TopicRank Algorithm.
data-preprocessing email-parsing graph-algorithms hierarchical-clustering keyphrase-extraction keywords-extraction named-entity-recognition network-x nlp pagerank-python phone-parse spacy text-cleaning textrank topicrank
Last synced: 19 Oct 2025
https://github.com/habedi/feature-factory
A high-performance feature engineering library for Rust powered by Apache DataFusion π¦
data-preprocessing data-science feature-engineering feature-selection machine-learning rust-lang rust-library
Last synced: 01 Aug 2025
https://github.com/parvvaresh/satellite_data
This repository provides Python code for converting satellite data into a format suitable for deep learning models. It supports various deep learning architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory networks (LSTMs).
data data-preprocessing data-reporting numpy pandas python
Last synced: 22 Apr 2025
https://github.com/orbxball/timit-preprocessor
Extract mfcc vectors and phones from TIMIT dataset
data-preprocessing deep-learning mfcc phone speech-recognition timit timit-dataset
Last synced: 18 Mar 2025
https://github.com/vidhi1290/deep-learning-for-eeg-emotion-classification
This repository contains a Python code script for performing emotion classification using EEG (Electroencephalogram) data. Emotion classification from EEG signals is an important application in neuroscience and human-computer interaction. The code leverages deep learning techniques to analyze EEG data and predict emotional states.
coorelation data-exploration data-preprocessing data-science data-visualization deep-learning deep-learning-algorithms eeg-emotion-recognition egg-signals emotion-distribution emotion-prediction feature-analysis heatmap human-emotions machine-learning machine-learning-algorithms pie-chart spectral-analysis time-series-visualization
Last synced: 10 Apr 2025
https://github.com/khaledashrafh/chatgpt-sentiment-analysis
This project aims to perform sentiment analysis on tweets related to ChatGPT, a popular language model developed by OpenAI. The dataset used for training and testing consists of 219,293 tweets collected over a month. Each tweet is classified as positive ("good"), negative ("bad"), or ("neutral").
bidirectional-lstm chat-gpt chat-gpt-3 chatgpt cnn cnn-classification cnn-model data-preprocessing glove glove-embeddings lstm lstm-model lstm-neural-networks natural-language-processing nlp openai sentiment-analysis sentiment-classification trials word-embading
Last synced: 29 Aug 2025
https://github.com/mikeqfu/pyhelpers
PyHelpers: An open-source toolkit for facilitating Python users' data manipulation tasks
data-manipulation data-preprocessing py-utils python python-utilities python-utility python-utils utilities
Last synced: 10 Oct 2025
https://github.com/CleverInsight/cognito
ππ€ Cognito - Simplifies AutoML Data Preprocessing.
automl data-munging data-preperation data-preprocessing data-wrangling
Last synced: 20 Nov 2025
https://github.com/rbhatia46/data-preprocessing-template
This repository includes all the Data Preprocessing required before using a dataset on a Machine Learning Model. Please refer README on how to use.
data-preprocessing data-science machine-learning python
Last synced: 11 Apr 2025
https://github.com/tslu1s/atlantic
Atlantic: Automated Data Preprocessing Framework for Supervised Machine Learning
automation automl automl-pipeline data-preprocessing data-science feature-selection label-encoder machine-learning onehot-encoder predictive-maintenance predictive-modeling preprocessing-pipeline python scikit-learn
Last synced: 10 Apr 2025
https://github.com/aicorsair/dataquest-data-science-analysis-projects
A repository dedicated to storing guided projects completed while learning data science concepts with Dataquest.
classification-models cluster-analysis data-analysis data-analytics data-cleaning data-preparation data-preprocessing data-science data-visualization deep-learning excel feature-engineering machine-learning pandas-dataframe power-bi python-3 regression-models scikit-learn sql web-scraping
Last synced: 27 Oct 2025
https://github.com/bilalhameed248/urdu-to-english-machine-translation
Fine tuned Urdu to English machine translation pre train model using Hugging-Face Trainer API on custom dataset.
bert bert-fine-tuning bert-model data data-preprocessing data-science deep-learning deep-neural-networks machine-translation pytorch seq2seq seq2seq-model seq2seq-tensorflow tensorflow
Last synced: 13 Apr 2025
https://github.com/basiralab/Kaggle-BrainNetPrediction-Toolbox
A Python toolbox for predicting brain network (graph) evolution over time from a single observation. The codes of the 20 competing Kaggle teams along with the competition datasets are made available.
brain-connectivity-evolution brain-network connectome-prediction data-preprocessing dimensionality-reduction kaggle-competition machine-learning predictive-learning regression-models
Last synced: 01 May 2025
https://github.com/yash22222/ibm-csrbox-internship-project
The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects
data-mining data-preprocessing data-science exploratory-data-analysis feature-engineering lemmatization machine-learning pandas pos-tagging random-forest random-forest-classifier scikit-learn sentiment-analysis web-scraping wordcloud
Last synced: 22 Apr 2025
https://github.com/datapreprocessing/datacleaning
Data Cleaning is a python package for data preprocessing. This cleans the CSV file and returns the cleaned data frame. It does the work of imputation, removing duplicates, replacing special characters, and many more.
data data-cleaning data-cleansing data-preprocessing data-wrangling imputation python threshold
Last synced: 14 Dec 2025
https://github.com/thecoderpinar/house-price-prediction-project
π This project focuses on predicting house prices using advanced regression techniques. It involves comprehensive data preprocessing, feature engineering, and model selection. The aim is to develop an accurate predictive model for real estate prices.
data-analysis data-preprocessing data-visualization deep-learning jupyter-notebook machine-learning neural-networks python regression regression-models
Last synced: 30 Apr 2025
https://github.com/tatevkaren/deep-learning-for-data-science
Deep Learning Case Studies with Tensorflow and Keras for Beginners-Advanced: ANN, CNN, RNN, Self-Organizing Maps, Boltzmann Machines, Stacked Autoencoders
ann artificial-intelligence artificial-neural-networks data-preprocessing data-science deep-learning ds keras modelling modelling-framework neural-networks numpy pandas python scikit-learn sklearn tensorflow
Last synced: 10 Apr 2025
https://github.com/ruban2205/machine_learning_fundamentals
This repository contains a collection of fundamental topics and techniques in machine learning. It aims to provide a comprehensive understanding of various aspects of machine learning through simplified notebooks. Each topic is covered in a separate notebook, allowing for easy exploration and learning.
adaboost-classifier agglomerative-clustering apriori-algorithm data-preprocessing data-science ensemble-learning fuzzy-cmeans-clustering machine-learning machine-learning-algorithms machine-learning-models multilayer-perceptron python random-forest-classifier self-organizing-map single-layer-perceptron
Last synced: 26 Oct 2025
https://github.com/khaledashrafh/linear-regression
This program implements linear regression from scratch using the gradient descent algorithm in Python. It predicts car prices based on selected features and uses a dataset of cars with their respective prices.
data-preprocessing dataset feature-selection gredient-decent learning-rate linear-regression pyhton3 regression regression-models
Last synced: 12 Sep 2025
https://github.com/khaledashrafh/logistic-regression
This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.
activation-function cost-function data-preprocessing logistic-regression model preprocessing regression-models sigmoid sigmoid-activation sigmoid-function
Last synced: 17 Oct 2025
https://github.com/opencodeiiita/news_scraping
beautifulsoup data-preprocessing data-scraping everyone opencode23 pandas
Last synced: 11 Sep 2025
https://github.com/samashi47/ml-toolkit-project
A general-purpose toolkit for data preprocessing, machine learning modeling, and visualization.
classification data data-preprocessing machine-learning python3 visualization
Last synced: 30 Jul 2025
https://github.com/mahtafetrat/gptinformal-persian-speech-dataset
A free licensed Persian TTS dataset including 6+ hours of audio-text pairs with subject
data-collection data-preprocessing dataset-preparation forced-alignment mana-tts manatts persian persian-speech speech-corpus speech-data-collection speech-dataset speech-processing speech-synthesis text-to-speech text-to-speech-dataset tts tts-dataset
Last synced: 06 Apr 2025
https://github.com/sergio11/spam_email_classifier_lstm
This project uses a Bi-directional LSTM model π§π€ to classify emails as spam or legitimate, utilizing NLP techniques like tokenization, padding, and stopword removal. It aims to create an effective email classifier π»π while addressing overfitting with strategies like early stopping π«.
bilstm confusion-matrix data-preprocessing deep-learning lstm lstm-model lstm-neural-networks machine-learning natural-language-processing sentiment-analysis spam-detection text-classification word-cloud
Last synced: 17 Apr 2025
https://github.com/armanx200/animal-detector
πΎ Training a machine learning model to recognize 15 different animal classes and classify images accordingly.
animal-classification arman-kianian artificial-intelligence classification cnns computer-vision convolutional-neural-networks data-preprocessing data-science deep-learning github image-processing image-recognition keras machine-learning model-training neural-networks open-source python tensorflow
Last synced: 29 Aug 2025
https://github.com/armanx200/fruit-detector
ππ Fruit Detector: A machine learning model to identify fruits from images, powered by TensorFlow and Keras. Train the model, predict fruits, and explore the world of AI fruit recognition! ππ
arman-kianian computer-vision data-preprocessing deep-learning image-recognition keras machine-learning neural-networks opencv python tensorflow
Last synced: 30 Apr 2025
https://github.com/amirali5/data-preprocessing
In this repo, all about Data preprocessing. Data preprocessing is a required first step before any machine learning machinery can be applied, because the algorithms learn from the data and the learning outcome for problem solving heavily depends on the proper data needed to solve a particular problem β which are called features. Examples of data preprocessing include cleaning, instance selection, normalization, one hot encoding, transformation, feature extraction and selection, etc.
Last synced: 15 May 2025
https://github.com/shamspias/gpt3-data-preprocessing
This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.
artificial-intelligence data-preprocessing data-preprocessing-pipelines data-science gpt-3 machine-learning
Last synced: 30 Jul 2025
https://github.com/kozodoi/dptools
Python package with utilities for data processing, aggregation, feature engineering and data versioning
aggregation data-preparation data-preprocessing data-science feature-engineering python
Last synced: 08 May 2025
https://github.com/sondosaabed/preprocessing-for-machine-learning-in-python
DataCamp inetrmediate course on how and when to perform data preprocessing in any machine learning project to get the data ready for modeling
data-preprocessing data-science data-scientist datacamp-course machine-learning machine-learning-pipeline python
Last synced: 09 Apr 2025
https://github.com/nafisalawalidris/predicting-credit-card-approvals
Explore credit card approval prediction through data analysis and machine learning. Preprocess data, train logistic regression models, and optimize hyperparameters. Learn data preprocessing, feature engineering, model training, and evaluation. Dive into the world of machine learning with Python and popular libraries.
approval-prediction credit-card data-analysis data-preprocessing feature-engineering hyperparameter-optimization libraries logistic-regression machine-learning model-evaluation model-training python python3
Last synced: 19 Apr 2025
https://github.com/xuefeng-xu/fedps
Federated data Preprocessing via aggregated Statistics
data-preprocessing federated-learning python scikit-learn statistics
Last synced: 26 Jun 2025
https://github.com/bharathgs/dframeutils
simple utility tools for dataframes in Python || WIP ||
csv data-cleaning data-preprocessing data-science dataframe pandas pandas-dataframe preprocessing python tidy-data tidytext utility utility-function utility-library
Last synced: 20 Mar 2025
https://github.com/mayurasandakalum/breast-cancer-detection
Code for classifying breast cancer tumors using machine learning. Includes preprocessing, visualizations, and models like Logistic Regression, Decision Tree, and Random Forest. Evaluated with accuracy, precision, recall, and F1-score. Clone, install dependencies, and run the Jupyter notebook for full analysis.
accuracy breast-cancer-classification data-preprocessing decision-tree exploratory-data-analysis f1-score jupyter-notebook logistic-regression machine-learning ml model-evaluation precision random-forest recall svm visualizations
Last synced: 02 Jul 2025
https://github.com/sergio11/online_payment_fraud
Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. π¨π€ Complete process from EDA and data preprocessing to model training and evaluation. ππ
classification data-preprocessing data-science deep-neural-networks dnn exploratory-data-analysis financial-fraud fraud-detection fraud-detection-model imbalanced-data keras machine-learning neural-network python smote tensorflow
Last synced: 17 Aug 2025
https://github.com/timkong21/medical-appointment-no-show-prediction
A machine learning solution predicting patient no-shows in healthcare appointments. This project integrates EDA, data processing, feature engineering, and XGBoost modeling, with a workflow spanning from Snowflake data retrieval to AWS deployment (S3, SageMaker, Lambda, API Gateway), aiming to enhance appointment management in medical ERP systems.
api aws aws-lambda aws-s3 data-preprocessing data-science exploratory-data-analysis feature-engineering healthcare hyperopt hyperparameter-tuning hypothesis-testing machine-learning predictive-modeling python sagemaker snowflake sql statistical-analysis xgboost
Last synced: 17 Oct 2025
https://github.com/siddeshsambasivam/newscastapi
Newscast API is a simple REST API to get you all the news articles for any given query word.
data-crawling data-preprocessing rest-api software-engineering
Last synced: 08 Oct 2025
https://github.com/martinkalema/malaria-in-africa
This project is aimed at understanding, mitigating, and controlling the impact of malaria in Africa.
data-mining data-preprocessing data-visualization
Last synced: 26 Aug 2025
https://github.com/armanx200/income-predictor
πβ¨ A machine learning project that predicts income based on various demographic factors using Random Forest and Gradient Boosting algorithms. Includes data preprocessing, hyperparameter tuning, and model evaluation with detailed performance metrics. ππ€
arman-kianian classification data-preprocessing data-science gradient-boosting hyperparameter-tuning income-prediction machine-learning python random-forest scikit-learn
Last synced: 03 Apr 2025
https://github.com/yoctol/purewords
Create pure sentences
data-preprocessing natural-language-processing
Last synced: 17 Aug 2025
https://github.com/ehtisham-sadiq/movie-recommendation-system
The Movie Recommendation System is an all-encompassing data science project that utilizes sophisticated machine learning methods, including collaborative and content-based filtering, to provide users with personalized movie suggestions based on their preferences and viewing history. I
algorithms collaborative-filtering data-preprocessing deployment exploratory-data-analysis machine-learning recommender-system user-interface
Last synced: 02 Aug 2025
https://github.com/amirreza81/applied-data-science-course
Comprehensive notes, practical exercises, and problem-solving solutions from the Applied Data Science course, covering data preprocessing, machine learning algorithms, statistical analysis, data visualization, and real-world applications.
accuracy-measure boosting classification data-cleaning data-preprocessing data-science data-visualisation deep-learning dimensionality-reduction eda feature-engineering image-classification imbalanced-data kaggle-dataset machine-learning multiclass-classification pandas regression scikit-learn stroke-prediction
Last synced: 22 Mar 2025
https://github.com/walidalsafadi/ufo-sighting
UFO Sightings cross the world!
data-preprocessing data-science data-visualization plotly ufo ufo-sightings
Last synced: 16 Mar 2025
https://github.com/ksatriow/weather-data-time-series
Weather Data Time Series
callback data-preprocessing lstm machine-learning mae optimizer time-series
Last synced: 31 Oct 2025
https://github.com/shervinnd/blood-donor-availability-predictor
A deep learning model to predict blood donor availability using TensorFlow and sklearn. Features data preprocessing, neural network training, and ROC curve visualization. Achieve high accuracy in predicting donor status! π©Ίπ
binary-classification blood-donation blood-donor-prediction data-preprocessing deep-learning healthcare-ai machine-learning medical-data-analysis neural-network predictive-modeling python roc-curve scikit-learn tensorflow
Last synced: 11 Oct 2025
https://github.com/sayamalt/stellar-classification---sloan-digital-sky-survey-17
Successfully established a machine learning model which can predict an appropriate stellar class, on the basis of a distinct set of spectral characteristics, to a substantially high level of accuracy.
cross-validation data-preprocessing data-visualization exploratory-data-analysis feature-engineering feature-scaling imbalanced-learning model-deployment model-evaluation model-training multiclass-classification supervised-machine-learning
Last synced: 31 Aug 2025
https://github.com/thecoderpinar/credit-card-fraud-detection-project
This project focuses on the detection of credit card fraud using various data science and machine learning techniques. The dataset includes a record of credit card transactions over a specific period, with the goal of accurately identifying fraudulent activities. πβ¨
anamoly-detection classification-algorithms credit-card-transactions data-analysis data-preprocessing data-science data-visualization fraud-detection machine-learning python
Last synced: 30 Apr 2025
https://github.com/a-poor/featureeng.jl
A feature engineering library for Julia.
data-preprocessing data-science feature-engineering feature-extraction hacktoberfest julia julia-package
Last synced: 27 Mar 2025
https://github.com/syedfaiqueali/data-science-and-machine-learning-bootcamp
Exploring Data Science and Machine Learning using Python.
data-cleaning data-preprocessing deep-learning descriptive-statistics gradient-descent linear-regression multivariate-regression naive-bayes-classifier neural-network optimization-algorithms tensorflow-models
Last synced: 09 Sep 2025
https://github.com/aksh-patel1/parallel-web-scraper-on-cloud
This project demonstrates an event-driven architecture for parallel web scraping and processing tasks using AWS services. The scraper job, running on AWS Batch, collects data from multiple web pages simultaneously and stores it in S3. The processing job, triggered by AWS EventBridge, efficiently processes the scraped data and updates Google-Sheet.
aws aws-batch aws-ecr aws-eventbridge aws-s3 data-preprocessing docker event-driven-architecture eventdrivenarchitecture python web-scraping
Last synced: 13 Jun 2025
https://github.com/ehtisham-sadiq/cirrhosis-patient-outcome-prediction
Multi-class classification model to predict outcomes of cirrhosis patients using machine learning
classification competition data-preprocessing encoding-algorithms exploratory-data-analysis feature-engineering machine-learning machine-learning-algorithms missing-data-imputation model-training-and-evaluation multiclass-classification
Last synced: 24 Nov 2025
https://github.com/simranjeet97/ipl-dataanalysis
Data Analysis performed on IPL Dataset with Data Profiling, Data Pre-Processing, Data Manipulation, and Data Visualization.
artificial-intelligence data-analysis data-manipulation data-mining data-preprocessing data-science data-visualization indian-premier-league-2008-2018 ipl ipl-dataset iplayer python
Last synced: 03 Oct 2025
https://github.com/abdelhakim-gh/exploratory-data-analysis
EDA for bank attrition problem
data-preprocessing data-visualization exploratory-data-analysis jupyter-notebook python
Last synced: 25 Mar 2025
https://github.com/walidalsafadi/indians-diabetes
Pima Indians Diabetes - ML Model Selection (83%)
data-preprocessing diabetes-prediction eda model-selection
Last synced: 16 Mar 2025
https://github.com/thyringer/cast
CLI tool for reading strings or complex data sets from CSV files to output them in other text formats.
csv-converter data data-preprocessing python python3 sql-builder
Last synced: 09 Apr 2025
https://github.com/karthikudyawar/contextlens
Unveiling Hidden Sentiments through Contextual Sentiment Analysis
api-development data-preprocessing data-visualization docker fastapi jupyter-notebook machine-learning natural-language-processing nlp python pytorch sentiment-analysis
Last synced: 30 Oct 2025
https://github.com/dragomirbozoki/kalliope-dual-llm-chatbot
Multilingual voice assistant powered by a dual-LLM architecture using retrieval-based and generative models. Built on top of Kalliope and Hugging Face Transformer
chatbot data-preprocessing hugging-face machine-learning nlp numpy pandas pipelines pytorch rag transformers
Last synced: 27 Jul 2025
https://github.com/lykmapipo/us-gas-prices
Python scripts that scrape US gas prices
aaa ci-cd data-extraction data-preprocessing data-transformation gas github-actions joblib lykmapipo metro pandas prices python selenium selenium-python state us webdriver-manager webscraper webscraping
Last synced: 08 Apr 2025
https://github.com/lennymalard/melpy-project
A NumPy-based deep learning library for building neural networks. It features an automatic differentiation engine and supports training models like LSTM, CNN, and FNN.
automatic-differentiation cnn data-preprocessing deep-learning fnn fnn-from-scratch from-scratch keras lstm machine-learning neural-network neural-networks neural-networks-from-scratch preprocessing pytorch tensorflow tokenizer
Last synced: 15 Sep 2025
https://github.com/carpentries-incubator/rna-seq-data-for-ml
RNA-Seq: Data Readiness for Machine Learning Applications
carpentries-incubator data-preparation data-preprocessing data-readiness english lesson machine-learning pre-alpha rna-seq
Last synced: 02 Sep 2025
https://github.com/shervinnd/titanic-survival-predictor
π’ Dive into the Titanic dataset with this ML project! Using TensorFlow, predict passenger survival via neural networks. Features data preprocessing, model training, and visualization with pandas, scikit-learn, and matplotlib. Perfect for beginners in deep learning. ππ»
classification data-preprocessing data-science data-visualization deep-learning jupyter-notebook machine-learning neural-network pandas predictive-modeling python scikit-learn tensorflow titanic-dataset
Last synced: 16 Sep 2025
https://github.com/dragomirbozoki/lipreading-cv-nlp
End-to-end visual speech recognition system using deep learning. Combines computer vision and NLP to transcribe spoken words from lip movements in video sequences.
computer-vision data-preprocessing google-colab machine-learning model-training-and-evaluation nlp tensorflow
Last synced: 18 Jun 2025
https://github.com/raihan4520/ml
A collection of machine learning projects showcasing various algorithms and techniques, including a final project for the Machine Learning course at AIUB.
data-preprocessing jupyter-notebook machine-learning model-evaluation numpy pandas python scikit-learn
Last synced: 26 Oct 2025