Projects in Awesome Lists tagged with pre-processing
A curated list of projects in awesome lists tagged with pre-processing .
https://github.com/nf-core/sarek
Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing
annotation bioinformatics cancer conda containers gatk4 genomics germline next-generation-sequencing nextflow nf-core pipeline pre-processing reproducible-research somatic target-panels variant-calling whole-exome-sequencing whole-genome-sequencing workflow
Last synced: 16 Jan 2026
https://github.com/patternhelloworld/url-knife
Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity
email-extractor email-parser email-parsing pre-processing uri-template url-extractor url-normalization url-normalizer url-parser url-parsing url-validation
Last synced: 09 Jul 2025
https://github.com/fitushar/3D-Medical-Imaging-Preprocessing-All-you-need
This Repo Will contain the Preprocessing Code for 3D Medical Imaging
3d 3d-padding 3d-preproceing abdomen-ct crop ct-preprocessing medical-image-analysis medical-image-processing medical-imaging mri-preprocessing normalization pre-processing python registration resampling simpleitk
Last synced: 09 May 2025
https://github.com/fitushar/3d-medical-imaging-preprocessing-all-you-need
This Repo Will contain the Preprocessing Code for 3D Medical Imaging
3d 3d-padding 3d-preproceing abdomen-ct crop ct-preprocessing medical-image-analysis medical-image-processing medical-imaging mri-preprocessing normalization pre-processing python registration resampling simpleitk
Last synced: 21 Aug 2025
https://github.com/zimbatm/mdsh
`$ mdsh` # a markdown shell pre-processor
markdown pre-commit-hook pre-processing shell
Last synced: 09 Apr 2025
https://github.com/rodrigobressan/entity_embeddings_categorical
Discover relevant information about categorical data with entity embeddings using Neural Networks (powered by Keras)
categorical-data embeddings entity-embedding keras machine-learning neural-networks pre-processing utility-library
Last synced: 13 Apr 2025
https://github.com/azukds/tubular
Python package implementing transformers for pre processing steps for machine learning.
feature-engineering pre-processing transformers
Last synced: 13 Apr 2025
https://github.com/paulross/cpip
CPIP - a C/C++ preprocessor implemented in Python.
c c-plus-plus pre-processing pre-processor preprocessing preprocessor python
Last synced: 07 Apr 2025
https://github.com/jpvantassel/swprepost
A Python package for surface-wave inversion pre- and post-processing.
dinver dispersion geopsy inversion post-processing pre-processing surface-waves swinvert
Last synced: 30 Apr 2025
https://github.com/khdlr/augmax
Efficiently Composable Data Augmentation on the GPU with Jax
Last synced: 13 Apr 2025
https://github.com/kvslab/vampy
A collection of tools for pre-processing, simulating, and post-processing vascular morphologies.
computational-fluid-dynamics post-processing pre-processing vascular
Last synced: 24 Dec 2025
https://github.com/pcastellanoescuder/pomashiny
:apple: Web-based User-friendly Workflow for Metabolomics and Proteomics Data Analysis
bioinformatics exploratory-data-analysis gui mass-spectrometry metabolomics pre-processing proteomics r shiny shiny-apps shinydashboard statistical-analysis visualization
Last synced: 28 Oct 2025
https://github.com/parvvaresh/email-spam-detection
This project focuses on detecting Persian spam emails using machine learning algorithms. The goal is to develop an effective spam detection system using various word embedding techniques and classification algorithms. The project utilizes three word embedding algorithms: TF-IDF, Frequency of Words, and Bag of Words. Additionally, six classification
chi2 decision-trees freq freq-word hazm knn logistic-regression machine-learning naive-bayes-classifier numpy pandas pre-processing python python3 random-forest svm tf-idf
Last synced: 22 Apr 2025
https://github.com/goamegah/rshiny-machine-learning-application
Application d'exploration de données et apprentissage de modèles supervisés avec RShiny
docker exploratory-data-analysis machine-learning pre-processing rshinyapp supervised-learning
Last synced: 15 Apr 2025
https://github.com/ribin-baby/audio-processing
👉 This repository contains basic audio 🔊 processing code with feature extraction explained. 🎶 🎶 🎶
audio-analysis audio-processing audio-visualizer feature-extraction mfcc-extractor mfcc-features pre-processing python3
Last synced: 13 Aug 2025
https://github.com/japal/maldirppa
MALDI mass spectrometry data robust pre-processing and other helper functions
mass-spectrometry pre-processing r-package
Last synced: 22 Oct 2025
https://github.com/signaln/parallelio
For reading from and writing to parallel data files in Python
machine-learning natural-language-processing pre-processing preprocessing text text-data
Last synced: 14 Jan 2026
https://github.com/parvvaresh/autocorrect
Have you ever wondered about how the Autocorrect features work on the keyboard of a Smartphone? Now almost every smartphone brand regardless of its price provides an autocorrect feature in their keyboards today.
edit-distance-algorithm jaccard-similarity jupyter-notebook nlp pre-processing python wer-score
Last synced: 24 Feb 2025
https://github.com/ajkhoury/preprocessor_def_guard
Adds guards for preprocessor definitions.
definition definitions format formatter guard pre-processing preprocessor
Last synced: 24 Feb 2025
https://github.com/dantesc03/broachalign-machine-learning
focus on machine learning techniques for clustering and regression analysis. It explores real-world datasets to solve challenges and extract meaningful insights. Specifically, it addresses the critical task of predicting when to replace broaches used in manufacturing airplane engines.
lasso-regression learn linear-regression machine-learning pre-processing r-programming random-forest resampling tuning xgboost
Last synced: 26 Dec 2025
https://github.com/romilagarwal/diabetic_retinoplasty
A hybrid deep learning framework for automated diabetic retinopathy detection combining EfficientNetB0 with Swin Transformer attention mechanisms. Features Bayesian uncertainty quantification through Monte Carlo Dropout, explainable AI visualizations with Grad-CAM, and specialized preprocessing techniques.
bayesian-inference deep-learning efficientnet explianable-ai grad-cam-visualization monte-carlo-dropout pre-processing swim-transformer
Last synced: 20 Jun 2025
https://github.com/vineet416/chronic-kidney-disease-prediction
This repository contain code of Chronic Kidney Disease Detection Prediction Project. The goal of this project is predict the chronic kidney disease using parameters like Diabetes Mellitus, Blood Urea, Sugar, Hypertension etc.. I used multiple machine learning algorithms with hyperparameter tuning which is having highest accuracy score of 97.5
data-visualization data-wrangling exploratory-data-analysis feature-engineering feature-selection hyperparameter-tuning machine-learning matplotlib numpy pandas plotly pre-processing python seaborn sklearn-library statsmodels
Last synced: 23 Mar 2025
https://github.com/curityio/example-dcr-request-validation
An example based on the Open Banking Brasil profile that demonstrates how to use a pre-processing procedure to validate a DCR request.
dynamic-client-registration financial-grade open-banking pre-processing software-statement use-case
Last synced: 25 Mar 2025
https://github.com/definetlynotai/peng
A transpiler that turns english-style python to proper python - A mini language of its own :)
beginner easy english freindly pre-processing python transpiler
Last synced: 08 Oct 2025
https://github.com/mahnoorsheikh16/loan-default-prediction
Credit risk is the borrower’s inability to repay a loan. Machine Learning models can predict risky customers and reduce lender losses. By analyzing behavior and demographics of past customers, these insights can apply to future customers for better loan decisions. This study aims to find the most suitable model for predicting loan defaults.
auc-score binary-classification-algorithms credit-card-fraud-detection data-cleaning data-science decision-tree-classifier exploratory-data-analysis loan-default-prediction logistic-regression machine-learning naive-bayes-classifier pre-processing python random-forest-classifier support-vector-machines xgboost-classifier
Last synced: 29 Nov 2025
https://github.com/lefteris-souflas/sas-programming-and-machine-learning
Applied SAS techniques for data analysis and machine learning in a milestone project. Base SAS Programming and SAS Viya tools were utilized for preprocessing, customer profiling, sales analysis, promotions, supplier evaluation, and customer segmentation. Results were visualized comprehensively.
customer-profiling data-analytics data-exploration market-basket-analysis pre-processing recency-frequency-monetary sas-machine-learning sas-oda sas-programming sas-studio sas-visual-analytics sas-viya
Last synced: 02 Mar 2025
https://git.gfz-potsdam.de/EnMAP/GFZ_Tools_EnMAP_BOX/EnPT
EnMAP Processing Tool - A Python package for pre-processing of EnMAP Level-1B data to Level-2A
EnMAP EnMAP-Box hyperspectral pre-processing processing chain remote sensing satellite
Last synced: 25 Sep 2025
https://github.com/mahnoorsheikh16/impact-of-minimum-wage-policies-on-youth-unemployment
This study will be an attempt to analyze what impact minimum wage has upon youth unemployment in Pakistan, a country having the world's ninth largest labor force, with 4 million young people reaching working age each year, and how it compares to the required living wage.
anker-methodology data-cleaning exploratory-data-analysis labour-economics linear-regression living-wage-calculator minimum-wage-data policy-analysis pre-processing r-language regression-analysis unemployment-analysis
Last synced: 26 Jan 2026
https://github.com/msamuelsons/classificao-estrela
Executa um processo completo de análise e classificação de estrelas usando diferentes algoritmos de aprendizado de máquina e técnicas de validação cruzada para avaliar a eficácia desses algoritmos na classificação das estrelas.
data-science machine-learning machine-learning-algorithms pre-processing star
Last synced: 15 Mar 2025
https://github.com/othmanekahtal/nexter-grid-project
Nexter is simple challenge for applying grid system in real project and awsome web page
animations css-html css3 flexbox grid-layout html5 pre-processing responsive sass-framework
Last synced: 15 Mar 2025
https://github.com/gtkacz/undergrad_thesis
Code for my undergraduate thesis: Quantitative Analysis of the Impact of Image Pre-Processing on the Accuracy of Computer Vision Models Trained to Identify Dermatological Skin Diseases
cnn computer-vision machine-learning pre-processing preprocessing
Last synced: 17 Aug 2025
https://github.com/olivierlima/infinity_process
Create an ∞ sequence of output extensions utilizing single image generation or GPT-4o text outputs.
computer-vision-opencv eventtime gpts gui-application image-manipulation microservice openai orchestration pre-processing process reduce spacy window workflows
Last synced: 29 Oct 2025
https://github.com/daniel-elston/credit-card-default-prediction-algorithm
Algorithm used to predict whether a bank customer will default on given credit cards using bank telemarketing dataset.
algorithms banking-applications classification data-science machine-learning pca-analysis pre-processing visualization wrangling-cleaning
Last synced: 04 Apr 2025
https://github.com/kush1912/phocket---ml-internship
This repository consists of machine Learning models, deep learning models and some NLP tasks such as Topic Modelling, Sequence generation, Sentiment analysis, Recommendation System
black-friday classification-algorithims decison-trees keywords-extraction knn model-selection n-grams natural-language-processing nlp nlp-keywords-extraction pre-processing random-forest roc-curve sentiment-analysis sequence-to-sequence svm-classifier tensorflow tfidf-vectorizer topic-modeling twitter-sentiment-analysis
Last synced: 24 Dec 2025
https://github.com/stkisengese/numpy-data-fundamentals
A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.
data data-analysis numpy pre-processing
Last synced: 18 Sep 2025
https://github.com/yukta026/book_recommender_system
I have created a book recommender system that recommends similar books to the reader based on his/her interest. This project shows results of collaborative and content-based filtering of the given dataset.
collaborative-filtering content-based-filtering feature-extraction model-building pre-processing recommender-system
Last synced: 18 Nov 2025
https://github.com/ismailsimsek/storesalestimeseriesforecasting
Testing preprocessing capabilities of different ML libraries
Last synced: 02 Mar 2025
https://github.com/pratikunterwegs/atlas-best-practices
Source code for a paper on pre-processing high frequency animal tracking data.
animal-movement movement-ecology pre-processing r-package
Last synced: 24 Mar 2025
https://github.com/emriss0/tech-tweet
TechTweet is a microblogging platform for tech enthusiasts, allowing users to share short tech messages and engage in discussions. Join the community, post your thoughts, and connect with others! 🐙💻
analytics-vidhya cnn-classification compose cross-validation django hackathon headline-generation huggingface-transformers navigation nlp pre-processing sentiment-analysis social-media spacy text-classification tweet twitter-api twitter-python
Last synced: 17 Oct 2025
https://github.com/pulkit0111/book-recommendation-system
This project walks through building a smart book recommendation system powered by Large Language Models (LLMs).
gradio langchain pre-processing python semantic-search sentiment-analysis zero-shot-text-classification
Last synced: 12 Jun 2025
https://github.com/badranalyst/e-commerce-customer-analysis-data-science-foundations-case-study
This case study explores e-commerce customer data through data exploration, pre-processing, and splitting. It includes model building and training to analyze customer behavior. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used for the analysis and model development.
data-analysis data-science dataset eda exploratory-data-analysis machine-learning matplotlib ml model-building model-training numpy pandas pre-processing python seaborn
Last synced: 11 Aug 2025
https://github.com/badranalyst/restaurant-reviews-sentiment-analysis-nlp-case-study
This project analyzes restaurant reviews using Natural Language Processing (NLP) for sentiment analysis. It covers data exploration, pre-processing (NLTK text cleaning), model building, prediction, and deployment. The goal is to predict sentiment from reviews using Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.
data-analysis data-science eda exploratory-data-analysis matplotlib-pyplot model model-building numpy pandas pre-processing predictive-modeling python seaborn
Last synced: 20 Feb 2025
https://github.com/harmanveer-2546/world-best-cities
Ranking of cities on social, environmental and economic factors.
best-cities-to-live-in cities continent data-loading exploratory-data-analysis geopandas hovering india matplotlib numpy os pandas plotly pre-processing seaborn visualization world world-map
Last synced: 28 Feb 2025
https://github.com/harmanveer-2546/student-performance-in-exam
Student performance analysis and prediction using datasets has become an essential component of modern education systems. With the increasing availability of data on student, schools and universities are using advanced analytics and machine learning algorithms to gain insights into student performance and predict future outcomes.
classification-report cluster dbscan dimensionality-reduction k-means-clustering labelling linear-regression logistic-regression numpy pandas pre-processing random-forest-classifier test-train-split unsupervised-machine-learning visualization
Last synced: 28 Feb 2025
https://github.com/mushtabaa/encm509-labs
Fundamentals of Biometric Systems Design Assignments
bayesian-networks conditional-probability de-noising em-algorithm face-recognition feature-extraction fingerprint gabor-filter guassian-mixture-model hand-gesture-recognition log-likelihood long-short-term-memory pre-processing principal-component-analysis pyagrum segmentation statistical-analysis wiener-filter
Last synced: 09 Apr 2025
https://github.com/clement-muth/generic-printf
This is an example of the function printf using _Generic keyword
c cpp generic keyword pre-processing printf printf-functions
Last synced: 09 Oct 2025