An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pre-processing

A curated list of projects in awesome lists tagged with pre-processing .

https://github.com/nf-core/sarek

Analysis pipeline to detect germline or somatic variants (pre-processing, variant calling and annotation) from WGS / targeted sequencing

annotation bioinformatics cancer conda containers gatk4 genomics germline next-generation-sequencing nextflow nf-core pipeline pre-processing reproducible-research somatic target-panels variant-calling whole-exome-sequencing whole-genome-sequencing workflow

Last synced: 16 Jan 2026

https://github.com/patternhelloworld/url-knife

Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity

email-extractor email-parser email-parsing pre-processing uri-template url-extractor url-normalization url-normalizer url-parser url-parsing url-validation

Last synced: 09 Jul 2025

https://github.com/zimbatm/mdsh

`$ mdsh` # a markdown shell pre-processor

markdown pre-commit-hook pre-processing shell

Last synced: 09 Apr 2025

https://github.com/rodrigobressan/entity_embeddings_categorical

Discover relevant information about categorical data with entity embeddings using Neural Networks (powered by Keras)

categorical-data embeddings entity-embedding keras machine-learning neural-networks pre-processing utility-library

Last synced: 13 Apr 2025

https://github.com/azukds/tubular

Python package implementing transformers for pre processing steps for machine learning.

feature-engineering pre-processing transformers

Last synced: 13 Apr 2025

https://github.com/paulross/cpip

CPIP - a C/C++ preprocessor implemented in Python.

c c-plus-plus pre-processing pre-processor preprocessing preprocessor python

Last synced: 07 Apr 2025

https://github.com/jpvantassel/swprepost

A Python package for surface-wave inversion pre- and post-processing.

dinver dispersion geopsy inversion post-processing pre-processing surface-waves swinvert

Last synced: 30 Apr 2025

https://github.com/khdlr/augmax

Efficiently Composable Data Augmentation on the GPU with Jax

gpu jax pre-processing

Last synced: 13 Apr 2025

https://github.com/kvslab/vampy

A collection of tools for pre-processing, simulating, and post-processing vascular morphologies.

computational-fluid-dynamics post-processing pre-processing vascular

Last synced: 24 Dec 2025

https://github.com/parvvaresh/email-spam-detection

This project focuses on detecting Persian spam emails using machine learning algorithms. The goal is to develop an effective spam detection system using various word embedding techniques and classification algorithms. The project utilizes three word embedding algorithms: TF-IDF, Frequency of Words, and Bag of Words. Additionally, six classification

chi2 decision-trees freq freq-word hazm knn logistic-regression machine-learning naive-bayes-classifier numpy pandas pre-processing python python3 random-forest svm tf-idf

Last synced: 22 Apr 2025

https://github.com/goamegah/rshiny-machine-learning-application

Application d'exploration de données et apprentissage de modèles supervisés avec RShiny

docker exploratory-data-analysis machine-learning pre-processing rshinyapp supervised-learning

Last synced: 15 Apr 2025

https://github.com/ribin-baby/audio-processing

👉 This repository contains basic audio 🔊 processing code with feature extraction explained. 🎶 🎶 🎶

audio-analysis audio-processing audio-visualizer feature-extraction mfcc-extractor mfcc-features pre-processing python3

Last synced: 13 Aug 2025

https://github.com/japal/maldirppa

MALDI mass spectrometry data robust pre-processing and other helper functions

mass-spectrometry pre-processing r-package

Last synced: 22 Oct 2025

https://github.com/signaln/parallelio

For reading from and writing to parallel data files in Python

machine-learning natural-language-processing pre-processing preprocessing text text-data

Last synced: 14 Jan 2026

https://github.com/parvvaresh/autocorrect

Have you ever wondered about how the Autocorrect features work on the keyboard of a Smartphone? Now almost every smartphone brand regardless of its price provides an autocorrect feature in their keyboards today.

edit-distance-algorithm jaccard-similarity jupyter-notebook nlp pre-processing python wer-score

Last synced: 24 Feb 2025

https://github.com/dantesc03/broachalign-machine-learning

focus on machine learning techniques for clustering and regression analysis. It explores real-world datasets to solve challenges and extract meaningful insights. Specifically, it addresses the critical task of predicting when to replace broaches used in manufacturing airplane engines.

lasso-regression learn linear-regression machine-learning pre-processing r-programming random-forest resampling tuning xgboost

Last synced: 26 Dec 2025

https://github.com/romilagarwal/diabetic_retinoplasty

A hybrid deep learning framework for automated diabetic retinopathy detection combining EfficientNetB0 with Swin Transformer attention mechanisms. Features Bayesian uncertainty quantification through Monte Carlo Dropout, explainable AI visualizations with Grad-CAM, and specialized preprocessing techniques.

bayesian-inference deep-learning efficientnet explianable-ai grad-cam-visualization monte-carlo-dropout pre-processing swim-transformer

Last synced: 20 Jun 2025

https://github.com/vineet416/chronic-kidney-disease-prediction

This repository contain code of Chronic Kidney Disease Detection Prediction Project. The goal of this project is predict the chronic kidney disease using parameters like Diabetes Mellitus, Blood Urea, Sugar, Hypertension etc.. I used multiple machine learning algorithms with hyperparameter tuning which is having highest accuracy score of 97.5

data-visualization data-wrangling exploratory-data-analysis feature-engineering feature-selection hyperparameter-tuning machine-learning matplotlib numpy pandas plotly pre-processing python seaborn sklearn-library statsmodels

Last synced: 23 Mar 2025

https://github.com/curityio/example-dcr-request-validation

An example based on the Open Banking Brasil profile that demonstrates how to use a pre-processing procedure to validate a DCR request.

dynamic-client-registration financial-grade open-banking pre-processing software-statement use-case

Last synced: 25 Mar 2025

https://github.com/definetlynotai/peng

A transpiler that turns english-style python to proper python - A mini language of its own :)

beginner easy english freindly pre-processing python transpiler

Last synced: 08 Oct 2025

https://github.com/mahnoorsheikh16/loan-default-prediction

Credit risk is the borrower’s inability to repay a loan. Machine Learning models can predict risky customers and reduce lender losses. By analyzing behavior and demographics of past customers, these insights can apply to future customers for better loan decisions. This study aims to find the most suitable model for predicting loan defaults.

auc-score binary-classification-algorithms credit-card-fraud-detection data-cleaning data-science decision-tree-classifier exploratory-data-analysis loan-default-prediction logistic-regression machine-learning naive-bayes-classifier pre-processing python random-forest-classifier support-vector-machines xgboost-classifier

Last synced: 29 Nov 2025

https://github.com/lefteris-souflas/sas-programming-and-machine-learning

Applied SAS techniques for data analysis and machine learning in a milestone project. Base SAS Programming and SAS Viya tools were utilized for preprocessing, customer profiling, sales analysis, promotions, supplier evaluation, and customer segmentation. Results were visualized comprehensively.

customer-profiling data-analytics data-exploration market-basket-analysis pre-processing recency-frequency-monetary sas-machine-learning sas-oda sas-programming sas-studio sas-visual-analytics sas-viya

Last synced: 02 Mar 2025

https://git.gfz-potsdam.de/EnMAP/GFZ_Tools_EnMAP_BOX/EnPT

EnMAP Processing Tool - A Python package for pre-processing of EnMAP Level-1B data to Level-2A

EnMAP EnMAP-Box hyperspectral pre-processing processing chain remote sensing satellite

Last synced: 25 Sep 2025

https://github.com/mahnoorsheikh16/impact-of-minimum-wage-policies-on-youth-unemployment

This study will be an attempt to analyze what impact minimum wage has upon youth unemployment in Pakistan, a country having the world's ninth largest labor force, with 4 million young people reaching working age each year, and how it compares to the required living wage.

anker-methodology data-cleaning exploratory-data-analysis labour-economics linear-regression living-wage-calculator minimum-wage-data policy-analysis pre-processing r-language regression-analysis unemployment-analysis

Last synced: 26 Jan 2026

https://github.com/msamuelsons/classificao-estrela

Executa um processo completo de análise e classificação de estrelas usando diferentes algoritmos de aprendizado de máquina e técnicas de validação cruzada para avaliar a eficácia desses algoritmos na classificação das estrelas.

data-science machine-learning machine-learning-algorithms pre-processing star

Last synced: 15 Mar 2025

https://github.com/othmanekahtal/nexter-grid-project

Nexter is simple challenge for applying grid system in real project and awsome web page

animations css-html css3 flexbox grid-layout html5 pre-processing responsive sass-framework

Last synced: 15 Mar 2025

https://github.com/gtkacz/undergrad_thesis

Code for my undergraduate thesis: Quantitative Analysis of the Impact of Image Pre-Processing on the Accuracy of Computer Vision Models Trained to Identify Dermatological Skin Diseases

cnn computer-vision machine-learning pre-processing preprocessing

Last synced: 17 Aug 2025

https://github.com/olivierlima/infinity_process

Create an ∞ sequence of output extensions utilizing single image generation or GPT-4o text outputs.

computer-vision-opencv eventtime gpts gui-application image-manipulation microservice openai orchestration pre-processing process reduce spacy window workflows

Last synced: 29 Oct 2025

https://github.com/daniel-elston/credit-card-default-prediction-algorithm

Algorithm used to predict whether a bank customer will default on given credit cards using bank telemarketing dataset.

algorithms banking-applications classification data-science machine-learning pca-analysis pre-processing visualization wrangling-cleaning

Last synced: 04 Apr 2025

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 18 Sep 2025

https://github.com/yukta026/book_recommender_system

I have created a book recommender system that recommends similar books to the reader based on his/her interest. This project shows results of collaborative and content-based filtering of the given dataset.

collaborative-filtering content-based-filtering feature-extraction model-building pre-processing recommender-system

Last synced: 18 Nov 2025

https://github.com/ismailsimsek/storesalestimeseriesforecasting

Testing preprocessing capabilities of different ML libraries

ml-pipelines pre-processing

Last synced: 02 Mar 2025

https://github.com/pratikunterwegs/atlas-best-practices

Source code for a paper on pre-processing high frequency animal tracking data.

animal-movement movement-ecology pre-processing r-package

Last synced: 24 Mar 2025

https://github.com/emriss0/tech-tweet

TechTweet is a microblogging platform for tech enthusiasts, allowing users to share short tech messages and engage in discussions. Join the community, post your thoughts, and connect with others! 🐙💻

analytics-vidhya cnn-classification compose cross-validation django hackathon headline-generation huggingface-transformers navigation nlp pre-processing sentiment-analysis social-media spacy text-classification tweet twitter-api twitter-python

Last synced: 17 Oct 2025

https://github.com/pulkit0111/book-recommendation-system

This project walks through building a smart book recommendation system powered by Large Language Models (LLMs).

gradio langchain pre-processing python semantic-search sentiment-analysis zero-shot-text-classification

Last synced: 12 Jun 2025

https://github.com/badranalyst/e-commerce-customer-analysis-data-science-foundations-case-study

This case study explores e-commerce customer data through data exploration, pre-processing, and splitting. It includes model building and training to analyze customer behavior. Python libraries like Pandas, NumPy, Matplotlib, and Seaborn are used for the analysis and model development.

data-analysis data-science dataset eda exploratory-data-analysis machine-learning matplotlib ml model-building model-training numpy pandas pre-processing python seaborn

Last synced: 11 Aug 2025

https://github.com/badranalyst/restaurant-reviews-sentiment-analysis-nlp-case-study

This project analyzes restaurant reviews using Natural Language Processing (NLP) for sentiment analysis. It covers data exploration, pre-processing (NLTK text cleaning), model building, prediction, and deployment. The goal is to predict sentiment from reviews using Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

data-analysis data-science eda exploratory-data-analysis matplotlib-pyplot model model-building numpy pandas pre-processing predictive-modeling python seaborn

Last synced: 20 Feb 2025

https://github.com/harmanveer-2546/student-performance-in-exam

Student performance analysis and prediction using datasets has become an essential component of modern education systems. With the increasing availability of data on student, schools and universities are using advanced analytics and machine learning algorithms to gain insights into student performance and predict future outcomes.

classification-report cluster dbscan dimensionality-reduction k-means-clustering labelling linear-regression logistic-regression numpy pandas pre-processing random-forest-classifier test-train-split unsupervised-machine-learning visualization

Last synced: 28 Feb 2025

https://github.com/clement-muth/generic-printf

This is an example of the function printf using _Generic keyword

c cpp generic keyword pre-processing printf printf-functions

Last synced: 09 Oct 2025