Projects in Awesome Lists tagged with preprocessing
A curated list of projects in awesome lists tagged with preprocessing .
https://github.com/unstructured-io/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 09 Sep 2025
https://github.com/Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
data-pipelines deep-learning document-image-analysis document-image-processing document-parser document-parsing docx donut information-retrieval langchain llm machine-learning ml natural-language-processing nlp ocr pdf pdf-to-json pdf-to-text preprocessing
Last synced: 26 Mar 2025
https://github.com/dongrixinyu/JioNLP
中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com
apache2 chinese natural-language-processing ner nlp nlp-parse preprocessing python time-parse time-parsing
Last synced: 18 Mar 2025
https://github.com/nidhaloff/igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
artificial-intelligence automation automl automl-experiments data-analysis data-science hacktoberfest hacktoberfest2021 machine-learning machine-learning-algorithms machine-learning-library machinelearning neural-network neural-networks preprocessing scikit-learn scikitlearn-machine-learning sklearn
Last synced: 15 May 2025
https://github.com/opengene/fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
adapter bioinformatics duplication fastq filter filtering illumina merging ngs overlap polyg preprocessing qc quality quality-control sequencing splitting trimming umi
Last synced: 29 Apr 2025
https://github.com/OpenGene/fastp
An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
adapter bioinformatics duplication fastq filter filtering illumina merging ngs overlap polyg preprocessing qc quality quality-control sequencing splitting trimming umi
Last synced: 07 May 2025
https://github.com/axelderomblay/mlbox
MLBox is a powerful Automated Machine Learning python library.
auto-ml automated-machine-learning automl classification data-science deep-learning distributed drift encoding kaggle keras lightgbm machine-learning optimization pipeline prediction preprocessing regression stacking xgboost
Last synced: 15 May 2025
https://github.com/AxeldeRomblay/MLBox
MLBox is a powerful Automated Machine Learning python library.
auto-ml automated-machine-learning automl classification data-science deep-learning distributed drift encoding kaggle keras lightgbm machine-learning optimization pipeline prediction preprocessing regression stacking xgboost
Last synced: 26 Apr 2025
https://github.com/winedarksea/autots
Automated Time Series Forecasting
automl autots deep-learning feature-engineering forecasting machine-learning preprocessing time-series
Last synced: 14 May 2025
https://github.com/winedarksea/AutoTS
Automated Time Series Forecasting
automl autots deep-learning feature-engineering forecasting machine-learning preprocessing time-series
Last synced: 26 Mar 2025
https://github.com/nvidia-merlin/nvtabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system
Last synced: 14 May 2025
https://github.com/KinWaiCheuk/nnAudio
Audio processing by using pytorch 1D convolution network
1d-convolution audio-processing cqt-spectrogram melspectrogram neural-network preprocessing pytorch spectrogram spectrogram-conversion-toolbox stft
Last synced: 14 Jul 2025
https://github.com/NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system
Last synced: 01 May 2025
https://github.com/kinwaicheuk/nnaudio
Audio processing by using pytorch 1D convolution network
1d-convolution audio-processing cqt-spectrogram melspectrogram neural-network preprocessing pytorch spectrogram spectrogram-conversion-toolbox stft
Last synced: 15 May 2025
https://github.com/TheAlgorithms/R
Collection of various algorithms implemented in R.
algorithm algorithms classification clustering data-mining datamanipulation education hacktoberfest learning machine-learning practice preprocessing r r-language r-programming regression
Last synced: 29 Jul 2025
https://github.com/thealgorithms/r
Collection of various algorithms implemented in R.
algorithm algorithms classification clustering data-mining datamanipulation education hacktoberfest learning machine-learning practice preprocessing r r-language r-programming regression
Last synced: 14 May 2025
https://github.com/sunlabuiuc/PyHealth
A Deep Learning Python Toolkit for Healthcare Applications.
clinical-data clinical-research data-mining deep-learning electronic-health-record electronic-medical-record healthcare medical-code preprocessing
Last synced: 10 May 2025
https://github.com/pytorch/torcharrow
High performance model preprocessing library on PyTorch
Last synced: 19 Oct 2025
https://github.com/r1j1t/contextualspellcheck
✔️Contextual word checker for better suggestions (not actively maintained)
bert chatbot help-wanted natural-language-processing nlp oov preprocessing python python-spelling-corrector spacy spacy-extension spellcheck spellchecker spelling-correction spelling-corrections
Last synced: 13 Apr 2025
https://github.com/msamogh/nonechucks
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch
Last synced: 07 May 2025
https://github.com/MaxHalford/xam
:dart: Personal data science and machine learning toolbox
data-science machine-learning preprocessing python stacking
Last synced: 08 May 2025
https://github.com/maxhalford/xam
:dart: Personal data science and machine learning toolbox
data-science machine-learning preprocessing python stacking
Last synced: 19 Aug 2025
https://github.com/datacanvasio/hypergbm
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 15 May 2025
https://github.com/DataCanvasIO/HyperGBM
A full pipeline AutoML tool for tabular data
adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost
Last synced: 09 May 2025
https://github.com/ikegami-yukino/jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
character-converter japanese-kana japanese-language julius preprocessing pure-python text-processing transliteration
Last synced: 14 May 2025
https://github.com/advaitsave/Introduction-to-Time-Series-forecasting-Python
Introduction to time series preprocessing and forecasting in Python using AR, MA, ARMA, ARIMA, SARIMA and Prophet model with forecast evaluation.
arima arma dickey-fuller forecast-evaluation forecasting preprocessing prophet-model python sarima seasonality series-forecasting-python series-preprocessing stationarity time-series time-series-forecasting
Last synced: 26 Mar 2025
https://github.com/ikegami-yukino/neologdn
Japanese text normalizer for mecab-neologd
japanese-language mecab-ipadic-neologd nlp preprocessing text-normalization
Last synced: 12 Mar 2025
https://github.com/dunky11/voicesmith
[WIP] VoiceSmith makes training text to speech models easy.
dataset-manager delightfultts preprocessing speech-synthesis text-to-speech toolkit tts univnet voice-cloning
Last synced: 06 May 2025
https://github.com/jbusecke/xMIP
Analysis ready CMIP6 data in python the easy way with pangeo tools.
analysis-ready-data climate-analysis climate-models cmip6 cmip6-data pangeo preprocessing xgcm
Last synced: 20 Jul 2025
https://github.com/jbusecke/xmip
Analysis ready CMIP6 data in python the easy way with pangeo tools.
analysis-ready-data climate-analysis climate-models cmip6 cmip6-data pangeo preprocessing xgcm
Last synced: 12 Dec 2025
https://github.com/ropensci/modistsp
An "R" package for automatic download and preprocessing of MODIS Land Products Time Series
gdal modis modis-data modis-land-products peer-reviewed preprocessing r r-package remote-sensing rstats satellite-imagery time-series
Last synced: 05 Apr 2025
https://github.com/mlr-org/mlr3pipelines
Dataflow Programming for Machine Learning in R
bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing r r-package stacking
Last synced: 15 May 2025
https://github.com/githubharald/deslantimg
The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.
c-plus-plus gpu handwriting-recognition image-processing ocr opencl opencv preprocessing python
Last synced: 14 May 2025
https://github.com/chakki-works/chariot
Deliver the ready-to-train data to your NLP model.
keras natural-language-processing preprocessing python tensorflow
Last synced: 17 Mar 2025
https://github.com/lozuwa/impy
Impy is a Python3 library with features that help you in your computer vision tasks.
dataset exploratory-data-analysis machine-learning preprocessing raw-data statistics tidy-data
Last synced: 02 Apr 2025
https://github.com/chrise96/3D_Ground_Segmentation
A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter
cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface
Last synced: 19 Mar 2025
https://github.com/madyankin/postcss-each
PostCSS plugin to iterate through values
css iteration postcss preprocessing
Last synced: 27 Apr 2025
https://github.com/kharchenkolab/dropEst
Pipeline for initial analysis of droplet-based single-cell RNA-seq data
pipeline preprocessing scrna-seq single-cell-rna-seq
Last synced: 09 Apr 2025
https://github.com/nipreps/dmriprep
dMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.
bids bids-apps diffusion-mri magnetic-resonance-imaging preprocessing
Last synced: 06 May 2025
https://github.com/elcorto/pwtools
pwtools is a Python package for pre- and postprocessing of atomistic calculations, mostly targeted to Quantum Espresso, CPMD, CP2K and LAMMPS. It is almost, but not quite, entirely unlike ASE, with some tools extending numpy/scipy. It has a set of powerful parsers and data types for storing calculation data.
ase cp2k cpmd kernel-regression kernel-ridge-regression lammps molecular-dynamics multivariate-regression parameter-sweep polynomial-regression postprocessing preprocessing python quantum-espresso quasi-harmonic-approximation radial-basis-function radial-distribution-function radial-pair-correlation-function sqlite
Last synced: 14 Oct 2025
https://github.com/takelab/podium
Podium: a framework agnostic Python NLP library for data loading and preprocessing
data-loading datasets natural-language-processing nlp preprocessing python
Last synced: 27 Jul 2025
https://github.com/lucasrla/wsi-preprocessing
Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning
fastai histopathology libvips openslide pathology preprocessing pytorch pyvips whole-slide-imaging wsi
Last synced: 12 Apr 2025
https://github.com/vincentstimper/mclahe
NumPy and Tensorflow implementation of the Multidimensional Contrast Limited Adaptive Histogram Equalization (MCLAHE) procedure
contrast-enhancement histogram-equalization multidimensional-data preprocessing
Last synced: 14 Oct 2025
https://github.com/paulross/cpip
CPIP - a C/C++ preprocessor implemented in Python.
c c-plus-plus pre-processing pre-processor preprocessing preprocessor python
Last synced: 07 Apr 2025
https://github.com/l-ramirez-lopez/prospectr
R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data
chemometrics derivatives infrared near-infrared nir pedometrics preprocessing r r-package resample sampling signal soil-spectroscopy spectroscopy
Last synced: 22 Oct 2025
https://github.com/data-science-lab-amsterdam/skippa
SciKIt-learn Pipeline in PAndas
data-science machine-learning pandas pandas-dataframe pipeline preprocessing python scikit-learn sklearn
Last synced: 08 Sep 2025
https://github.com/silentflame/named-entity-recognition
Corpus and a baseline neural network system for Named Entity Recognition in Hindi-English Code-Mixed social media text.
acl-news2018 crfsuite csv dataset dataset-generation decision-trees f1-score hindi-english lstm-neural-networks named-entity-recognition ner ner-tags neural-network nlp-machine-learning preprocessing python research-paper social-media stats tweets
Last synced: 26 Sep 2025
https://github.com/bids-apps/freesurfer
BIDS app wrapping recon-all from FreeSurfer
anatomical-mri bids bidsapp mri preprocessing
Last synced: 29 Apr 2025
https://github.com/fitushar/brain-tissue-segmentation-using-deep-learning-pipeline-neuronet
This Repository is for the MISA Course final project which was Brain tissue segmentation. we adopt NeuroNet which is a comprehensive brain image segmentation tool based on a novel multi-output CNN architecture which has been trained and tuned using IBSR18 dataset
3d 3dfcn brain brain-tissue-segmentation cnn-architecture dice neuronet preprocessing registration segmentation
Last synced: 22 Apr 2025
https://github.com/daniellwdb/roka
🤖 Rise of Kingdoms bot to manage kingdom titles and DKP through Discord.
adb automation discord-bot ocr preprocessing rise-of-kingdoms
Last synced: 23 Jun 2025
https://github.com/fareedkhan-dev/most-powerful-nlp-library
Gemini, as capable as GPT-4, provides a free API with limited access. I tested it with the help of prompt engineering and found that it can solve almost any NLP task you want to tackle.
api gemini large-language-models llm nlp nlp-library preprocessing python
Last synced: 07 Sep 2025
https://github.com/bids-apps/HCPPipelines
A BIDS App for minimal preprocessing using the HCP Pipelines
anatomical-mri bids bidsapp functional-mri mri preprocessing
Last synced: 29 Apr 2025
https://github.com/fkie-cad/logprep
log data pre processing, generation and shipping in python
etl kafka log logdata loggenerator logshipper opensearch preprocessing python soar sre
Last synced: 20 Aug 2025
https://github.com/maruedt/chemometrics
Python library for chemometric data analysis
chemometrics data-analysis ihm mcr mvda pca pls preprocessing python scikit-learn spectroscopy statistics
Last synced: 10 Apr 2025
https://github.com/juliaml/mllabelutils.jl
Utility package for working with classification targets and label-encodings
classification julia machine-learning preprocessing
Last synced: 05 Jul 2025
https://github.com/intuition-dev/intuition
Intuition v1. CLI for Pug, CRUD and docs/blogs as staticGen, and much more.
component low-code markdown preprocessing pug seo static-site-generator web webapp
Last synced: 10 Apr 2025
https://github.com/hscspring/pnlp
NLP预/后处理工具。
chinese-nlp concurrency nlp nlp-enhancer nlp-preprocess normalization preprocessing text-cleaning text-extraction text-length text-processing
Last synced: 27 Sep 2025
https://github.com/vasisouv/tweets-preprocessor
Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team
nltk preprocessing python spacy spacy-nlp twitter
Last synced: 14 Aug 2025
https://github.com/lucasrla/wsi-tile-cleanup
Image filters for digital pathology: detect pen marks, background, and artifacts. Use them for preprocessing towards deep learning
deep-learning fastai histopathology libvips otsu-threshold pathology preprocessing pytorch pyvips whole-slide-imaging wsi
Last synced: 12 Apr 2025
https://github.com/akb89/pyfn
A python module to process data for Frame Semantic Parsing
coling2018 frame-semantic-parsing framenet framenet-xml-data open-sesame pipeline preprocessing semafor
Last synced: 18 Sep 2025
https://github.com/nobodywasishere/vhdlproc
VHDLproc is a VHDL preprocessor
preprocessing python vhdl vhdl-preprocessor
Last synced: 24 Apr 2025
https://github.com/strubell/preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset.
conll-2005 dataset nlp nlp-resources preprocessing semantic-role-labeling
Last synced: 08 Sep 2025
https://github.com/justinshenk/simages
Find duplicates and similar images in a folder
autoencoder duplicate-detection images preprocessing similarity-detection
Last synced: 11 Oct 2025
https://github.com/banditml/faucetml
High speed mini-batch data reading & preprocessing from BigQuery.
bigquery feature-engineering features machine-learning ml preprocessing pytorch
Last synced: 28 Jul 2025
https://github.com/cea-list/rpcdataloader
A variant of the PyTorch Dataloader using remote workers.
data-science dataloader distributed-computing hpc machine-learning preprocessing pytorch slurm
Last synced: 21 Jun 2025
https://github.com/louisbrulenaudet/docutron
Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.
cv2 detecron2 detection document legal legaltech legaltools llm machine-learning nlp ocr ocr-recognition preprocessing
Last synced: 14 Jul 2025
https://github.com/Neurita/pypes
Reusable neuroimaging pipelines using nipype
dti fmri ica neuroimaging nipype pet plotting preprocessing
Last synced: 01 May 2025
https://github.com/niklaswais/gesp
court-decisions preprocessing web-scraping
Last synced: 14 May 2025
https://github.com/evernext10/hand-gesture-recognition-machine-learning
Automatic method for the recognition of hand gestures for the categorization of vowels and numbers in Colombian sign language based on Neural Networks (Perceptrons), Support Vector Machine and K-Nearest Neighbor for classifier /// Método automático para el reconocimiento de gestos de mano para la categorización de vocales y números en lenguaje de señas colombiano basado en redes neuronales (perceptrones), soporte de máquina vectorial y K-vecino más cercano para clasificador
artificial-intelligence colombian-sign-language colombian-signal-language f1-score feature-extraction gesture hand knearest-neighbor-classifier knn-classification knn-classifier lsc machine-learning machinelearning neural-network precision preprocessing recall recognition signal-processing support-vector-machines
Last synced: 22 Apr 2025
https://github.com/bids-apps/CPAC
BIDS Application for the Configurable Pipeline for the Analysis of Connectomes (C-PAC)
bids bidsapp mri preprocessing
Last synced: 29 Apr 2025
https://github.com/saichandrareddy1/oxygenjs
This a JavaScript Library for the Numerical Javascript and Machine Learning
algebra javascript machine machine-learning machine-learning-algorithms maths matrix numerical-methods preprocessing
Last synced: 28 Oct 2025
https://github.com/lydialucchesi/smallsets
Visual documentation for data preprocessing in R and Python
data-science data-visualization documentation-tool machine-learning preprocessing python r r-package visualization-tools
Last synced: 24 Jun 2025
https://github.com/uranusx86/random-erasing-tensorflow
A complete Tensorflow implementation of cutout random erasing (without numpy)
argumentation deep-learning image-processing neural-network paper-implementations preprocessing python tensorflow tensorflow-experiments
Last synced: 24 Oct 2025
https://github.com/yeonghyeon/preprocessing-method-for-stemi-detection
Official source code of "Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG"
cnn convolutional-neural-network ecg electrocardiogram enhancement highpass-filter improvement lead notch-filter preprocessing python qrs-complex stemi-detection voting
Last synced: 26 Apr 2025
https://github.com/adobe-research/beacon-aug
Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms
albumentation augly augmentation beacon conversion cross-platform deep-learning gan imgaug mmcv preprocessing transformations
Last synced: 10 Apr 2025
https://github.com/marrow/dsl
A Pythonic DSL construction engine for import–time code translation.
cpython dsl preprocessing preprocessor pypy python python-2 python-3 text-processing
Last synced: 27 Jun 2025
https://github.com/deepraj1729/tchatbot-api
A Flask REST API to serve trained ChatBots using Tensorflow Serving and Docker Containers
api-rest chatbot deep-learning flask flask-restful framwork keras nlp preprocessing requests tensorflow tf-serving
Last synced: 01 May 2025
https://github.com/yoctol/text-normalizer
Normalize text string
natural-language-processing preprocessing pypi
Last synced: 28 Apr 2025
https://github.com/riccorl/ipa
NLP Preprocessing Pipeline Wrappers
lemmatization model natural-language-processing nlp part-of-speech-tagger pipeline preprocessing spacy stanza tagging token tokenizer wrapper
Last synced: 12 Apr 2025
https://github.com/gianlucatruda/warfit-learn
A machine learning toolkit for reproducible research in anticoagulant dose estimation.
data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn
Last synced: 24 Oct 2025
https://github.com/chrislemke/sk-transformers
A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠
data-science feature-engineering feature-selection machine-learning pandas preprocessing python scikit-learn scikit-learn-pipelines scikit-learn-transformer
Last synced: 17 Jun 2025
https://github.com/stanstrup/qc4metabolomics
QC systems for metabolomics studies
lc-ms metabolomics metabolomics-studies preprocessing qc-systems xcms
Last synced: 15 Apr 2025
https://github.com/alexchristensen/semnetcleaner
An Automated Cleaning Tool for Semantic and Linguistic Data
preprocessing r semantic-network-analysis
Last synced: 11 Apr 2025
https://github.com/nottruefalse/captcha_solving
All about creating a dataset, preprocessing images, and creating an actual model to solve captcha
ai captcha-solver keras-models nodejs preprocessing python3 svg-captcha tensorflow
Last synced: 27 Jul 2025
https://github.com/francoisschwarzentruber/abcd
A simple ASCII format to represent music scores, and a music score editor
abc abcjs ascii constraint-satisfaction-problem lilypond linear-programming markdown midi music music-composition music-notation music-notation-format music-score preprocessing simple-app
Last synced: 13 Sep 2025
https://github.com/huangzhii/tsunami
An R software for Gene Co-Expression Analysis
co-expression gene preprocessing
Last synced: 15 May 2025
https://github.com/miferreiro/bdpar
Big Data Preprocessing Architecture
custom-flow custom-pipes preprocessing r r6
Last synced: 12 Jun 2025
https://github.com/james77777778/keras-aug
A library that includes pure TF/Keras preprocessing and augmentation layers, providing support for various data types such as images, labels, bounding boxes, segmentation masks, and more.
augmentation keras keras-cv preprocessing tensorflow
Last synced: 10 Apr 2025
https://github.com/tirendazacademy/r-programming-tutorial
Here are the topics talked about R tutorial in 1 YouTube video.
data-analysis data-science data-structures data-visualization ggplot2 logistics preprocessing r-programming r-programming-projects r-projects regression rstudio
Last synced: 21 Feb 2025
https://github.com/khaledashrafh/logistic-regression
This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.
activation-function cost-function data-preprocessing logistic-regression model preprocessing regression-models sigmoid sigmoid-activation sigmoid-function
Last synced: 17 Oct 2025
https://github.com/bencardoen/datacurator.jl
A scalable Julia package to transparently validate and transform large biomedical datasets using human readable recipes that are translated to machine verifiable templates.
julia julia-package portable postprocessing preprocessing reproducible-research scalability
Last synced: 30 Jun 2025
https://github.com/boudinfl/semeval-2010-pre
Preprocessed SemEval-2010 benchmark dataset for keyphrase extraction
dataset information-retrieval keyphrase-extraction natural-language-processing preprocessing
Last synced: 24 Mar 2025
https://github.com/karan-malik/prepdata
Automating the process of Data Preprocessing for Data Science
classification data dataanalysis dataframe datapreprocessing datascience machine-learning numpy pandas pip preprocessing pypi-package python python3 random-forest regress sklearn
Last synced: 13 Apr 2025
https://github.com/birddevelper/scanneddocumentpreprocessing
Scanned document preprocessing python snippet code
classification denoising image-processing machine-learning ocr opencv preprocessing python
Last synced: 28 Apr 2025
https://github.com/bcbi/preprocessmd.jl
Medically-informed data preprocessing for machine learning
julia machine-learning omop preprocessing
Last synced: 03 Aug 2025
https://github.com/nipreps/nondefaced-detector
Identify non-defaced datasets before publication
mri neuroimaging preprocessing pretrained-models reproducibility tensorflow-models
Last synced: 29 Mar 2025
https://github.com/abtinz/machine-learning-with-python
Machine Learning with Python in Jupiter
data-mining data-science fuzzy-logic machine-learning matplotlib numpy pandas preprocessing regression
Last synced: 29 Jul 2025
https://github.com/khannatanmai/rule-based-preprocessing-mt
Rule-based pre-processing of non-compositional constructions to simplify them and improve black-box machine translation
construction-grammar machine-translation preprocessing rule-based
Last synced: 03 Aug 2025
https://github.com/m-clark/tidyext
Extensions and extras for tidy processing.
datapreprocessing dplyr group-by head missing-data onehot-encoder prediction preprocessing r rounding sparse-matrix summary summary-statistics tail tidyr tidyverse
Last synced: 30 Apr 2025