An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with preprocessing

A curated list of projects in awesome lists tagged with preprocessing .

https://github.com/dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

apache2 chinese natural-language-processing ner nlp nlp-parse preprocessing python time-parse time-parsing

Last synced: 18 Mar 2025

https://github.com/opengene/fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

adapter bioinformatics duplication fastq filter filtering illumina merging ngs overlap polyg preprocessing qc quality quality-control sequencing splitting trimming umi

Last synced: 29 Apr 2025

https://github.com/OpenGene/fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)

adapter bioinformatics duplication fastq filter filtering illumina merging ngs overlap polyg preprocessing qc quality quality-control sequencing splitting trimming umi

Last synced: 07 May 2025

https://github.com/nvidia-merlin/nvtabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system

Last synced: 14 May 2025

https://github.com/NVIDIA-Merlin/NVTabular

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

deep-learning feature-engineering feature-selection gpu machine-learning nvidia preprocessing recommendation-system recommender-system

Last synced: 01 May 2025

https://github.com/pytorch/torcharrow

High performance model preprocessing library on PyTorch

preprocessing python pytorch

Last synced: 19 Oct 2025

https://github.com/msamogh/nonechucks

Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

data-cleaning data-pipeline data-preprocessing data-processing machine-learning preprocessing pytorch torch

Last synced: 07 May 2025

https://github.com/MaxHalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 08 May 2025

https://github.com/maxhalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 19 Aug 2025

https://github.com/ikegami-yukino/jaconv

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

character-converter japanese-kana japanese-language julius preprocessing pure-python text-processing transliteration

Last synced: 14 May 2025

https://github.com/advaitsave/Introduction-to-Time-Series-forecasting-Python

Introduction to time series preprocessing and forecasting in Python using AR, MA, ARMA, ARIMA, SARIMA and Prophet model with forecast evaluation.

arima arma dickey-fuller forecast-evaluation forecasting preprocessing prophet-model python sarima seasonality series-forecasting-python series-preprocessing stationarity time-series time-series-forecasting

Last synced: 26 Mar 2025

https://github.com/dunky11/voicesmith

[WIP] VoiceSmith makes training text to speech models easy.

dataset-manager delightfultts preprocessing speech-synthesis text-to-speech toolkit tts univnet voice-cloning

Last synced: 06 May 2025

https://github.com/jbusecke/xMIP

Analysis ready CMIP6 data in python the easy way with pangeo tools.

analysis-ready-data climate-analysis climate-models cmip6 cmip6-data pangeo preprocessing xgcm

Last synced: 20 Jul 2025

https://github.com/jbusecke/xmip

Analysis ready CMIP6 data in python the easy way with pangeo tools.

analysis-ready-data climate-analysis climate-models cmip6 cmip6-data pangeo preprocessing xgcm

Last synced: 12 Dec 2025

https://github.com/ropensci/modistsp

An "R" package for automatic download and preprocessing of MODIS Land Products Time Series

gdal modis modis-data modis-land-products peer-reviewed preprocessing r r-package remote-sensing rstats satellite-imagery time-series

Last synced: 05 Apr 2025

https://github.com/githubharald/deslantimg

The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.

c-plus-plus gpu handwriting-recognition image-processing ocr opencl opencv preprocessing python

Last synced: 14 May 2025

https://github.com/chakki-works/chariot

Deliver the ready-to-train data to your NLP model.

keras natural-language-processing preprocessing python tensorflow

Last synced: 17 Mar 2025

https://github.com/lozuwa/impy

Impy is a Python3 library with features that help you in your computer vision tasks.

dataset exploratory-data-analysis machine-learning preprocessing raw-data statistics tidy-data

Last synced: 02 Apr 2025

https://github.com/chrise96/3D_Ground_Segmentation

A ground segmentation algorithm for 3D point clouds based on the work described in “Fast segmentation of 3D point clouds: a paradigm on LIDAR data for Autonomous Vehicle Applications”, D. Zermas, I. Izzat and N. Papanikolopoulos, 2017. Distinguish between road and non-road points. Road surface extraction. Plane fit ground filter

cpp extraction ground ground-segmentation lastools lidar non-ground point-cloud preprocessing road-surface

Last synced: 19 Mar 2025

https://github.com/madyankin/postcss-each

PostCSS plugin to iterate through values

css iteration postcss preprocessing

Last synced: 27 Apr 2025

https://github.com/kharchenkolab/dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

pipeline preprocessing scrna-seq single-cell-rna-seq

Last synced: 09 Apr 2025

https://github.com/nipreps/dmriprep

dMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.

bids bids-apps diffusion-mri magnetic-resonance-imaging preprocessing

Last synced: 06 May 2025

https://github.com/elcorto/pwtools

pwtools is a Python package for pre- and postprocessing of atomistic calculations, mostly targeted to Quantum Espresso, CPMD, CP2K and LAMMPS. It is almost, but not quite, entirely unlike ASE, with some tools extending numpy/scipy. It has a set of powerful parsers and data types for storing calculation data.

ase cp2k cpmd kernel-regression kernel-ridge-regression lammps molecular-dynamics multivariate-regression parameter-sweep polynomial-regression postprocessing preprocessing python quantum-espresso quasi-harmonic-approximation radial-basis-function radial-distribution-function radial-pair-correlation-function sqlite

Last synced: 14 Oct 2025

https://github.com/takelab/podium

Podium: a framework agnostic Python NLP library for data loading and preprocessing

data-loading datasets natural-language-processing nlp preprocessing python

Last synced: 27 Jul 2025

https://github.com/lucasrla/wsi-preprocessing

Simple library for preprocessing histopathological whole-slide images (WSI) into tiles (a.k.a. patches) towards deep learning

fastai histopathology libvips openslide pathology preprocessing pytorch pyvips whole-slide-imaging wsi

Last synced: 12 Apr 2025

https://github.com/vincentstimper/mclahe

NumPy and Tensorflow implementation of the Multidimensional Contrast Limited Adaptive Histogram Equalization (MCLAHE) procedure

contrast-enhancement histogram-equalization multidimensional-data preprocessing

Last synced: 14 Oct 2025

https://github.com/paulross/cpip

CPIP - a C/C++ preprocessor implemented in Python.

c c-plus-plus pre-processing pre-processor preprocessing preprocessor python

Last synced: 07 Apr 2025

https://github.com/l-ramirez-lopez/prospectr

R package: Misc. Functions for Processing and Sample Selection of Spectroscopic Data

chemometrics derivatives infrared near-infrared nir pedometrics preprocessing r r-package resample sampling signal soil-spectroscopy spectroscopy

Last synced: 22 Oct 2025

https://github.com/bids-apps/freesurfer

BIDS app wrapping recon-all from FreeSurfer

anatomical-mri bids bidsapp mri preprocessing

Last synced: 29 Apr 2025

https://github.com/fitushar/brain-tissue-segmentation-using-deep-learning-pipeline-neuronet

This Repository is for the MISA Course final project which was Brain tissue segmentation. we adopt NeuroNet which is a comprehensive brain image segmentation tool based on a novel multi-output CNN architecture which has been trained and tuned using IBSR18 dataset

3d 3dfcn brain brain-tissue-segmentation cnn-architecture dice neuronet preprocessing registration segmentation

Last synced: 22 Apr 2025

https://github.com/daniellwdb/roka

🤖 Rise of Kingdoms bot to manage kingdom titles and DKP through Discord.

adb automation discord-bot ocr preprocessing rise-of-kingdoms

Last synced: 23 Jun 2025

https://github.com/fareedkhan-dev/most-powerful-nlp-library

Gemini, as capable as GPT-4, provides a free API with limited access. I tested it with the help of prompt engineering and found that it can solve almost any NLP task you want to tackle.

api gemini large-language-models llm nlp nlp-library preprocessing python

Last synced: 07 Sep 2025

https://github.com/bids-apps/HCPPipelines

A BIDS App for minimal preprocessing using the HCP Pipelines

anatomical-mri bids bidsapp functional-mri mri preprocessing

Last synced: 29 Apr 2025

https://github.com/fkie-cad/logprep

log data pre processing, generation and shipping in python

etl kafka log logdata loggenerator logshipper opensearch preprocessing python soar sre

Last synced: 20 Aug 2025

https://github.com/juliaml/mllabelutils.jl

Utility package for working with classification targets and label-encodings

classification julia machine-learning preprocessing

Last synced: 05 Jul 2025

https://github.com/intuition-dev/intuition

Intuition v1. CLI for Pug, CRUD and docs/blogs as staticGen, and much more.

component low-code markdown preprocessing pug seo static-site-generator web webapp

Last synced: 10 Apr 2025

https://github.com/vasisouv/tweets-preprocessor

Repo containing the Twitter preprocessor module, developed by the AUTH OSWinds team

nltk preprocessing python spacy spacy-nlp twitter

Last synced: 14 Aug 2025

https://github.com/lucasrla/wsi-tile-cleanup

Image filters for digital pathology: detect pen marks, background, and artifacts. Use them for preprocessing towards deep learning

deep-learning fastai histopathology libvips otsu-threshold pathology preprocessing pytorch pyvips whole-slide-imaging wsi

Last synced: 12 Apr 2025

https://github.com/akb89/pyfn

A python module to process data for Frame Semantic Parsing

coling2018 frame-semantic-parsing framenet framenet-xml-data open-sesame pipeline preprocessing semafor

Last synced: 18 Sep 2025

https://github.com/nobodywasishere/vhdlproc

VHDLproc is a VHDL preprocessor

preprocessing python vhdl vhdl-preprocessor

Last synced: 24 Apr 2025

https://github.com/strubell/preprocess-conll05

Scripts for preprocessing the CoNLL-2005 SRL dataset.

conll-2005 dataset nlp nlp-resources preprocessing semantic-role-labeling

Last synced: 08 Sep 2025

https://github.com/justinshenk/simages

Find duplicates and similar images in a folder

autoencoder duplicate-detection images preprocessing similarity-detection

Last synced: 11 Oct 2025

https://github.com/banditml/faucetml

High speed mini-batch data reading & preprocessing from BigQuery.

bigquery feature-engineering features machine-learning ml preprocessing pytorch

Last synced: 28 Jul 2025

https://github.com/cea-list/rpcdataloader

A variant of the PyTorch Dataloader using remote workers.

data-science dataloader distributed-computing hpc machine-learning preprocessing pytorch slurm

Last synced: 21 Jun 2025

https://github.com/louisbrulenaudet/docutron

Docutron Toolkit: detection and segmentation analysis for legal data extraction over documents.

cv2 detecron2 detection document legal legaltech legaltools llm machine-learning nlp ocr ocr-recognition preprocessing

Last synced: 14 Jul 2025

https://github.com/Neurita/pypes

Reusable neuroimaging pipelines using nipype

dti fmri ica neuroimaging nipype pet plotting preprocessing

Last synced: 01 May 2025

https://github.com/evernext10/hand-gesture-recognition-machine-learning

Automatic method for the recognition of hand gestures for the categorization of vowels and numbers in Colombian sign language based on Neural Networks (Perceptrons), Support Vector Machine and K-Nearest Neighbor for classifier /// Método automático para el reconocimiento de gestos de mano para la categorización de vocales y números en lenguaje de señas colombiano basado en redes neuronales (perceptrones), soporte de máquina vectorial y K-vecino más cercano para clasificador

artificial-intelligence colombian-sign-language colombian-signal-language f1-score feature-extraction gesture hand knearest-neighbor-classifier knn-classification knn-classifier lsc machine-learning machinelearning neural-network precision preprocessing recall recognition signal-processing support-vector-machines

Last synced: 22 Apr 2025

https://github.com/bids-apps/CPAC

BIDS Application for the Configurable Pipeline for the Analysis of Connectomes (C-PAC)

bids bidsapp mri preprocessing

Last synced: 29 Apr 2025

https://github.com/saichandrareddy1/oxygenjs

This a JavaScript Library for the Numerical Javascript and Machine Learning

algebra javascript machine machine-learning machine-learning-algorithms maths matrix numerical-methods preprocessing

Last synced: 28 Oct 2025

https://github.com/yeonghyeon/preprocessing-method-for-stemi-detection

Official source code of "Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG"

cnn convolutional-neural-network ecg electrocardiogram enhancement highpass-filter improvement lead notch-filter preprocessing python qrs-complex stemi-detection voting

Last synced: 26 Apr 2025

https://github.com/adobe-research/beacon-aug

Cross-library augmentation toolbox supporting 300 operators over 8 libraries + AI transforms

albumentation augly augmentation beacon conversion cross-platform deep-learning gan imgaug mmcv preprocessing transformations

Last synced: 10 Apr 2025

https://github.com/marrow/dsl

A Pythonic DSL construction engine for import–time code translation.

cpython dsl preprocessing preprocessor pypy python python-2 python-3 text-processing

Last synced: 27 Jun 2025

https://github.com/deepraj1729/tchatbot-api

A Flask REST API to serve trained ChatBots using Tensorflow Serving and Docker Containers

api-rest chatbot deep-learning flask flask-restful framwork keras nlp preprocessing requests tensorflow tf-serving

Last synced: 01 May 2025

https://github.com/gianlucatruda/warfit-learn

A machine learning toolkit for reproducible research in anticoagulant dose estimation.

data-science iwpc pandas preprocessing python reproducible-research sklearn supervised-learning warfarin warfit-learn

Last synced: 24 Oct 2025

https://github.com/chrislemke/sk-transformers

A collection of pandas & scikit-learn compatible transformers for preprocessing and feature engineering 🛠

data-science feature-engineering feature-selection machine-learning pandas preprocessing python scikit-learn scikit-learn-pipelines scikit-learn-transformer

Last synced: 17 Jun 2025

https://github.com/alexchristensen/semnetcleaner

An Automated Cleaning Tool for Semantic and Linguistic Data

preprocessing r semantic-network-analysis

Last synced: 11 Apr 2025

https://github.com/nottruefalse/captcha_solving

All about creating a dataset, preprocessing images, and creating an actual model to solve captcha

ai captcha-solver keras-models nodejs preprocessing python3 svg-captcha tensorflow

Last synced: 27 Jul 2025

https://github.com/huangzhii/tsunami

An R software for Gene Co-Expression Analysis

co-expression gene preprocessing

Last synced: 15 May 2025

https://github.com/miferreiro/bdpar

Big Data Preprocessing Architecture

custom-flow custom-pipes preprocessing r r6

Last synced: 12 Jun 2025

https://github.com/james77777778/keras-aug

A library that includes pure TF/Keras preprocessing and augmentation layers, providing support for various data types such as images, labels, bounding boxes, segmentation masks, and more.

augmentation keras keras-cv preprocessing tensorflow

Last synced: 10 Apr 2025

https://github.com/khaledashrafh/logistic-regression

This program implements logistic regression from scratch using the gradient descent algorithm in Python to predict whether customers will purchase a new car based on their age and salary.

activation-function cost-function data-preprocessing logistic-regression model preprocessing regression-models sigmoid sigmoid-activation sigmoid-function

Last synced: 17 Oct 2025

https://github.com/bencardoen/datacurator.jl

A scalable Julia package to transparently validate and transform large biomedical datasets using human readable recipes that are translated to machine verifiable templates.

julia julia-package portable postprocessing preprocessing reproducible-research scalability

Last synced: 30 Jun 2025

https://github.com/boudinfl/semeval-2010-pre

Preprocessed SemEval-2010 benchmark dataset for keyphrase extraction

dataset information-retrieval keyphrase-extraction natural-language-processing preprocessing

Last synced: 24 Mar 2025

https://github.com/bcbi/preprocessmd.jl

Medically-informed data preprocessing for machine learning

julia machine-learning omop preprocessing

Last synced: 03 Aug 2025

https://github.com/khannatanmai/rule-based-preprocessing-mt

Rule-based pre-processing of non-compositional constructions to simplify them and improve black-box machine translation

construction-grammar machine-translation preprocessing rule-based

Last synced: 03 Aug 2025