An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with data-augmentation

A curated list of projects in awesome lists tagged with data-augmentation .

https://github.com/snorkel-team/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 23 Feb 2026

https://hazyresearch.github.io/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 26 Feb 2025

https://github.com/nvidia/dali

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch

Last synced: 13 May 2025

https://github.com/NVIDIA/DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

audio-processing data-augmentation data-processing deep-learning fast-data-pipeline gpu gpu-tensorflow image-augmentation image-processing machine-learning mxnet neural-network paddle python pytorch

Last synced: 15 Mar 2025

https://github.com/qdata/textattack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security

Last synced: 14 May 2025

https://github.com/QData/TextAttack

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

adversarial-attacks adversarial-examples adversarial-machine-learning data-augmentation machine-learning natural-language-processing nlp security

Last synced: 02 Apr 2025

https://github.com/webdataset/webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

data-augmentation deep-learning pytorch webdataset webdataset-format

Last synced: 11 Dec 2025

https://github.com/iver56/audiomentations

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

audio audio-data-augmentation audio-effects augmentation data-augmentation deep-learning dsp machine-learning music python sound sound-processing

Last synced: 13 May 2025

https://github.com/425776024/nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda

chinese-data-augmentation chinese-eda data-augmentation nlp nlpcda

Last synced: 15 May 2025

https://github.com/visual-layer/fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.

data-augmentation data-curation dataset deep-learning image image-analysis image-classfication image-classification image-duplicate-detection image-processing image-similarity machine-learning novelty-detection object-detection outlier-detection python visual-search visualization visualization-tools

Last synced: 14 May 2025

https://github.com/yongzhuo/nlp_xiaojiang

自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用

bert chatbot chinese data-augmentation distance enhance feature nlp text-augment text-classification xlnet

Last synced: 15 May 2025

https://github.com/zhanlaoban/eda_nlp_for_chinese

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

chinese chinese-data-augmentation data-augmentation easy-data-augmentation eda text-classification

Last synced: 16 May 2025

https://github.com/zhanlaoban/EDA_NLP_for_Chinese

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

chinese chinese-data-augmentation data-augmentation easy-data-augmentation eda text-classification

Last synced: 09 May 2025

https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.

alignment compression data-augmentation data-synthesis feedback instruction-following kd knowledge-distillation large-language-model llm multi-modal self-distillation self-training supervised-finetuning survey

Last synced: 12 Apr 2025

https://github.com/goru001/inltk

Natural Language Toolkit for Indic Languages aims to provide out of the box support for various NLP tasks that an application developer might need

data-augmentation deep-learning indic-languages nlp pytorch sentence-embeddings sentence-encoding sentence-similarity word-embeddings

Last synced: 12 May 2025

https://github.com/zhunzhong07/Random-Erasing

Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

aaai2020 data-augmentation image-classification object-detection person-re-identification pytorch

Last synced: 02 May 2025

https://github.com/DemisEom/SpecAugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Last synced: 19 Jul 2025

https://github.com/demiseom/specaugment

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

data-augmentation python pytorch specaugment speech speech-recognition tensorflow

Last synced: 04 Apr 2025

https://github.com/vanderschaarlab/synthcity

A library for generating and evaluating synthetic tabular data for privacy, fairness and data augmentation.

data-augmentation fairness-ml generative-model machine-learning privacy pytorch synthetic-data tabular-data

Last synced: 16 May 2025

https://github.com/codebox/image_augmentor

Data augmentation tool for images

data-augmentation image-augmentor machine-learning

Last synced: 12 Jun 2025

https://github.com/sshuair/torchsat

🔥TorchSat 🌏 is an open-source deep learning framework for satellite imagery analysis based on PyTorch.

classification data-augmentation deep-learning pytorch remote-sensing satellite satellite-imagery semantic-segmentation torchvision

Last synced: 07 Apr 2025

https://github.com/mratsim/amazon-forest-computer-vision

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

computer-vision data-augmentation deep-learning kaggle kaggle-competition keras neural-network-example neural-networks pytorch transfer-learning

Last synced: 07 Apr 2025

https://github.com/mratsim/Amazon-Forest-Computer-Vision

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

computer-vision data-augmentation deep-learning kaggle kaggle-competition keras neural-network-example neural-networks pytorch transfer-learning

Last synced: 08 May 2025

https://github.com/arundo/tsaug

A Python package for time series augmentation

audio data-augmentation deep-learning time-series

Last synced: 04 Apr 2025

https://github.com/vinthony/ghost-free-shadow-removal

[AAAI 2020] Towards Ghost-free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN

data-augmentation deep-learning scene-understanding shadow-removal tensorflow

Last synced: 06 Apr 2025

https://github.com/Garfield-kh/PoseTriplet

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

cvpr2022 data-augmentation motion-generation motion-imitation pose-estimation

Last synced: 03 Apr 2025

https://github.com/yu4u/mixup-generator

An implementation of "mixup: Beyond Empirical Risk Minimization"

data-augmentation deep-neural-networks generator keras mixup

Last synced: 13 Apr 2025

https://github.com/imedslab/solt

Streaming over lightweight data transformations

data-augmentation deep-learning image-recognition image-segmentation landmark-detection

Last synced: 08 May 2025

https://github.com/deftruth/torchlm

💎A high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations, can easily install via pip.

albumentations data-augmentation face-landmarks heatmap mobilenet pip pipnet regression shufflenet torchvision yolov5 yolov6 yolov7 yolox

Last synced: 17 Mar 2025

https://github.com/roatienza/straug

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

data-augmentation scene-text-recognition str

Last synced: 05 Apr 2025

https://github.com/xlite-dev/torchlm

💎A high level python lib for face landmarks detection: training, eval, export, inference(Python/C++) and 100+ data augmentations.

albumentations data-augmentation face-landmarks heatmap mobilenet pip pipnet regression shufflenet torchvision yolov5 yolov6 yolov7 yolox

Last synced: 13 Dec 2025

https://github.com/bmcfee/muda

A library for augmenting annotated audio data

data-augmentation machine-learning music nyucds python

Last synced: 06 Apr 2025

https://github.com/jiachens/ModelNet40-C

Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

benchmark computer-vision corruption-robustness data-augmentation deep-learning ml-safety point-cloud-processing pytorch regularization robustness

Last synced: 20 Mar 2025

https://github.com/THUDM/GRAND

Source code and dataset of the NeurIPS 2020 paper "Graph Random Neural Network for Semi-Supervised Learning on Graphs"

data-augmentation gnn graph-neural-networks graphs neurips-2020 semi-supervised-learning

Last synced: 21 Jul 2025

https://github.com/thudm/grand

Source code and dataset of the NeurIPS 2020 paper "Graph Random Neural Network for Semi-Supervised Learning on Graphs"

data-augmentation gnn graph-neural-networks graphs neurips-2020 semi-supervised-learning

Last synced: 10 Apr 2025

https://github.com/hwalsuklee/tensorflow-mnist-cnn

MNIST classification using Convolutional NeuralNetwork. Various techniques such as data augmentation, dropout, batchnormalization, etc are implemented.

batch-normalization cnn data-augmentation dropout ensemble-prediction mnist mnist-classification tensorflow

Last synced: 12 May 2025

https://github.com/hfawaz/aaltd18

Data augmentation using synthetic data for time series classification with deep residual networks

convolutional-neural-networks data-augmentation deep-learning dtw dynamic-time-warping time-series-classification

Last synced: 09 Apr 2025

https://github.com/thesouthfrog/stylealign

[ICCV 2019]Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Transition

data-augmentation face-alignment facial-landmarks representation-learning semi-supervised-learning

Last synced: 23 Apr 2025

https://github.com/beyondguo/genius

💡GENIUS – generating text using sketches! A strong text generation & data augmentation tool.

conditional-text-generation data-augmentation keywords-to-text named-entities-recognition sketch-to-text text-augmentation text-classificaiton text-generation

Last synced: 09 May 2025

https://github.com/mahmoudnafifi/WB_color_augmenter

WB color augmenter improves the accuracy of image classification and image semantic segmentation methods by emulating different WB effects (ICCV 2019) [Python & Matlab].

cnn color-augmentation color-constancy color-correction computer-vision data-augmentation deep-learning deep-neural-network deeplearning iccv19 iccv2019 image-augmentation image-classification semantic-segmentation white-balance whitebalance

Last synced: 08 May 2025

https://github.com/alexandervnikitin/tsgm

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24

augmentations data-augmentation data-science datasets deep-learning generative-model keras machine-learning python synthetic-data synthetic-time-series tensorflow2 time-series vae

Last synced: 06 Apr 2025

https://github.com/snu-mllab/PuzzleMix

Official PyTorch implementation of "Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup" (ICML'20)

data-augmentation mixup

Last synced: 06 May 2025

https://github.com/squeezeailab/llm2llm

[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

data-augmentation llama llama2 llm llms natural-language-processing nlp synthetic-dataset-generation transformer

Last synced: 31 Jul 2025

https://github.com/vita-epfl/social-nce

[ICCV21] Official implementation of the "Social NCE: Contrastive Learning of Socially-aware Motion Representations" in PyTorch.

contrastive-learning data-augmentation imitation-learning motion-forecasting multi-agent reinforcement-learning representation-learning

Last synced: 16 Jan 2026

https://github.com/clementchadebec/pyraug

Data Augmentation with Variational Autoencoders (TPAMI)

data-augmentation python variational-autoencoder

Last synced: 22 Aug 2025

https://github.com/arian-askari/ChatGPT-RetrievalQA-CIKM2023

A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on real human responses.

ai chatgpt chatgpt-information-retrieval chatgpt-ir data-augmentation dataset deep-learning gpt-3 gpt2 gpt3 information-retrieval information-retrieval-chatgpt ir ir-chatgpt machine-learning nlp openai python sequence-to-sequence text-retrieval

Last synced: 27 Mar 2025

https://github.com/Shaoli-Huang/SnapMix

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data (AAAI 2021)

aaai2021 cutmix data-augmentation fine-grained-recognition mixup

Last synced: 05 Apr 2025

https://github.com/tanyuqian/learning-data-manipulation

NeurIPS 2019 - Learning Data Manipulation for Augmentation and Weighting

bert data-augmentation data-manipulation meta-learning

Last synced: 26 Oct 2025

https://github.com/khawar-islam/diffuseMix

Official PyTorch implementation of DiffuseMix : Label-Preserving Data Augmentation with Diffusion Models (CVPR'2024)

cutmix data-augmentation diffusion-models generative-data-augmentation image-classification mixup synthetic-data transfer-learning

Last synced: 15 Aug 2025

https://github.com/snu-mllab/Co-Mixup

Official PyTorch implementation of "Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity" (ICLR'21 Oral)

data-augmentation mixup

Last synced: 06 May 2025

https://github.com/fabioperez/skin-data-augmentation

Source code for the paper 'Data Augmentation for Skin Lesion Analysis' — 🏆 Best Paper Award at the ISIC Skin Image Analysis Workshop @ MICCAI 2018

computer-vision data-augmentation deep-learning machine-learning melanoma melanoma-classification pytorch skin-lesion-analysis

Last synced: 06 Apr 2025

https://github.com/hypox64/candock

A time series signal analysis and classification framework

classification data-augmentation data-preprocessing deep-learning eeg series-signal-analysis

Last synced: 24 Apr 2025

https://github.com/zhiqiangdon/pose-adv-aug

Code for "Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation" (CVPR 2018)

adversarial-learning data-augmentation human-pose-estimation

Last synced: 30 Apr 2025

https://github.com/ashawkey/volumentations

3D volume data augmentation package inspired by albumentations

3d data-augmentation

Last synced: 30 Apr 2025

https://github.com/sky77764/pa-aug.pytorch

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud (IROS 2021)

3d-object-detection data-augmentation lidar point-cloud

Last synced: 15 Aug 2025

https://github.com/hkuds/adagcl

[KDD'2023] "AdaGCL: Adaptive Graph Contrastive Learning for Recommendation"

contrastive-learning data-augmentation graph-auto-encoder graph-neural-networks recommender-systems

Last synced: 28 Aug 2025

https://github.com/AIoT-MLSys-Lab/DeepAA

[ICLR 2022] "Deep AutoAugment" by Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang

automl data-augmentation deep-learning

Last synced: 08 May 2025

https://github.com/guchengxi1994/mask2json

a small tool for image augmentation, including mask files to json/xml files , image augmentation(flip,rotation,noise,...) and so on

data-augmentation labelimg-tool labelme-tool

Last synced: 16 Jan 2026

https://github.com/cuge1995/pointcutmix

our code for paper 'PointCutMix: Regularization Strategy for Point Cloud Classification', Neurocomputing, 2022

3d-deep-learning data-augmentation deep-learning point-cloud

Last synced: 05 May 2025

https://github.com/cuge1995/PointCutMix

our code for paper 'PointCutMix: Regularization Strategy for Point Cloud Classification', Neurocomputing, 2022

3d-deep-learning data-augmentation deep-learning point-cloud

Last synced: 20 Mar 2025