Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/enkiwang/dataset-distillation-papers

A collection of dataset distillation papers.
https://github.com/enkiwang/dataset-distillation-papers
dataset-condensation dataset-distillation efficient-machine-learning
Last synced: 12 days ago
JSON representation
A collection of dataset distillation papers.
Host: GitHub
URL: https://github.com/enkiwang/dataset-distillation-papers
Owner: enkiwang
Created: 2022-12-07T01:39:15.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-01-27T09:52:18.000Z (almost 2 years ago)
Last Synced: 2024-11-06T22:06:41.417Z (2 months ago)
Topics: dataset-condensation, dataset-distillation, efficient-machine-learning
Homepage:
Size: 15.6 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

        # Dataset-distillation-papers

This repository aims to provide a full list of works about dataset distillation (DD) or dataset condensation (DC).

## Quick links

**Papers sorted by year:** | [2023](#Papers-in-2023-back-to-top) | [2022](#Papers-in-2022-back-to-top) | [2021](#Papers-in-2021-back-to-top) | [2020](#Papers-in-2020-back-to-top) | [2019](#Papers-in-2019-back-to-top) | [2018](#Papers-in-2018-back-to-top) | 

## 2023

### Papers in 2023 [[Back-to-top](#Dataset-distillation-papers)]

| Author    | Title     | Type      | Task      | Dataset       | Venue     | Supp. Material     |

|---------|---------|---------|---------|---------|---------|---------|

| Ruonan Yu et al |[**Dataset Distillation: A Comprehensive Review**](https://arxiv.org/pdf/2301.07014.pdf) | Survey |  Multiple tasks |  | arXiv, Jan., 2023 |  |

| Shiye Lei et al |[**A Comprehensive Survey to Dataset Distillation**](https://arxiv.org/pdf/2301.05603.pdf) | Survey |  Multiple tasks |  | arXiv, Jan., 2023 |  |

| Noveen Sachdeva et al |[**Data Distillation: A Survey**](https://arxiv.org/pdf/2301.04272.pdf) | Survey | Multiple tasks  |  | arXiv, Jan., 2023 |  |

| Yugeng Liu et al|[**Backdoor Attacks Against Dataset Distillation**](https://arxiv.org/abs/2301.01197) |  |  Security | FMNIST, CIFAR10, STL10, SVHN | NDSS, 2023 | [Code](https://github.com/liuyugeng/baadd) |

## 2022

### Papers in 2022 [[Back-to-top](#Dataset-distillation-papers)]

| Author    | Title     | Type      | Task      | Dataset       | Venue     | Supp. Material     |

|---------|---------|---------|---------|---------|---------|---------|

| Guang Li et al|[**Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing**](https://www.sciencedirect.com/science/article/pii/S0169260722005703) | Soft-Label Distillation |  Application: Medical Data Sharing | Gastric X-ray | Computer Methods and Programs in Biomedicine, 2022 |  |

| Zijia Wang et al |[**Gift from nature: Potential Energy Minimization for explainable dataset distillation**](https://openaccess.thecvf.com/content/ACCV2022W/MLCSA/papers/Wang_Gift_from_nature_Potential_Energy_Minimization_for_explainable_dataset_distillation_ACCVW_2022_paper.pdf) | Potential Energy Minimization |  Image Classification | miniImageNet, CUB-200 | ACCV Workshop, 2022 |  |

| Michael Arbel  et al |[**Non-Convex Bilevel Games with Critical Point Selection Maps**](https://arxiv.org/pdf/2207.04888.pdf) | General Optimization |  Image Classification | CIFAR-10 | NeurIPS, 2022 |  |

| Zhiwei Deng et al |[**Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks**](https://openreview.net/pdf?id=RYZyj_wwgfa) |  |  Image Classification |  MNIST, SVHN, CIFAR10/100, TinyImageNet | NeurIPS, 2022 | [Code](https://github.com/princetonvisualai/RememberThePast-DatasetDistillation) |

| Noveen Sachdeva et al |[**Infinite Recommendation Networks: A Data-Centric Approach**](https://arxiv.org/pdf/2206.02626.pdf) | Neural Tangent Kernel | Application: Recommender System  | Amazon Magazine, ML-1M, Douban, Netflix | NeurIPS, 2022 | [Code](https://github.com/noveens/distill_cf) |

| Dingfang Chen et al | [**Private Set Generation with Discriminative Information**](https://openreview.net/pdf?id=mxnxRw8jiru) |  |  Application: Private Data Generation | MNIST, FashionMNIST | NeurIPS, 2022 | [Code](https://github.com/DingfanChen/Private-Set), [Poster](https://nips.cc/media/PosterPDFs/NeurIPS%202022/53552.png?t=1668599242.828518) |

| Justin Cui et al | [**DC-BENCH: Dataset Condensation Benchmark**](https://openreview.net/pdf?id=Bs8iFQ7AM6) | Benchmark  | Image Classification |  | NeurIPS, 2022 | [Code](https://dc-bench.github.io/), [Poster](https://nips.cc/media/PosterPDFs/NeurIPS%202022/55673.png?t=1669626268.8753998) |

|Yongchao Zhou et al |[**Dataset Distillation using Neural Feature Regression**](https://openreview.net/pdf?id=2clwrA2tfik) |  | Image Classification | CIFAR100, TinyImageNet, ImageNette, ImageWoof | NeurIPS, 2022 | [Code](https://github.com/yongchao97/FRePo), [Slide](https://docs.google.com/presentation/d/10NMtEVsW-nbEWgbTEJQYMH-rdgOklXZF/edit#slide=id.p3) |

| Songhua Liu et al |[**Dataset Distillation via Factorization**](https://openreview.net/pdf?id=luGXvawYWJ) |  | Image Classification | SVHN, CIFAR10/100 | NeurIPS, 2022 | [Code](https://github.com/Huage001/DatasetFactorization), [Poster](https://nips.cc/media/PosterPDFs/NeurIPS%202022/55231.png?t=1668961755.9041288) |

| Noel Loo et al|[**Efficient Dataset Distillation using Random Feature Approximation**](https://openreview.net/pdf?id=h8Bd7Gm3muB) |  | Image Classification | MNIST, FashionMNIST, SVHN, CIFAR-10/100 | NeurIPS, 2022 | [Code](https://github.com/yolky/RFAD), [Poster](https://nips.cc/media/PosterPDFs/NeurIPS%202022/4be2c8f27b8a420492f2d44463933eb6.png?t=1666483874.2999172) |

| Yihan Wu et al |[**Towards Robust Dataset Learning**](https://arxiv.org/pdf/2211.10752.pdf) | Tri-level Optimization |  Robust Image Classification | MNIST, CIFAR10, TinyImageNet | arXiv, Nov., 2022 |  |

| Andrey Zhmoginov et al |[**Decentralized Learning with Multi-Headed Distillation**](https://arxiv.org/pdf/2211.15774.pdf) | Local DD |  Application: FL | CIFAR-10/100 | arXiv, Nov., 2022 |  |

| Jiawei Du et al |[**Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation**](https://arxiv.org/pdf/2211.11004.pdf) | Accumulated Trajectory Matching |  Image Classification |  | arXiv, Nov., 2022 |  |

| Justin Cui et al |[**Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory**](https://arxiv.org/abs/2211.10586) |  | Image Classification | CIFAR-10/100, ImageNet-1K  | arXiv, Nov., 2022 |  |

| Renjie Pi et al |[**DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics**](https://arxiv.org/pdf/2211.10878.pdf) |  |  Application: FL | FMNIST, CIFAR10, CINIC10 | arXiv, Nov., 2022 |  |

| Zongwei Wang et al |[**Quick Graph Conversion for Robust Recommendation**](https://arxiv.org/pdf/2210.10321.pdf) | Gradient Matching |  Application: Recommender System | Beauty, Alibaba-iFashion, Yelp2018 | arXiv, Oct., 2022 | |

| Yulan Chen et al |[**Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation**](https://ieeexplore.ieee.org/document/9897234) |  |  Application: Fashion Analysis |  | ICIP, 2022 |  |

| Yuna Jeong et al |[**Training data selection based on dataset distillation for rapid deployment in machine-learning workflows**](https://link.springer.com.remotexs.ntu.edu.sg/article/10.1007/s11042-022-13701-6) |  |  Application: Dataset Selection |  | Multimedia Tools and Applications, 2022 | |

| Yanlin Zhou et al |[**Communication-Efficient and Attack-Resistant Federated Edge Learning with Dataset Distillation**](https://ieeexplore.ieee.org/abstract/document/9925087) | MNIST, Landmark, IMDB, etc |  Application: FL |  | IEEE TCC, 2022 | [Code]() |

| Nicholas Carlini et al |[**No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy"**](https://arxiv.org/abs/2209.14987) |  |  Application: Privacy | CIFAR-10 | arXiv, Sept., 2022 |  |

| Guang Li et al |[**Dataset Distillation for Medical Dataset Sharing**](https://arxiv.org/pdf/2209.14603.pdf) | Trajectory Matching |  Application: Medical Data Sharing| COVID-19 Chest X-ray | arXiv, Sept., 2022 |  |

| Guang Li et al |[**Dataset Distillation using Parameter Pruning**](https://arxiv.org/pdf/2209.14609.pdf) | Parameter Pruning |  Image Classification |  CIFAR-10/100 | arXiv, Sept., 2022 |  |

| Ping Liu et al |[**Meta Knowledge Condensation for Federated Learning**](https://arxiv.org/abs/2209.14851) |  |  Application: FL | MNIST | arXiv, Sept., 2022 |  |

| Dmitry Medvedev et al |[**Learning to Generate Synthetic Training Data Using Gradient Matching and Implicit Differentiation**](https://link.springer.com/chapter/10.1007/978-3-031-15168-2_12) | Gradient Matching, Implicit Differentiation |  Image Classification | MNIST | CCIS, 2022 | [Code](https://github.com/dm-medvedev/EfficientDistillation) |

| Wei Jin et al |[**Condensing Graphs via One-Step Gradient Matching**](https://dl.acm.org/doi/abs/10.1145/3534678.3539429?casa_token=hjYiq57R1jcAAAAA:EPtmMLrdCCVYn1Zg1GWq6lVPAIYLOJiv63QE9LODfOGYLvBvRhv7JWsYdxcmW4Hda6t2TwoAewlHrQ) | Gradient Matching | Graph Classification |  | KDD, 2022 | [Code](https://github.com/amazon-science/doscond) |

| Rui Song et al |[**Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments**](https://arxiv.org/abs/2208.11311) | Local DD |  Application: FL | MNIST, CIFAR10 | arXiv, Aug., 2022 |  |

| Hae Beom Lee et al |[**Dataset Condensation with Latent Space Knowledge Factorization and Sharing**](https://arxiv.org/pdf/2208.10494.pdf) | Local DD | Image Classification |  | arXiv, Aug., 2022 |  |

| Thi-Thu-Huong Le et al |[**A Review of Dataset Distillation for Deep Learning**](https://ieeexplore.ieee.org/abstract/document/9932086) | Survey |  Image Classification |  | ICPTS, 2022 |  |

| Zixuan Jiang et al |[**Delving into Effective Gradient Matching for Dataset Condensation**](https://arxiv.org/pdf/2208.00311.pdf) | Gradient Matching |  Image Classification |  MNIST/FashionMNIST, SVHN, CIFAR-10/100. | arXiv, Jul., 2022 | [Code]() |

| Yuanhao Xiong et al |[**FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning**](https://arxiv.org/pdf/2207.09653.pdf) |  |  Application: FL | MNIST, CIFAR10/100 | arXiv, Jul., 2022 |  |

| Nikolaos Tsilivis et al |[**Can we achieve robustness from data alone?**](https://arxiv.org/pdf/2207.11727.pdf) | KIP  |  Security | MNIST, CIFAR-10 | arXiv, Jul., 2022 |  |

| Nadiya Shvai et al |[**DEvS: Data Distillation Algorithm Based on Evolution Strategy**](https://dl.acm.org/doi/pdf/10.1145/3520304.3528819) | Evolution Strategy |  Image Classification |  CIFAR-10 | GECCO, 2022 |  |

| Mattia Sangermano |[**Sample Condensation in Online Continual Learning**](https://ieeexplore.ieee.org/abstract/document/9892299/) | Gradient Matching |  Application: Continual learning | SplitMNIST, SplitFashionMNIST, SplitCIFAR10 | IJCNN, 2022 | [Code](https://github.com/MattiaSangermano/OLCGM) |

| Brian Moser et al |[**Less is More: Proxy Datasets in NAS approaches**](https://openaccess.thecvf.com/content/CVPR2022W/NAS/papers/Moser_Less_Is_More_Proxy_Datasets_in_NAS_Approaches_CVPRW_2022_paper.pdf) |  |  Application: NAS |  | CVPRW, 2022 | |

| George Cazenavette et al |[**Wearable ImageNet: Synthesizing Tileable Textures via Dataset Distillation**](https://openaccess.thecvf.com/content/CVPR2022W/CVFAD/papers/Cazenavette_Wearable_ImageNet_Synthesizing_Tileable_Textures_via_Dataset_Distillation_CVPRW_2022_paper.pdf) |  | Image Classification |  |CVPRW, 2022 | [Code](https://github.com/GeorgeCazenavette/mtt-distillation) |

| George Cazenavette et al |[**Dataset Distillation by Matching Training Trajectories**](https://openaccess.thecvf.com/content/CVPR2022/papers/Cazenavette_Dataset_Distillation_by_Matching_Training_Trajectories_CVPR_2022_paper.pdf) | Trajectory Matching  | Image Classification |  CIFAR-100, Tiny ImageNet, ImageNet subsets | CVPR, 2022 | [Code](https://georgecazenavette.github.io/mtt-distillation/) |

| Kai Wang et al |[**CAFE: Learning to Condense Dataset by Aligning Features**](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_CAFE_Learning_To_Condense_Dataset_by_Aligning_Features_CVPR_2022_paper.pdf) | Feature Alignment |  Image Classification | MNIST, FashionMNIST, SVHN, CIFAR10/100 | CVPR, 2022 | [Code](https://github.com/kaiwang960112/CAFE) |

| Mengyang Liu et al |[**Graph Condensation via Receptive Field Distribution Matching**](https://arxiv.org/pdf/2206.13697.pdf) | Rceptive Field Distribution Matching |  Graph Classification | Cora, PubMed, Citeseer, Ogbn-arxiv, Flikcr | arXiv, Jun., 2022 |  |

| Saehyung Lee et al |[**Dataset Condensation with Contrastive Signals**](https://proceedings.mlr.press/v162/lee22b/lee22b.pdf) | Contrastive Learning |  Image Classification | SVHN, CIFAR-10/100; Automobile, Terrier, Fish | ICML 2022 | [Code](https://github.com/Saehyung-Lee/DCC) |

| Jang-Hyun Kim et al |[**Dataset Condensation via Efficient Synthetic-Data Parameterization**](https://proceedings.mlr.press/v162/kim22c/kim22c.pdf) |  |  Image Classification |  CIFAR-10, ImageNet, Speech Commands | ICML, 2022 | [Code](https://github.com/snu-mllab/Efficient-Dataset-Condensation) |

| Tian Dong et al |[**Privacy for Free: How does Dataset Condensation Help Privacy?**](https://proceedings.mlr.press/v162/dong22c/dong22c.pdf) | Application: Privacy | Image Classification |  | ICML, 2022 |  |

| Paul Vicol et al |[**On Implicit Bias in Overparameterized Bilevel Optimization**](https://proceedings.mlr.press/v162/vicol22a.html) | General Optimization |  Image Classification | MNIST | ICML, 2022 |  |

| Wei Jin et al |[**Graph Condensation for Graph Neural Networks**](https://openreview.net/pdf?id=WLEx3Jo4QaB) | Gradient Matching | Graph Classification | Cora, Citeseer, Ogbn-arxiv; Reddit, Flickr | ICLR, 2022 | [Code](https://github.com/ChandlerBang/GCond) |

| Bo Zhao et al |[**Synthesizing Informative Training Samples with GAN**](https://arxiv.org/pdf/2204.07513.pdf) | GAN | Image Classification | CIFAR-10/100  | arXiv, Apr. 2022  | [Code](https://github.com/VICO-UoE/IT-GAN) |

| Shengyuan Hu et al |[**FedSynth: Gradient Compression via Synthetic Data in Federated Learning**](https://arxiv.org/pdf/2204.01273.pdf) |  | Application: FL | MNIST, FEMNIST, Reddit |  |  |

| Aminu Musa et al |[**Learning from Small Datasets: An Efficient Deep Learning Model for Covid-19 Detection from Chest X-ray Using Dataset Distillation Technique**](https://ieeexplore.ieee.org/abstract/document/9803131) |  | Application: Medical Imaging | Chest X-ray | NIGERCON, 2022 |  |

| Seong-Woong Kim et al |[**Stable Federated Learning with Dataset Condensation**](http://jcse.kiise.org/files/V16N1-05.pdf) |  |  Application: FL | CIFAR-10 | JCSE, 2022 |  |

| Robin T. Schirrmeister et al |[**When less is more: Simplifying inputs aids neural network understanding**](https://arxiv.org/pdf/2201.05610.pdf) |  |  Application: Understanding NN | MNIST, Fashion-MNIST, CIFAR10/100, | arXiv, Jan, 2022 | |

| Isha Garg et al |[**TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data**](https://arxiv.org/pdf/2201.08494.pdf) |  |  Application: FL |  | arXiv, Jan., 2022 |  |

## 2021

### Papers in 2021 [[Back-to-top](#Dataset-distillation-papers)]

| Author    | Title     | Type      | Task      | Dataset       | Venue     | Supp. Material     |

|---------|---------|---------|---------|---------|---------|---------|

| Timothy Nguyen et al |[**Dataset Distillation with Infinitely Wide Convolutional Networks**](https://openreview.net/pdf?id=hXWPpJedrVP) | Kernel Ridge Regression |  Image Classification | MNIST, Fashion-MNIST, CIFAR-10/100, SVHN | NeurIPS, 2021 | [Code](https://github.com/google-research/google-research/tree/master/kip) |

| Bo Zhao et al |[**Dataset Condensation with Distribution Matching**](https://arxiv.org/pdf/2110.04181.pdf) | Distribution Matching | Image Classification | MNIST, CIFAR10/100, TinyImageNet | arXiv, Oct., 2021 | [Code](https://github.com/VICO-UoE/DatasetCondensation) |

| Ilia Sucholutsky et al |[**Soft-Label Dataset Distillation and Text Dataset Distillation**](https://ieeexplore.ieee.org/abstract/document/9533769) | Label Distillation |  Image/Text Classification | MNIST, IMDB | IJCNN, 2021 | [Code](https://github.com/ilia10000/dataset-distillation) |

| Felix Wiewel et al |[**Condensed Composite Memory Continual Learning**](https://ieeexplore.ieee.org/abstract/document/9533491/) | Gradient Matching |  Application: Continual Learning |  | IJCNN, 2021 | [Code](https://github.com/FelixWiewel/CCMCL) |

| Bo Zhao et al |[**Dataset Condensation with Differentiable Siamese Augmentation**](http://proceedings.mlr.press/v139/zhao21a/zhao21a.pdf) | Data Augmentation | Image Classification | MNIST, FashionMNIST, SVHN, CIFAR10/100  | ICML, 2021 | [Code](https://github.com/VICO-UoE/DatasetCondensation), [Video](https://slideslive.com/38958791/dataset-condensation-with-differentiable-siamese-augmentation?ref=recommended) |

| Timothy Nguyen et al |[**Dataset Meta-Learning from Kernel Ridge-Regression**](https://openreview.net/pdf?id=l-PrrQrK0QR) | Kernel Ridge Regression |  Image Classification | MNIST, CIFAR-10 | ICLR, 2021 | [Code](https://github.com/google-research/google-research/tree/master/kip) |

| Bo Zhao et al |[**Dataset Condensation with Gradient Matching**](https://openreview.net/pdf?id=mSAKhLYLSsl) | Gradient Matching | Image Classification | CIFAR-10, Fashion-MNIST, MNIST, SVHN, USPS | ICLR, 2021 | [Code](https://github.com/VICO-UoE/DatasetCondensation) |

| |[**New Properties of the Data Distillation Method When Working with Tabular Data**](https://link.springer.com/chapter/10.1007/978-3-030-72610-2_29) | Simulation |  Tabular Classification |  | LNISA, 2021 | [Code](https://github.com/dm-medvedev/dataset-distillation) |

| Yongqi Li et al |[**Data Distillation for Text Classification**](https://arxiv.org/abs/2104.08448) |  |  Text Classification |  | arXiv, Apr., 2021 | [Code]() |

| Ilia Sucholutsky et al |[**‘Less Than One’-Shot Learning: Learning N Classes From M