Dataset distillation has gained significant interest in recent years, yet existing approaches typically distill from the entire dataset, potentially including non-beneficial samples. We introduce a novel “Prune First, Distill After” framework that systematically prunes datasets via loss-based sampling prior to distillation. By leveraging pruning before classical distillation techniques and generative priors, we create a representative core-set that leads to enhanced generalization for unseen architectures - a significant challenge of current distillation methods. More specifically, our proposed framework significantly boosts distilled quality, achieving up to a 5.2 percentage points accuracy increase even with substantial dataset pruning, i.e., removing 80% of the original dataset prior to distillation. Overall, our experimental results highlight the advantages of our easy-sample prioritization and cross-architecture robustness, paving the way for more effective and high-quality dataset distillation.
Our work introduces a 'Prune first, distill after' approach for dataset distillation with generative priors (GLaD and LD3M).
By simply sorting the images in your dataset via a loss-value score from your classifier, you can prune unbeneficial samples out before distillation and reach improved performance.
In our written work, we show that removing at least 40% of the hardest (high loss) before distillation leads to dramatic training quality across multiple architectures (AlexNet, VGG-11, ResNet-18, and ViT).
@article{moser2024distill,
title={Distill the Best, Ignore the Rest: Improving Dataset Distillation with Loss-Value-Based Pruning},
author={Moser, Brian B and Raue, Federico and Nauen, Tobias C and Frolov, Stanislav and Dengel, Andreas},
journal={arXiv preprint arXiv:2411.12115},
year={2024}
}