In image Super-Resolution (SR), relying on large datasets for training is a double-edged sword. While offering rich training material, they also demand substantial computational and storage resources. In this work, we analyze dataset pruning to solve these challenges. We introduce a novel approach that reduces a dataset to a core-set of training samples, selected based on their loss values as determined by a simple pre-trained SR model. By focusing the training on just 50\% of the original dataset, specifically on the samples characterized by the highest loss values, we achieve results comparable to or surpassing those obtained from training on the entire dataset. Interestingly, our analysis reveals that the top 5\% of samples with the highest loss values negatively affect the training process. Excluding these samples and adjusting the selection to favor easier samples further enhances training outcomes. Our work opens new perspectives to the untapped potential of dataset pruning in image SR. It suggests that careful selection of training data based on loss-value metrics can lead to better SR models, challenging the conventional wisdom that more data inevitably leads to better performance.
Image Super-Resolution (SR) techniques are crucial in image processing, enabling the reconstruction of high-resolution (HR) images from low-resolution (LR) counterparts. These techniques have diverse applications, from enhancing consumer photography to improving satellite and medical imagery. However, training SR models requires significant computational resources and large-scale datasets to capture the diversity of textures and patterns essential for effective upscaling. Despite recent advancements in deep learning that have improved SR techniques, the resource-intensive nature of training these models remains a challenge. Models like SwinIR and HAT have set new benchmarks, but their success often depends on extensive and diverse training data. To address these challenges, our work explores dataset pruning as a strategy to enhance the efficiency of SR model training without compromising output quality. Our contribution is twofold. First, we propose a novel loss-value-based sampling method for dataset pruning in image SR, leveraging a simple pre-trained SR model, SRCNN. Unlike traditional approaches that use the entirety of available data, our method selectively identifies the most informative samples. Second, we demonstrate that training SR models on a pruned dataset, comprising 50% of the original data selected based on their loss values, can achieve comparable or superior performance to training on the full dataset. Refining this selection by excluding the top 5% hardest samples further enhances training efficiency. Through this work, we aim to shift how training datasets are curated for SR tasks, advocating for a loss-value-driven approach to dataset pruning. This strategy significantly reduces storage requirements and offers a scalable solution adaptable to the evolving complexities and requirements of image SR.
BibTeX
@inproceedings{moser2024study,
title={A Study in Dataset Pruning for Image Super-Resolution},
author={Moser, Brian B and Raue, Federico and Dengel, Andreas},
booktitle={International Conference on Artificial Neural Networks},
pages={351--363},
year={2024},
organization={Springer}
}