Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (gamma-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.gamma -Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of gamma-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.
γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training / Liu, Lei; Zhang, Peng; Liang, Yunji; Liu, Junrui; Morra, Lia; Guo, Bin; Yu, Zhiwen; Zhang, Yanyong; Zeng, Daniel D.. - In: IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS. - ISSN 2329-924X. - (2024), pp. 1-15. [10.1109/tcss.2024.3453600]
γ-Razor: Hardness-Aware Dataset Pruning for Efficient Neural Network Training
Morra, Lia;
2024
Abstract
Training deep neural networks (DNNs) on largescale datasets is often inefficient with large computational needs and significant energy consumption. Although great efforts have been taken to optimize DNNs, few studies focused on the inefficiency caused by the data samples with less value for model training. In this article, we empirically demonstrate that sample complexity is important for model efficiency and selecting representative samples is constructive to the model efficiency. In particular, we propose hardness-aware dataset pruning method (gamma-Razor) to select representative samples from large-scale datasets to remove the less valuable data samples for model training.gamma -Razor is a two-stage framework that includes interclass sampling and intraclass sampling. First, we introduce the inverse self-paced learning strategy to learn hard samples and adjust their weights adaptively according to the inverse frequency of effective samples of each class. For intraclass sampling, hardness-aware cluster sampling algorithm is proposed to downsample easy samples within each class. To evaluate the performance of gamma-Razor, we conducted extensive experiments on three large-scale datasets for image classification tasks. The experimental results show that models trained with the pruned datasets show competitive performances against their counterparts trained with the original large-scale datasets in terms of robustness and efficiency. Furthermore, models trained with the pruned datasets converge faster with lower energy consumption.File | Dimensione | Formato | |
---|---|---|---|
-Razor_Hardness-Aware_Dataset_Pruning_for_Efficient_Neural_Network_Training.pdf
non disponibili
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
2.36 MB
Formato
Adobe PDF
|
2.36 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2993544