The volume reduction of the activation maps produced by the hidden layers of a Deep Neural Network (DNN) is a critical aspect in modern applications as it affects the on-chip memory utilization, the most limited and costly hardware resource. Despite the availability of many compression methods that leverage the statistical nature of deep learning to approximate and simplify the inference model, e.g., quantization and pruning, there is room for deterministic optimizations that instead tackle the problem from a computational view. This work belongs to this latter category as it introduces a novel method for minimizing the active memory footprint. The proposed technique, which is data-, model-, compiler-, and hardware-agnostic, does implement a functional-preserving, automated graph restructuring where the memory peaks are suppressed and distributed over time, leading to flatter profiles with less memory pressure. Results collected on a representative class of Convolutional DNNs with different topologies, from Vgg16 and SqueezeNetV1.1 to the recent MobileNetV2, ResNet18, and InceptionV3, provide clear evidence of applicability, showing remarkable memory savings (62.9% on average) with low computational overhead (8.6% on average).
Dataflow Restructuring for Active Memory Reduction in Deep Neural Networks / Cipolletta, A.; Calimera, A.. - (2021), pp. 114-119. (Intervento presentato al convegno 2021 Design, Automation and Test in Europe Conference and Exhibition, DATE 2021 nel 01-05 February 2021) [10.23919/DATE51398.2021.9473965].
Dataflow Restructuring for Active Memory Reduction in Deep Neural Networks
Cipolletta A.;Calimera A.
2021
Abstract
The volume reduction of the activation maps produced by the hidden layers of a Deep Neural Network (DNN) is a critical aspect in modern applications as it affects the on-chip memory utilization, the most limited and costly hardware resource. Despite the availability of many compression methods that leverage the statistical nature of deep learning to approximate and simplify the inference model, e.g., quantization and pruning, there is room for deterministic optimizations that instead tackle the problem from a computational view. This work belongs to this latter category as it introduces a novel method for minimizing the active memory footprint. The proposed technique, which is data-, model-, compiler-, and hardware-agnostic, does implement a functional-preserving, automated graph restructuring where the memory peaks are suppressed and distributed over time, leading to flatter profiles with less memory pressure. Results collected on a representative class of Convolutional DNNs with different topologies, from Vgg16 and SqueezeNetV1.1 to the recent MobileNetV2, ResNet18, and InceptionV3, provide clear evidence of applicability, showing remarkable memory savings (62.9% on average) with low computational overhead (8.6% on average).File | Dimensione | Formato | |
---|---|---|---|
DATE21_final_submitted.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
642.6 kB
Formato
Adobe PDF
|
642.6 kB | Adobe PDF | Visualizza/Apri |
Dataflow_Restructuring_for_Active_Memory_Reduction_in_Deep_Neural_Networks.pdf
non disponibili
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
409.83 kB
Formato
Adobe PDF
|
409.83 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2921765