Across different Deep Learning (DL) applications or within the same application but in different phases, bitwidth precision of activations and weights may vary. Moreover, energy and latency of MAC units have to be minimized, especially at the edge. Hence, various precision-scalable MAC units optimized for DL have recently emerged. Our contribution is a new precision-configurable multiplier/dot-product unit based on a modified Radix-4 Booth signed multiplier with Sum-Together (ST) mode. Besides 16-bit full precision multiplications, it can be reconfigured to perform dot products among two 8-bit or four 4-bit sub words of the input operands without requiring an external adder, thus reducing the number of cycles of MAC operations. The results of the synthesis in performance, power and area on a 28-nm technology show that our unit (1) is superior to other state of the art ST multipliers in area (≈35% less) in the clock frequency range between 100 and 1000 MHz and (2) reduces latency up to 4x when used to compute a convolutional layer, at the cost of limited overheads in area (+10%) and power (+13%) compared to a conventional 16-bit Booth multiplier. This unit can play an important role in designing variable-precision MAC units or DL accelerators for edge devices.

A Reconfigurable Multiplier/Dot-Product Unit for Precision-Scalable Deep Learning Applications / Urbinati, Luca; Casu, Mario R.. - ELETTRONICO. - 1005:(2023), pp. 9-14. (Intervento presentato al convegno 53rd Annual Meeting of the Italian Electronics Society tenutosi a Pizzo (VV), Italia nel September 7-9, 2022) [10.1007/978-3-031-26066-7_2].

A Reconfigurable Multiplier/Dot-Product Unit for Precision-Scalable Deep Learning Applications

Luca Urbinati;Mario R. Casu
2023

Abstract

Across different Deep Learning (DL) applications or within the same application but in different phases, bitwidth precision of activations and weights may vary. Moreover, energy and latency of MAC units have to be minimized, especially at the edge. Hence, various precision-scalable MAC units optimized for DL have recently emerged. Our contribution is a new precision-configurable multiplier/dot-product unit based on a modified Radix-4 Booth signed multiplier with Sum-Together (ST) mode. Besides 16-bit full precision multiplications, it can be reconfigured to perform dot products among two 8-bit or four 4-bit sub words of the input operands without requiring an external adder, thus reducing the number of cycles of MAC operations. The results of the synthesis in performance, power and area on a 28-nm technology show that our unit (1) is superior to other state of the art ST multipliers in area (≈35% less) in the clock frequency range between 100 and 1000 MHz and (2) reduces latency up to 4x when used to compute a convolutional layer, at the cost of limited overheads in area (+10%) and power (+13%) compared to a conventional 16-bit Booth multiplier. This unit can play an important role in designing variable-precision MAC units or DL accelerators for edge devices.
2023
978-3-031-26065-0
978-3-031-26066-7
File in questo prodotto:
File Dimensione Formato  
sie22_urbinati_7699_preprint.pdf

Open Access dal 29/02/2024

Descrizione: Pre-print version
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF Visualizza/Apri
Urbinati_and_Casu_2023_A_Reconfigurable_MultiplierDot-Product_Unit_for_Precision-scalable_Deep_Learning_Applications.pdf

non disponibili

Descrizione: Post-print version
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 339.29 kB
Formato Adobe PDF
339.29 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2977769