To achieve an optimal balance between accuracy and latency in Deep Neural Networks (DNNs), precision-scalability has become a paramount feature for hardware specialized for Machine Learning (ML) workloads. Recently, many precision-scalable (PS) multipliers and multiply-and-accumulate (MAC) units have been proposed. They are mainly divided in two categories, Sum-Apart (SA) and Sum-Together (ST), and have been always presented as alternative implementations. Instead, in this paper, we introduce for the first time a new class of PS Sum-Together/Apart Reconfigurable multipliers, which we call STAR, designed to support both SA and ST modes with a single reconfigurable architecture. STAR multipliers could be useful in MAC units of CPU or hardware accelerators, for example, enabling them to handle both 2D Convolution (in ST mode) and Depth-wise Convolution (in SA mode) with a unique PS hardware design, thus saving hardware resources. We derive four distinct STAR multiplier architectures, including two derived from the well-known Divide-and-Conquer and Sub-word Parallel SA and ST families, which support 16, 8 and 4-bit precision. We perform an extensive exploration of these architectures in terms of power, performance, and area, across a wide range of clock frequency constraints, from 0.4 to 2.0 GHz, targeting a 28-nm CMOS technology. We identify the Pareto-optimal solutions with the lowest area and power in the low-frequency, mid-frequency, and high-frequency ranges. Our findings allow designers to select the best STAR solution depending on their design target, either low-power and low-area, high performance, or balanced.

STAR: Sum-Together/Apart Reconfigurable Multipliers for Precision-Scalable ML Workloads / Manca, Edward; Urbinati, Luca; Casu, Mario R.. - ELETTRONICO. - (2024), pp. 1-6. (Intervento presentato al convegno Design, Automation & Test in Europe Conference & Exhibition (DATE) tenutosi a Valencia (Spain) nel 25-27 March 2024).

STAR: Sum-Together/Apart Reconfigurable Multipliers for Precision-Scalable ML Workloads

Edward Manca;Luca Urbinati;Mario R. Casu
2024

Abstract

To achieve an optimal balance between accuracy and latency in Deep Neural Networks (DNNs), precision-scalability has become a paramount feature for hardware specialized for Machine Learning (ML) workloads. Recently, many precision-scalable (PS) multipliers and multiply-and-accumulate (MAC) units have been proposed. They are mainly divided in two categories, Sum-Apart (SA) and Sum-Together (ST), and have been always presented as alternative implementations. Instead, in this paper, we introduce for the first time a new class of PS Sum-Together/Apart Reconfigurable multipliers, which we call STAR, designed to support both SA and ST modes with a single reconfigurable architecture. STAR multipliers could be useful in MAC units of CPU or hardware accelerators, for example, enabling them to handle both 2D Convolution (in ST mode) and Depth-wise Convolution (in SA mode) with a unique PS hardware design, thus saving hardware resources. We derive four distinct STAR multiplier architectures, including two derived from the well-known Divide-and-Conquer and Sub-word Parallel SA and ST families, which support 16, 8 and 4-bit precision. We perform an extensive exploration of these architectures in terms of power, performance, and area, across a wide range of clock frequency constraints, from 0.4 to 2.0 GHz, targeting a 28-nm CMOS technology. We identify the Pareto-optimal solutions with the lowest area and power in the low-frequency, mid-frequency, and high-frequency ranges. Our findings allow designers to select the best STAR solution depending on their design target, either low-power and low-area, high performance, or balanced.
2024
978-3-9819263-8-5
File in questo prodotto:
File Dimensione Formato  
STAR_Sum-Together_Apart_Reconfigurable_Multipliers_for_Precision-Scalable_ML_Workloads.pdf

non disponibili

Descrizione: paper post-print 2a
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.32 MB
Formato Adobe PDF
1.32 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Manca Urbinati Casu - STAR - DATE2024 - camera ready_post_print_accepted.pdf

accesso aperto

Descrizione: paper post-print 2
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 863.35 kB
Formato Adobe PDF
863.35 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2989782