To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.
Accelerating Quantized DNN Layers on RISC-V with a STAR MAC Unit / Manca, Edward; Urbinati, Luca; Casu, Mario R.. - ELETTRONICO. - 1113:(2024), pp. 43-53. (Intervento presentato al convegno 54th Annual Meeting of the Italian Electronics Society tenutosi a Noto (SR), Italia nel September 6-8, 2023) [10.1007/978-3-031-48711-8_6].
Accelerating Quantized DNN Layers on RISC-V with a STAR MAC Unit
Manca, Edward;Urbinati, Luca;Casu, Mario R.
2024
Abstract
To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.File | Dimensione | Formato | |
---|---|---|---|
SIE2023_proc_099_post_print.pdf
non disponibili
Descrizione: Post-print version
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
585.97 kB
Formato
Adobe PDF
|
585.97 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
SIE2023_proc_099_post_print_accepted.pdf
accesso aperto
Descrizione: Post-print author's accepted
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
1.04 MB
Formato
Adobe PDF
|
1.04 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2984332