To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.

Accelerating Quantized DNN Layers on RISC-V with a STAR MAC Unit / Manca, Edward; Urbinati, Luca; Casu, Mario R.. - ELETTRONICO. - 1113:(2024), pp. 43-53. (Intervento presentato al convegno 54th Annual Meeting of the Italian Electronics Society tenutosi a Noto (SR), Italia nel September 6-8, 2023) [10.1007/978-3-031-48711-8_6].

Accelerating Quantized DNN Layers on RISC-V with a STAR MAC Unit

Manca, Edward;Urbinati, Luca;Casu, Mario R.
2024

Abstract

To support quantized neural networks in low-end CPUs, we propose STAR MAC, a reconfigurable multiply-and-accumulate unit based on a modified Baugh-Wooley architecture that operates at a variable reduced precision. We integrated it in a small RISC-V processor called Ibex obtaining an acceleration up to 5.8 in Fully-Connected (FC) layers, 3.7 in 2D-Convolution (2DConv) layers, and 2.8 in Depth-Wise Convolution (DWConv) layers, with respect to the original Ibex core (Orig.), and up to 4.5 in FC layers, 3.0 in 2DConv layers, and 2.3 in DWConv layers, against a modified Ibex core supporting standard 32-bit MAC operations (Orig.+MAC). Area and power in a 28-nm technology with 200 and 600 MHz target clock frequency are 0.015 and 0.017 mm, and 1.5 and 4.3 mW, respectively, with a limited overhead within 10% and 3% with respect to Orig., and within 3% and 3% against Orig.+MAC.
2024
978-3-031-48710-1
978-3-031-48711-8
File in questo prodotto:
File Dimensione Formato  
SIE2023_proc_099_post_print.pdf

non disponibili

Descrizione: Post-print version
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 585.97 kB
Formato Adobe PDF
585.97 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
SIE2023_proc_099_post_print_accepted.pdf

accesso aperto

Descrizione: Post-print author's accepted
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2984332