Designs implemented on field-programmable gate arrays (FPGAs) via high-level synthesis (HLS) suffer from off-chip memory latency and bandwidth bottlenecks. FPGAs can access both large but slow off-chip memories (DRAM), and fast but small on-chip memories (block RAMs and registers). HLS tools allow exploiting the memory hierarchy in a scratchpad-like fashion, requring a significant manual effort. We propose an automation of the FPGA memory management in Xilinx Vitis HLS through a fully- configurable C++ source-level cache. Each DRAM-mapped array can be associated with a private level 2 (L2) cache with one or more ports, and each port can optionally provide level 1 cache. The L2 cache runs in a separate dataflow task with respect to the application accessing it. This solution isolates off-chip memory accesses and data buffering into dedicated dataflow tasks, resembling the load, compute, store design paradigm, but without the drawback of manual algorithm refactoring. Experimental results collected from FPGA board show that our cache speeds up the execution of a variety of benchmarks by up to 60 times compared to the out-of-the-box solution provided by HLS, requiring very limited optimization effort. Our caches are not meant to compete with manually optimized implementations quality of results (QoR), but rather to significantly save design effort, in exchange for some QoR, to make the HLS flow a bit more software-like, allowing the designer to focus on algorithmic optimizations, rather than on explicit memory management. Moreover, caching could be the only feasible memory optimization for algorithms with data-dependent or irregular memory access patterns, but with good data locality.

Array-specific dataflow caches for high-level synthesis of memory-intensive algorithms on FPGAs / Brignone, Giovanni; Jamal, Muhammad Usman; Lazarescu, Mihai T.; Lavagno, Luciano. - In: IEEE ACCESS. - ISSN 2169-3536. - ELETTRONICO. - 10:(2022), pp. 118858-118877. [10.1109/ACCESS.2022.3219868]

Array-specific dataflow caches for high-level synthesis of memory-intensive algorithms on FPGAs

Brignone, Giovanni;Jamal, Muhammad Usman;Lazarescu, Mihai T.;Lavagno, Luciano
2022

Abstract

Designs implemented on field-programmable gate arrays (FPGAs) via high-level synthesis (HLS) suffer from off-chip memory latency and bandwidth bottlenecks. FPGAs can access both large but slow off-chip memories (DRAM), and fast but small on-chip memories (block RAMs and registers). HLS tools allow exploiting the memory hierarchy in a scratchpad-like fashion, requring a significant manual effort. We propose an automation of the FPGA memory management in Xilinx Vitis HLS through a fully- configurable C++ source-level cache. Each DRAM-mapped array can be associated with a private level 2 (L2) cache with one or more ports, and each port can optionally provide level 1 cache. The L2 cache runs in a separate dataflow task with respect to the application accessing it. This solution isolates off-chip memory accesses and data buffering into dedicated dataflow tasks, resembling the load, compute, store design paradigm, but without the drawback of manual algorithm refactoring. Experimental results collected from FPGA board show that our cache speeds up the execution of a variety of benchmarks by up to 60 times compared to the out-of-the-box solution provided by HLS, requiring very limited optimization effort. Our caches are not meant to compete with manually optimized implementations quality of results (QoR), but rather to significantly save design effort, in exchange for some QoR, to make the HLS flow a bit more software-like, allowing the designer to focus on algorithmic optimizations, rather than on explicit memory management. Moreover, caching could be the only feasible memory optimization for algorithms with data-dependent or irregular memory access patterns, but with good data locality.
2022
File in questo prodotto:
File Dimensione Formato  
ACCESS3219868.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 1.17 MB
Formato Adobe PDF
1.17 MB Adobe PDF Visualizza/Apri
Array-Specific_Dataflow_Caches_for_High-Level_Synthesis_of_Memory-Intensive_Algorithms_on_FPGAs.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 2.15 MB
Formato Adobe PDF
2.15 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2973073