We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup.

HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads / Fotouhi, P.; Fariborz, M.; Proietti, R.; Lowe-Power, J.; Akella, V.; Yoo, S. J. B.. - STAMPA. - 12728:(2021), pp. 176-194. (Intervento presentato al convegno 36th International Conference on High Performance Computing, ISC High Performance 2021 tenutosi a Virtual nel 24 June - 2 July 2021) [10.1007/978-3-030-78713-4_10].

HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads

Proietti R.;
2021

Abstract

We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup.
2021
978-3-030-78712-7
978-3-030-78713-4
File in questo prodotto:
File Dimensione Formato  
Pages from 978-3-030-78713-4.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.3 MB
Formato Adobe PDF
1.3 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
ISC_2021_HTA.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 668.79 kB
Formato Adobe PDF
668.79 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2973500