We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup.
HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads / Fotouhi, P.; Fariborz, M.; Proietti, R.; Lowe-Power, J.; Akella, V.; Yoo, S. J. B.. - STAMPA. - 12728:(2021), pp. 176-194. (Intervento presentato al convegno 36th International Conference on High Performance Computing, ISC High Performance 2021 tenutosi a Virtual nel 24 June - 2 July 2021) [10.1007/978-3-030-78713-4_10].
HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads
Proietti R.;
2021
Abstract
We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup.File | Dimensione | Formato | |
---|---|---|---|
Pages from 978-3-030-78713-4.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
ISC_2021_HTA.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
668.79 kB
Formato
Adobe PDF
|
668.79 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2973500