This paper proposes a machine-learning (ML)-aided cognitive approach for effective bandwidth reconfiguration in optically interconnected datacenter/high-performance computing (HPC) systems. The proposed approach relies on a Hyper-X-like architecture augmented with flexible-bandwidth photonic interconnections at large scales using a hierarchical intra/inter-POD photonic switching layout. We first formulate the problem of the connectivity graph and routing scheme optimization as a mixed-integer linear programming model. A two-phase heuristic algorithm and a joint optimization approach are devised to solve the problem with low time complexity. Then, we propose an ML-based end-to-end performance estimator design to assist the network control plane with intelligent decision making for bandwidth reconfiguration. Numerical simulations using traffic distribution profiles extracted from HPC applications traces as well as random traffic matrices verify the accuracy performance of the ML design estimator (<9% error) and demonstrate up to 5 x throughput gain from the proposed approach compared with the baseline Hyper-X network using fixed all-to-all intra/inter-portable data center interconnects. (C) 2021 Optical Society of America

Machine-learning-aided cognitive reconfiguration for flexible-bandwidth HPC and data center networks [Invited] / Chen, Xl; Proietti, R; Fariborz, M; Liu, Cy; Yoo, Sjb. - In: JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING. - ISSN 1943-0620. - 13:6(2021), pp. C10-C20. [10.1364/JOCN.412360]

Machine-learning-aided cognitive reconfiguration for flexible-bandwidth HPC and data center networks [Invited]

Proietti, R;
2021

Abstract

This paper proposes a machine-learning (ML)-aided cognitive approach for effective bandwidth reconfiguration in optically interconnected datacenter/high-performance computing (HPC) systems. The proposed approach relies on a Hyper-X-like architecture augmented with flexible-bandwidth photonic interconnections at large scales using a hierarchical intra/inter-POD photonic switching layout. We first formulate the problem of the connectivity graph and routing scheme optimization as a mixed-integer linear programming model. A two-phase heuristic algorithm and a joint optimization approach are devised to solve the problem with low time complexity. Then, we propose an ML-based end-to-end performance estimator design to assist the network control plane with intelligent decision making for bandwidth reconfiguration. Numerical simulations using traffic distribution profiles extracted from HPC applications traces as well as random traffic matrices verify the accuracy performance of the ML design estimator (<9% error) and demonstrate up to 5 x throughput gain from the proposed approach compared with the baseline Hyper-X network using fixed all-to-all intra/inter-portable data center interconnects. (C) 2021 Optical Society of America
File in questo prodotto:
File Dimensione Formato  
jocn-13-6-C10.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.63 MB
Formato Adobe PDF
2.63 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
ViewAcceptedManuscript.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.21 MB
Formato Adobe PDF
1.21 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2972080