With the growing complexity of workflows brought by the recent integration of machine learning, deep learning and big data analytics techniques, there is an ever increasing demand for compute, network and storage resources which require innovative approaches to their management, as well as their easy access and use (including the cloud model). Although there is an abundance of resources in today’s HPC infrastructures, they remained shared across the users, and for certain use cases (e.g., urgent computing applications) they may be still not enough to fulfil the workflow requirements. Also, specific computing resources (e.g., hardware accelerators) may be accessible only within certain datacentres. To cope with these challenges, a secure interconnection among multiple HPC datacentres that allows mutual access to their resources (federation) is considered. This paper focuses on the extension of the SimGrid software library, a C++ based simulation framework, for evaluating the jobs allocation strategies that lay at the core of a federated execution platform. A greedy-based allocation strategy has been evaluated against random and round-robin approaches; then, this greedy allocation strategy has been integrated within the main orchestration service developed in the context of the LEXIS federated execution platform. Tests with real workflows showed the capability of this greedy allocation strategy to dynamically select the best suitable execution cluster for different jobs.

Dynamic Job Allocation on Federated Cloud-HPC Environments / Vitali, G.; Scionti, A.; Viviani, P.; Vercellino, C.; Terzo, O.. - ELETTRONICO. - 497:(2022), pp. 71-82. (Intervento presentato al convegno 16th International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2022 tenutosi a Online nel 2022) [10.1007/978-3-031-08812-4_8].

Dynamic Job Allocation on Federated Cloud-HPC Environments

Vitali G.;Scionti A.;Vercellino C.;
2022

Abstract

With the growing complexity of workflows brought by the recent integration of machine learning, deep learning and big data analytics techniques, there is an ever increasing demand for compute, network and storage resources which require innovative approaches to their management, as well as their easy access and use (including the cloud model). Although there is an abundance of resources in today’s HPC infrastructures, they remained shared across the users, and for certain use cases (e.g., urgent computing applications) they may be still not enough to fulfil the workflow requirements. Also, specific computing resources (e.g., hardware accelerators) may be accessible only within certain datacentres. To cope with these challenges, a secure interconnection among multiple HPC datacentres that allows mutual access to their resources (federation) is considered. This paper focuses on the extension of the SimGrid software library, a C++ based simulation framework, for evaluating the jobs allocation strategies that lay at the core of a federated execution platform. A greedy-based allocation strategy has been evaluated against random and round-robin approaches; then, this greedy allocation strategy has been integrated within the main orchestration service developed in the context of the LEXIS federated execution platform. Tests with real workflows showed the capability of this greedy allocation strategy to dynamically select the best suitable execution cluster for different jobs.
2022
9783031088117
9783031088124
File in questo prodotto:
File Dimensione Formato  
dynamic_job.pdf

Open Access dal 18/06/2023

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 2.51 MB
Formato Adobe PDF
2.51 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2974767