With the growing complexity of workflows brought by the recent integration of machine learning, deep learning and big data analytics techniques, there is an ever increasing demand for compute, network and storage resources which require innovative approaches to their management, as well as their easy access and use (including the cloud model). Although there is an abundance of resources in today’s HPC infrastructures, they remained shared across the users, and for certain use cases (e.g., urgent computing applications) they may be still not enough to fulfil the workflow requirements. Also, specific computing resources (e.g., hardware accelerators) may be accessible only within certain datacentres. To cope with these challenges, a secure interconnection among multiple HPC datacentres that allows mutual access to their resources (federation) is considered. This paper focuses on the extension of the SimGrid software library, a C++ based simulation framework, for evaluating the jobs allocation strategies that lay at the core of a federated execution platform. A greedy-based allocation strategy has been evaluated against random and round-robin approaches; then, this greedy allocation strategy has been integrated within the main orchestration service developed in the context of the LEXIS federated execution platform. Tests with real workflows showed the capability of this greedy allocation strategy to dynamically select the best suitable execution cluster for different jobs.
Dynamic Job Allocation on Federated Cloud-HPC Environments / Vitali, G.; Scionti, A.; Viviani, P.; Vercellino, C.; Terzo, O.. - ELETTRONICO. - 497:(2022), pp. 71-82. (Intervento presentato al convegno 16th International Conference on Complex, Intelligent and Software Intensive Systems, CISIS 2022 tenutosi a Online nel 2022) [10.1007/978-3-031-08812-4_8].
Dynamic Job Allocation on Federated Cloud-HPC Environments
Vitali G.;Scionti A.;Vercellino C.;
2022
Abstract
With the growing complexity of workflows brought by the recent integration of machine learning, deep learning and big data analytics techniques, there is an ever increasing demand for compute, network and storage resources which require innovative approaches to their management, as well as their easy access and use (including the cloud model). Although there is an abundance of resources in today’s HPC infrastructures, they remained shared across the users, and for certain use cases (e.g., urgent computing applications) they may be still not enough to fulfil the workflow requirements. Also, specific computing resources (e.g., hardware accelerators) may be accessible only within certain datacentres. To cope with these challenges, a secure interconnection among multiple HPC datacentres that allows mutual access to their resources (federation) is considered. This paper focuses on the extension of the SimGrid software library, a C++ based simulation framework, for evaluating the jobs allocation strategies that lay at the core of a federated execution platform. A greedy-based allocation strategy has been evaluated against random and round-robin approaches; then, this greedy allocation strategy has been integrated within the main orchestration service developed in the context of the LEXIS federated execution platform. Tests with real workflows showed the capability of this greedy allocation strategy to dynamically select the best suitable execution cluster for different jobs.File | Dimensione | Formato | |
---|---|---|---|
dynamic_job.pdf
Open Access dal 18/06/2023
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
2.51 MB
Formato
Adobe PDF
|
2.51 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2974767