Taming Multi-node Accelerated Analytics: An Experience in Porting MATLAB to Scale with Python

Viviani, Paolo; Vitali, Giacomo; Lengani, Davide; Scionti, Alberto; Vercellino, Chiara; Terzo, Olivier

doi:10.1007/978-3-031-08812-4_20

High Performance Data Analytics (HPDA) at scale is a multifaceted problem that involves distributed computing resources, their location with respect to the data placement, fast networks, optimized I/O, hardware accelerators and a fairly complex software stack that draws both from the legacy of HPC and the cloud world. This complexity does not cope well with the needs of domain experts who desire to focus on their algorithms without having to understand all the features of the underlying infrastructure. Among these domain experts, engineers often rely on MATLAB to quickly model their complex numerical computations with a simple, textbook-like syntax and an effective Integrated Development Environment (IDE). On the other end, MATLAB was not designed with large-scale, out-of-core computations in mind, despite the introduction of some parallel computing tools (e.g., distributed arrays and spmd loops). In an ideal world, a domain expert should only focus on its application logic, while runtime/parallel computing experts should provide tools that act behind the scenes to efficiently distribute the computations on available resources and to provide optimal performance. Conversely, in real life it often happens that the domain expert prototypes its code with MATLAB on a small scale, then HPDA/HPC experts will leverage proper tools to deploy it at scale; this causes a significant effort overhead, as development needs to be performed twice, possibly with multiple iterations. The rise of Python as the language of choice for a huge number of data scientists, along with its open ecosystem, led to the development of many tools that tried to achieve a convergence between the goals of domain experts and the need for performance and scalability. Sometimes building upon existing frameworks (e.g. PySpark) or starting from scratch in pure Python (e.g. Dask), these tools bear promise to allow engineers to write their code with a reasonably textbook-like syntax, while retaining the capability to scale among multiple nodes (and accelerators) with minimal intervention to the application code. This paper discusses the process of porting an engineering application written in MATLAB (parallelized with existing toolboxes) to Python using Dask. An indication of the scalability results is also given, including a 20x speedup with respect to the MATLAB code using the same setup.

Taming Multi-node Accelerated Analytics: An Experience in Porting MATLAB to Scale with Python / Viviani, Paolo; Vitali, Giacomo; Lengani, Davide; Scionti, Alberto; Vercellino, Chiara; Terzo, Olivier. - ELETTRONICO. - 497 LNNS:(2022), pp. 200-210. (Intervento presentato al convegno The 16-th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS-2022) tenutosi a Online nel June 29 – July 1, 2022) [10.1007/978-3-031-08812-4_20].

Taming Multi-node Accelerated Analytics: An Experience in Porting MATLAB to Scale with Python

Paolo Viviani;Giacomo Vitali;Davide Lengani;Alberto Scionti;Chiara Vercellino;Olivier Terzo

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2022
			
	Titolo della Serie/Collana
	
				LECTURE NOTES IN NETWORKS AND SYSTEMS
			
	Codice ISBN
	
				9783031088117
9783031088124
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
taming_multinode.pdf accesso riservato Tipologia: 1. Preprint / submitted version [pre- review] Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 198.64 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	198.64 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2974766

PORTO @ Archivio Istituzionale della Ricerca

Taming Multi-node Accelerated Analytics: An Experience in Porting MATLAB to Scale with Python

Paolo Viviani;Giacomo Vitali;Davide Lengani;Alberto Scionti;Chiara Vercellino;Olivier Terzo

2022

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)