Due to the current high availability of omics, data-driven biology has greatly expanded, and several papers have reviewed state-of-the-art technologies. Nowadays, two main types of investigation are available for a multi-omics dataset: extraction of relevant features for a meaningful biological interpretation and clustering of the samples. In the latter case, a few reviews refer to some outdated or no longer available methods, whereas others lack the description of relevant clustering metrics to compare the main approaches. This work provides a general overview of the major techniques in this area, divided into four groups: graph, dimensionality reduction, statistical and neural-based. Besides, eight tools have been tested both on a synthetic and a real biological dataset. An extensive performance comparison has been provided using four clustering evaluation scores: Peak Signal-to-Noise Ratio (PSNR), Davies-Bouldin(DB) index, Silhouette value and the harmonic mean of cluster purity and efficiency. The best results were obtained by using the dimensionality reduction, either explicitly or implicitly, as in the neural architecture.

A survey on data integration for multi-omics sample clustering / Lovino, M.; Randazzo, V.; Ciravegna, G.; Barbiero, P.; Ficarra, E.; Cirrincione, G.. - In: NEUROCOMPUTING. - ISSN 0925-2312. - ELETTRONICO. - 488:(2022), pp. 494-508. [10.1016/j.neucom.2021.11.094]

A survey on data integration for multi-omics sample clustering

Lovino M.;Randazzo V.;Ciravegna G.;Ficarra E.;Cirrincione G.
2022

Abstract

Due to the current high availability of omics, data-driven biology has greatly expanded, and several papers have reviewed state-of-the-art technologies. Nowadays, two main types of investigation are available for a multi-omics dataset: extraction of relevant features for a meaningful biological interpretation and clustering of the samples. In the latter case, a few reviews refer to some outdated or no longer available methods, whereas others lack the description of relevant clustering metrics to compare the main approaches. This work provides a general overview of the major techniques in this area, divided into four groups: graph, dimensionality reduction, statistical and neural-based. Besides, eight tools have been tested both on a synthetic and a real biological dataset. An extensive performance comparison has been provided using four clustering evaluation scores: Peak Signal-to-Noise Ratio (PSNR), Davies-Bouldin(DB) index, Silhouette value and the harmonic mean of cluster purity and efficiency. The best results were obtained by using the dimensionality reduction, either explicitly or implicitly, as in the neural architecture.
File in questo prodotto:
File Dimensione Formato  
Randazzo-ASurvey.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 2.16 MB
Formato Adobe PDF
2.16 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2948440