Due to the current high availability of omics, data-driven biology has greatly expanded, and several papers have reviewed state-of-the-art technologies. Nowadays, two main types of investigation are available for a multi-omics dataset: extraction of relevant features for a meaningful biological interpretation and clustering of the samples. In the latter case, a few reviews refer to some outdated or no longer available methods, whereas others lack the description of relevant clustering metrics to compare the main approaches. This work provides a general overview of the major techniques in this area, divided into four groups: graph, dimensionality reduction, statistical and neural-based. Besides, eight tools have been tested both on a synthetic and a real biological dataset. An extensive performance comparison has been provided using four clustering evaluation scores: Peak Signal-to-Noise Ratio (PSNR), Davies-Bouldin(DB) index, Silhouette value and the harmonic mean of cluster purity and efficiency. The best results were obtained by using the dimensionality reduction, either explicitly or implicitly, as in the neural architecture.
A survey on data integration for multi-omics sample clustering / Lovino, M.; Randazzo, V.; Ciravegna, G.; Barbiero, P.; Ficarra, E.; Cirrincione, G.. - In: NEUROCOMPUTING. - ISSN 0925-2312. - ELETTRONICO. - (2021). [10.1016/j.neucom.2021.11.094]
|Titolo:||A survey on data integration for multi-omics sample clustering|
|Data di pubblicazione:||2021|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1016/j.neucom.2021.11.094|
|Appare nelle tipologie:||1.1 Articolo in rivista|