In real-world scenarios, where datasets often lack strong annotations, unsupervised learning enables the discovery of new knowledge from unlabeled data. However, in the unsupervised field, a challenging issue is the uncertain robustness ofthe obtained clusters since noisy data and different algorithms and settings could lead to different results. Therefore, to face this challenge, we employed unsupervised machine learning techniques with the dual aim of evaluating the robustness to noise and norm changes of unsupervised clustering techniques. To do this, a real-world dataset composed of histopathological images of colon cancer patients was employed in the analysis. First-order and texture features were extracted patch-wise from each patient, aggregated at the patient level, and used as input of two unsupervised algorithms, i.e., dendrogram and self-organizing map (SOM). Keeping fixed the clustering algorithm and settings, we compared in terms of Rand Index (RI) the results obtained using 1) original and noisy features and 2) different norms. In addition, we assessed the correlation of the clustering results to the Microsatellite Instability (MSI) status. Results showed that the L2-based SOM and both Ll and L2-based dendrograms seem to be not affected by noise (respectively 70%, 68%, and 63% RI). Moreover, the obtained results were highly similar when comparing the use of Ll and L2 Norm for both dendrograms and SOMs (respectively 78% and 75% RI). In the MSI correlation analysis, the SOM obtained a moderate correlation (0.30) that tended to zero when the norm was changed or noise was added, while dendrograms, on the other hand, yielded more robust correlation values.
How Do Norms and Noise Impact Clustering Results? A Robustness Analysis Applied to Digital Pathology / Nicoletti, Giulia; Marchiò, Caterina; Rosati, Samanta; Berrino, Enrico; Aquilano, Maria Costanza; Bonoldi, Emanuela; Balestra, Gabriella; Regge, Daniele; Giannini, Valentina. - (2023), pp. 333-337. (Intervento presentato al convegno 23rd IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2023 tenutosi a Dayton (USA) nel 04-06 December 2023) [10.1109/bibe60311.2023.00061].
How Do Norms and Noise Impact Clustering Results? A Robustness Analysis Applied to Digital Pathology
Nicoletti, Giulia;Rosati, Samanta;Balestra, Gabriella;Giannini, Valentina
2023
Abstract
In real-world scenarios, where datasets often lack strong annotations, unsupervised learning enables the discovery of new knowledge from unlabeled data. However, in the unsupervised field, a challenging issue is the uncertain robustness ofthe obtained clusters since noisy data and different algorithms and settings could lead to different results. Therefore, to face this challenge, we employed unsupervised machine learning techniques with the dual aim of evaluating the robustness to noise and norm changes of unsupervised clustering techniques. To do this, a real-world dataset composed of histopathological images of colon cancer patients was employed in the analysis. First-order and texture features were extracted patch-wise from each patient, aggregated at the patient level, and used as input of two unsupervised algorithms, i.e., dendrogram and self-organizing map (SOM). Keeping fixed the clustering algorithm and settings, we compared in terms of Rand Index (RI) the results obtained using 1) original and noisy features and 2) different norms. In addition, we assessed the correlation of the clustering results to the Microsatellite Instability (MSI) status. Results showed that the L2-based SOM and both Ll and L2-based dendrograms seem to be not affected by noise (respectively 70%, 68%, and 63% RI). Moreover, the obtained results were highly similar when comparing the use of Ll and L2 Norm for both dendrograms and SOMs (respectively 78% and 75% RI). In the MSI correlation analysis, the SOM obtained a moderate correlation (0.30) that tended to zero when the norm was changed or noise was added, while dendrograms, on the other hand, yielded more robust correlation values.File | Dimensione | Formato | |
---|---|---|---|
2023_BIBE_How_Do_Norms_and_Noise_Impact_Clustering_Results_A_Robustness_Analysis_Applied_to_Digital_Pathology.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
4.4 MB
Formato
Adobe PDF
|
4.4 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2999428