Bridging Structure and Spectra: A Comparative K-Clustering Analysis of Metabolites from SMILES and SERS Data

Sparavigna, Amelia Carolina

doi:10.5281/zenodo.17052624

In a series of recent works, autoencoders have been successfully applied to the clustering of Surface-Enhanced Raman Spectroscopy (SERS) spectra. While various architectures have shown promise, autoencoders like VAEs and GRUs have shown limitations due to their structure not being optimal for Raman spectra. To rigorously validate the chemical logic of this approach, we developed an independent benchmark using a clustering analysis of the same metabolites based on their molecular structure, represented by SMILES fingerprints. In this work, we present a comprehensive comparison of three autoencoder architectures - Conv1D, Dense, and Transformer - against this structural benchmark. While all three models successfully replicated key chemical groupings, such as the tryptophan family, the most significant findings emerged from the insightful divergences between the methods. These divergences highlight the unique, vibrationally-driven logic of the SERS analysis, which can identify relationships that are not apparent from a simple structural comparison. The Conv1D autoencoder consistently provided the most chemically intuitive and robust clustering. It excelled at creating clean, high-resolution clusters for chemical families and correctly identifying unique spectral outliers, like lipoamide, which the other models failed to isolate. Our findings demonstrate that while the Dense and Transformer models provide valuable insights, the Conv1D model is the clear winner for this application, striking the best balance between validating traditional chemical knowledge and revealing new, subtle relationships in SERS data.

Bridging Structure and Spectra: A Comparative K-Clustering Analysis of Metabolites from SMILES and SERS Data / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17052624]

Bridging Structure and Spectra: A Comparative K-Clustering Analysis of Metabolites from SMILES and SERS Data

Sparavigna, Amelia Carolina

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Citazione
	
				Bridging Structure and Spectra: A Comparative K-Clustering Analysis of Metabolites from SMILES and SERS Data / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17052624]
			
	Appare nelle tipologie
	
				5.15 Pubblicazione su portale

File in questo prodotto:

File	Dimensione	Formato
metabolites.pdf accesso aperto Tipologia: 1. Preprint / submitted version [pre- review] Licenza: Creative commons Dimensione 505.17 kB Formato Adobe PDF Visualizza/Apri	505.17 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3002787

PORTO @ Archivio Istituzionale della Ricerca

Bridging Structure and Spectra: A Comparative K-Clustering Analysis of Metabolites from SMILES and SERS Data

Sparavigna, Amelia Carolina

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)