This study introduces the concept of the pseudospectrum as a powerful and intuitive tool for interpreting the complex logic of autoencoders. Often treated as ''black boxes,'' these unsupervised models are capable of autonomously clustering data, yet their internal decision-making process remains largely hidden. By using a Surface-Enhanced Raman Spectroscopy (SERS) dataset of metabolites as a test case, we demonstrate that the pseudospectrum - defined as the reconstructed linear centroid from the latent space - serves as a tangible window into what the autoencoder has learned. Unlike a simple statistical average, the pseudospectrum is the model's unique, noise-free interpretation of the most relevant spectral features. We show how different autoencoder architectures - such as the sequential Convolutional 1D Autoencoder (Conv-1D AE) and the holistic Transformer Autoencoder (Transformer AE) - produce distinct pseudospectra that reflect their underlying philosophies. The Conv-1D AE generates a smooth, continuous curve that preserves local spectral patterns, while the Transformer AE, with its attention mechanism, creates a sparse, ''spiky'' representation that filters out irrelevant information and focuses only on key peaks. By anchoring the abstract concept of a latent space centroid to a well-understood physical representation (the spectrum), the pseudospectrum bridges the gap between machine learning and scientific reality. This approach offers a direct and easily comprehensible method for understanding how an AI model interprets chemical data, serving as a valuable alternative to more complex model interpretability tools. The findings validate the pseudospectrum as a crucial tool for unlocking new scientific insights from complex chemical systems.
The Pseudospectra as Windows into Autoencoders Logic / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17038439]
The Pseudospectra as Windows into Autoencoders Logic
Amelia Carolina Sparavigna
2025
Abstract
This study introduces the concept of the pseudospectrum as a powerful and intuitive tool for interpreting the complex logic of autoencoders. Often treated as ''black boxes,'' these unsupervised models are capable of autonomously clustering data, yet their internal decision-making process remains largely hidden. By using a Surface-Enhanced Raman Spectroscopy (SERS) dataset of metabolites as a test case, we demonstrate that the pseudospectrum - defined as the reconstructed linear centroid from the latent space - serves as a tangible window into what the autoencoder has learned. Unlike a simple statistical average, the pseudospectrum is the model's unique, noise-free interpretation of the most relevant spectral features. We show how different autoencoder architectures - such as the sequential Convolutional 1D Autoencoder (Conv-1D AE) and the holistic Transformer Autoencoder (Transformer AE) - produce distinct pseudospectra that reflect their underlying philosophies. The Conv-1D AE generates a smooth, continuous curve that preserves local spectral patterns, while the Transformer AE, with its attention mechanism, creates a sparse, ''spiky'' representation that filters out irrelevant information and focuses only on key peaks. By anchoring the abstract concept of a latent space centroid to a well-understood physical representation (the spectrum), the pseudospectrum bridges the gap between machine learning and scientific reality. This approach offers a direct and easily comprehensible method for understanding how an AI model interprets chemical data, serving as a valuable alternative to more complex model interpretability tools. The findings validate the pseudospectrum as a crucial tool for unlocking new scientific insights from complex chemical systems.File | Dimensione | Formato | |
---|---|---|---|
pseudospettri.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
906.67 kB
Formato
Adobe PDF
|
906.67 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002735