Infrared (IR) spectroscopy is essential for mineralogical analysis, but spectral classification is often complicated by high dimensionality and subtle band overlaps, particularly in the diagnostic hydration region (2800-3800 cm-1). This study introduces an unsupervised machine learning framework utilizing a Densely Connected Autoencoder (DAE) for feature extraction and dimensionality reduction of 150 mineral ATR-IR spectra sourced from the RRUFF database. The core methodology employs a novel two-stage K-Means clustering approach: first, across the full spectral range (400-3800 cm-1) to establish classes based on fundamental structural chemistry (e.g., silicates vs. carbonates); second, restricting the DAE input exclusively to the hydration range to separate minerals based on H2O/OH bonding typology. The DAE successfully learned a compact 40-dimensional latent representation. Critically, the second stage autonomously isolated a highly distinct spectral archetype (Cluster 9), dominated by Gypsum (CaSO4.2H2O), which represents the pure, noise-free pseudo-spectrum of crystallization water. This archetype is characterized by the expected two narrow, sharp H2O peaks, clearly differentiated from the broader bands of complex/acidic hydrates (Cluster 3) and the single, sharp signals of structural hydroxyl groups (Cluster 5). This methodology provides a robust, data-driven alternative for generating clean spectral standards, enabling reliable comparison with potentially noisy or historical ATR-IR measurements without the need for manual denoising.
Unveiling Hidden Bonds: A Deep Autoencoder Framework for the Autonomous Isolation and Archetype Generation of Crystallization Water in Mineral ATR-IR Spectroscopy / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17711908]
Unveiling Hidden Bonds: A Deep Autoencoder Framework for the Autonomous Isolation and Archetype Generation of Crystallization Water in Mineral ATR-IR Spectroscopy
Amelia Carolina Sparavigna
2025
Abstract
Infrared (IR) spectroscopy is essential for mineralogical analysis, but spectral classification is often complicated by high dimensionality and subtle band overlaps, particularly in the diagnostic hydration region (2800-3800 cm-1). This study introduces an unsupervised machine learning framework utilizing a Densely Connected Autoencoder (DAE) for feature extraction and dimensionality reduction of 150 mineral ATR-IR spectra sourced from the RRUFF database. The core methodology employs a novel two-stage K-Means clustering approach: first, across the full spectral range (400-3800 cm-1) to establish classes based on fundamental structural chemistry (e.g., silicates vs. carbonates); second, restricting the DAE input exclusively to the hydration range to separate minerals based on H2O/OH bonding typology. The DAE successfully learned a compact 40-dimensional latent representation. Critically, the second stage autonomously isolated a highly distinct spectral archetype (Cluster 9), dominated by Gypsum (CaSO4.2H2O), which represents the pure, noise-free pseudo-spectrum of crystallization water. This archetype is characterized by the expected two narrow, sharp H2O peaks, clearly differentiated from the broader bands of complex/acidic hydrates (Cluster 3) and the single, sharp signals of structural hydroxyl groups (Cluster 5). This methodology provides a robust, data-driven alternative for generating clean spectral standards, enabling reliable comparison with potentially noisy or historical ATR-IR measurements without the need for manual denoising.| File | Dimensione | Formato | |
|---|---|---|---|
|
densedense.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
1.52 MB
Formato
Adobe PDF
|
1.52 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3005427
