The scarcity of high-quality data poses a significant challenge for machine learning applications in geosciences. This work presents a novel and reproducible method for generating realistic Raman spectra of minerals to augment limited datasets. We utilize a Generative Adversarial Network (GAN), specifically a Wasserstein GAN with a Gradient Penalty (WGAN-GP), to overcome common training instabilities like mode collapse. The core innovation of our approach lies in a physically-informed generator architecture that creates spectra as a linear combination of q-Gaussian functions, whose parameters are learned by the model. This method, applied to a small dataset of Albite spectra from the RRUFF database, successfully learned the fundamental "grammar" of the mineral's spectrum, producing diverse and physically plausible outputs. We also demonstrate that a key to successful GAN training is finding a stable adversarial equilibrium, highlighting that more training is not always better. This work not only provides a powerful tool for data augmentation but also illustrates how a collaboration between scientific domain knowledge and advanced AI models can create credible and useful data for computational analysis.

Synthesizing Raman Spectra of Minerals with Physically-Informed Generative Adversarial Networks: A Case Study on Albite / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17213917]

Synthesizing Raman Spectra of Minerals with Physically-Informed Generative Adversarial Networks: A Case Study on Albite

Amelia Carolina Sparavigna
2025

Abstract

The scarcity of high-quality data poses a significant challenge for machine learning applications in geosciences. This work presents a novel and reproducible method for generating realistic Raman spectra of minerals to augment limited datasets. We utilize a Generative Adversarial Network (GAN), specifically a Wasserstein GAN with a Gradient Penalty (WGAN-GP), to overcome common training instabilities like mode collapse. The core innovation of our approach lies in a physically-informed generator architecture that creates spectra as a linear combination of q-Gaussian functions, whose parameters are learned by the model. This method, applied to a small dataset of Albite spectra from the RRUFF database, successfully learned the fundamental "grammar" of the mineral's spectrum, producing diverse and physically plausible outputs. We also demonstrate that a key to successful GAN training is finding a stable adversarial equilibrium, highlighting that more training is not always better. This work not only provides a powerful tool for data augmentation but also illustrates how a collaboration between scientific domain knowledge and advanced AI models can create credible and useful data for computational analysis.
2025
Synthesizing Raman Spectra of Minerals with Physically-Informed Generative Adversarial Networks: A Case Study on Albite / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17213917]
File in questo prodotto:
File Dimensione Formato  
wgan-gp.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Creative commons
Dimensione 1.45 MB
Formato Adobe PDF
1.45 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003416