The scarcity of high-quality data poses a significant challenge for machine learning applications in geosciences. This work presents a novel and reproducible method for generating realistic Raman spectra of minerals to augment limited datasets. We utilize a Generative Adversarial Network (GAN), specifically a Wasserstein GAN with a Gradient Penalty (WGAN-GP), to overcome common training instabilities like mode collapse. The core innovation of our approach lies in a physically-informed generator architecture that creates spectra as a linear combination of q-Gaussian functions, whose parameters are learned by the model. This method, applied to a small dataset of Albite spectra from the RRUFF database, successfully learned the fundamental "grammar" of the mineral's spectrum, producing diverse and physically plausible outputs. We also demonstrate that a key to successful GAN training is finding a stable adversarial equilibrium, highlighting that more training is not always better. This work not only provides a powerful tool for data augmentation but also illustrates how a collaboration between scientific domain knowledge and advanced AI models can create credible and useful data for computational analysis.
Synthesizing Raman Spectra of Minerals with Physically-Informed Generative Adversarial Networks: A Case Study on Albite / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17213917]
Synthesizing Raman Spectra of Minerals with Physically-Informed Generative Adversarial Networks: A Case Study on Albite
Amelia Carolina Sparavigna
2025
Abstract
The scarcity of high-quality data poses a significant challenge for machine learning applications in geosciences. This work presents a novel and reproducible method for generating realistic Raman spectra of minerals to augment limited datasets. We utilize a Generative Adversarial Network (GAN), specifically a Wasserstein GAN with a Gradient Penalty (WGAN-GP), to overcome common training instabilities like mode collapse. The core innovation of our approach lies in a physically-informed generator architecture that creates spectra as a linear combination of q-Gaussian functions, whose parameters are learned by the model. This method, applied to a small dataset of Albite spectra from the RRUFF database, successfully learned the fundamental "grammar" of the mineral's spectrum, producing diverse and physically plausible outputs. We also demonstrate that a key to successful GAN training is finding a stable adversarial equilibrium, highlighting that more training is not always better. This work not only provides a powerful tool for data augmentation but also illustrates how a collaboration between scientific domain knowledge and advanced AI models can create credible and useful data for computational analysis.File | Dimensione | Formato | |
---|---|---|---|
wgan-gp.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
1.45 MB
Formato
Adobe PDF
|
1.45 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3003416