The integration of Artificial Intelligence (AI) and machine learning into spectroscopic analysis requires the availability of large and structured datasets. However, a significant portion of scientific literature and important databases, such as the Spectral Database for Organic Compounds (SDBS), provide spectra only in image format, creating an obstacle to the application of these methods. This work presents a practical methodology to bridge this gap. The proposed method is divided into two main phases: a pre-processing of the spectrum image using graphic software for noise reduction and calibration, and a subsequent conversion of numerical data using a Python script with libraries like OpenCV and NumPy. The process effectively transforms a Raman spectrum image into a text file (.txt) containing wavenumber and intensity data, making it usable for advanced computational analysis, such as training autoencoders for denoising or classification. This approach allows for the recovery and enhancement of historical data, making it fully compatible with the requirements of modern AI-based spectral analysis techniques.
From Plots to Bits: Digitizing Raman Spectra for AI Analysis with Python / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17113284]
From Plots to Bits: Digitizing Raman Spectra for AI Analysis with Python
Amelia Carolina Sparavigna
2025
Abstract
The integration of Artificial Intelligence (AI) and machine learning into spectroscopic analysis requires the availability of large and structured datasets. However, a significant portion of scientific literature and important databases, such as the Spectral Database for Organic Compounds (SDBS), provide spectra only in image format, creating an obstacle to the application of these methods. This work presents a practical methodology to bridge this gap. The proposed method is divided into two main phases: a pre-processing of the spectrum image using graphic software for noise reduction and calibration, and a subsequent conversion of numerical data using a Python script with libraries like OpenCV and NumPy. The process effectively transforms a Raman spectrum image into a text file (.txt) containing wavenumber and intensity data, making it usable for advanced computational analysis, such as training autoencoders for denoising or classification. This approach allows for the recovery and enhancement of historical data, making it fully compatible with the requirements of modern AI-based spectral analysis techniques.File | Dimensione | Formato | |
---|---|---|---|
plotbit.zip
accesso aperto
Descrizione: Cartella contenete immagine prova e programma Python
Tipologia:
Altro materiale allegato
Licenza:
Creative commons
Dimensione
539.32 kB
Formato
Zip File
|
539.32 kB | Zip File | Visualizza/Apri |
plotbit-en.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
606.1 kB
Formato
Adobe PDF
|
606.1 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3003017