The integration of Artificial Intelligence (AI) and machine learning into spectroscopic analysis requires the availability of large and structured datasets. However, a significant portion of scientific literature and important databases, such as the Spectral Database for Organic Compounds (SDBS), provide spectra only in image format, creating an obstacle to the application of these methods. This work presents a practical methodology to bridge this gap. The proposed method is divided into two main phases: a pre-processing of the spectrum image using graphic software for noise reduction and calibration, and a subsequent conversion of numerical data using a Python script with libraries like OpenCV and NumPy. The process effectively transforms a Raman spectrum image into a text file (.txt) containing wavenumber and intensity data, making it usable for advanced computational analysis, such as training autoencoders for denoising or classification. This approach allows for the recovery and enhancement of historical data, making it fully compatible with the requirements of modern AI-based spectral analysis techniques.

From Plots to Bits: Digitizing Raman Spectra for AI Analysis with Python / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17113284]

From Plots to Bits: Digitizing Raman Spectra for AI Analysis with Python

Amelia Carolina Sparavigna
2025

Abstract

The integration of Artificial Intelligence (AI) and machine learning into spectroscopic analysis requires the availability of large and structured datasets. However, a significant portion of scientific literature and important databases, such as the Spectral Database for Organic Compounds (SDBS), provide spectra only in image format, creating an obstacle to the application of these methods. This work presents a practical methodology to bridge this gap. The proposed method is divided into two main phases: a pre-processing of the spectrum image using graphic software for noise reduction and calibration, and a subsequent conversion of numerical data using a Python script with libraries like OpenCV and NumPy. The process effectively transforms a Raman spectrum image into a text file (.txt) containing wavenumber and intensity data, making it usable for advanced computational analysis, such as training autoencoders for denoising or classification. This approach allows for the recovery and enhancement of historical data, making it fully compatible with the requirements of modern AI-based spectral analysis techniques.
2025
From Plots to Bits: Digitizing Raman Spectra for AI Analysis with Python / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17113284]
File in questo prodotto:
File Dimensione Formato  
plotbit.zip

accesso aperto

Descrizione: Cartella contenete immagine prova e programma Python
Tipologia: Altro materiale allegato
Licenza: Creative commons
Dimensione 539.32 kB
Formato Zip File
539.32 kB Zip File Visualizza/Apri
plotbit-en.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Creative commons
Dimensione 606.1 kB
Formato Adobe PDF
606.1 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003017