The vastness of chemical space necessitates the use of computational methods for molecular discovery. While generative models like the Variational Autoencoder (VAE) have proven effective, their iterative application can lead to model saturation, where the AI converges on known compounds. This work investigates a critical, often-overlooked factor contributing to this challenge: the persistent session memory of cloud-based notebooks. Using a VAE-GRU model on a minimal set of oxadiazoles, we demonstrate that a methodological intervention—systematically clearing the runtime environment—is crucial for breaking this convergence and stimulating novel generation. Our approach illustrates how it is possible to successfully unlock the model's creative potential, enabling it to generate novel structures, including new ring sizes and chemical classes. The findings underscore that successful AI-driven discovery is not solely dependent on a powerful model or a large dataset, but on the researcher's ability to strategically manage both the data and the computational environment to unlock a model's full creative capacity.

Beyond Convergence: An Iterative Exploration of Chemical Space and Runtime Environment / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17135978]

Beyond Convergence: An Iterative Exploration of Chemical Space and Runtime Environment

Amelia Carolina Sparavigna
2025

Abstract

The vastness of chemical space necessitates the use of computational methods for molecular discovery. While generative models like the Variational Autoencoder (VAE) have proven effective, their iterative application can lead to model saturation, where the AI converges on known compounds. This work investigates a critical, often-overlooked factor contributing to this challenge: the persistent session memory of cloud-based notebooks. Using a VAE-GRU model on a minimal set of oxadiazoles, we demonstrate that a methodological intervention—systematically clearing the runtime environment—is crucial for breaking this convergence and stimulating novel generation. Our approach illustrates how it is possible to successfully unlock the model's creative potential, enabling it to generate novel structures, including new ring sizes and chemical classes. The findings underscore that successful AI-driven discovery is not solely dependent on a powerful model or a large dataset, but on the researcher's ability to strategically manage both the data and the computational environment to unlock a model's full creative capacity.
2025
Beyond Convergence: An Iterative Exploration of Chemical Space and Runtime Environment / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17135978]
File in questo prodotto:
File Dimensione Formato  
OXA1.pdf

accesso aperto

Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: Creative commons
Dimensione 516.95 kB
Formato Adobe PDF
516.95 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003085