The vastness of chemical space necessitates the use of computational methods for molecular discovery. While generative models like the Variational Autoencoder (VAE) have proven effective, their iterative application can lead to model saturation, where the AI converges on known compounds. This work investigates a critical, often-overlooked factor contributing to this challenge: the persistent session memory of cloud-based notebooks. Using a VAE-GRU model on a minimal set of oxadiazoles, we demonstrate that a methodological intervention—systematically clearing the runtime environment—is crucial for breaking this convergence and stimulating novel generation. Our approach illustrates how it is possible to successfully unlock the model's creative potential, enabling it to generate novel structures, including new ring sizes and chemical classes. The findings underscore that successful AI-driven discovery is not solely dependent on a powerful model or a large dataset, but on the researcher's ability to strategically manage both the data and the computational environment to unlock a model's full creative capacity.
Beyond Convergence: An Iterative Exploration of Chemical Space and Runtime Environment / Sparavigna, Amelia Carolina. - ELETTRONICO. - (2025). [10.5281/zenodo.17135978]
Beyond Convergence: An Iterative Exploration of Chemical Space and Runtime Environment
Amelia Carolina Sparavigna
2025
Abstract
The vastness of chemical space necessitates the use of computational methods for molecular discovery. While generative models like the Variational Autoencoder (VAE) have proven effective, their iterative application can lead to model saturation, where the AI converges on known compounds. This work investigates a critical, often-overlooked factor contributing to this challenge: the persistent session memory of cloud-based notebooks. Using a VAE-GRU model on a minimal set of oxadiazoles, we demonstrate that a methodological intervention—systematically clearing the runtime environment—is crucial for breaking this convergence and stimulating novel generation. Our approach illustrates how it is possible to successfully unlock the model's creative potential, enabling it to generate novel structures, including new ring sizes and chemical classes. The findings underscore that successful AI-driven discovery is not solely dependent on a powerful model or a large dataset, but on the researcher's ability to strategically manage both the data and the computational environment to unlock a model's full creative capacity.File | Dimensione | Formato | |
---|---|---|---|
OXA1.pdf
accesso aperto
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Creative commons
Dimensione
516.95 kB
Formato
Adobe PDF
|
516.95 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3003085