Classification-Based Detection and Quantification of Cross-Domain Data Bias in Materials Discovery

Trezza, Giovanni; Chiavazzo, Eliodoro

doi:10.1021/acs.jcim.4c01766

It stands to reason that the amount and the quality of data are of key importance for setting up accurate artificial intelligence (AI)-driven models. Among others, a fundamental aspect to consider is the bias introduced during sample selection in database generation. This is particularly relevant when a model is trained on a specialized data set to predict a property of interest and then applied to forecast the same property over samples having a completely different genesis. Indeed, the resulting biased model will likely produce unreliable predictions for many of those out-of-the-box samples, i.e., samples out of the training set. Neglecting such an aspect may hinder the AI-based discovery process, even when high-quality, sufficiently large, and highly reputable data sources are available. To address this challenge, we propose a new method that detects and quantifies data bias, reducing its impact on materials discovery. Our approach, aimed at identifying and excluding those out-of-the-box materials for which the predictions of a pretrained model are likely unreliable, leverages a classification strategy and is validated by means of superconductor and thermoelectric materials as two representative case studies. This methodology, designed to be simple, flexible, and easily adaptable to any architecture, including modern graph equivariant neural networks, aims to enhance the reliability of AI models when applied to diverse and previously unseen materials, thereby contributing to more reliable AI-driven materials discovery.

Classification-Based Detection and Quantification of Cross-Domain Data Bias in Materials Discovery / Trezza, Giovanni; Chiavazzo, Eliodoro. - In: JOURNAL OF CHEMICAL INFORMATION AND MODELING. - ISSN 1549-9596. - (2024). [10.1021/acs.jcim.4c01766]

Classification-Based Detection and Quantification of Cross-Domain Data Bias in Materials Discovery

Trezza, Giovanni;Chiavazzo, Eliodoro

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1021/acs.jcim.4c01766
			
	Titolo della Rivista
	
				JOURNAL OF CHEMICAL INFORMATION AND MODELING

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2995548

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

PORTO @ Archivio Istituzionale della Ricerca

Classification-Based Detection and Quantification of Cross-Domain Data Bias in Materials Discovery

Trezza, Giovanni;Chiavazzo, Eliodoro

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Attenzione

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)