Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Perera-Lago, Javier; Toscano-Duran, Victor; Paluzo-Hidalgo, Eduardo; Narteni, Sara; Rucco, Matteo

doi:10.1007/978-3-031-63803-9_21

Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the epsilon-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by epsilon-representativeness, i.e., both of them have points closer than epsilon, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that epsilon-representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine learning component widely adopted for dealing with tabular data.

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data / Perera-Lago, J., Toscano-Duran, V., Paluzo-Hidalgo, E., Narteni, S., Rucco, M.. - 2156:(2024), pp. 384-395. (The 2nd world conference on eXplainable Artificial Intelligence (xAI 2024) La Valletta (Malta) 17-19 July 2024) [10.1007/978-3-031-63803-9_21].

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara Narteni;Matteo Rucco

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Titolo della Serie/Collana
	
				COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
			
	Codice ISBN
	
				978-3-031-63802-2
978-3-031-63803-9
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
xAI2024_published_seville.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 784.35 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	784.35 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
XAI24_RepDecisionTrees (2).pdf Open Access dal 11/07/2025 Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 474.96 kB Formato Adobe PDF Visualizza/Apri	474.96 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990592

PORTO @ Archivio Istituzionale della Ricerca

Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data

Javier Perera-Lago;Victor Toscano-Duran;Eduardo Paluzo-Hidalgo;Sara Narteni;Matteo Rucco

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)