Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the epsilon-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by epsilon-representativeness, i.e., both of them have points closer than epsilon, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that epsilon-representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine learning component widely adopted for dealing with tabular data.
Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data / Perera-Lago, Javier; Toscano-Duran, Victor; Paluzo-Hidalgo, Eduardo; Narteni, Sara; Rucco, Matteo. - 2156:(2024), pp. 384-395. (Intervento presentato al convegno The 2nd world conference on eXplainable Artificial Intelligence (xAI 2024) tenutosi a La Valletta (Malta) nel 17-19 July 2024) [10.1007/978-3-031-63803-9_21].
Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data
Sara Narteni;
2024
Abstract
Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the epsilon-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by epsilon-representativeness, i.e., both of them have points closer than epsilon, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that epsilon-representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machine learning component widely adopted for dealing with tabular data.| File | Dimensione | Formato | |
|---|---|---|---|
| xAI2024_published_seville.pdf accesso riservato 
											Tipologia:
											2a Post-print versione editoriale / Version of Record
										 
											Licenza:
											
											
												Non Pubblico - Accesso privato/ristretto
												
												
												
											
										 
										Dimensione
										784.35 kB
									 
										Formato
										Adobe PDF
									 | 784.35 kB | Adobe PDF | Visualizza/Apri Richiedi una copia | 
| XAI24_RepDecisionTrees (2).pdf Open Access dal 11/07/2025 
											Tipologia:
											2. Post-print / Author's Accepted Manuscript
										 
											Licenza:
											
											
												Pubblico - Tutti i diritti riservati
												
												
												
											
										 
										Dimensione
										474.96 kB
									 
										Formato
										Adobe PDF
									 | 474.96 kB | Adobe PDF | Visualizza/Apri | 
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2990592
			
		
	
	
	
			      	