Heterogeneous Datasets for Federated Survival Analysis Simulation

Archetti, Alberto; Eugenio, Lomurno; Francesco, Lattari; Andre, Martin; Matteucci, Matteo

doi:10.1145/3578245.3584935

Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work provides a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we propose two dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.

Heterogeneous Datasets for Federated Survival Analysis Simulation / Archetti, Alberto; Lomurno, Eugenio; Lattari, Francesco; Martin, Andre; Matteucci, Matteo. - ELETTRONICO. - (2023), pp. 173-180. (Intervento presentato al convegno 14th Annual ACM/SPEC International Conference on Performance Engineering, ICPE 2023 tenutosi a Coimbra (PRT) nel 2023) [10.1145/3578245.3584935].

Heterogeneous Datasets for Federated Survival Analysis Simulation

Archetti Alberto;Lomurno Eugenio;Lattari Francesco;Martin Andre;Matteucci Matteo

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2023
			
	Codice ISBN
	
				9798400700729
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_arXiv__Heterogheneous_Datasets_for_Federated_Survival_Analysis_Simulation.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 700.05 kB Formato Adobe PDF Visualizza/Apri	700.05 kB	Adobe PDF	Visualizza/Apri
3578245.3584935.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.46 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.46 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2980984

PORTO @ Archivio Istituzionale della Ricerca

Heterogeneous Datasets for Federated Survival Analysis Simulation

Archetti Alberto;Lomurno Eugenio;Lattari Francesco;Martin Andre;Matteucci Matteo

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)