Quantifying Privacy Risks in Synthetic Data: A Study on Black-Box Membership Inference

Fantino, Giacomo; Rondina, Marco; Vetro', Antonio; De Martin, Juan Carlos

doi:10.1007/978-3-032-22774-4_5

The use of synthetic data has grown steadily in recent years, particularly to support AI research and data sharing. However, synthetic data remains vulnerable to privacy risks such as membership inference attacks (MIAs), where an attacker identifies whether a data record was in the original dataset, whose recent variants increasingly exploit overfitting in generative models to boost their accuracy. Privacy metrics have been proposed to assess the protection offered by synthetic datasets and the risk of information leakage. However, their ability to reflect actual risks of MIAs remains unexplored. This study empirically evaluates the trade-offs between utility and privacy in the generation of synthetic tabular data leveraging a variety of black-box MIAs, providing a novel assessment of privacy risks. Using state-of-the-art generative models, we repeatedly generated synthetic datasets, assessed their utility, measured vulnerability to black-box MIAs, and evaluated privacy using commonly used privacy metrics. Our analysis reveals that CTGAN and CTAB-GAN+ can mitigate the risks of membership disclosure without significantly compromising the utility of the data, while the other generators showed weaker privacy-utility trade-offs. However, the analysis of the privacy metrics suggests that their reliance on proximity to training data limits their ability to fully measure an attacker’s exploitation capabilities. The results observed in this study highlight the potential applicability of the aforementioned generative models to privacy-sensitive domains, demonstrating their ability to balance utility and privacy even under the challenge of diverse black-box MIAs. Our analysis of privacy metrics provides empirical evidence on the real-world privacy risks of synthetic tabular data and call for developing new, empirically validated privacy metrics.

Quantifying Privacy Risks in Synthetic Data: A Study on Black-Box Membership Inference / Fantino, Giacomo; Rondina, Marco; Vetro', Antonio; De Martin, Juan Carlos. - ELETTRONICO. - 16504:(2026), pp. 86-106. ( Fundamental Approaches to Software Engineering: 29th International Conference, FASE 2026, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2026 Turin (ITA) April 11-16, 2026) [10.1007/978-3-032-22774-4_5].

Quantifying Privacy Risks in Synthetic Data: A Study on Black-Box Membership Inference

Fantino, Giacomo;Rondina, Marco;Vetro', Antonio;De Martin, Juan Carlos

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2026
			
	Titolo della Serie/Collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice ISBN
	
				978-3-032-22774-4
978-3-032-22773-7
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
978-3-032-22774-4_5.pdf accesso aperto Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Creative commons Dimensione 489.31 kB Formato Adobe PDF Visualizza/Apri	489.31 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3008359

PORTO @ Archivio Istituzionale della Ricerca

Quantifying Privacy Risks in Synthetic Data: A Study on Black-Box Membership Inference

Fantino, Giacomo;Rondina, Marco;Vetro', Antonio;De Martin, Juan Carlos

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)