With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, -anonymity provides a simple and easy-to-understand protection. However, -anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high. In this paper, we study a novel anonymization property called -anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of -anonymity is to release such attribute about a user only if at least other users have exposed the same attribute in a past time window. Depending on the value of , the output stream results -anonymized with a certain probability. To this end, we present a probabilistic model to map the -anonymity into the -anonymity property. The model is not only helpful in studying the -anonymity property, but also general enough to evaluate the probability of achieving -anonymity in data streams, resulting in a generic contribution.

Practical anonymization for data streams: z-anonymity and relation with k-anonymity / Jha, Nikhil; Vassio, Luca; Trevisan, Martino; Leonardi, Emilio; Mellia, Marco. - In: PERFORMANCE EVALUATION. - ISSN 0166-5316. - STAMPA. - 159:(2023), p. 102329. [10.1016/j.peva.2022.102329]

Practical anonymization for data streams: z-anonymity and relation with k-anonymity

Nikhil Jha;Luca Vassio;Martino Trevisan;Emilio Leonardi;Marco Mellia
2023

Abstract

With the advent of big data and the emergence of data markets, preserving individuals’ privacy has become of utmost importance. The classical response to this need is anonymization, i.e., sanitizing the information that, directly or indirectly, can allow users’ re-identification. Among the various approaches, -anonymity provides a simple and easy-to-understand protection. However, -anonymity is challenging to achieve in a continuous stream of data and scales poorly when the number of attributes becomes high. In this paper, we study a novel anonymization property called -anonymity that we explicitly design to deal with data streams, i.e., where the decision to publish a given attribute (atomic information) is made in real time. The idea at the base of -anonymity is to release such attribute about a user only if at least other users have exposed the same attribute in a past time window. Depending on the value of , the output stream results -anonymized with a certain probability. To this end, we present a probabilistic model to map the -anonymity into the -anonymity property. The model is not only helpful in studying the -anonymity property, but also general enough to evaluate the probability of achieving -anonymity in data streams, resulting in a generic contribution.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0166531622000372-main.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 2.22 MB
Formato Adobe PDF
2.22 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
z_anonymity__Zero_Delay_Anonymization_for_Data_Streams__EXTENDED_.pdf

Open Access dal 31/12/2023

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Creative commons
Dimensione 740.44 kB
Formato Adobe PDF
740.44 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2974412