Browsing unicity: On the limits of anonymizing web tracking data

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information's mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe's largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.

Details

OriginalspracheEnglisch
TitelProceedings - 2020 IEEE Symposium on Security and Privacy, SP 2020
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten777-790
Seitenumfang14
ISBN (elektronisch)978-1-7281-3497-0
PublikationsstatusVeröffentlicht - Mai 2020
Peer-Review-StatusJa

Publikationsreihe

ReiheIEEE Symposium on Security and Privacy
Band2020-May
ISSN1081-6011

Konferenz

Titel41st IEEE Symposium on Security and Privacy, SP 2020
Dauer18 - 21 Mai 2020
StadtSan Francisco
LandUSA/Vereinigte Staaten

Externe IDs

Scopus 85091601301