Browsing unicity: On the limits of anonymizing web tracking data

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Contributors

Abstract

Cross domain tracking has become the rule, rather than the exception, and scripts that collect behavioral data from visitors across sites have become ubiquitous on the Web. The collections form comprehensive profiles of browsing patterns and contain personal, sensitive information. This data can easily be linked back to the tracked individuals, most of whom are likely unaware of this information's mere existence, let alone its perpetual storage and processing. As public pressure has increased, tracking companies like Google, Facebook, or Baidu now claim to anonymize their datasets, thus limiting or eliminating the possibility of linking it back to data subjects.In cooperation with Europe's largest audience measurement association we use access to a comprehensive tracking dataset to assess both identifiability and the possibility of convincingly anonymizing browsing data. Our results show that anonymization through generalization does not sufficiently protect anonymity. Reducing unicity of browsing data to negligible levels would necessitate removal of all client and web domain information as well as click timings. In tangible adversary scenarios, supposedly anonymized datasets are highly vulnerable to dataset enrichment and shoulder surfing adversaries, with almost half of all browsing sessions being identified by just two observations. We conclude that while it may be possible to store single coarsened clicks anonymously, any collection of higher complexity will contain large amounts of pseudonymous data.

Details

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE Symposium on Security and Privacy, SP 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages777-790
Number of pages14
ISBN (electronic)978-1-7281-3497-0
Publication statusPublished - May 2020
Peer-reviewedYes

Publication series

SeriesIEEE Symposium on Security and Privacy
Volume2020-May
ISSN1081-6011

Conference

Title41st IEEE Symposium on Security and Privacy, SP 2020
Duration18 - 21 May 2020
CitySan Francisco
CountryUnited States of America

External IDs

Scopus 85091601301