Protecting privacy in volunteered geographic information processing

Research output: Contribution to book/conference proceedings/anthology/reportChapter in book/anthology/reportContributedpeer-review

Abstract

Social media data is used for analytics, e.g., in science, authorities, or the industry. Privacy is often considered a secondary problem. However, protecting the privacy of social media users is demanded by laws and ethics. In order to prevent subsequent abuse, theft, or public exposure of collected datasets, privacy-aware data processing is crucial. In this chapter, we show a set of concepts to process social media data with social media user's privacy in mind. We present a data storage concept based on the cardinality estimator HyperLogLog to store social media data, so that it is not possible to extract individual items from it, but only to estimate the cardinality of items within a certain set, plus running set operations over multiple sets to extend analytical ranges. Applying this method requires to define the scope of the result before even gathering the data. This prevents the data from being misused for other purposes at a later point in time and thus follows the privacy by design principles. We further show methods to increase privacy through the implementation of abstraction layers. As another additional instrument, we introduce a method to implement filter lists on the incoming data stream. A conclusive case study demonstrates our methods to be protected against adversarial actors.

Details

Original languageEnglish
Title of host publicationVolunteered Geographic Information
EditorsDirk Burghardt, Elena Demidova, Daniel A. Keim
PublisherSpringer Nature
Pages277-297
Number of pages21
ISBN (electronic)978-3-031-35374-1
ISBN (print)978-3-031-35373-4, 978-3-031-35376-5
Publication statusPublished - 8 Dec 2023
Peer-reviewedYes

Keywords

Keywords

  • Data retention, HyperLogLog, Privacy, Social media