SGX-PySpark: Secure Distributed Data Analytics

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Contributors

Abstract



Data analytics is central to modern online services, particularly those data-driven. Often this entails the processing of large-scale datasets which may contain private, personal and sensitive information relating to individuals and organisations. Particular challenges arise where cloud is used to store and process the sensitive data. In such settings, security and privacy concerns become paramount, as the cloud provider is trusted to guarantee the security of the services they offer, including data confidentiality. Therefore, the issue this work tackles is “How to securely perform data analytics in a public cloud?”

To assist this question, we design and implement SGX-PySpark- a secure distributed data analytics system which relies on a trusted execution environment (TEE) such as Intel SGX to provide strong security guarantees. To build SGX-PySpark, we integrate PySpark - a widely used framework for data analytics in industry to support a wide range of queries, with SCONE - a shielded execution framework using Intel SGX.

Details

Original languageEnglish
Title of host publicationThe World Wide Web Conference
Place of PublicationNew York, NY, USA
PublisherAssociation for Computing Machinery, Inc
Pages3564–3563
ISBN (print)9781450366748
Publication statusPublished - 2019
Peer-reviewedYes

External IDs

Scopus 85066907235

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Keywords

  • data analytics, distributed system, Confidential computing, security, Securtiy, Data security