SGX-PySpark: Secure Distributed Data Analytics
Research output: Contribution to book/Conference proceedings/Anthology/Report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Data analytics is central to modern online services, particularly those data-driven. Often this entails the processing of large-scale datasets which may contain private, personal and sensitive information relating to individuals and organisations. Particular challenges arise where cloud is used to store and process the sensitive data. In such settings, security and privacy concerns become paramount, as the cloud provider is trusted to guarantee the security of the services they offer, including data confidentiality. Therefore, the issue this work tackles is “How to securely perform data analytics in a public cloud?”
To assist this question, we design and implement SGX-PySpark- a secure distributed data analytics system which relies on a trusted execution environment (TEE) such as Intel SGX to provide strong security guarantees. To build SGX-PySpark, we integrate PySpark - a widely used framework for data analytics in industry to support a wide range of queries, with SCONE - a shielded execution framework using Intel SGX.
Details
Original language | English |
---|---|
Title of host publication | The World Wide Web Conference |
Place of Publication | New York, NY, USA |
Publisher | Association for Computing Machinery, Inc |
Pages | 3564–3563 |
ISBN (print) | 9781450366748 |
Publication status | Published - 2019 |
Peer-reviewed | Yes |
External IDs
Scopus | 85066907235 |
---|
Keywords
Research priority areas of TU Dresden
DFG Classification of Subject Areas according to Review Boards
Keywords
- data analytics, distributed system, Confidential computing, security, Securtiy, Data security