SGX-PySpark: Secure Distributed Data Analytics

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract



Data analytics is central to modern online services, particularly those data-driven. Often this entails the processing of large-scale datasets which may contain private, personal and sensitive information relating to individuals and organisations. Particular challenges arise where cloud is used to store and process the sensitive data. In such settings, security and privacy concerns become paramount, as the cloud provider is trusted to guarantee the security of the services they offer, including data confidentiality. Therefore, the issue this work tackles is “How to securely perform data analytics in a public cloud?”

To assist this question, we design and implement SGX-PySpark- a secure distributed data analytics system which relies on a trusted execution environment (TEE) such as Intel SGX to provide strong security guarantees. To build SGX-PySpark, we integrate PySpark - a widely used framework for data analytics in industry to support a wide range of queries, with SCONE - a shielded execution framework using Intel SGX.

Details

OriginalspracheEnglisch
TitelThe World Wide Web Conference
ErscheinungsortNew York, NY, USA
Herausgeber (Verlag)Association for Computing Machinery, Inc
Seiten3564–3563
ISBN (Print)9781450366748
PublikationsstatusVeröffentlicht - 2019
Peer-Review-StatusJa

Externe IDs

Scopus 85066907235

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • data analytics, distributed system, Confidential computing, security, Securtiy, Data security