Architectural implications for exascale based on big data workflow requirements

Research output: Contribution to book/conference proceedings/anthology/reportChapter in book/anthology/reportContributed

Abstract

The sheer volume of data accumulated in many scientific disciplines as well as in industry is a critical point that requires immediate attention. The handling of large data sets will become a limiting factor-even for data intensive applications running on future Exascale systems. Nowadays, Big Data can be more a collection of challenges for data processing at large scale and less a tool box of solutions used to improve applications, scale well, and handle the constantly growing data sets. There is an urgent need for intelligent mechanisms to acquire, process, and analyze data, which have to run and scale efficiently on current and future computing architectures. The complexity of Big Data applications will highly profit from flexible workflow systems that consider the full data life cycle, from data acquisition to long-term storage and towards the curation of knowledge. To maximize the applicability of HPC systems for Big Data workflows, several changes in the system architecture and its software need to be considered. First, in order to exploit all available I/O capacities an adaptable monitoring system needs to collect information about I/O patterns of application and workflows as well as provide information to model the I/O subsystem. The goal is to collect long term performance data, to evaluate this data, and finally to show how and why resources cannot be used to their full potential. Second, as the complexity of systems is continuously increasing, the level of abstraction that is presented to the user needs to increase with at least the same rate in order to ensure that the current usability is at least maintained. This is accomplished by employing science gateways as well as workflow and metadata technologies.

Details

Original languageEnglish
Title of host publicationAdvances in Parallel Computing
PublisherIOS Press, Amsterdam [u. a.]
Pages101-113
Number of pages13
Volume26
ISBN (print)978-1-61499-582-1
Publication statusPublished - 2015
Peer-reviewedNo

External IDs

ORCID /0000-0002-1686-8440/work/142240059
Scopus 84944048972

Keywords

Research priority areas of TU Dresden