A Generalized Service Infrastructure for Data Analytics

Research output: Contribution to conferencesPaperContributedpeer-review

Abstract

Data analytics has become a catch phrase illustrating the challenge to efficiently analyze the increasing amounts of data, and stands for a paradigm shift to a more data-centric point of view. Not only the aspect of storing and retrieving large collections of data (Big Data) is important, but also the accurate and fast execution of analysis workflows. This document introduces a service infrastructure for large-scale data analysis which can be tailored to the individual needs of data scientists. The service infrastructure covers all aspects of the analysis workflow, i.~e. integration, pre-processing, analysis and visualization. Data integration challenges are addressed by a universally applicable, easy-to-use web interface. To let users focus on the pre-processing, analysis and visualization, a rich set of tools is available, so that the user's analysis environment can be tailored to the user's needs and keeps the hurdles for beginners low. Our service combines technology from the Big Data and high-performance computing (HPC) worlds. Thus, the service is built in a way to run on different infrastructures and has been tested on both, a private cloud and an HPC backend to run flexible and massively-parallel analysis workflows. The applicability of our service-centric approach is demonstrated with a use case from hardware monitoring, where many sensor values are collected, distributed and processed. Finally, the presented concept allows for collaborative working over all analysis steps.

Details

Original languageEnglish
Publication statusPublished - 2018
Peer-reviewedYes

External IDs

Scopus 85050650500
ORCID /0000-0001-8338-6372/work/142233357
ORCID /0009-0007-5755-1427/work/142250918
ORCID /0000-0003-2684-102X/work/142255212

Keywords

Keywords

  • Big Data, Cloud, service infrastructure, time series, data analytics platform, high-performance computing