EHadoop: network I/O aware scheduler for elastic MapReduce cluster

Research output: Contribution to book/conference proceedings/anthology/reportConference contributionContributedpeer-review

Contributors

Abstract

Over the last few years the usage of cloud computing dramatically increased. Many data analytics platforms run on the cloud. Such systems characterized by large data transfer among VMs. The network isolation between cloud users in modern datacenters is not as good as CPU and memory isolation. The weak isolation leads to unpredictable performance of the inter datacenter network. Moreover, with the raise of popularity of cloud computing the competition between providers get tougher, which leads to prices decrease. Some users decide to perform data-analytics in a cross-cloud fashion, which requires data transfer over WAN. It is known that WAN provides lower than LAN performance. We show that saturated network can greatly impact MapReduce job's task completion time. It results in higher costs for the user, because according to the pay-as-you-go model the user pays for the time resources being used. In this work we present EHadoop network I/O aware scheduler for elastic MapReduce cluster which performs online job profiling and schedules tasks based on available network bandwidth. The evaluation results show that EHadoop allows to avoid network contention and does not increase MapReduce task completion time with network bandwidth degradation.

Details

Original languageEnglish
Title of host publication8th IEEE International Conference on Cloud Computing (CLOUD'15)
PublisherIEEE Computer Society, Washington
Publication statusPublished - 2015
Peer-reviewedYes

Keywords

Research priority areas of TU Dresden

DFG Classification of Subject Areas according to Review Boards

Keywords

  • mapreduce, scalability, performance, measurement