EHadoop: network I/O aware scheduler for elastic MapReduce cluster

Publikation: Beitrag in Buch/Konferenzbericht/Sammelband/GutachtenBeitrag in KonferenzbandBeigetragenBegutachtung

Beitragende

Abstract

Over the last few years the usage of cloud computing dramatically increased. Many data analytics platforms run on the cloud. Such systems characterized by large data transfer among VMs. The network isolation between cloud users in modern datacenters is not as good as CPU and memory isolation. The weak isolation leads to unpredictable performance of the inter datacenter network. Moreover, with the raise of popularity of cloud computing the competition between providers get tougher, which leads to prices decrease. Some users decide to perform data-analytics in a cross-cloud fashion, which requires data transfer over WAN. It is known that WAN provides lower than LAN performance. We show that saturated network can greatly impact MapReduce job's task completion time. It results in higher costs for the user, because according to the pay-as-you-go model the user pays for the time resources being used. In this work we present EHadoop network I/O aware scheduler for elastic MapReduce cluster which performs online job profiling and schedules tasks based on available network bandwidth. The evaluation results show that EHadoop allows to avoid network contention and does not increase MapReduce task completion time with network bandwidth degradation.

Details

OriginalspracheEnglisch
Titel8th IEEE International Conference on Cloud Computing (CLOUD'15)
Herausgeber (Verlag)IEEE Computer Society, Washington
PublikationsstatusVeröffentlicht - 2015
Peer-Review-StatusJa

Schlagworte

Forschungsprofillinien der TU Dresden

DFG-Fachsystematik nach Fachkollegium

Schlagwörter

  • mapreduce, scalability, performance, measurement