EHadoop: network I/O aware scheduler for elastic MapReduce cluster
Research output: Contribution to book/conference proceedings/anthology/report › Conference contribution › Contributed › peer-review
Contributors
Abstract
Over the last few years the usage of cloud computing dramatically increased. Many data analytics platforms run on the cloud. Such systems characterized by large data transfer among VMs. The network isolation between cloud users in modern datacenters is not as good as CPU and memory isolation. The weak isolation leads to unpredictable performance of the inter datacenter network. Moreover, with the raise of popularity of cloud computing the competition between providers get tougher, which leads to prices decrease. Some users decide to perform data-analytics in a cross-cloud fashion, which requires data transfer over WAN. It is known that WAN provides lower than LAN performance. We show that saturated network can greatly impact MapReduce job's task completion time. It results in higher costs for the user, because according to the pay-as-you-go model the user pays for the time resources being used. In this work we present EHadoop network I/O aware scheduler for elastic MapReduce cluster which performs online job profiling and schedules tasks based on available network bandwidth. The evaluation results show that EHadoop allows to avoid network contention and does not increase MapReduce task completion time with network bandwidth degradation.
Details
Original language | English |
---|---|
Title of host publication | 8th IEEE International Conference on Cloud Computing (CLOUD'15) |
Publisher | IEEE Computer Society, Washington |
Publication status | Published - 2015 |
Peer-reviewed | Yes |
Keywords
Research priority areas of TU Dresden
DFG Classification of Subject Areas according to Review Boards
Keywords
- mapreduce, scalability, performance, measurement