on openstack of hadoop performance · pdf fileof hadoop on openstack andrew lazarev mirantis,...
TRANSCRIPT
![Page 1: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/1.jpg)
Performance of Hadoop on OpenStack
Andrew LazarevMirantis, 2014
![Page 2: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/2.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 3: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/3.jpg)
What Is Hadoop?Ambari
(Man
agem
ent)
ZooK
eeper
(Coo
rdin
atio
n)
Oozie
(Sch
edul
ing)
HDFS(File System)
HBase
(NoS
ql S
tore
)
MapReduce(Programming Framework)
Pig
(Dat
a Fl
ow)
Hive
(SQ
L)
Storm
(Rea
l-tim
e co
mpu
tatio
n)
- Core Apache Hadoop
![Page 4: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/4.jpg)
● Easy to operate cluster● One-click self-service provisioning● Sharing hardware between several Hadoop
clusters● Tenants isolation on hypervisor and network
layers● Comparable performance with much more
flexibility
Why Virtualize Hadoop?
![Page 5: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/5.jpg)
● Sahara - OpenStack Data Processing project○ OpenStack Integrated○ Supports Hadoop 1 and 2○ Different vendors (Apache, Hortonworks, Intel*)○ Cluster provisioning and on-demand jobs
execution
How To Virtualize?
![Page 6: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/6.jpg)
Direct impact
● Disk write● Disk read● Network● CPU
Virtualization Impact
![Page 7: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/7.jpg)
Indirect impact
● Lack of low level system control● Resources for hypervisor operation
Virtualization Impact
![Page 8: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/8.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 9: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/9.jpg)
● Mirantis OpenStack Express cluster● 20 nodes● CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620)● Memory: 8 x 4.0 GB, 32.0 GB total● Disk: 1 drive, 0.9 TB (WDC WD1003FBYX-0)● Network: 2 x 1 GbE
Environment
![Page 10: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/10.jpg)
● Host OS: CentOS 6.5● VM OS: CentOS 6.5● Mirantis OpenStack● QEMU-KVM 1.2.0● Network: Neutron + GRE● Open vSwitch 1.10.2
Environment (continuation)
![Page 11: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/11.jpg)
● Hadoop: Vanilla Apache 1.2.1● Bare metal setup:
○ 19 Hadoop Nodes● OpenStack setup:
○ 1 Controller + 19 Computes○ 19 (or 57) VMs with Hadoop
Environment (continuation)
![Page 12: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/12.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 13: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/13.jpg)
Disk Write (using dd)
*greater is better
![Page 14: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/14.jpg)
● TestDFSIO - built-in hadoop IO test○ write test○ read test○ 1000 files of 1GB (1 TB total)
Disk Write (hadoop test)
![Page 15: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/15.jpg)
Disk Write (hadoop test)
*less is better
![Page 16: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/16.jpg)
Disk Write (hadoop test)
*less is better
![Page 17: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/17.jpg)
● “disk_cachemodes” param in nova.conf○ writethrough (default) - guest disk write cache
is disabled ○ writeback - guest disk write cache is enabled
Disk Cache Mode
![Page 18: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/18.jpg)
● Writeback cache enabled● One large VM with all memory per Host
Disk Write (dd, writeback cache)
![Page 19: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/19.jpg)
Disk Write (dd, writeback cache)
*greater is better
![Page 20: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/20.jpg)
Disk Write (hadoop test, writeback cache)
*less is better
![Page 21: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/21.jpg)
● QEMU 1.4: ○ high performance virtio-blk data plane
implementation○ +108.0% on rnd-write (based on RedHat
presentation on KVM Forum):
Disk Write - Way To Improve
![Page 22: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/22.jpg)
Disk Read (using hdparm)
*greater is better
![Page 23: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/23.jpg)
Disk Read (using hdparm)
*greater is better
![Page 24: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/24.jpg)
Disk Read (hadoop test)
*less is better
![Page 25: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/25.jpg)
Network (OVS+GRE)
*greater is better
![Page 26: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/26.jpg)
● PI - built-in hadoop test● Depends mostly on CPU● 50 series of 10,000,000,000 probes
CPU (hadoop test)
![Page 27: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/27.jpg)
CPU (hadoop test)
*less is better
![Page 28: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/28.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 29: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/29.jpg)
● Built-in hadoop test● Represents real Hadoop workload● Involves
○ IO○ Networking○ Computation
● Sorting 200,000,000 of 100-byte entries (20 GB)● Writeback cache enabled
Terasort
![Page 30: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/30.jpg)
Terasort
*less is better
![Page 31: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/31.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 32: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/32.jpg)
● Hadoop can consider “distance” between nodes● Intelligent task scheduling● Reading data from “close” data nodes
Data Locality
NODENODE
NODE
NODE
NODE
NODE
![Page 33: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/33.jpg)
Data Locality
*greater is better
![Page 34: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/34.jpg)
● Network within host comparable to disk speed● Allows hadoop process isolation (VM per process)● Test:
○ 1 Master Node (JobTracker + NameNode)○ 18 DataNodes○ 18 TaskTrackers○ TeraSort of 20 Gb data
Data Locality
![Page 35: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/35.jpg)
Terasort (data locality)
*less is better
![Page 36: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/36.jpg)
● Introduction● Environment description● Direct virtualization impact● Real-life workload● Data locality● Conclusion
Agenda
![Page 37: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/37.jpg)
● Only 6% performance impact for composite test● Performance continuously improving with
external libs upgrade (QEMU, Open vSwitch)● Much more topology flexibility● Isolation at low cost
○ between clusters○ between nodes within cluster
Conclusion
![Page 38: on OpenStack of Hadoop Performance · PDF fileof Hadoop on OpenStack Andrew Lazarev Mirantis, 2014 ... (based on RedHat presentation on KVM Forum): ... Hadoop can consider “distance”](https://reader031.vdocuments.site/reader031/viewer/2022021510/5aaaafad7f8b9a9a188e880a/html5/thumbnails/38.jpg)
Q&A