best practices for deploying hadoop workloads on hci for ... · spbm framework. vsan . storage ....
TRANSCRIPT
#vmworld
Best Practices for Deploying Hadoop Workloads on HCI
Powered by vSANChen Wei, VMware, Inc.
Paudie ORiordan, VMware, Inc.
HCI2038BU
#HCI2038BU
VMworld 2018 Content: Not for publication or distribution
Disclaimer
2©2018 VMware, Inc.
This presentation may contain product features orfunctionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
VMworld 2018 Content: Not for publication or distribution
Agenda
3©2018 VMware, Inc.
Hadoop on HCIWhat is means to run Hadoop on vSAN
Hadoop deployment considerationsConfiguration, Roles, Sizing and Scale Out
What Have We DoneConfiguration layout/Test Cases/Test Results
What Did We LearnPerformance, Availability, Operation and Tradeoffs
VMworld 2018 Content: Not for publication or distribution
4©2018 VMware, Inc.
HCI Integrates Servers and Storage
• Lower total costs
• Greater agility and scale
• Simplified management
Traditional 3-Tiered ArchitectureComplex and Separate Silos
Servers and Blades
External Storage
NetworkingHardware
Hyper-ConvergedInfrastructure
Unified Management
VirtualizationCompute | Storage | Network
Server + Storage Network
Built on Industry-Standard Servers and Switches
Virtualization
4
VMworld 2018 Content: Not for publication or distribution
5©2018 VMware, Inc.
What is vSAN
On each host with local disks we will create a disk group(s)
Each disk group has cache and capacity
A host can have multiple disk groups, these contribute capacity to a single shared datastore
The vSAN Datastore is accessible by all hosts in the cluster automatically
Disk GroupDisk Group Disk GroupDisk Group Disk GroupDisk Group
VMware vSphere & VMware vSAN
VMworld 2018 Content: Not for publication or distribution
6©2018 VMware, Inc.
vSAN Objects
Delta Disk
Memory DeltaSnapshot
• VM Home Namespace (VMX, NVRAM, etc.)
• VM Swap (virtual memory swap)
• Virtual Disk (VMDK)
• Snapshot (delta)
• Snapshot (delta)
VM Home
VM Swap
VMDK
VMworld 2018 Content: Not for publication or distribution
7©2018 VMware, Inc.
vSAN Datastore
vSAN uses multiple storage components on commodity servers and storage to implement a distributed Datastore across the ESXi hosts in a vSphere cluster
You can specify explicit fault domains to increase availability
• Protect against rack failure.
• Ensures other copy/replica does NOT live in the same rack as first copy
Distributed scale out storage
FD1 FD2 FD3 FD4
Rack Rack Rack Rack
replica
RAID-1
replica witness
FTT=1
VMworld 2018 Content: Not for publication or distribution
8©2018 VMware, Inc.
Storage Policy Based Management
Define storage related settings for protection and performance
• FTT for Protection• Striping for performance
Apply per VM, or even per VMDK level to meet business goals
Key to software defined storage (SDS) architecture and management at scale
Storage Policy Based Management framework used by vSAN and VVol
vSphere
SPBM Framework
vSAN Storage Policies
VMworld 2018 Content: Not for publication or distribution
9©2018 VMware, Inc.
What is Hadoop?
Supports the processing of large data sets in a distributed computing environment.
• Built on commodity hardware
• Scale out architecture
Key Technologies• Map Reduce• Hadoop Distributed File
System
VMworld 2018 Content: Not for publication or distribution
10©2018 VMware, Inc.
Data Node
Hadoop HDFS filesystem
Implements a distributed file system across commodity storage and servers that provides high-performance access to data across highly scalable Hadoop clusters.
You can create explicit rack id’s to increase availability
• Protect against rack failure.
• Ensures other copy/replica does NOT live in the same rack as first copy
Distributed scale out filesystem
Rack 1 Rack 2 Rack 3 Rack 4
Switch Switch
World
Name NodeResource Manager
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
Name NodeResource Manager
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
Node ManagerData Node
VMworld 2018 Content: Not for publication or distribution
11©2018 VMware, Inc.
Hadoop VMs and Roles
Cloudera Manager
Zookeeper Server
HDFS Journal Node
HDFS Gateway
YARN Gateway
Hive Gateway
Spark Gateway
HDFS Name Node (Active/Standby)
YARN Resource Manager(Standby/Active)
Zookeeper Server
HDFS Journal Node
HDFS Balancer
HDFS Failover Controller
HDFS HTTP FS
HDFS NFS Gateway
HDFS Data Node
YARN Node Manager
Spark Executor
Impala Daemon
Gateway VM Master VMs Worker VMs
VMworld 2018 Content: Not for publication or distribution
12©2018 VMware, Inc.
Host Failure with vSAN and HA
vSAN Objects stay available after component failure
Capacity Disk / Cache Disk / Host Failure
vSphere HA ensures VMs restart on different hosts incase of failure
Failure tolerance independent of HDFS replication factor
vSAN Policy FTT=1
vSAN Policy Host Failure to Tolerate = 1vSphere HA Enabled
SSD SSD SSD
VMVM
SSD SSD SSD
VMVM
SSD SSD SSD
VMVM VMVM
SSD SSD SSDComponentComponentComponent
VMworld 2018 Content: Not for publication or distribution
13©2018 VMware, Inc.
Host Affinity for Next Generation Applications (RPQ only)
Policy ensures data resides on the same host VM runs
For next-gen apps that provide built-in app resiliency
• Hadoop• Splunk• Cassandra
VMs run with FTT=0 for space efficiency
Application settings determines availability
Accommodate net-gen apps optimized for shared-nothing architectures
HadoopData Node
vSphere vSAN
vSAN Datastore
HadoopData Node
VMworld 2018 Content: Not for publication or distribution
14©2018 VMware, Inc.
Host Affinity and HDFS Replication Factor
No data loss, some performance degradation
Capacity or Cache Disk Failure = Host Failure
VMs unavailable with failed host
Host failure tolerance depends on HDFS redundancy factor
HDFS to rebuild the blocks on the other VMs
FTT=0 with Host Affinity
vSAN Policy FTT=0HDFS replication factor = 3
SSD SSD SSDobject
VMVM
object SSD SSD SSDobject
VMVM
objectSSD SSD SSDobject
VMVM
object SSD SSD SSDobject
VMVM
object
HDFS block rebuild
VMworld 2018 Content: Not for publication or distribution
15©2018 VMware, Inc.
Hadoop Deployment ConsiderationsConfiguration, roles, sizing and scale out
VMworld 2018 Content: Not for publication or distribution
16©2018 VMware, Inc.
Recommended vSAN ”Starter“ ConfigurationProof of Concept stage
All Flash with 6(+) * Hosts
2(+) * Disk Groups per Host
2(+) * 10(+) GbE Ports per Host
vSAN 6.7(+)
Disable Deduplication/Compression
Disable Checksum
SSD
SSD
SSD
SSD
SSD
SSD
NVMe
Disk Group 2
Cache
Capacity
SSD
SSD
SSD
SSD
SSD
SSD
NVMe
Disk Group 1
VMworld 2018 Content: Not for publication or distribution
17©2018 VMware, Inc.
Hadoop on vSAN Deployment Layout
Separate Roles into Infrastructure and Compute
Dedicate two or more hosts for Hadoop infrastructure Virtual Machines
HVE (Hadoop Virtualization Extensions) to prevent the same replicas on two DataNode VMs on the same host
Set up HA/DRS Rules to ensure VMs don’t land on same physical host
Single Small Cluster with FTT=1, HA and DRS enabled
Host 2
Host 1
Master 1VM
GatewayVM
Master 2VM
DataNode1
VM
DataNode 2
VM
DataNode 11
VM
DataNode 12
VM
Host 3
Host 8
Infrastructure Nodes
…
Data Nodes
ClouderaManager
VMworld 2018 Content: Not for publication or distribution
18©2018 VMware, Inc.
Hadoop on vSAN Deployment Layout
Separate Clusters for infra and workload
Dedicated “infra” Cluster for for Hadoop infrastructure Virtual Machines
Dedicated ”Workload” Cluster for Data /Worker Node VMs
HVE (Hadoop Virtualization Extensions) to prevent the same replicas on two DataNode VMs on the same host
Two Cluster Design
Host 4
Host 1
Master 1VM
GatewayVM
Master 2VM
DataNode1
VM
DataNode 2
VM
DataNode 11
VM
DataNode 12
VM
Host 1
Host [up to…64]
Infrastructure Nodes. FTT=1HA and DRS enabled.
…
Data / Worker Nodes.FTT=0 with Host Affinity. DRS and HA disabled.
…ClouderaManager
VMworld 2018 Content: Not for publication or distribution
19©2018 VMware, Inc.
Cluster Size Number of Servers Dedicated to Master VMs
Number of Servers Dedicated to Worker VMs
Number of Worker VMs
Expected Performance Scaling
8 2 6 12 1x
16 2 14 28 2.3x
32 3 29 58 4.8x
48 3 45 90 7.5x
How to design deploymentScale Out and Sizing
DataNode VM Configuration
CPU: 0.5 * #_of_Physical_Cores_Per_Host
RAM: 0.4 * Host RAM
OS Disk: 100GB
Number of Data Disk: 0.5 * #_of_Capacity_Drive_Per_Host
Size of Data Disk: No Less Than (3* Host RAM) / (2* #_of_Data_Disk) **** Cloudera Recommendation
VMworld 2018 Content: Not for publication or distribution
‹#› 20©2018 VMware, Inc.
Joint Work Alive on StorageHub!Search for “cloudera storagehub vsan”https://storagehub.vmware.com/t/vmware-vsan/cloudera-distribution-including-apache-hadoop-on-vmware-vsan-tm/
VMworld 2018 Content: Not for publication or distribution
21©2018 VMware, Inc.
What Have We DoneTest configuration, test cases and test results
VMworld 2018 Content: Not for publication or distribution
22©2018 VMware, Inc.
Hardware Resource
8x HPE ProLiant DL380 Gen9
Each Server:
CPU: 2x Intel Xeon Processors E5-2683 v4 @ 2.10 GHz, 16 cores/socket
RAM: 512 GiB
NICs: • 2x 1 GbE ports (MGMT Network)• 2x 10 GbE ports (VM Network) - LAG• 2x 10 GbE ports (vSAN Network) - LAG
NVMes: 4x 800 GB NVMe PCIe
SSDs: 12x 800 GB 12G SAS SSD
vSAN Disk and Network Configuration
VMworld 2018 Content: Not for publication or distribution
23©2018 VMware, Inc.
Feature FTT=0, Host Affinity FTT=1, No Host Affinity
Data Locality Host Local None
Data VMDKs per Worker VM 6x 700GB 6x 350GB
vSAN Storage Used/Capacity 52.65 / 69.86 TB 53.88 / 69.86 TB
Deduplication/Compression Disabled Disabled
Checksum Disabled Disabled
Stripe Width 12 12
Number of Copies per VMDK 1 2
HDFS Configured Capacity 47.73 TB 23.51 TB
Default HDFS Replication Factor 3 2
HDFS Max File Size at Default Replication Factor
15.91 TB 11.76 TB
vSAN ConfigurationFTT=0 vs. FTT=1
VMworld 2018 Content: Not for publication or distribution
24©2018 VMware, Inc.
VM SPEC GATEWAY VM MASTER VM WORKER VM
Quantity 1 2 12
vCPU 16
Memory 200 GiB
OS CentOS 7.5 x86_64
OS VMDK Size 250 GB 100 GB 100 GB
Data Disks 1x 100 GB 1x 100GB 6x 350(FTT=1)/700(FTT=0) GB
How we sized the Hadoop VMsVM configurations and roles
VMworld 2018 Content: Not for publication or distribution
25©2018 VMware, Inc.
Software ComponentHadoop on vSAN Cluster overview
Component VersionvSphere 6.7.0
Guest OS CentOS 7.5 x86_64
Cloudera Distribution Hadoop 5.14.2
Cloudera Manager 5.14.3
Hadoop 2.6.0+cdh5.14.2+2748
HDFS 2.6.0+cdh5.14.2+2748
YARN 2.6.0+cdh5.14.2+2748
Spark 1.6.0+cdh5.14.2+543
ZooKeeper 3.4.5+cdh5.14.2+142
VMworld 2018 Content: Not for publication or distribution
26©2018 VMware, Inc.
Test Scenarios:• FTT=0 with Host Affinity• FTT=1 without Host Affinity• Host failure
Test Cases:• Cloudera Storage Validation Testing• Hadoop MapReduce
– TeraSort Suite• 1 TB: 10 billion records• 3 TB: 30 billion records
– TestDFSIO• Spark – 500GB/1TB
– K-Means Clustering– Logistic Regression Classification– Random Forest Decision Trees
Testing Scenarios and CasesPerformance Testing Against CDH Cluster
VMworld 2018 Content: Not for publication or distribution
27©2018 VMware, Inc.
Cloudera Storage Validation TestingCloudera storage validation testing
Deploy Hadoop Cluster
Configure Cloudera Tool-
kit
Run Microbenchmark
Run HBase or Kudu
VMworld 2018 Content: Not for publication or distribution
28©2018 VMware, Inc.
Testing Results
Performance advantage, FTT=0 over FTT=1
TeraGen: • 54.4% -- 1TB• 56.7% -- 3TB
TeraSort:• 20.6% -- 1TB• 23.3% -- 3TB
TeraValidate:• 0.4% -- 1TB• 40.3% -- 3TB
TeraSort Suite performance testing
VMworld 2018 Content: Not for publication or distribution
29©2018 VMware, Inc.
Testing Results
Performance advantage, FTT=0 over FTT=1
TestDFSIO• 41.8% -- 1TB• 51.5% -- 3TB• 71.2% -- 10TB
TestDFSIO performance testing
VMworld 2018 Content: Not for publication or distribution
30©2018 VMware, Inc.
Testing Results
Performance advantage, FTT=0 over FTT=1
K-Means• 5.2% -- 500GB• 5.4% -- 1TB
Logistic Regression• 1.8% -- 500GB• 0.2% -- 1TB
Random Forest• -1.2% -- 500GB• -3.4% -- 1TB
Spark performance
VMworld 2018 Content: Not for publication or distribution
31©2018 VMware, Inc.
Failover Scenario
No data loss, some performance degradation
Capacity or Cache Disk Failure = Host Failure
VMs gone with failed host
Host failure tolerance depends on HDFS redundancy factor
FTT=0 with Host Affinity
SSD SSD SSDobject
VMVM
object SSD SSD SSDobject
VMVM
objectSSD SSD SSDobject
VMVM
object SSD SSD SSDobject
VMVM
object
HDFS block rebuild
vSAN Policy FTT=0 with Host AffinityHDFS replication factor = 3
VMworld 2018 Content: Not for publication or distribution
32©2018 VMware, Inc.
Failover Scenario
No data loss, some performance degradation
Capacity/Cache disk failure has no impact to the Hadoop cluster
VMs protected by HA/DRS
Host failure tolerance depends on vSAN FTT value
FTT=1 without Host Affinity
SSD SSD SSD
VMVM
SSD SSD SSD
VMVM
SSD SSD SSD
VMVM VMVM
SSD SSD SSDComponentComponentComponent
VMVM
vSAN Policy FTT=1HDFS replication factor = 2vSphere HA Enabled
VMworld 2018 Content: Not for publication or distribution
33©2018 VMware, Inc.
What Did We LearnConsiderations about FTT=0 with Host Affinity and FTT=1
VMworld 2018 Content: Not for publication or distribution
36©2018 VMware, Inc.
Take-Aways of Comparing FTT=1 vs. FTT=0Capacity consideration
FTT=1 without Host Affinity FTT=0 with Host Affinity
Number of copies per VMDK 2 1
HDFS raw capacity 1x 2x
HDFS redundancy factor 2 3
HDFS max file size 3x 4x
VMworld 2018 Content: Not for publication or distribution
37©2018 VMware, Inc.
I/O Performance• Better than FTT=1 in most cases
– One copy per VMDK stays local– Little data through vSAN network
Availability• Three copies per HDFS block * One copy
per VMDK => Tolerate up to two hosts failure
I/O Performance• Good, passed Cloudera Storage Validation
– Two copies per VMDK– A lot data transferred through network
Availability• Two copies perf HDFS block * Two copies
per VMDK => Tolerate one host failure
FTT=0 with Host Affinity FTT=1 without Host Affinity
Take-Aways of Comparing FTT=1 vs. FTT=0I/O performance and availability
VMworld 2018 Content: Not for publication or distribution
38©2018 VMware, Inc.
Single Host Failure• Performance impact determined by data-
set• Need to rebuild the lost DataNode or
Master VMs
Disk or Disk Group Failure• Same as Host Failure
Single Host Failure• Performance impact also determined by
data-set• Minor performance impact introduced by
vSAN resync/rebuild process• VMs saved by vSphere HA
Single Disk or Disk Group Failure• No performance impact • Very little impact introduced by vSAN
resync/rebuild process• All the VMs are up and running
FTT=0 with Host Affinity FTT=1 without Host Affinity
Take-Aways of Comparing FTT=1 vs. FTT=0Failover performance and operational consideration
VMworld 2018 Content: Not for publication or distribution
39©2018 VMware, Inc.
Great Performance
Good capacity
Better availability
Good Performance
vMotion, DRS and HA
Less impact and easiness of recovering from failover
FTT=0 with Host Affinity FTT=1 without Host Affinity
Take-Aways of Comparing FTT=1 vs. FTT=0Which to use when you need…
VMworld 2018 Content: Not for publication or distribution
40©2018 VMware, Inc.
Learn About the Future of Hyperconverged Infrastructure at the
Innovating Beyond HCI Showcase Keynote#HCI3728KU
John GilmartinSenior Vice President & General Manager, Integrated Systems Business Unit,VMware
Door prizes will be awarded at each Showcase Keynote—don’t miss your chance to win!
What is the Digital Foundation?
Yanbing Li Senior Vice President & General Manager,
Storage and Availability Business Unit, VMware
How is VMware expanding to capabilities of HCI?
What are the tools for the multi-cloud world?
Join Us Tuesday at 11AM to find out:
VMworld 2018 Content: Not for publication or distribution
41©2018 VMware, Inc.
Visit the HCI Zone in the Solutions Exchange
#vSAN #HCIZone#vSANfan
Experience Solutions Powered by vSANEdge
Video Analytics
CloudPKS with containerized MySQL
CoreSAP HANA on vSAN
DataStax
Espresso Bar vSpeaking Podcast Daily Prizes
VMworld 2018 Content: Not for publication or distribution
PLEASE FILL OUTYOUR SURVEY.Take a survey and enter a drawingfor a VMware company store gift card.
#vmworld #HCI2038BU
VMworld 2018 Content: Not for publication or distribution
THANK YOU!
#vmworld #HCI2038BU
VMworld 2018 Content: Not for publication or distribution