cisco hadoop summit 2013
DESCRIPTION
Cisco's presentations at Hadoop Summit 2013.TRANSCRIPT
Hadoop Considerations• Traffic Types, Job Patterns, Network Considerations, Compute
Network Integration• Co-exist with current Data Center infrastructure
• Open, Programmable and Application-Aware Networks
Multi-tenancy • Remove the “Silo clusters”
Agenda
2
3
Hadoop Job Patterns and Network Traffic
Job Patterns
4
Analyze
Extract Transform Load (ETL)
Explode
Reduce
Reduce
Reduce
Ingress vs. Egress
Data Set
1:0.3
Ingress vs. Egress
Data Set
1:1
Ingress vs. Egress
Data Set
1:2
The Time the reducers start is dependent on:
mapred.reduce.slowstart.completed.maps
It doesn’t change the amount of data sent to Reducers, but
may change the timing to send that data
Traffic Types
5
Small Flows/Messaging(Admin Related, Heart-beats, Keep-alive,
delay sensitive application messaging)
Small – Medium Incast(Hadoop Shuffle)
Large Flows(HDFS Ingest)
Large Incast(Hadoop Replication)
Map and Reduce Traffic
6
Many-to-Many Traffic Pattern
Map 1 Map 2 Map NMap 3
Reducer 1 Reducer 2 Reducer 3 Reducer N
HDFS
Shuffle
Output Replication
NameNode
JobTracker
ZooKeeper
AnalyzeSimulated with Shakespeare Wordcount
Extract Transform Load (ETL)
Simulated with Yahoo TeraSort
Extract Transform Load (ETL)
Simulated with Yahoo TeraSort with output
replication
Job PatternsJob Patterns have varying impact on network utilization
8
Integration into the Data Center
9
Network Attributes Architecture Availability Capacity, Scale &
Oversubscription Flexibility Management & Visibility
Integration Considerations
Availa
blity
Bufferin
g
Overs
ubscrip
tion
Data
Node Spee
d
Laten
cy
Data Node Speed Differences
10
Single 1GE100% Utilized
Dual 1GE75% Utilized
10GE40% Utilized
Generally 1G is being used largely due to the cost/performance trade-offs. Though 10GE can provide benefits depending on workload
• No single point of failure from network view point. No impact on job completion time
• NIC bonding configured at Linux – with LACP mode of bonding
• Effective load-sharing of traffic flow on two NICs.
• Recommended to change the hashing to src-dst-ip-port (both network and NIC bonding in Linux) for optimal load-sharing
Availability Single Attached vs. Dual Attached Node
11
1 13 25 37 49 61 73 85 97 109
121
133
145
157
169
181
193
205
217
229
241
253
265
277
289
301
313
325
337
349
361
373
385
397
409
421
433
445
457
469
481
493
505
517
529
541
553
565
577
589
601
613
625
637
649
661
673
685
697
709
721
733
745
757
769
781
793
Job
Com
pleti
on
Cell
Usa
ge
1G Buffer Used 10G Buffer Used 1G Map % 1G Reduce % 10G Map % 10G Reduce %
1GE vs. 10GE Buffer Usage
12
Moving from 1GE to 10GE actually lowers the buffer requirement at the switching layer.
By moving to 10GE, the data node has a wider pipe to receive data lessening the need for buffers on the network as the total aggregate transfer rate and amount of data does not increase substantially. This is due, in part, to limits of I/O and Compute capabilities
Integration Considerations
Goals
• Extensive Validation of Hadoop Workload
• Reference ArchitectureMake it easy for Enterprise
Demystify Network for Hadoop Deployment
Integration with Enterprise with efficient choices of network topology/devices
Findings
• 10G and/or Dual attached server provides consistent job completion time & better buffer utilization
• 10G provide reduce burst at the access layer
• Dual Attached Sever is recommended design – 1G or 10G. 10G for future proofing
• Rack failure has the biggest impact on job completion time
• Does not require non-blocking network
• Latency does not matter much in Hadoop workloads
13http://www.slideshare.net/Hadoop_Summit/ref-arch-validated-and-tested-approach-to-define-a-network-designhttp://youtu.be/YJODsK0T67A
More Details From Hadoop Summit 2012 at:
14
Network Integration
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
Which port is connected?
n3548-001# show interface brief
--------------------------------------------------------------------------------Ethernet VLAN Type Mode Status Reason Speed PortInterface Ch #--------------------------------------------------------------------------------Eth1/1 1 eth access up none 10G(D) --Eth1/2 1 eth access up none 10G(D) --Eth1/3 1 eth access up none 10G(D) --Eth1/4 1 eth access up none 10G(D) --Eth1/5 1 eth access up none 10G(D) –-..Eth1/33 1 eth access up none 10G(D) --Eth1/34 1 eth access up none 10G(D) --Eth1/35 1 eth access down SFP not inserted 10G(D) --Eth1/36 1 eth access down SFP not inserted 10G(D) --Eth1/37 1 eth access down Administratively down 10G(D) –.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
What is connected there?Classic Network View
n3548-001# show mac address-table dynamic Legend: * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC age - seconds since first seen,+ - primary entry using vPC Peer-Link VLAN MAC Address Type age Secure NTFY Ports ---------+-----------------+--------+---------+------+----+------------------* 1 e8b7.484d.a208 dynamic 60570 F F Eth1/31* 1 e8b7.484d.a20a dynamic 60560 F F Eth1/31* 1 e8b7.484d.a73e dynamic 60560 F F Eth1/34* 1 e8b7.484d.a740 dynamic 60560 F F Eth1/34* 1 e8b7.484d.ad15 dynamic 60560 F F Eth1/28* 1 e8b7.484d.ad17 dynamic 60560 F F Eth1/28* 1 e8b7.484d.b3e9 dynamic 60570 F F Eth1/25* 1 e8b7.484d.b3eb dynamic 60560 F F Eth1/25..
MAC Addresses of the connected devices … and
the port they are on…
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
n3548-001# portServerMap=======================================Port Server FQDN---------------------------------------Eth1/1 c200-m2-10g2-001.cluster10g.comEth1/2 c200-m2-10g2-002.cluster10g.comEth1/3 c200-m2-10g2-003.cluster10g.comEth1/4 c200-m2-10g2-004.cluster10g.comEth1/5 c200-m2-10g2-005.cluster10g.comEth1/6 c200-m2-10g2-006.cluster10g.comEth1/7 c200-m2-10g2-031.cluster10g.comEth1/8 c200-m2-10g2-008.cluster10g.comEth1/9 c200-m2-10g2-009.cluster10g.comEth1/11 c200-m2-10g2-011.cluster10g.com...
What is actually connected there?
Which server is connected to which port on the switch …
Note: Eth1/10 is missing because there is nothing connected to it
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
n3548-001# trackerList===========================================Port Server Server Port-------------------------------------------Eth1/2 c200-m2-10g2-002 50544Eth1/3 c200-m2-10g2-003 41909Eth1/4 c200-m2-10g2-004 36480Eth1/5 c200-m2-10g2-005 38179Eth1/6 c200-m2-10g2-006 51375Eth1/7 c200-m2-10g2-031 41915Eth1/8 c200-m2-10g2-008 50983Eth1/9 c200-m2-10g2-009 37056Eth1/11 c200-m2-10g2-011 35882Eth1/12 c200-m2-10g2-012 44551...
What is running on those servers?
Hadoop - TaskTracker List
Note:Eth1/1 is not on the list because it’s the namenode and is not running a tasktracker Eth1/10 is not on the list because there is nothing connected to it
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Which node is using the buffer?
n3548-001# bufferServerMap ===================================================================Port Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 1536KB 2304KB 2304KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1536KB 1536KB Eth1/4 c200-m2-10g2-004 384KB 384KB 2304KB 2304KB 2304KB Eth1/5 c200-m2-10g2-005 384KB 384KB 768KB 1536KB 1536KB Eth1/6 c200-m2-10g2-006 384KB 2304KB 2304KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 384KB 384KB 3456KB 3840KB 3840KB Eth1/8 c200-m2-10g2-008 768KB 768KB 2688KB 2688KB 2688KB Eth1/9 c200-m2-10g2-009 384KB 384KB 2304KB 2304KB 2304KB Eth1/11 c200-m2-10g2-011 384KB 384KB 1920KB 1920KB 1920KB ...
Eth1/1(c200-m2-10g2-001) has 0 buffer usage because
it’s the name node
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
What’s running on this cluster + Buffer usage per server …
n3548-001# jobsBufferHadoop Job Info ... ===================================================================1 jobs currently runningJobId RunTime(secs) User Priorityjob_201306131423_0009 120 hadoop NORMAL ===================================================================Buffer Info - Per PortPort Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 384KB 384KB 768KB 768KB 768KB Eth1/3 c200-m2-10g2-003 384KB 384KB 1152KB 1152KB 1152KB Eth1/4 c200-m2-10g2-004 384KB 1536KB 1536KB 1536KB 1536KB Eth1/5 c200-m2-10g2-005 384KB 768KB 1152KB 1152KB 1152KB ..
What jobs were running during peak buffer usage … and for how long were
they running
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
What’s running on this cluster + Buffer usage per server …
n3548-001(config)# jobsBufferHadoop Job Info ... ===================================================================0 jobs currently runningJobId RunTime(secs) User Priority===================================================================Buffer Info - Per PortPort Server 1sec 5sec 60sec 5min 1hr-------------------------------------------------------------------Eth1/1 c200-m2-10g2-001 0KB 0KB 0KB 0KB 0KB Eth1/2 c200-m2-10g2-002 0KB 0KB 0KB 1920KB 1920KB Eth1/3 c200-m2-10g2-003 0KB 0KB 0KB 2304KB 2304KB Eth1/4 c200-m2-10g2-004 0KB 0KB 0KB 2688KB 2688KB Eth1/5 c200-m2-10g2-005 0KB 0KB 0KB 2304KB 2304KBEth1/6 c200-m2-10g2-006 0KB 0KB 0KB 2304KB 2304KB Eth1/7 c200-m2-10g2-031 0KB 0KB 0KB 1920KB 2688KB .
Historic look at the buffer usage …
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Server Resource Monitoring – CPU, Connections, etc.,
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Network Resource Monitoring – Buffer Counters etc.,
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24
Server + Network
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25
Shuffle vs Replication + Buffer UsageTerasort on 10G
Buffer Usage
Shuffle
Replication
Reduce
Map
0 60 120 180 240 300 360 420 480 540 600 660 720 780
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26
Network + Application Visibility Model
(Python Socket)
Push Data Push Data Push Data
Application Logs
PTP Grandmaster(OPTIONAL)
Analyze
AP
I to
App
licat
ion
Info
Synchronize Tim
e
github.com/datacenter
27
Multi-tenant Environments
28
Hadoop + HBASE
Job Based
Department Based
Various Multitenant Environments
Need to understand Traffic Patterns
Scheduling Dependent
Permissions and Scheduling Dependent
29
Map 1 Map 2 Map NMap 3
Reducer 1
Reducer 2
Reducer 3
Reducer N
HDFS
Shuffle
Output Replication
Region Server
Region Server
Client Client
Major Compaction
ReadRead
Read
Update
Update
Read
Major Compaction
30
Hbase During Major Compaction
Read/Update Latency
Comparison of Non-QoS vs. QoS Policy
~45% for Read Improvement
Switch Buffer Usage
With Network QoS Policy to prioritize
Hbase Update/Read Operations
Switch Buffer Usage
With Network QoS Policy to prioritize
Hbase Update/Read Operations
Hbase + Hadoop Map Reduce
Read/Update Latency
Comparison of Non-QoS vs. QoS Policy
~60% for Read Improvement
Cisco Unified Data Center
UNIFIEDFABRIC
UNIFIED COMPUTING
Highly Scalable, Secure Network Fabric
Modular StatelessComputing Elements
UNIFIED MANAGEMENT
AutomatedManagement
THANK YOU FOR LISTENING
www.cisco.com/go/ucswww.cisco.com/go/nexushttp://www.cisco.com/go/workloadautomation
Manages Enterprise Workloads
Cisco.com Big Datawww.cisco.com/go/bigdata
Data Center Script Examples from Presentation:
github.com/datacenter