networking for the grid yee-ting li escience summer school @ edinburgh

35
Networking for the Networking for the Grid Grid Yee-Ting Li Yee-Ting Li eScience Summer School @ eScience Summer School @ Edinburgh Edinburgh

Upload: elmer-lawrence

Post on 02-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Networking for the GridNetworking for the Grid

Yee-Ting LiYee-Ting Li

eScience Summer School @ eScience Summer School @ EdinburghEdinburgh

Page 2: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

What the GRID isWhat the GRID is

• Worldwide Distributed SystemWorldwide Distributed System

• Interconnected with ‘networks’Interconnected with ‘networks’

• Balancing processors, storage and Balancing processors, storage and network utilizationnetwork utilization

• Networking is important to make Networking is important to make GRID workGRID work

Page 3: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Networking Important!Networking Important!

• Only way two grid nodes can Only way two grid nodes can communicate with each othercommunicate with each other

• Need ways of determining how Need ways of determining how ‘efficiently’ they talk‘efficiently’ they talk

• Focus on:Focus on:– The characterising how they talkThe characterising how they talk– The language they use to talkThe language they use to talk

Page 4: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Part 1Part 1

• NetworkingNetworking

• Networking MonitoringNetworking Monitoring– Networks are also transientNetworks are also transient– Network performance also varies as you’re Network performance also varies as you’re

sharing with n million other userssharing with n million other users

• Sometimes you can notice periodic patterns Sometimes you can notice periodic patterns – sometimes you can’t– sometimes you can’t– Difficult to analyse and create trends/predictionsDifficult to analyse and create trends/predictions– Show steps towards…Show steps towards…

Page 5: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Networking 101Networking 101

Workstation Workstation

Cloud

• Networking straight forwardNetworking straight forward

• Just connect to the network and it works!Just connect to the network and it works!

• HA!HA!

Page 6: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

NetworkingNetworking

Cloud

Workstation WorkstationRouter

Router

Router

Switch Switch

• Complex? Get’s more complex!Complex? Get’s more complex!

• Each node has it’s own scheduling prioritiesEach node has it’s own scheduling priorities

• Routers must serve trillions of data units Routers must serve trillions of data units per second!per second!

Page 7: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

NetworkingNetworking

Driver

NIC

Kernel space

User space

Application

• Complex stack from Complex stack from which data has to flow which data has to flow to get onto networkto get onto network

• Each node on the Each node on the network also has their network also has their own stacksown stacks

• Routers have IPR on Routers have IPR on stacks – no one knows stacks – no one knows what Cisco stuff looks what Cisco stuff looks like!like!

Page 8: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Example MetricsExample Metrics

• ConnectivityConnectivity

• DelayDelay– One-way delayOne-way delay– Two-way delayTwo-way delay

• Throughput / goodputThroughput / goodput

• Network pathNetwork path

• LossLoss

• JitterJitter

Page 9: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Metrics ExampleMetrics Example

• Video ConferencingVideo Conferencing– Needs predictable bit rateNeeds predictable bit rate– Doesn’t usually matter if bit rate changes too Doesn’t usually matter if bit rate changes too

muchmuch– Needs constant jitterNeeds constant jitter– Low one-way delay preferableLow one-way delay preferable

• FTPFTP– Needs reliable transportNeeds reliable transport– Throughput depends on urgency of dataThroughput depends on urgency of data– Jitter and delay don’t matterJitter and delay don’t matter

Page 10: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Network Monitoring UsesNetwork Monitoring Uses

• Monitoring is measuring over long periods Monitoring is measuring over long periods of timeof time

• Gives an indication of network Gives an indication of network performance over time – a baselineperformance over time – a baseline

• Allows comparison of different tools for Allows comparison of different tools for analysisanalysis

• Allows analysis of how different protocols Allows analysis of how different protocols behave in different conditions – in real lifebehave in different conditions – in real life

• Allows ‘tuning’ of existing protocols to Allows ‘tuning’ of existing protocols to make most out of networkmake most out of network

Page 11: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Possible Users of a NM Web Possible Users of a NM Web ServiceService• Network ManagersNetwork Managers

– See how much bandwidth is being usedSee how much bandwidth is being used• Network AnalystsNetwork Analysts

– Make things faster and better!Make things faster and better!• Resource BrokersResource Brokers

• Broker to determine where to send jobs – Network CostBroker to determine where to send jobs – Network Cost

• Bandwidth BrokersBandwidth Brokers– Allocate bandwidth depending on current network stateAllocate bandwidth depending on current network state

• Replication ManagersReplication Managers– Distribute data only when network is not busyDistribute data only when network is not busy

• QoS Brokers (aka Managed bandwidth Services)QoS Brokers (aka Managed bandwidth Services)– Universal language for intercommunication..? Universal language for intercommunication..?

• Next Generation FTPNext Generation FTP– First look up historical throughputs before sending to determine best First look up historical throughputs before sending to determine best

pathpath

Page 12: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

GridNMGridNM

• Architecture for monitoring the networkArchitecture for monitoring the network• Backend – collects data for presentationBackend – collects data for presentation• Logs metrics in ASCII log files on a single Logs metrics in ASCII log files on a single

hosthost• Allows mesh measurements – all nodes Allows mesh measurements – all nodes

performs measurements to al other nodesperforms measurements to al other nodes• Uses standard UNIX infrastructure – sshUses standard UNIX infrastructure – ssh

– Should be easily adaptable to using Globus Should be easily adaptable to using Globus certifications once interactive processing is certifications once interactive processing is introduced in EDG.introduced in EDG.

Page 13: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

GridNM (cont…)GridNM (cont…)

• Uses existing (and future tools) to collect Uses existing (and future tools) to collect metricsmetrics

• Modular - uses XML to describe available Modular - uses XML to describe available resourcesresources– HostsHosts– ToolsTools

• Locks hosts if under measurement – Locks hosts if under measurement – prevents other tests affecting metricsprevents other tests affecting metrics

• Currently monitoring 6 sites around Currently monitoring 6 sites around Europe using 5 toolsEurope using 5 tools

Yee-Ting Li
check number of tools and hosts
Page 14: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

GridNM ‘plot’GridNM ‘plot’

Page 15: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Web Service Network Web Service Network MonitoringMonitoring

• GridNM just one Network Monitoring GridNM just one Network Monitoring ProgramProgram

• Many different programs out there!Many different programs out there!

• Unify data exchange between Unify data exchange between different monitoring infrastructuresdifferent monitoring infrastructures

Page 16: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

piPEspiPEs

• Internet2 e2ePI Architecture for network Internet2 e2ePI Architecture for network monitoringmonitoring

• Defines information flow to diagnose networks Defines information flow to diagnose networks and hosts performance – white paperand hosts performance – white paper

• Incorporates a ‘finger pointing’ mechanism to Incorporates a ‘finger pointing’ mechanism to identify poor performersidentify poor performers

• Ideal starting point!Ideal starting point!• BUT… found out about it too late…BUT… found out about it too late…• Currently investigating implementation with SLAC Currently investigating implementation with SLAC

software + web service as possible software + web service as possible implementation of piPEs softwareimplementation of piPEs software

Page 17: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

GGF NMWGGGF NMWG

• Defines characteristics that are just the Defines characteristics that are just the values that we are interested invalues that we are interested in

• Defines classes of metrics, e.g. bandwidth, Defines classes of metrics, e.g. bandwidth, delay etc. that these characteristics reportdelay etc. that these characteristics report

• Defines singleton and derived Defines singleton and derived characteristicscharacteristics

• Defines samples of data and their inherent Defines samples of data and their inherent sampling patternssampling patterns

• TimestampsTimestamps• Still in draft form…Still in draft form…

Page 18: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

GGF NMWG cont. / Schema GGF NMWG cont. / Schema DesignDesign• As it’s all in XML, designing a XML schema As it’s all in XML, designing a XML schema

to describe ‘objects’ to be passed aroundto describe ‘objects’ to be passed around• XML Schema Document (XSD)XML Schema Document (XSD)• Focusing actually implementing what the Focusing actually implementing what the

NMWG document says… and doesn’t say…NMWG document says… and doesn’t say…• Note: We are also tackling this from a pure Note: We are also tackling this from a pure

OO design too – however, due to technical OO design too – however, due to technical differences between objects in C++, Java differences between objects in C++, Java and SOAP/XML then there may be issues and SOAP/XML then there may be issues to overcome…to overcome…

Page 19: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Part 2Part 2

• Network Communication LanguagesNetwork Communication Languages

• Known as Known as transport protocols - transport protocols - determines how applications put determines how applications put traffic into the networktraffic into the network

• Sits on top of IP – common language Sits on top of IP – common language of the internetof the internet

Page 20: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Transport Level ProtocolsTransport Level Protocols

• TCP (HTTP, FTP, GridFTP) used for file transferTCP (HTTP, FTP, GridFTP) used for file transfer– Gives guarantee on deliveryGives guarantee on delivery– All data is copied preciselyAll data is copied precisely– Performance can be poorPerformance can be poor– Respects other internet usersRespects other internet users

• UDP (Real, H323) used for video conferencingUDP (Real, H323) used for video conferencing– Gives no guarantees on deliveryGives no guarantees on delivery– Data may be incompleteData may be incomplete– Performance goodPerformance good– Doesn’t respect other internet usersDoesn’t respect other internet users

Page 21: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

UDP vs TCPUDP vs TCP

• Udp: min=274, max=565, ave=493, stdev=43Udp: min=274, max=565, ave=493, stdev=43• Tcp: min=37, max=292, ave=195, stdev=40Tcp: min=37, max=292, ave=195, stdev=40• Summary: tcp is rubbish! – why?Summary: tcp is rubbish! – why?

Page 22: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Memory and Disk transfersMemory and Disk transfers

Iperf TCP Mbits/s

File copy dis k-to- dis k

Fast Ethernet

OC3

Disklimited

Over 60Mbits/s iperf >> file copy

Les Cottrell, SLAC

Page 23: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

What does TCP do?What does TCP do?

• TCP retransmits lost dataTCP retransmits lost data• Even retransmits data it ‘thinks’ has Even retransmits data it ‘thinks’ has

been lost!been lost!• Needs and uses a ‘windowing’ systemNeeds and uses a ‘windowing’ system

– Uses ACKnowledgements from recieverUses ACKnowledgements from reciever– Grows a Congestion Window ‘cwnd’ to Grows a Congestion Window ‘cwnd’ to

determine the size of windowdetermine the size of window

• Model:Model:– Tap is independent of Tank sizeTap is independent of Tank size– Tank filled by applicationTank filled by application– Valve opening (data rate) determined by Valve opening (data rate) determined by

feedback from networkfeedback from network– Small tanks mean small data rateSmall tanks mean small data rate– Large tanks mean larger data rateLarge tanks mean larger data rate

Socket buffer size

TCP Protocol

Network

Page 24: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

TCP socket buffer sizesTCP socket buffer sizes

• Iperf observations: 490Iperf observations: 490• Standard socket buffer graphStandard socket buffer graph

– Shows linear(ish) region followed by plateauShows linear(ish) region followed by plateau• Optimal socket buffer size just over 2mBOptimal socket buffer size just over 2mB

Page 25: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Retransmitted DataRetransmitted Data

• Graph shows the Graph shows the amount of amount of retransmitted data retransmitted data against the against the throughputthroughput

• Retransmitted data Retransmitted data is due to loss on is due to loss on the networkthe network

• General case ACK’s General case ACK’s have to timeout have to timeout before resendingbefore resending

• We get more We get more retransmitted data retransmitted data for low throughputs for low throughputs with large windowswith large windows

Page 26: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Measuring Performance of Measuring Performance of Transport Level ProtocolsTransport Level Protocols• Need to identify what we want to measure – Need to identify what we want to measure –

the metrics.the metrics.• Dependant on the use of the transport Dependant on the use of the transport

protocol. Need to analyse application level protocol. Need to analyse application level usageusage

• For Grid:For Grid:– Movement of ‘transient’ dataMovement of ‘transient’ data

• File Transfer and ReplicationFile Transfer and Replication• process jobs or ‘sandboxes’process jobs or ‘sandboxes’

– Movement of Real-Time Data Movement of Real-Time Data • Video Conferencing – Access GridVideo Conferencing – Access Grid• Real-Time applicationsReal-Time applications

Page 27: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Web 100 & TCPWeb 100 & TCP

• OSI states that we should OSI states that we should not know anything about not know anything about the separate layersthe separate layers

• How do we know How do we know something is going wrong? something is going wrong? – your throughput – your throughput decreases!decreases!

• Prevents congestion Prevents congestion collapse!collapse!

• Need Web100! Allows in Need Web100! Allows in depth tcp stack analysis depth tcp stack analysis per flowper flow

• Kernel patch – 2.4.16, Kernel patch – 2.4.16, alpha1.2alpha1.2

• New version – 2.4.19 New version – 2.4.19 alpha2.0pre1alpha2.0pre1

• Using program to grab Using program to grab web100 results - web100 results - logvarslogvars

Page 28: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Reliability of Web100 Reliability of Web100 results…results…

• Still alpha… but Still alpha… but reliablereliable

• Graph against iperf Graph against iperf throughputs throughputs correlate very wellcorrelate very well

• At least as reliable At least as reliable as the result as the result offered by iperf!offered by iperf!

Page 29: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Congestion WindowCongestion Window

• Looking at the Looking at the max_cwnd achieved max_cwnd achieved for each for each measurement…measurement…

• Appears to be two Appears to be two regionsregions– with high correlation with high correlation

of throughput and of throughput and max cwndmax cwnd

– A linear region A linear region where we get the a where we get the a range of range of throughputs for throughputs for same max_cwndsame max_cwnd

• Cwnd never grows Cwnd never grows beyond 1500kbytes!beyond 1500kbytes!

Page 30: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Bandwidth Delay ProductBandwidth Delay Product

• Window = bandwidth * delayWindow = bandwidth * delay• We wantWe want

– Bandwidth = 1,000,000,000 bit/secBandwidth = 1,000,000,000 bit/sec• We haveWe have

– Delay = 19msDelay = 19ms• Window needs to be an average of…Window needs to be an average of…

– =1e+9 * 19e-3 / 8 bytes=1e+9 * 19e-3 / 8 bytes– =2.25mbytes!=2.25mbytes!

• We only achieve ~1.5mbytes max!We only achieve ~1.5mbytes max!• Need to implement some monitoring of the Need to implement some monitoring of the

degree of the average and variation of cwnd for degree of the average and variation of cwnd for each tcp connection…each tcp connection…

Page 31: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

TCP OptimisationTCP Optimisation

• It’s actually TCP that is limiting our It’s actually TCP that is limiting our transfer rates!transfer rates!– All applications use it!All applications use it!

• Understandable as TCP hasn’t changed Understandable as TCP hasn’t changed much for the last 15-20 years!much for the last 15-20 years!– When standard link was about 56kbit/sec!When standard link was about 56kbit/sec!

• Solution: Need new TCP implementations!Solution: Need new TCP implementations!

Page 32: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

What is High Speed TCP?What is High Speed TCP?

• Changes the way TCP behaves at high speed (ie Changes the way TCP behaves at high speed (ie large cwnd)large cwnd)

• Standard TCP has two modesStandard TCP has two modes– Slow start (not very slow…)Slow start (not very slow…)– Congestion AvoidanceCongestion Avoidance

• Focuses on Congestion Avoidance Region – ie Focuses on Congestion Avoidance Region – ie when TCP knows (thinks it knows…) how well the when TCP knows (thinks it knows…) how well the network behaves…network behaves…

• BUT only when we are at high speeds, else do BUT only when we are at high speeds, else do what normal Standard TCP does…what normal Standard TCP does…

• Readily deployable 1Readily deployable 1stst step towards Equation step towards Equation Based Congestion ControlBased Congestion Control

Page 33: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

What does it do?What does it do?

• Standard TCP uses two parametersStandard TCP uses two parameters– Increase parameter, aIncrease parameter, a– Decrease parameter, bDecrease parameter, b

• i.e. AIMD( a,b )i.e. AIMD( a,b )• Standard TCP usesStandard TCP uses

– a=1a=1– b=0.5b=0.5

• High Speed TCP introducesHigh Speed TCP introduces– a->a(cwnd)a->a(cwnd)– b->b(cwnd)b->b(cwnd)

• i.e. The value of a and b depends on the current congestion i.e. The value of a and b depends on the current congestion window sizewindow size

• If we increase If we increase aa more with larger cwnd we can get back up to our more with larger cwnd we can get back up to our ‘optimal’ cwnd size for the network path‘optimal’ cwnd size for the network path

• If we decrease If we decrease b b less we don’t lose as much bandwidth due to a less we don’t lose as much bandwidth due to a small congestion windowsmall congestion window

Page 34: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

What exactly does it do?What exactly does it do?

• Based on the TCP response functionBased on the TCP response function– Relates loss and throughputRelates loss and throughput

• Uses the TCP response function to investigate Uses the TCP response function to investigate certain parameterscertain parameters– High_Window, High_Loss; largest cwnd needed for x High_Window, High_Loss; largest cwnd needed for x

throughput and the required loss for that throughputthroughput and the required loss for that throughput– Low_Window, Low_Loss; smallest cwnd when we actually Low_Window, Low_Loss; smallest cwnd when we actually

switch from Standard TCP and the required loss rate for switch from Standard TCP and the required loss rate for that cwnd sizethat cwnd size

– High_B; the smallest decrease in High_B; the smallest decrease in bb when we are at a large when we are at a large cwndcwnd

• Equations to transform this information into a table Equations to transform this information into a table for a(cwnd) and b(cwnd)for a(cwnd) and b(cwnd)

Page 35: Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh

Transport Protocols ‘NG’Transport Protocols ‘NG’Name Transport Notes

UDP Blast UDP

Tsunami UDP/TCP Uses TCP as ‘control’ channel

High Speed TCP TCP For 10Gb/sec links

PGM / CC Modified UDP Multicast UDP – new transport protocol

IBP Application ‘logistical networking’