internet performance dynamics

31
Internet Performance Dynamics oston University Computer Science Departmen http://cs-people.bu.edu/barford/ Fall, 2000 Paul Barford

Upload: ravi

Post on 14-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Internet Performance Dynamics. Paul Barford. Boston University Computer Science Department. http://cs-people.bu.edu/barford/. Fall, 2000. Motivation. What are the root causes of long response times in wide area services like the Web? Servers? Networks? Server/network interaction?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Internet Performance Dynamics

Internet Performance Dynamics

Boston University Computer Science Department

http://cs-people.bu.edu/barford/

Fall, 2000

Paul Barford

Page 2: Internet Performance Dynamics

Motivation

What are the root causes of long response times in wide area services like the Web? Servers? Networks? Server/network interaction?

Page 3: Internet Performance Dynamics

A Challenge

Histograms of file transfer latency for 500KB files transferred between Denver and Boston

Day 1 Day 2

HS mean = 8.3 sec. LS Mean = 13.0 sec. HS mean = 5.8 sec. LS Mean = 3.4 sec.

Precise separation of server effects from network effects is difficult

2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Seconds

2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Seconds

HS mean = 8.3 sec. LS mean = 13.0 sec.

HSLS

HS

LS

HS mean = 5.8 sec. LS mean = 3.4 sec.

Page 4: Internet Performance Dynamics

What is needed?

A laboratory enabling detailed examination of Web transactions (Web “microscope”) Wide Area Web Measurement (WAWM)

project testbed Technique for analyzing transactions to separate

and identify causes of delay Critical path analysis of TCP

Page 5: Internet Performance Dynamics

Web Transactions “under a microscope”

WebServer

Distributed Clients

Global Internet

Page 6: Internet Performance Dynamics

Generating Realistic Server Workloads

Approaches: Trace-based:

Pros: Exactly mimics known workload Cons: “black box” approach, can’t easily change

parameters of interest Analytic: synthetically create a workload

Pros: Explicit models can be inspected and parameters can be varied

Cons: Difficult to identify, collect, model and generate workload components

Page 7: Internet Performance Dynamics

SURGE: Scalable URL Reference Generator

Analytic Web workload generator Based on 12 empirically derived distributions Explicit, parameterized models Captures “heavy-tailed” (highly variable) properties of

Web workloads SURGE components:

Statistical distribution generator Hyper Text Transfer Protocol (HTTP) request generator

Currently being used at over 130 academic and industrial sites world wide Adopted by W3C for HTTP-NG testbed

Page 8: Internet Performance Dynamics

Seven workload characteristics captured in SURGE

Characteristic Component Model System Impact

File Size Base file - body Lognormal File System *Base file - tail Pareto *Embedded file Lognormal *Single file1 Lognormal *Single file 2 Lognormal *

Request Size Body Lognormal Network *Tail Pareto *

Document Popularity Zipf Caches, buffersTemporal Locality Lognormal Caches, buffersOFF Times Pareto *Embedded References Pareto ON Times *Session Lengths Inverse Gaussian Connection times

* Model developed during the SURGE project

BF EF1 EF2 Off time SF Off time BF EF1

Page 9: Internet Performance Dynamics

HTTP request generator

Supports both HTTP/1.0 and HTTP/1.1 ON/OFF thread is a “user equivalent”

SURGE Client System

SURGE Client System

SURGE Client System

Network

ON/OFF Thread

ON/OFF Thread

ON/OFF Thread Web Server System

Page 10: Internet Performance Dynamics

SURGE and SPECWeb96 exercise servers very differently

Surge

SPECWeb96

-5

0

5

10

15

20

25

30

35

40

0 200 400 600

Packets per Second

Per

cen

t C

PU

Uti

liza

tio

n

SPECWeb96

SURGE

Page 11: Internet Performance Dynamics

SURGE’s flexibility allows easy experimentation

HTTP/1.0 HTTP/1.1

Page 12: Internet Performance Dynamics

Web Transactions “under a microscope”

WebServer

Distributed Clients

Global Internet

Page 13: Internet Performance Dynamics

WAWM Infrastructure

13 clients distributed around the global Internet Execute transactions of interest

One server cluster at BU Local load generators running SURGE enable

server to be placed under any load condition Active and passive measurements from both

server and clients Packet capture via “tcpdump”

GPS timers

Page 14: Internet Performance Dynamics

WAWM client systems

Harvard University, MAPurdue University, INUniversity of Denver, COACIRI, Berkeley, CAHP, Palo Alto, CAUniversity of Saskatchewan, CanadaUniversity Federal de Minas Gerais, BrazilUniversity Simon Bolivar, Venezuela

EpicRealm - Dallas, TXEpicRealm – Atlanta, GAEpicRealm - London, EnglandEpicRealm - Tokyo, Japan

Internet2/Surveyor

Others??

Page 15: Internet Performance Dynamics

What is needed?

A laboratory enabling detailed examination of Web transactions (Web “microscope”) Wide Area Web Measurement (WAWM)

project testbed Technique for analyzing transactions to separate

and identify causes of delay Critical path analysis of TCP

Page 16: Internet Performance Dynamics

Identifying root causes of response time

Delays can occur at many points along the end-to-end path simultaneously

Pinpointing where delays occur and which delays matter is difficult

Our goal is to identify precisely the determiners of response time in TCP transactions

Client

Router 1

Router 2

Router 3 Server

Page 17: Internet Performance Dynamics

Critical path analysis (CPA) for TCP transactions

CPA identifies the precise set of events that determine execution time of a distributed application Web transaction response time

Decreasing duration of any event on the CP decreases response time not true for events off the CP

Profiling the CP for TCP enables accurate assignment of delays to: Server delay Client delay Network delay (propagation, network variance and drops)

Applied to HTTP/1.0 Could apply to other applications (eg. FTP)

Page 18: Internet Performance Dynamics

Window-based flow control in TCP

Client Server

1 or more data packets

ACK packet

Client Server

D

DD

DD

DD

DDD

A

AA

D

A

A

DD

D D

System Time line Graph

DD

A

A

D D

AAA

D DDDDDDD

DDD

Page 19: Internet Performance Dynamics

TCP flows as a graph

Vertices are packet departures or arrivals Data, ACK, SYN, FIN

Directed edges reflect Lamport’s “happens before” relation On client or server or over the network

Weights are elapsed time Assumes global clock synchronization

Profile associates categories with edge types Assignment based on logical flow

Page 20: Internet Performance Dynamics

Client Server

Original Data Flow

1461:2921

5841:7301

11681:13141

drop 17521

17521:20441

20441:24821

16061:17521

24821:27741

27741:29201

ack 2921

ack 7301

ack 10221

ack 16061

ack 16061

ack 16061

ack 24821

ack 27741

Number

Rounds

Bytes Liberated

1:29201

2921:73002

7301:131403

13141:175204

17520:248205

24821:277406

27741:306607

Client Server

Critical Path

1461:2921

5841:7301

11681:13141

drop 17521

16061:17521

24821:27741

27741:29201

ack 2921

ack 7301

ack 10221

ack 24821

ack 27741

Profile

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Network Delay

Server Delay

Server Delay

Server Delay

Client Delay

Client Delay

Drop Delay

Page 21: Internet Performance Dynamics

tcpeval

Inputs are “tcpdump” packet traces taken at end points of transactions

Generates a variety of statistics for file transactions File and packet transfer latencies Packet drop characteristics Packet and byte counts per unit time

Generates both timeline and sequence plots for transactions

Generates critical path profiles and statistics for transactions

Freely distributed

Page 22: Internet Performance Dynamics

Implementation Issues

tcpeval must recreate TCP state at end points as packets arrive Capturing packets at end points makes timer simulation

unnecessary “Active round” must be maintained

Packet filter problems must be addressed Dropped packets Added packets Out of order packets

tcpeval works across platforms for RFC 2001 compliant TCP stacks

Page 23: Internet Performance Dynamics

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Seconds

LN, LS LN, HS HN, LS HN, HS

Net Variance

Propagation

Server

Time Out

Client

Fast Retrans

CPA results for 1KB file

Latency is dominated by server load for BU to Denver path

6 packets are typically on the critical path

Page 24: Internet Performance Dynamics

CP time line diagrams for 1KB file

Low Server Load High Server Load

Page 25: Internet Performance Dynamics

CPA results for 20KB file

Both server load and network effects are significant

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Seconds

LN, LS LN, HS HN, LS HN, HS

Net Variance

Propagation

Server

Time Out

Client

Fast Retrans

14 packets are typically on the critical path

Page 26: Internet Performance Dynamics

The Challenge

Histograms of file transfer latency for 500KB files transferred between Denver and Boston

Day 1 Day 2

HS mean = 8.3 sec. LS Mean = 13.0 sec. HS mean = 5.8 sec. LS Mean = 3.4 sec.

2 4 6 8 10 12 14 16 18 20

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Seconds

2 4 6 8 10 12 14 16 18 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Seconds

HS mean = 8.3 sec. LS mean = 13.0 sec.

HSLS

HS

LS

HS mean = 5.8 sec. LS mean = 3.4 sec.

Page 27: Internet Performance Dynamics

CPA results for 500KB file

Latency is dominated by network effects

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Sec

onds

LN, LS LN, HS HN, LS HN, HS

Net Variance

Propagation

Server

Time Out

Client

Fast Retrans

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Seco

nds

LN, LS LN, HS HN, LS HN, HS

Net Variance

Propagation

Server

Time Out

Client

Fast Retrans

Day 1 Day 2

56 packets are typically on the critical path

Page 28: Internet Performance Dynamics

Active versus Passive Measurements

Understanding active (Zing) versus passive (tcpdump) network measurements

Figure shows active measures are a poor predictor of TCP performance Goal is to be able to predict TCP performance using active

measurements

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

% Packet loss from tcpdump

% P

acke

t lo

ss f

rom

Zin

g

Page 29: Internet Performance Dynamics

Related work

Web performance characterization Client studies [Catledge95,Crovella96] Server studies [Mogul95, Arlitt96]

Wide area measurements NPD [Paxson97], Internet QoS [Huitema00], Keynote Systems Inc.

TCP analysis TCP modeling [Mathis97, Padhye98,Cardwell00] Graphical TCP analysis [Jacobson88, Brakmo96] Automated TCP analysis [Paxson97]

Critical path analysis Parallel program execution [Yang88, Miller90] RPC performance evaluation [Schroeder89]

Page 30: Internet Performance Dynamics

Conclusions

Using SURGE, WAWM can put realistic Web transactions “under a microscope”

Complex interactions between clients, the network and servers in the wide area can lead to surprising performance

Complex packet transactions can be effectively understood using CPA

CP profiling of BU to Denver transactions allowed precise assignment of delays Latency for small files is dominated by server load Latency for large files is dominated by network effects

Relationship between active and passive measurement is not well understood

Future work – lots of things to do!

Page 31: Internet Performance Dynamics

Acknowledgements

Mark Crovella Vern Paxson, Anja Feldmann, Jim Pitkow, Drue

Coles, Bob Carter, Erich Nahum, John Byers, Azer Bestavros, Lars Kellogg-Stedman, David Martin

Xerox, Inc., EpicRealm Inc., Internet2 Michael Mitzenmacher, Kihong Park, Carey

Williamson, Virgilio Almeida, Martin Arlitt