multi-tier trace correlation - wireshark...wan server server server a a pc wan accel wan accel a a...

58
Multi-tier Trace Correlation Paul Offord CTO, Advance7 1

Upload: others

Post on 08-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Multi-tier Trace Correlation Paul Offord CTO, Advance7

1

Page 2: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Agenda

•  Context •  Process-to-process communication •  Multi-tier traffic patterns •  Your questions •  Practical 1 – Timeframe and time accounting •  Your questions •  Correlation strategies •  Final questions •  Closing remarks

2

Page 3: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

The Enemy

3

Recurring    Gray    Problem  

It  keeps  happening  

The  causing  technology  is  unknown  

Performance  Error  

Incorrect  output  See  Wikipedia  

Page 4: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Recurring gray problems

4

Problem  Manager  

App  Support  

Data  Networks  

Server  Support  

Database  Support  

SoluDon  Architects  

?  

Desk  Support  

Page 5: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Discovery

5

SoFware  engineering  principles  

Standard  IT  diagnosDc  tools  and  techniques  

Enter fromPM Process

Gain detailed & accurate understanding

of problem symptoms

Agreedunderstanding?

Exit toPM Process

Chooseone symptomto investigate

Share, gather,explain & sort

Agree diagnosticobjective and plan

capture of definitive data

Gain accurate understanding of the

symptom environment

Execute thediagnostic capture plan

Analyse thecaptured diagnostics

No

Work with the owningSupport Team to determine

the fix

Implement the fixand re-activate the

diagnostic capture plan

Translate diagnostic data and present to the

Support Team owning the RC technology

No

No Root Cause identified? Yes Fixed?

NewRoot Cause

?No

YesYes

Adequatediagnostics

?

Yes

No

Analysecaptured data

Review thecaptured diagnostics

Is Quality Acceptable?

Yes

No

No

RPR  method  

Page 6: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

RPR Principles

•  Achieve Root Cause Identification (RCI)

•  Focus on a single symptom

•  Capture individual instances

•  Use Definitive Diagnostic Data

•  Capture in production

6

Page 7: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Performance – What happened?

7

WAN Server

Server

Server

AA

PC WAN Accel

WAN Accel

A

A

1.0s  0.2s   0.2s  0.3s   0.1s  

0.4s  

12.8s  

User  experiences  15s  response  Dme  

Page 8: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Error – What happened?

8

WAN Server

Server

Server

AA

PC WAN Accel

WAN Accel

A

A

User  receives  an  error  message  

Incorrect  interacDon  

Page 9: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Process-to-process communication

9

Client  Process  

Server  Process  

Connect  

Disconnect  

Time  

Increasin

g  

Data  Transfer  

TCP  Ports  

Page 10: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Request-response Pairs

10

Client   Network   Server  

Request  

Response  

Time  

Increasin

g  

Service  

Time  

Note:  Messages  not  packets  

Page 11: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Client-Server Chains

11

Page 12: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Slow Response – Scenario 1

12

Time  increasing

Req

Rsp

10  seconds

Req

Rsp

9.5  seconds

C SS

ReqRsp

DatabaseWeb  Server

Page 13: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Slow Response – Scenario 2

13

ReqRsp

10  seconds

Req

Rsp

ReqRsp

C SS

Database

9.5  seconds

Time  increasing

Web  Server

Page 14: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Response Time Elements

•  Client time

•  Service time

•  Request spread

•  Response spread

14

Page 15: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Client and Service Time

15

Req

Rsp

Req

Rsp

Req

Rsp

C SS

Web Database

Service  TimeClient  Time

Page 16: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Spread

16

Req  –  Part  a

Rsp  –  Part  α  

Req

Rsp  –  Part  1

Rsp  –  Part  β  

C SS

Web Database

Service  TimeClient  Time

Rsp  –  Part  2

Req  –  Part  bReq  –  Part  cReq  –  Part  d

Request  Spread

Rsp  –  Part  γResponse  Spread

Response  Spread

Time  increasing

Page 17: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Break for…

17

QuesDons?  

Page 18: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Protocol Message vs. Packets

18

BeVer  filter  expression        tcp.port==80  &&  (tcp.len>0  ||  tcp.flags.syn==1)  &&  !tcp.analysis.retransmission  

Eliminates  TCP  Keep-­‐alive  packets  Or    tcp.port==80  &&  (tcp.len>1  ||  tcp.flags.syn==1)  &&  !tcp.analysis.retransmission  

Ignore  retransmissions  Detect  

connect  delays  Remove  ACKs  

Messages  to  service  

Page 19: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

What about interleaved streams?

19

We’ll  deal  with  this  later  

Page 20: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

TMS Problem

•  Simple workflow system

•  Web browser, web server and database

•  List of work items called tickets

•  Click on ticket to display detail – Response time < 1 second

•  Intermittent response time of 5+ seconds 20

Recurring  Gray  

Problem  

Page 21: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

TMS Slow Response Time

21

Page 22: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

TMS HTTP Trace

22

Linux  

PC   Network  Web  Server  

TMS  App  

Database  Server  

TCP  Port  80  

A

Request  to  TCP  Port  80  

Response  from      

TCP  Port  80  

Think  Dme   Service  Time  

 

Time  delta  for  last  request  pkt  

to  first  response  pkt  

Approx.  response  Dme  of  6  seconds  

Page 23: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

HTTP Response Time

23

Request  to  TCP  Port  80  

Response  from  TCP  Port  80  

11:42:36.622843  

11:42:42.770757  

Service  Time  of  6.148s  

Last  request  pkt  to  

first  response  pkt  

Page 24: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Time Accounting

24

Linux  

PC   Network  Web  Server  

TMS  App  Data  base  

6.148s  

A

<  1s  

Page 25: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Break for…

25

QuesDons?  

Page 26: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

TMS Database Trace – Scenario 1

26

Linux  

PC   Network  Web  Server  PHP  App  

Database  Server  

TCP  Port  80   TCP  Port  5432  

A A

Request  to  TCP  Port  80  

Response  from  TCP  Port  80  

11:42:36.622843  

11:42:42.770757  

Req  to  5432  Rsp  from  5432  

Req  to  5432  Rsp  from  5432  

Service  Time  

Client  Time  

Timeframe  

Page 27: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Database Response Time

27

Req  to  5432  Rsp  from  5432  

Req  to  5432  Rsp  from  5432  

0.5  +  28.8  ms  

6.001s  

Request  to  TCP  Port  80  

Response  from  TCP  Port  80  

11:42:36.622843  

11:42:42.770757  

Page 28: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Break for…

28

QuesDons?  

Page 29: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Sort by TCP Connection

Use the quadruplet:

ClientIP:ClientPort:ServiceIP:ServicePort

29

Page 30: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Determining the Client Port

30

Page 31: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Calculating Service Time

31

Page 32: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Calculating Client Time

32

Page 33: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Client Time Scatter Plot

33

0  

1,000  

2,000  

3,000  

4,000  

5,000  

6,000  

7,000  

0   5   10   15   20   25   30  

Client  Tim

e  (m

s)  

Client  Time  for  Database  Trace  Mouse  over  this  to  get  spreadsheet  row  number  and  hence  trace  frame  number  

Page 34: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Server Time Scatter Plot

34

0  

5  

10  

15  

20  

25  

30  

0   5   10   15   20   25   30  

Service  Time  (m

s)  

Service  Time  for  Database  Trace  

Page 35: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Updated Time Account

35

Linux  

PC   Network  Web  Server  

TMS  App  Data  base  

6.089s  

A

<  1s   0.059s  

6.148s  

Other  services?  

A

Page 36: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Most work but some don’t

36

Protocol   Flip-­‐Flop  

Web  (HTTP  and  HTTPS)   Yes  

Web  Services  (e.g.  .NET  RemoDng,  WCF)   Yes  

Other  RPC  (e.g.  Java  RMI,  MSRPC)   Yes  

Database  (e.g.  MicrosoF1,  Sybase,  Oracle)   Yes  

File  Server  (SMB2,  SMB23,  NFS)   Yes  

Many  proprietary  protocols   Yes  

Citrix  ICA   No  

Windows  Terminal  Server  RDP   No  

1.  MARS  may  have  to  be  considered  2.  Further  sort  criteria  need  to  be  considered  3.  Further  sort  criteria  need  to  be  considered  

Page 37: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

37

What  about  clock  sync?  

See  the  RPR  book  

Page 38: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Break for…

38

QuesDons?  

Page 39: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Correlation Strategies

39

•  Don’t need to

•  Port-to-port mapping

•  Based on data content

•  Based on characterization

Page 40: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

No Need - Scenario

40

Under  load  one  transacDon  intermiVently  gave  a  60+  second  response  Dme  

HTTPServer

Customer Presentation

Server(WAS)

Siteminder Policy Server

HTTPServer

WebSphereApplication

Server

Oracle Database

LoadInjector

Page 41: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

41

No Need - Analysis

HTTPServer

HTTPS

Customer Presentation

Server(WAS)

Siteminder Policy Server

HTTPServer

WebSphereApplication

Server

HTTPS

HTTPS

HTTPS Total  for  all  response  times  (hundreds  of  them)  during  the  48.613-­‐second  timeframe  is0.5  seconds

62.291

61.887s

48.613s

2s

11s

HTTPS

62.300s

10:25:35.016

10:26:36.904

Oracle Database

HTTPS

Page 42: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

42

No Need – Further elimination

HTTPServer

HTTPS

Customer Presentation

Server(WAS)

Siteminder Policy Server

HTTPServer

WebSphereApplication

Server

HTTPS

HTTPS Total  for  all  WAS  response  times  during  the  61.887-­‐second  timeframe  is1.162  secs

62.291

61.887s

HTTPS

62.300s

10:25:35.016

10:26:36.904

Oracle Database

HTTPS

SP  &  SQL/TNSTotal  for  all  database  response  times  in  

61.887-­‐second  timeframe  is  1.181  secs

SP  &  SQL/TNS

Page 43: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

43

Port-to-port Mapping

WANPC

XenAppFile

Serverwith User’s Home

Directory

A AA Trace  A Trace  BTrace  C

ICA  Session  StartICA  Traffic

SMB  Tree  Connect

192.168.3.22192.168.9.67 192.168.1.38 192.168.3.8

192.168.3.22:47006 192.168.3.8:445

192.168.3.9:8276 192.168.1.38:2598

User  fredblogs  starts  the  Citrix  client  

A  short  Dme  later  the  XenApp  server  connects  to  \\mainfs\home\fredblogs  

Page 44: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

44

Content  Matching  

Page 45: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

TMS Database Trace – Scenario 2

45

Linux  

PC   Network  Web  Server  PHP  App  

Database  Server  

TCP  Port  80   TCP  Port  5432  

A A

Req  to  5432  Rsp  from  5432  

Request  to  TCP  Port  80  

Response  from  TCP  Port  80  

11:42:36.622843  

11:42:42.770757  

Req  to  5432  Rsp  from  5432  

Service  Time  

Client  Time  

Page 46: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Database Response Time

46

Does  this  relate  to  our  slow  transacDon?  

6.029s  

Request  to  TCP  Port  80  

Response  from  TCP  Port  80  

11:42:36.622843  

11:42:42.770757  

11:42:36.733601  

11:42:42.762467  

Page 47: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Content Matching - Response

47

Linux  

PC   Network  Web  Server  

TMS  App  Data  base  A A

PSG  Create  -­‐  CommunicaDons   PSG  Create  -­‐  CommunicaDons  

Page 48: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

48

Data Content - Response

Page 49: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Content Matching - Request

49

Linux  

PC   Network  Web  Server  

TMS  App  Data  base  A A

TicketNo=511129   511129  

Page 50: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

50

Data Content - Request

Therefore    This  slow  database  transacDon  relates  to  the  web  transacDon  

Page 51: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

51

Characterization

Time  increasing

Req  Type  A

Rsp  Type  A

Req  Type  1

Rsp  Type  1

C SS

Req  Type  BRsp  Type  B

Req  Type  V

Rsp  Type  V

Req  Type  2

Rsp  Type  2

App  Server Database

Page 52: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Resources

52

Book  RPR:  A  Problem  Diagnosis  

Method  for  IT  Professionals  GiF  today  or  from  Amazon  or  Lulu  

White  Paper  Network  Trace  Analysis  Strategies  from  www.advance7.com  

Video  RPR  NA03:  Analysing  SQL  Server  performance  using  Wireshark  and  Excel  from  YouTube  

Page 53: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

More Resources

53

Forum  RPR  PracDDoners  from  www.linkedin.com  

Video  RPR  NA02:  Analysing  SMB2  and  fileserver  performance    from  YouTube  

Video  RPR  NA01:  Analysing  fileserver  

performance  using  Wireshark  and  Excel    from  YouTube  

Page 54: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

54

QuesDons?  

Page 55: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

55

Cloud  

SaaS  

PaaS   BPO  

OperaDon  costs  Revenue  

IT  cap-­‐ex  

Page 56: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Recurring Gray Problems

56

The  issue  will  grow  

You  have  the  skills  &  techniques  to  make  the  difference  

Only  evidence-­‐based  methods  will  help    

It  will  slow  development  of  the  industry  

Only

Page 57: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

57 Lead  the  way  

Page 58: Multi-tier Trace Correlation - Wireshark...WAN Server Server Server A A PC WAN Accel WAN Accel A A User receives)an) error)message) Incorrect interacDon) Process-to-process communication

Thank you

58

Paul  Offord  Chief  Technical  Officer  

Advance7  

e:    [email protected]  p:    +  44  1371  876  805  t:    @paulofforda7  

For  book  or  e-­‐book  contact:  Rachel  D’Cruze  e:    [email protected]  p:    +  44  1371  876  805