case study: debugging multicast problems from an applications perspective

Post on 14-Jan-2016

34 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Case Study: Debugging Multicast Problems from an Applications Perspective. Steven Senger, Ph.D. Dept. of Computer Science University of Wisconsin - La Crosse. HAVnet Project. Parvati Dev, PI, Stanford SUMMIT National Library of Medicine, NGI & SII programs since 1999. - PowerPoint PPT Presentation

TRANSCRIPT

Case Study: Debugging Multicast Problems from an Applications

Perspective

Case Study: Debugging Multicast Problems from an Applications

Perspective

Steven Senger, Ph.D.

Dept. of Computer Science

University of Wisconsin - La Crosse

HAVnet ProjectHAVnet Project

• Parvati Dev, PI, Stanford SUMMIT• National Library of Medicine, NGI & SII

programs since 1999.• Applications of high-performance networks to

anatomical and surgical education.• http://havnet.stanford.edu• http://visu.uwlax.edu

Immersive SegmentationImmersive Segmentation

Remote Stereo ViewerRemote Stereo Viewer

Nomadic Anatomy ViewerNomadic Anatomy Viewer

Other Apps and ComponentsOther Apps and Components

• Information Channels– Multicast based announcement/discovery

mechanism.– Supports other app requirements such as

logging.• Access Grid

TestbedTestbed

Network/App MonitoringNetwork/App Monitoring

Potholes Along the WayPotholes Along the Way

• Stanford / CENIC– Multicast setup delay

• WiscNet– Conflict between sender and receiver

• Michigan / Merit– Multicast setup delay– Inbound flow stops after 209 secs

Stanford / CENIC …Stanford / CENIC …

• Longstanding problem (observed in ‘01).• Large delays (~15 min) in multicast setup.• Stanford / La Crosse / NLM

– Significant delays except for La Crosse / NLM

• Originally thought to be at Stanford Border and RP.

• 04 hardware/ios upgrades at Stanford.• Situation improved.

Stanford / CENIC …Stanford / CENIC …

• Only Michigan to Stanford delayed, ~6 mins. • Oct 04, Phone calls, Stanford, CENIC,

Vendor support, La Crosse. Escalate through 3 layers of vendor support.

• Test/Debug every couple of weeks through March ‘05.

• Identified as MSDP propagation delay related to encap/unencap data received by MSDP.

Stanford / CENICStanford / CENIC

• Delay occurred at each CENIC router. • At some point problem had been internally

found and resolved by vendor.• Solution: upgrade OS on CENIC routers.

La Crosse / WiscNet …La Crosse / WiscNet …

• First observed spring 05 using AccessGrid.• La Crosse sender and Stanford receiver OK.• Starting a La Crosse receiver breaks the flow.• WiscNet identified problem router.• Vendor support engaged.• Discovered rpd restart sufficient to fix.• Reoccurs every 2 months.

La Crosse / WiscNet …La Crosse / WiscNet …

• When failing– Upstream interface on router gets set to

unreasonable value.– Sender continues to send data in

encapsulated PIM-register messages.– Router never sends register-stop

messages.

La Crosse / WiscNetLa Crosse / WiscNet

• Problem has survived router chassis upgrade. • No solution as yet.

U. Michigan / Merit …U. Michigan / Merit …

• Discovered after CENIC problem solved.• Small delay in setup for Michigan to Stanford.• Varies between 0 and 60 sec.• Similar behavior for Milwaukee to Stanford.• Does not appear to be in CENIC?

U. Michigan / Merit …U. Michigan / Merit …

• Presence of other receivers seems to change the setup delay.

• Merit engaged in isolating problem. • No solution as yet.

U. Michigan / MeritU. Michigan / Merit

• Discovered Jan ‘06 using AccessGrid.• Traffic from Stanford to MCBI/Merit starts

correctly but stops after 208 seconds. • When stopped IPLSng shows as pruned.• Merit identified problem with a switch in

Chicago not allowing streams to setup correctly.

• Problem resolved with OS upgrade.

Diagnostic HelpDiagnostic Help

• Debugging strategies• Tools• Monitoring

top related