active measurement data analysis techniqueshpwren.ucsd.edu/~hwb/papers/amp_case_studies/body.pdf ·...

11
3/27/2000: This work is an Authors’ version, and has been submitted for publication. Copyright may be transferred without further notice and the accepted version may then be posted by the publisher. Active Measurement Data Analysis Techniques Todd Hansen 1 Jose Otero 1 Tony McGregor 2 Hans-Werner Braun 1 1 National Laboratory for Applied Network Research (NLANR) Measurement and Network Analysis Group San Diego Supercomputer Center (SDSC) 2 National Laboratory for Applied Network Research (NLANR) Measurement and Network Analysis Group University of Waikato, New Zealand Contact author: Todd Hansen National Laboratory for Applied Network Research (NLANR) Measurement and Network Analysis Group San Diego Supercomputer Center (SDSC) University of California, San Diego 9500 Gilman Drive La Jolla, Ca 92093-0505 [email protected] (858) 822-0929 This work is sponsored by the National Science Foundation (NSF) under Cooperative Agreement No. ANI-9807479. The government has certain rights in this material. We thank Maureen C. Curran for her editorial assistance.

Upload: others

Post on 22-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

3/27/2000: This work is an Authors’ version, and has been submitted for publication. Copyright may betransferred without further notice and the accepted version may then be posted by thepublisher.

Active Measurement Data Analysis Techniques

Todd Hansen1

Jose Otero1

Tony McGregor2

Hans-Werner Braun1

1National Laboratory for Applied Network Research (NLANR) Measurement and Network Analysis Group San Diego Supercomputer Center (SDSC)

2National Laboratory for Applied Network Research (NLANR) Measurement and Network Analysis Group University of Waikato, New Zealand

Contact author: Todd HansenNational Laboratory for Applied Network Research (NLANR)

Measurement and Network Analysis GroupSan Diego Supercomputer Center (SDSC)University of California, San Diego9500 Gilman DriveLa Jolla, Ca [email protected](858) 822-0929

This work is sponsored by the National Science Foundation (NSF) under Cooperative AgreementNo. ANI-9807479. The government has certain rights in this material. We thank Maureen C.Curran for her editorial assistance.

Page 2: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

2

Abstract

Active measurements, examine the capabilities of networks and the internet. Through the networkanalysis infrastructure (NAI), these measurements can provide useful data for network analysis. TheNational Laboratory for Applied Network Research (NLANR) team, has developed data sets which allowidentifcation of a variety of problems which range from High Performance Connectivity and CommodityNetwork issues. to network hardware and routing issues. We present the steps necessary to secure andanalyze such data.

Introduction

The National Laboratory for AppliedNetwork Research (NLANR)5 Measurementand Analysis Team6 conducts both activeand passive measurements in order to betterunderstand the Internet and its capabilities.Active measurements are network probesdeveloped to measure the capabilities of theinternet. Active and passive measurementscan be compared to car maintenance. Whenyou are trying to determine what is wrongwith your car, you can either check yourcar’s oil level or give your car a test drive.A test drive would be an activemeasurement as it changes the state of thevehicle in question, while an oil checkwould be a passive measurement, whichgenerally has no effect on the state of the

car. With active measurements, one cangenerally retrieve additional informationabout a network’s capabilities, at the cost ofadding interference. In this paper, wediscuss data analysis methods for NLANR’sActive Measurement Project (AMP7). Ourgoal is to provide a basis for theunderstanding and analysis of activemeasurement data so that one can use thisdata to better understand their networkconnectivity. We detail methods, which willhelp one determine the overall performanceof their network. This paper is designed tobe an introduction to active measurementsand analysis. As such, feedback1 andsuggestions on areas which are unclear,would be greatly appreciated.

The AMP System

The NLANR Active Measurement Project(AMP) is a distributed network ofapproximately 100 active monitors whichsystematically perform scheduledmeasurements between each other (Figure1). These systems send data to a central data

collector where it is made available fordownload (raw data) and public viewing (byWeb browser7). Currently, we are workingon additional methods for data presentationwhich may be more useful for generalnetwork analysis.

Page 3: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

3

Figure 1: Map of AMP Probe constellation (Dec 1999)

Measurement Methods: what we measure

The AMP constellation measures round triptimes (RTT) between each of its systemsonce a minute. Every ten minutes,traceroutes are preformed to determineroutes between each of the measurementboxes. We choose to measure RTT insteadof one-way delay because this measurementis easier to perform and does not rely onexternal devices to synchronize the timebetween each of the monitors. Othermeasurement projects use Global PositionSystem (GPS) receivers to synchronize thetime between hosts. We have found thatthese systems are too expensive and difficultto install, for the limited amount ofadditional information gained.

In greater detail, our RTT measurements aremade once a minute (randomized by 15seconds) using the fping program. Thisprogram sends an Internet Control MessageProtocol (ICMP) echo packet to each hostand waits for an ICMP reply packet. It thenrecords the measured delay for each site.

The benefit to fping is that it allows us tosimultaneously ping multiple sites. (If wehad to ping each site serially, it would takemuch longer than a minute.) A loss isdetermined when four ICMP echo requestpackets are sent and no replies are receivedbefore each timeout. Our route data iscollected every 10 minutes (randomized by15 seconds) using the traceroute program.Our Web interface allows one to view thisdata by host pairs. For example, afterselecting a primary site (Figure 2), the Webinterface shows the previous day’s averageRTT and loss, for each site within its mesh(Figure 3). The interface also allows one toselect a secondary site, and view weeklygraphs, which show the RTT between theprimary and secondary sites over a weekstime (Figure 4). From this data, one canchoose to look at daily RTT or loss graphs.One can also view a listing of route changesthat may have occurred on a specified day(Figure 6).

Page 4: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

4

Meshes and Peer Networks

A full mesh is a complete interlinking ofobjects, in which every object is directlyconnected to every other object. Our AMPconstellation does a full mesh ofmeasurements. This is a data and resourceintensive process, and as such, is notscalable beyond 150 machines. Therefore, itis necessary to create peer networks, whichhave their own network of hosts doingmeasurements within the full mesh.Unfortunately, this does not allow forstudying networks and routes which traverse

multiple peer networks. For this reason, weconfigure mutual peering points, a couple ofhosts in each peer’s network, that performmeasurements within both meshes. Bystrategically placing these peers we should,in theory, be able to gather usefulinformation about the connectivity betweenthe peer networks. We are currently in theprocess of experimenting with peernetworks; the results of this work should beavailable sometime in the near future.

Figure 2: Main AMP Web interface page, select primary monitor to view.

Page 5: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

5

Figure 3: Site information specific to the primary site chosen, showing averages for previous 24hours. From here, the user can select a secondary site and view further measurements.

Page 6: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

6

Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as round trip time (RTT)graphs. Yearly and weekly graphs are also presented. From this page you can choose to view a graphof a single day’s data (Figure 5) or you can choose to look at the routes for a specific day (Figure 6).

Page 7: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

7

Figure 5: Graph of a single day’s measurements between the primary and secondary sites. This canbe used to closely examine interesting anomalies within data or to more closely examine eventswhich may not show clearly in the weekly summaries. Packet loss and round trip times (RTT) for aparticular day are also presented in graphical form, on these pages. The graph above exhibits a stepdown in RTT as the result of a router switch.

Page 8: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

8

Figure 6: This image shows routing information between the primary and secondary sites for the dayselected. As you can see the page highlights route changes in blue. Using this page one can view, ingraphical form, all routes for a particular day, using the Otter utility.

Data Analysis Methods: What you can determine

Our data is collected and organized in ahierarchical structure on the Web.7 Afterentering the Web interface, select a primarysite (Figure 2). Upon selection of theprimary site, measurements are performedand data is sent to central data collector.You will then be presented a summary page(Figure 3), which shows the day’smeasurements from the primary site, to eachof the secondary sites. (The secondary sitesare the sites to which the measurements aremade). These sites are the same ones, as

seen in the primary site list. The presentedsummary is useful for a quick view of statusand for identifying possible problem sites.From this page, choose a secondary site.After selecting the secondary site, you willbe able to view measurement data collectedbetween the primary and secondary sites(Figure 4). The top graph is a yearlysummary of the measurements, followed byweekly summary graphs giving detailedinformation. The weekly graphs are veryuseful in determining the time of a network

Page 9: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

9

event. From the weekly graphs you canselect to view an even more detailed graphof measurements for a particular day (Figure5). This is done by clicking on the graph orby clicking on the date to the left of thegraph. You can also choose to view a listingof route changes for the day (Figure 6). Intheory, using this hierarchical structure, onecan view all the data that is recorded by oursystems. Unfortunately, due to the highvolume of data (70 Gb) this is not veryreleasitic.

To analyze the network connectivity for aspecific site: select that site as the primarysite; then go through each of the secondarysites listed, looking at each of the weeklygraphs. For example, if a site shows 100%loss, it is most likely because of a filter (arouter that blocks certain packets). This canbe verified by looking at the problem log forthe site, which is available from the primarysite selection page (Figure 2). It is alsouseful to look at the recent weekly graphsfor any network events, problems or trends.

In the case studies detailed below, weattempt to give you many different examplesof network events. For example, you mightnotice that your site’s connection to UC SanDiego exhibits commodity link congestioncurves (Case Study #2) and upon furtherinvestigation you find that this is due toexcessive use of the network or simply thefact that your route to UCSD is via yourcommodity connection. In general, a littlebit of investigation with the information weprovide can lead to useful insights.However, we can never provide all of theanswers. In general, if you go through all ofthe secondary sites for your primary site,you should get a good idea of yourconnectivity and where any problems maylie. Keep in mind that it may be useful toinvestigate a few reverse path measurementsfrom other sites to your own. This may give

you a better idea on how the world connectswith you.

If you want to get an idea of the overallstatus of the network, we recommend goingthrough a number of different primary andsecondary site pairs, almost at random.Perhaps over time you can come up with auseful list of whic sites are important orsignificant to you. Unfortunately, with 100sites it would take too long to look at each ofthe 10,000 possible pairs. Therefore, youmust decide on some way to limit yoursearch in order to view only relevantinformation. Another method worthmentioning is to look at two differentsecondary sites for each primary site. Thisway you can get a good overall picture.

The point of general network analysis is toget an idea of the stability of the networkand the performance of the links. We arecurrently working on several tools to aid inthis. One of these is the AMP data reportspage. This allows you to get a quicksummary of all the routes between each ofthe different sites; this can be very useful infinding problem sites. Another useful tool isthe Cichlid 3-D visualization system, whichcan graph all of the sites RTT/Loss over theprevious 24 hours. We are also working onideas to develop event triggers and routedecomposition to pinpoint when and wherenetwork problems occur. The intent of thesetools is to give you an idea of the stability ofthe networks, as well as to alert you whensomething interesting occurs.

One of the best ways to find trouble or todiagnose problems with a site is to look forbumps or steps in the round trip time (RTT).These bumps tend to show a routing change,or some configuration change that alters theRTT between two sites. See Figure 7 for anexample.

Page 10: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

10

Figure 7: Weekly graph (Dec 19 – Dec 25) from SDSC to NCAR. Thetemporary step down is caused by the removal of a router by vBNSengineering. The router was replaced on Monday.

Another thing to look for is high loss. If asite has high loss it may have faultyhardware. This analysis is very basic. Amore interesting understanding of your site’sperformance can be acquired by comparingits connectivity with that of other sites. Forexample, is there the same congestion onyour commodity links that other sites exhibiton theirs? What about the round trip timesof your vBNS9 or Abilene10 connections?Are they relatively good or is theresomething in your local network that ismaking the RTT unstable. Another thing tolook at, is how the congestion curves (of theRTT graph) are shaped: do they have asmooth shape? Are they bigger than thosefor other sites? If the graph shows

congestion curves, then how does it affectyour traffic?

In general, our feeling is that a highperformance network should have low RTTand no significant congestion curve. Onefactor to examine is what TCP window sizeswithin the TCP stacks, would yieldmaximum performance. This can show agreat deal about the feasibility of high speedcommunication over these networks. Also,how much different is a commodityconnection at noon on Friday compared to avBNS link? Hopefully the answers to thesequestions will help give an accurate pictureof the connectivity and performance of yournetwork connection.

Cases Studies

We feel that the following case studiesillustrate some of the important aspects ofnetwork performance analysis. The first twocase studies show the characteristics of high

performance networks and commoditynetworks. We invite your comments andsuggestions on ways to improve thetractability of these issues.

Page 11: Active Measurement Data Analysis Techniqueshpwren.ucsd.edu/~hwb/Papers/AMP_case_studies/body.pdf · Figure 4: Primary (amp-sdsc) to secondary (amp-ncar) site data is displayed as

30

Related Projects

There are a couple of related projects whichhave attempted to do the same sort of activeanalysis that we have done. Some of them,Surveyor, for example have attempted tomeasure one-way delays instead of roundtrip times. They feel that this will give themmore useful information. However, alldesign decisions come at a cost. It would bewise to look at the platforms on which thetests are running and the size of their mesh.However, if the measurements are done

properly, then their results should be asuseful as ours. A listing of these relatedprojects is included in the referencessection.8

Acknowledgements

This work was sponsored by the NationalScience Foundation under CooperativeAgreement No. ANI-9807479. We thankMaureen C. Curran for her editorialassistance.

References

1. Todd Hansen, [email protected], San Diego Supercomputer Center, UCSD, 9500 Gilman Dr.,La Jolla, CA, 92093-0505, USA

2. Jose Otero, [email protected], San Diego Supercomputer Center, UCSD, 9500 Gilman Dr., LaJolla, CA, 92093-0505, USA

3. Hans-Werner Braun, [email protected], San Diego Supercomputer Center, UCSD, 9500 GilmanDr., La Jolla, CA, 92093-0505, USA

4. Tony McGregor, [email protected], The University of Waikato, Private Bag 3107, Hamilton,New Zealand

5. National Laboratory for Applied Network Research (NLANR), http://www.nlanr.net/ 6. NLANR’s Measurement and Network Analysis Team, http://moat.nlanr.net/ 7. NLANR’s Active Measurement Project (AMP) http://moat.nlanr.net/AMP/ 8. A listing of related projects:

• DREN AMP http://www.sd.wareonearth.com/amp (AMP peer network)• Internet End-to-End Performance Monitoring at SLAC http://www-iepm.slac.stanford.edu/

• National Internet Measurement Infrastructure http://www.ncne.nlanr.net/nimi/

• RIPE's Test Traffic Measurements http://www.ripe.net/test-traffic/index.html

• Surveyor http://www.advanced.org/surveyor/

• CAIDA’s Skitter Project http://www.caida.org/Tools/Skitter/ 9. MCI’s vBNS Network http://www.vbns.net/10. Internet2’s Abilene Network http://www.internet2.edu/abilene/11. AMP Introduction http://moat.nlanr.net/AMP/active/intro.html12. A. J. McGregor, H-W Braun, J.A. Brown. The NLANR Network Analysis Infrastructure.

Accepted for publication by IEEE Communications Magazine.