Download - slides
Analyzing Peer-to-Peer Traffic Across Large Networks
Jia WangJia Wang
Joint work with Subhabrata SenJoint work with Subhabrata Sen
AT&T Labs - ResearchAT&T Labs - Research
Analyzing peer-to-peer traffic accoss large networks
2
P2P applications
Distributed file sharingDistributed file sharing Napster, Gnutella, FastTrack, EDonkey, Napster, Gnutella, FastTrack, EDonkey,
DirectConnect…DirectConnect… Searching v.s. data fetching phasesSearching v.s. data fetching phases All the communications occur over default All the communications occur over default
ports ports SuperNodes and HubsSuperNodes and Hubs
Why is this interesting?Why is this interesting?Large and growing traffic volume Large and growing traffic volume
Analyzing peer-to-peer traffic accoss large networks
3
Outline MethodologyMethodology
Data collectionData collection Characterization metricsCharacterization metrics
Analysis resultsAnalysis results Traffic volume and overlay topology Traffic volume and overlay topology System dynamicsSystem dynamics Traffic characterizationTraffic characterization
P2P vs WebP2P vs Web
Analyzing peer-to-peer traffic accoss large networks
4
Methodology ChallengesChallenges
Decentralized systemDecentralized system Transient peer membershipTransient peer membership Some popular close proprietary protocolsSome popular close proprietary protocols
Large-scale passive measurementLarge-scale passive measurement Flow-level data from routers across a large tier-1 ISP Flow-level data from routers across a large tier-1 ISP
backbonebackbone Analyze both signaling and data fetching trafficAnalyze both signaling and data fetching traffic 3 levels of granularity: IP, Prefix, AS3 levels of granularity: IP, Prefix, AS
P2P protocolsP2P protocols FastTrack:1214 (including Morpheus)FastTrack:1214 (including Morpheus) Gnutella:6346/6347 Gnutella:6346/6347 DirectConnect:411/412DirectConnect:411/412
Analyzing peer-to-peer traffic accoss large networks
5
Methodology Discussion AdvantagesAdvantages
Requires minimal knowledge of P2P protocols: port Requires minimal knowledge of P2P protocols: port numbernumber
Large scale non-intrusive measurementLarge scale non-intrusive measurement More complete view of P2P trafficMore complete view of P2P traffic Allows localized analysis Allows localized analysis
LimitationsLimitations Flow-level data: no application-level detailsFlow-level data: no application-level details Incomplete traffic flowsIncomplete traffic flows
Other issuesOther issues DHCP, NAT, proxyDHCP, NAT, proxy Host Host IP IP Asymmetric IP routingAsymmetric IP routing
Analyzing peer-to-peer traffic accoss large networks
6
Measurements CharacterizationCharacterization
Overlay network topologyOverlay network topology Traffic distributionTraffic distribution Dynamic behaviorDynamic behavior
MetricsMetrics Host distributionHost distribution Host connectivity Host connectivity Traffic volumeTraffic volume Mean bandwidth usageMean bandwidth usage Traffic pattern over timeTraffic pattern over time Connection duration and on-timeConnection duration and on-time
Analyzing peer-to-peer traffic accoss large networks
7
Data cleaning
Invalid IPsInvalid IPs 10.0.0.0-10.255.255.25510.0.0.0-10.255.255.255 172.16.0.0-172.31.255.255.255172.16.0.0-172.31.255.255.255 192.168.0.0-192.168.255.255192.168.0.0-192.168.255.255
No matched prefixes in routing tablesNo matched prefixes in routing tables Invalid AS numbersInvalid AS numbers
> 64512> 64512 Removed 4% flowsRemoved 4% flows
Analyzing peer-to-peer traffic accoss large networks
8
Overview of P2P traffic
Total 800 million flow recordsTotal 800 million flow records FastTrack is the most popular oneFastTrack is the most popular one
Date (2001)Date (2001) 9/10-9/159/10-9/15 10/9-10/1310/9-10/13 12/10-12/1612/10-12/16
# flows# flows 111M111M 184M184M 341M341M
# IPs# IPs 3.4M3.4M 4.5M4.5M 5.9M5.9M
# IPs / day# IPs / day 1M1M 1.5M1.5M 1.9M1.9M
Total traffic Total traffic (GB/day)(GB/day)
773773 11531153 17761776
Traffic per IP Traffic per IP (MB/day)(MB/day)
1.61.6 1.61.6 1.81.8
Analyzing peer-to-peer traffic accoss large networks
9
Host distribution
Analyzing peer-to-peer traffic accoss large networks
10
Host connectivity
Connectivity is very small for most hosts, very high for few hosts
Distribution is less skewed at prefix and AS levels
FastTrack (9/14/2001)
Analyzing peer-to-peer traffic accoss large networks
11
Traffic volume distribution
Significant skews in traffic volume across granularities
Few entities source most of the traffic Few entities receive most of the traffic
FastTrack (9/14/2001)
Analyzing peer-to-peer traffic accoss large networks
12
Mean bandwidth usage
Upstream usage < downstream usage. Possible causes are Asymmetric available BW, e.g., DSL, cable Users/ISPs rate-limiting upstream data transfers
FastTrack (9/14/2001)
Analyzing peer-to-peer traffic accoss large networks
13
Time of day effect
Traffic volume exhibits very strong time-of-day effect Milder time-of-day variation for # hosts in the system
FastTrack (9/14/2001 GMT)
Analyzing peer-to-peer traffic accoss large networks
14
Host connection duration & on-time
Substantial transience: most hosts stay in the system for a short time
Distribution less skewed at the prefix and AS levels
Using per-cluster or per-AS indexing/caching nodes may help
FastTrack (9/14/2001) thd=30min
Analyzing peer-to-peer traffic accoss large networks
15
Traffic characterization
The power lawThe power law May not be a suitable model for P2P trafficMay not be a suitable model for P2P traffic
Relationship between metricsRelationship between metrics Traffic volumeTraffic volume Number of IPsNumber of IPs On-timeOn-time Mean bandwidth usageMean bandwidth usage
Analyzing peer-to-peer traffic accoss large networks
16
Traffic volume vs. on-time
1. Volume heavy hitters tend to have long on-times2. Hosts with short on-times contribute small traffic volumes
FastTrack (9/14/2001): top 1% hosts (73% volume)
1
2
Analyzing peer-to-peer traffic accoss large networks
17
Connectivity vs. on-timeFastTrack (9/14/2001): top 1% hosts (73% volume)
1. Hosts with high connectivity have long on-times
2. Hosts with short on-times communicate with few other hosts
1
2
Analyzing peer-to-peer traffic accoss large networks
18
P2P vs Web ObservationsObservations
97% of prefixes contributing P2P traffic also 97% of prefixes contributing P2P traffic also contribute Web trafficcontribute Web traffic
Heavy hitter prefixes for P2P traffic tend to be Heavy hitter prefixes for P2P traffic tend to be heavy hitters for Web trafficheavy hitters for Web traffic
Prefix stability – the daily traffic volume Prefix stability – the daily traffic volume (in %) from the prefix does not change (in %) from the prefix does not change over daysover days
Experiments: Experiments: 0.01%, 0.1%, 0.1%, 1%, 10% 10% heavy hitters => heavy hitters => 10%, 30%, 30%, 50%, 90% of 90% of the traffic volume the traffic volume
Analyzing peer-to-peer traffic accoss large networks
19
Traffic stabilityMarch 2002
Top 0.01% prefixes Top 1% prefixes
P2P traffic contributed by the top heavy hitter prefixesis more stable than either Web or total traffic
Analyzing peer-to-peer traffic accoss large networks
20
Summary
Measure and characterize P2P traffic Measure and characterize P2P traffic across a large networkacross a large network
Three popular P2P systemsThree popular P2P systems Significant increase in both number of users Significant increase in both number of users
and traffic volumeand traffic volume Traffic distributions are highly skewedTraffic distributions are highly skewed High level system dynamicsHigh level system dynamics P2P is significant, but stable component of P2P is significant, but stable component of
the Internet traffic the Internet traffic
Analyzing peer-to-peer traffic accoss large networks
21
Acknowledgement
AT&T LabsAT&T Labs Matt Grossglauser, Carsten Lund, Jennifer Matt Grossglauser, Carsten Lund, Jennifer
Rexford, Matt Roughan, Fred TrueRexford, Matt Roughan, Fred True ExternalExternal
Steve GribbleSteve Gribble