on the intrinsic locality properties of web reference streams
TRANSCRIPT
Intrinsic Temporal Locality Properties of Web Request Streams
Rodrigo Fonseca (UC Berkeley)
Virgílio Almeida (UFMG)
Mark Crovella (Boston University)
Bruno Abrahao (UFMG)
IEEE INFOCOM 2003
2
• Network measurements in highly dynamic environments–Understand the behavior and find
intrinsic properties
–Online algorithms for driving self-organization of system agents
–Algorithms for dynamic adapting environments with high degree of uncertainty
Research Questions
3
Question
• Study temporal locality at different points of the Web
–Results: Framework + metrics:
• New insights on the causes of temporal locality across different points of the Web topology
4
Web Measurements
• Why?– Understand and improve current systems
– Design new systems
• Challenges– Complexity
– Rate of change
• Approach– Look for regularities, underlying principles
– Abstractions
5
Previous Studies
• Many advancements–Caching hierarchies
–Replacement Policies for caches
– Load distribution and balancing
• However...–Focus on individual components
–No structured view of the entire system
6
Stream centric view of the Web
• Focus on request streams
– Look for intrinsic properties
–How they are altered
–How their properties change in different points
–How their intrinsic properties can influence the designs of components and of collections of components
7
Tranformations on Request Streams
• Three types of transformations
–Aggregation
• Multiple sources
–Disaggregation
• Multiple destinations
–Filtering
• Resulting stream = subset of input stream
DAF
8
Transformations Abstraction
• Components may be abstracted by combinations of transformations
D
F
Clients
A
Server
D
F
A
Proxy Cache
9
Approach
• For a given intrinsic property of streams–Study the effects of the
transformations on the property
–Combine the effects to understand• Effects of components
• Effects of collections of components
• Properties at different points of the topology
10
Temporal Locality
• Need of intrinsic metrics for the streams– We use virtual time
• Two sources:– Popularity
...XAXBXCXDXE...
• Preserved with reordering
– Correlation...AABBCCDDEE...
• Lost with reordering
11
Measuring Popularity: Entropy
– Traditionally Zipf’s Law: pi i-
– We measure the deviation from Uniform popularity distribution
– Entropy:
• pi determined empirically from frequencies
• N is the number of distinct objects
• If uniform, tends to log2(N)
• If one object only is accessed, tends to 0
– Not the Entropy rate of the source
N
i
ii ppH1
2log
14
Measuring Correlation: CV
• Look at IAT distribution per object– Number of references between 2 references
to the same object
• Behavior– Correlation tendency to shorter IATs
– No Correlation tends to geometric distribution
• We measure the deviation from the geometric distribution for IAT– Coefficient of Variation:
μ
σ
16
Metrics Summary
• Entropy
–High concentration: Low Entropy
–Uniform distribution: High Entropy
• CV
–High Correlation: High CV
–No correlation: CV ~ 1