application of multifractals in www traffic characterization
DESCRIPTION
Application of Multifractals in WWW Traffic Characterization. Marwan Krunz Department of Elect. & Comp. Eng. Broadband Networking Lab. University of Arizona http://www.ece.arizona.edu/~bnlab [email protected]. Presentation Outline. WWW Traffic Monofractals Versus Multifractals - PowerPoint PPT PresentationTRANSCRIPT
Application of Multifractals in Application of Multifractals in WWW Traffic CharacterizationWWW Traffic Characterization
Marwan KrunzDepartment of Elect. & Comp. Eng.
Broadband Networking Lab.University of Arizona
http://www.ece.arizona.edu/~bnlab
2
Presentation OutlinePresentation Outline
WWW Traffic
Monofractals Versus Multifractals
Proposed Model
Simulation Results
Ongoing Research
Other BNL Projects
3
WWW TrafficWWW Traffic
What do we mean by WWW traffic? Sequence of requests for file objects at a server
Why do we want to model it? Capacity planning & resource dimensioning Design of caching & prefetching schemes
What traffic properties to capture? Popularity Temporal locality Spatial locality
C CC
C
C
C Web server
Internet
4
Temporal LocalityTemporal Locality
Closeness in time between references to the same object
Often measured using the stack distance string
A B A C B D A B A C B D
time
D
B
A
C
2
D
C
A
B
2
C
D
B
A
C
D
A
B
C
B
A
D
3
D
A
B
C
4
C
D
A
B
4
B
C
D
A
33 2
D
C
A
B
Sta
ck
stack distance string1
5
Temporal Locality (cont.)Temporal Locality (cont.) Temporal locality is often represented by the
marginal distribution of the stack distance string Approximately lognormal
Sources of temporal locality: “Long-term” popularity of objects Temporal correlations between requests to same object
6
Temporal Locality (cont.)Temporal Locality (cont.)
Need to differentiate between the two sources, since Long-term popularity suggests the use of long-term
frequency information in caching (LFU) Temporal correlations suggest the use of short-term
residency information in caching (LRU)
Solutions: Have several, popularity-based stack-distance models Use a scaled version of the stack distance string
[Cherkasova & Ciardo, 2000] Stack distances normalized by their mean stack distance
7
Example – CLARKNET TraceExample – CLARKNET Trace
Popularity (# of requests)
Mea
n s
tack
dis
tan
ce
8
Spatial LocalitySpatial Locality
Correlations between requests to different files
Can be captured through the autocorrelation function
(ACF) of the (scaled) stack distance string
Empirical ACF exhibits a slowly decaying behavior;
an indication of long-range dependence (LRD)
9
ExampleExample
A B F G E D A H B F E D
time
67
7
Stack distance: 6,7,7,…
high autocorrelation value at lag 1 of the stack distance string
10
ACF – Calgary TraceACF – Calgary Trace
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240
Lag
ACF
Real
Fitted
11
How to Simultaneously Capture How to Simultaneously Capture Temporal and Spatial LocalitiesTemporal and Spatial Localities
Previous approach: Self-similar model (Crovella et al.) Start with a F-ARIMA with a desired H-parameter
Transform the Gaussian distribution of the F-ARIMA
process into a lognormal distribution
Problems: H-parameter characterizes only the long-term correlation
behavior
Transformation is nonlinear (hence, it does NOT preserve the
overall structure of the ACF)
12
Impact of Transformation Impact of Transformation
Lag (in stack distances)
Au
toco
rrel
atio
ns
F-ARIMA (after transformation)
Real
F-ARIMA (before transformation)
13
Monofractals (Self-Similarity)Monofractals (Self-Similarity)
Example from geometry: The Sierpinski gasket
14
Self-Similarity in Network TrafficSelf-Similarity in Network Traffic
Self-similar traffic
Poisson traffic
15
Self-Similarity … More FormallySelf-Similarity … More Formally
Consider a random process X = {X(t)} with mean ,
variance v, and ACF R(k), k = 0 , 1, …
Let X(m) be the aggregated process of X over non-
overlapping blocks of length m
X is exactly self-similar with scaling factor 0 < H < 1
if XmXandkRkR Hd
mm 1
16
A process Y = {Y(t)} exhibits LRD if it is the derivative
process of a self-similar process with H > 0.5
Manifestations of LRD behavior: ACF of Y decays hyperbolically Spectral density obeys a power law near the origin: F()~c -, as 0 vVariance of the sample mean decreases more slowly than the
reciprocal of the sample size
Other Related DefinitionsOther Related Definitions
Lag
AC
F
17
Multifractal ProcessesMultifractal Processes Generalizations of self-similarity, where now the H
parameter varies with scale
Wavelet construction of multifractals (Riedi et al.): Discrete wavelet transform of sequence to be modeled
, , , ,( ) ( ) ( )J k J k j k j kk j J k
X t U t W t
Scale coefficient atscale J and time 2Jk
Shifted and translated scale function
Wavelet coefficient atscale j and time 2jk
Shifted and translated wavelet function
Coarsestscale
18
Multifractal Processes (cont.)Multifractal Processes (cont.)
Multifractals can be generated using a semi-random
cascades:
M
A1*M A2*M
A3*A1*M A4*A1*M
Ai is a symmetric random variable
If dependent semi-random cascade
19
Multifractal Wavelet ModelMultifractal Wavelet Model Trace = scale coefficients at the finest time scale
For the Haar transform, the scale and wavelet coefficients are:
j 1,2k j 1,2k 1 j 1,2k j 1,2k 1j,k j,k
U U U UU and W
2 2
2 1 4 7
3/21/2 -3/21/2
14/2=7
1/21/2
-8/2=-4
Scale coefficient
Waveletcoefficient11/21/2
20
Multifractal Wavelet Model Multifractal Wavelet Model
(cont.)(cont.) To generate synthetic data:
Scale coefficients at coarsest scale: U0,0 ~ N(E[U0,0],Var[U0,0])
Synthetic trace is obtained from the scale coefficients at finest scale
where Aj,k, k = 1,2,…, are iid symmetric rvs with mean zero.
Let Aj be a generic r.v. having the same CDF as Aj,k
j,k j,k j,k j,k j,k j,kj 1,2k j,k j 1,2k 1 j,k
U W 1 A U W 1 AU U and U U
2 2 2 2
21
Multifractal Wavelet Model Multifractal Wavelet Model (cont.)(cont.)
Autocorrelations are controlled through the energy at scale j, i.e., E[Wj2]
To produce a synthetic trace with a desired ACF, the parameter(s) of Aj
is selected based on:
Problem: Need to compute E[Wj2] for all scales j
Large number of model parameters
2 22E W E A E Wj 1 0j 1 2 and E A0j 22 22E W E UE A 1 E A 0j j j 1
22
Goal: Reduce the complexity of the original model
Outline of modified model: Take Aj to be a triangular rv in the range[-cj, cj] for all j Define the aggregated sequence {Xn
(m) : n = 1, 2, …}
Relate E[(Xn(m))2] to E[Uj
2] and, thus, to E[Aj2]
Aggregation level 2m represents the scale j-1
Express cj-1 c(2m) in terms of E[(Xn(m))2] and E[(Xn
(2m))2]
Modified Multifractal ModelModified Multifractal Model
nm
mn i
i nm m 1
X X ,n 1,2, , N / m,m 1,2,4, , N
23
Modified Multifractal Model (cont.)Modified Multifractal Model (cont.) Relate E[(Xn
(m))2] to the mean (), variance (v), and ACF (k: k = 1,2,…) of the original trace:
Thus, cj,j = 1, 2, …,is expressed in terms of , v, and k: k = 1,2,…
For the ACF, we use the general form:
g(k) is taken to be k or log(k+1)
model is specified using 4 parameters
m2(m) 2 2
n kk 1
E X mv 2v m k m
exp( ( )), 0,1,...nk g k k
24
Outline of Traffic GenerationOutline of Traffic Generation
1. Extract empirical (scaled) stack distancesa. Start with an empty stack (to avoid initial ordering problem)b. Process trace in the reverse directionc. Record stack depth only for objects already in the stackd. Reverse the extracted stack distance stringe. Normalize stack distances by their empirical averages
2. Generate synthetic stack distance stringa. Compute parameters for multifractal modelb. Generate a synthetic (scaled) stack distance stringc. Scale back stack distances
3. Generate URL traces while enforcing popularity profile
25
Traffic Generation ExampleTraffic Generation Example
Trace length=12
popularity profile:
frA=4/12
frB=4/12
frC=2/12
frD=2/12
D
C
A
B
synthetic traffic
A
D
C
A
B
B
Scaled back synthetic stack distance string
2434432 3
D
C
A
B
A
D
C
A
B
C
D
C
A
B
B
D
C
A
B
D
D
C
A
B
A
D
C
A
B
B
D
B
C
A
A C
D
B
C
A
B
D
B
C
A
D
D
B
C
A
26
Simulation ResultsSimulation Results
RealMultifractal model
27
Simulation Results (cont.)Simulation Results (cont.)
RealMultifractal model
28
Simulation Results (cont.)Simulation Results (cont.)
Statistics Real MF LRD No spatial loc.
0.954 0.937 0.905 0.806
1.032 1.164 1.428 1.918
10.13 0.118 0.06 0.0
50.076 0.075 0.02 0.0
250.039 0.039 0.001 0.0
90th percentile 2.233 2.21 2.16 1.67
98th percentile 3.976 4.23 5.05 4.98
29
Ongoing & Future WorkOngoing & Future Work
Online traffic forecasting using multifractal model
Incorporation of traffic forecasting in design of
prefetching strategies
30
Modeling of Prefetching SystemsModeling of Prefetching Systems
Goal: Provide a theoretical model to analyze the
performance of generic prefetching systems
Limitations of existing works: Many are mainly focused on the prediction aspect only Performance is often studied via simulations under very
specific setups (e.g., given network topology) Few analytical models, but which are overly simplistic
(e.g., ignore client behavior, TCP dynamics, etc.)
31
FrameworkFramework
Local cache
One TCP connection for both, demand fetching and prefetching
Two separate TCP connections
Prefetching cache: Small portion of the local cache
Server
Client: ON/OFF source
Predictor: I suggest you prefetch documents:D1,D2,….,Dk since I think they will be requestedsoon with probabilities: P1, P2, …., Pk.
Should I onlyuse thinking time for prefetching?
An access toa prefetched document movesthe document to the local cache.This helps in Studying the performance ofprefetching in isolation from caching.
OR