example: rumor performance evaluation
DESCRIPTION
Example: Rumor Performance Evaluation. Andy Wang CIS 5930-03 Computer Systems Performance Analysis. Motivation. Optimistic peer replication is popular Intermittent connectivity Availability of replicas for concurrent updates Convergence and correctness for updates - PowerPoint PPT PresentationTRANSCRIPT
Example: Rumor Performance Evaluation
Andy WangCIS 5930-03
Computer SystemsPerformance Analysis
Motivation• Optimistic peer replication is popular
– Intermittent connectivity– Availability of replicas for concurrent
updates– Convergence and correctness for updates
• Example: Rumor, Coda, Ficus, Lotus Notes, Outlook Calendar, CVS
2
Background• Replication provides high availability• Optimistic replication allows immediate
access to any replicated item, at the risk of permitting concurrent updates
• Reconciliation process makes replicas consistent (i.e., two replicas for peer-to-peer)
3
Background Continued• Conflicts occur when different replicas
of the same file are updated subsequent to the previous reconciliation
4
Optimistic Replication Example
5
Log on Desktop10:00 Update10:25 Update
Log on Portable10:00 Update10:25 Update
connected
Log on Desktop10:00 Update10:25 Update10:40 Update
Log on Portable10:00 Update10:25 Update10:51 Update
disconnected
Example Continued
6
Log on Desktop10:00 Update10:25 Update10:40 Update
Log on Portable10:00 Update10:25 Update10:51 Update
disconnected
Log on Desktop10:00 Update10:25 Update10:40 Update10:51 Update
Log on Portable10:00 Update10:25 Update10:40 Update10:51 Update
connected
• Run reconciliation• Detect a conflict• Propagate updates
Goal• Understand the cost characteristics of
the reconciliation process for Rumor
7
Services• Reconciliation
– Exchange file system states– Detect new and conflicting versions
• If possible, automatically resolve conflicts• Else, prompt user to resolve conflicts
– Propagate updates
8
Outcomes• Two reconciled replicas become
consistent for all files and directories• Some files remain inconsistent and
require user to resolve conflicts
9
Metrics• Time
– Elapsed time • From the beginning to the completion of a
reconciliation request– User time (time spent using CPU)– System time (time spent in the kernel)
• Failure rate– Number of incomplete reconciliations and
infinite loops (none observed)
10
Metrics not Measured• Disk access time
– Require complex instrumentations • E.g., buffering, logging, etc.
• Network and memory resources– Not heavily used
• Correctness– Difficult to evaluate
11
Monitor Implementation
12
Spool-to-dump Spool-to-dumpRecon
Scanner Rfindstored Rrecon Server
Perl library
C++
Reconciliation Process
• Top-level Perl time command
Parameters• System parameters
– CPU (speed of local and remote servers)– Disk (bandwidth, fragmentation level)– Network (type, bandwidth, reliability)– Memory (size, caching effects, speed)– Operating system (type, version, VM
management, etc.)
13
Parameters (Continued)• Workload parameters
– Number of replicas– Number of files and directories– Number of conflicts and updates– Size of volumes (file size)
14
Workloads• Update characteristics extracted from
Geoff Kuenning’s traces
15
File accessRead-only
access
Read-write access
Nonshared access Shared access
Read access
Write access
2-way sharing 3+way sharing
Read access
Write access
Read access
Write access
Experimental Settings• Machine model: Dell Latitude XP• CPU: x486 100 MHz• RAM: 36MB• Ethernet: 10Mb• Operating system: Linux 2.0.x• File system: ext3
16
Experimental Settings• Should have documented the following
as well– CPU: L1 and L2 cache sizes– RAM: Brand and type– Disk: brand, model, capacity, RPM, and
the size of on-disk cache– File system version
17
Experimental Design• 255 full factorial design • Linear regression or multivariate linear
regression to model major factors• Target: 95% confidence interval
18
255 Full Factorial Design
• Number of replicas: 2 and 6• Number of files: 10 and 1,000• File size: 100 and 22,000 bytes• Number of directories: 10 and 100• Number of updates: 10 and 450
– Capped at 10 updates for 10 files• Number of conflicts: 0 /* typical */
19
255 Full Factorial Analysis
• Experiment errors < 3%
20
0 5 10 15 20 25 30 350
20406080
100120140160
elapsed time
measured timepredicted time
experiment number
time (sec-onds)
0 5 10 15 20 25 30 3505
10152025303540
user time
measured timepredicted time
experiment number
time (sec-onds)
0 5 10 15 20 25 30 350123456
system time
measured timepredicted time
experiment number
time (sec-onds)
Variation of Effects• All major effects
significant at 95% confidence interval
21
#files#dirs
file size * #files
file size
#updates0
20
40
60
80
100top 5 effects for elapsed time
% variation
#files
#updates
#files * #updates
file size
file size * #files
020406080
100top 5 effects for system time
% variation
#files
#replicas
#dirs
#replicas *
#files
#files * #updates
020406080
100
top 5 effects for user time
% variantion
Residuals vs. Predicted Time
• Clusters caused by dominating effects of files
22
0 20 40 60 80 100 120 140
-20-15-10
-505
101520
elapsed time
predicted time
residuals
0 5 10 15 20 25 30 35 40
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
user time
predicted time
residuals0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
system time
predicted time
residuals
Residuals vs. Experiment Numbers
• Residuals show homoscedasticity, almost
23
0 20 40 60 80 100 120 140 160 180
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
user time
experiment number
residuals0 20 40 60 80 100 120 140 160 180
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
system time
experiment number
residuals
0 20 40 60 80 100 120 140 160 180
-20-15-10
-505
101520
elapsed time
experiment number
residuals
Quantile-Quantile Plot• Residuals are
normally distributed, almost
24
-3 -2 -1 0 1 2 3 4
-20-15-10
-505
101520
f(x) = 5.61253143490396 x + 4.93495530436048E-16R² = 0.97570585239607
elapsed time
normal quantiles
residual quantiles
-3 -2 -1 0 1 2 3 4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
f(x) = 0.124183670176851 x − 3.226948188583E-16R² = 0.952366702694788
user time
normal quantiles
residual quantiles-3 -2 -1 0 1 2 3 4
-0.5-0.4-0.3-0.2-0.1
00.10.20.30.40.5
f(x) = 0.112484959649303 x − 5.06606047559798E-18R² = 0.986338838838569
system time
normal quantiles
residual quantiles
Multivariate Regression• Number of replicas: 2• Number of files: 4 levels, 10-600• File size: 22,000 bytes• Number of directories: 4 levels, 10-60• Number of updates: 0• Number of conflicts: 0 /* typical */• Number of repetitions: 5 per data point
25
Multivariate Regression• Experiment errors <
7%• All coefficients are
significant
26
0 10 20 30 40 50 60 70 80 9005
10152025303540
user time
measured timepredicted time
experiment number
time (seconds)
0 10 20 30 40 50 60 70 80 900
20406080
100120140
elapsed time
measured timepredicted time
experiment number
time (seconds)
0 10 20 304050 6070 80900
0.51
1.52
2.53
3.5
system time
measured timepredicted time
experiment number
time (sec-onds)
Residuals vs. Predicted Time
• Elapsed time shows a bi-model trend
• User time shows an exponential trend
27
5 10 15 20 25 30 35
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
user time
predicted time
residuals1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
system time
predicted time
residuals
30 40 50 60 70 80 90 100 110 120
-15
-10
-5
0
5
10
15
elapsed time
predicted time
residuals
Residuals vs. Experiment Numbers
• Not so good for elapsed time and user time
28
0 10 20 30 40 50 60 70 80 90
-15
-10
-5
0
5
10
15
elapsed time
experiment number
residuals
0 10 20 30 40 50 60 70 80 90
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
user time
experiment number
residuals0 10 20 30 40 50 60 70 80 90
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
system time
experiment number
residuals
Quantile-Quantile Plot• Residuals are not
normally distributed for elapsed time and user time
29
-3 -2 -1 0 1 2 3
-15
-10
-5
0
5
10
15f(x) = 5.6774814834728 x − 3.74753980933428E-14R² = 0.84068455127645
elapsed time
normal quantiles
residual quantiles
-3 -2 -1 0 1 2 3
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1f(x) = 0.481071580575666 x − 1.8682654604378E-15R² = 0.924255360680913
user time
normal quantiles
residual quantiles
-3 -2 -1 0 1 2 3
-0.5-0.4-0.3-0.2-0.1
00.10.20.3
f(x) = 0.132069999118134 x − 2.51384352224851E-15R² = 0.978920253463901
system time
normal quantiles
residual quantiles
Log Transform (User Time)
• ANOVA tests failed miserably
30
0.9 1 1.1 1.2 1.3 1.4 1.5 1.6
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
user time
predicted time
residuals
0 10 20 30 40 50 60 70 80 90
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
user time
experiment number
residuals -3 -2 -1 0 1 2 3
-0.06-0.05-0.04-0.03-0.02-0.01
00.010.020.030.04
f(x) = 0.0222199973685429 x − 1.28549373927752E-15R² = 0.870897001030419
user time
normal quantiles
residual quantiles
Residual Analyses (User Time)
• No indications that transforms can help…
31
5 10 15 20 25 30 35 400
0.05
0.1
0.15
0.2
0.25
mean user time
standard deviation of
residuals
5 10 15 20 25 30 35 400
0.01
0.02
0.03
0.04
0.05
0.06
mean user time
variance of residuals
0 200 400 600 800 1000 12000
0.05
0.1
0.15
0.2
0.25
mean user time squared
standard deviation of
residuals
Possible Explanations• i-node related factors
– Number of files per directory block– Crossing block boundary may cause
anomalies• Caching effects
– Reboot needed across experiments
32
Linear Regression• Number of files: 100, 150, 200, 250,
252, 253, 300, 350, 400, 450 – Test for the boundary-crossing condition as
the number of files exceeds one block– Note that Rumor has hidden files
• Number of repetitions: 5 per data point• Flush cache (reboot) before each run
33
Linear Regression• R2 > 80%• All coefficients are
significant
34
0100
200300
400500
0
20
40
60
80
100
elapsed time
measured timepredicted time95% confidence interval
number of files
time (seconds)
0100
200300
400500
00.5
11.5
22.5
3
system time
measured timepredicted time95% confidence interval
number of files
time (seconds)
0 1002003004005000
5
10
15
20
25
user time
measured timepredicted time95% confidence interval
number of files
time (seconds)
Residuals vs. Predicted Time
• Elapsed time shows a bi-model trend
• User time shows an exponential trend
35
35 40 45 50 55 60 65 70 75 80 85
-15
-10
-5
0
5
10
15
elapsed time
predicted time
residuals
1.2 1.4 1.6 1.8 2 2.2 2.4
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
system time
predicted time
residuals
8 10 12 14 16 18 20 22 24 26
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
user time
predicted time
residuals
Residuals vs. Experiment Numbers
• Elapsed time shows a rising bi-modal trend– Randomization of
experiments may help
36
0 10 20 30 40 50 60
-15
-10
-5
0
5
10
15
elapsed time
experiment number
residuals
0 10 20 30 40 50 60
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
system time
experiment number
residuals
0 10 20 30 40 50 60
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
user time
experiment number
residuals
Quantile-Quantile Plot• Error residuals for
elapsed time is not normal – Perhaps piece-wise
normal
37
-3 -2 -1 0 1 2 3
-15
-10
-5
0
5
10
15f(x) = 5.82178334927256 x + 2.58046606262658E-15R² = 0.87800554257113
elapsed time
normal quantiles
residual quantilas
-3 -2 -1 0 1 2 3
-0.2-0.15
-0.1-0.05
00.05
0.10.15
0.20.25
0.3
f(x) = 0.0976338391551245 x − 4.46690697919164E-16R² = 0.969293820421059
system time
normal quantiles
residual quantilas
-3 -2 -1 0 1 2 3
-0.4-0.3-0.2-0.1
00.10.20.30.40.50.6
f(x) = 0.213446556701086 x + 1.49533417053058E-15R² = 0.970879846787612
user time
normal quantiles
residual quantilas
Possible Explanations• i-node related factors: No• Caching effects: No• Hidden factors: Maybe• Bugs: Maybe
38
Conclusion• Identified the number of files as the
dominating factor for Rumor running time
• Observed the existence of an unknown factor in the Rumor performance model
39
40
White Slide