an industrial case study on the automated detection of performance regressions in heterogeneous...
TRANSCRIPT
An Industrial Case Study on the Automated Detection of Performance
Regressions in Heterogeneous Environments
King Chun Foo1, Zhen Ming (Jack) Jiang2, Bram Adams3, Ahmed E. Hassan4, Ying Zou5, Parminder Flora6
BlackBerry1,6, York University2 , Polytechnique Montreal3 , Queen’s University4, 5
Flickr outage impacted89 million users
(05/24/13)
Most field problems for large-scale systems are rarely functional, instead they are load-related
One hour global outage lost $7.2 million in revenue
(02/24/09)Perfor
mance
Regres
sion
Testin
g Need
ed!
Performance Regression Testing
Mimics multiple users repeatedly performing the same tasks Take hours or even days
Produces GB/TB of data that must be analyzed
Is the system ready for release?
Ad Hoc Process
Test AnalyticsSupporting test pass/fail decision making
using facts instead of ad-hoc process!
Performance Counters
Performance Regression Report
Performance Regression Report
Initial Attempt
Test N(tN)
Test 1(t1)
New Test (tnew)
.
.
.
Association Rule Mining
Test 2(t2)
Perf. Rules(M)
Detecting Violation Metric Violated
Metric Set(VM)
PerformanceRegressionReport
Heterogeneous Environments
v1.75 v5.10 v1.71 v5.10 v1.71 v5.50
Perf Lab A Perf Lab B Perf Lab CTest 1 (T1) Test 2 (T2) Test 3 (T3)
Our Approach
Test N(tN)
Test 1(t1)
New Test (tnew)
.
.
.
Association Rule Mining
Test 2(t2)
Perf. Rules(M1)
Perf. Rules(M2)
Perf. Rules(MN)
.
.
.
Detecting Violation Metric Violated
Metric Set (VM1)
Violated Metric Set
(VM2)
Violated Metric Set
(VMN)
.
.
.
Our Approach
Test N(tN)
Test 1(t1)
New Test (tnew)
.
.
.
Association Rule Mining
Test 2(t2)
Perf. Rules(M1)
Perf. Rules(M2)
Perf. Rules(MN)
.
.
.
Detecting Violation Metric Violated
Metric Set (VM1)
Violated Metric Set
(VM2)
Violated Metric Set
(VMN)
.
.
.
Our ApproachPerf. Rules
(M1)
Perf. Rules(M2)
Perf. Rules(MN)
Detecting Violation Metric Violated
Metric Set (VM1)
Violated Metric Set
(VM2)
Violated Metric Set
(VMN)
.
.
.
EnsembleLearning
.
.
.
Aggregated Violated
Metric Set
PerformanceRegressionReport
SmallMediumLargeOriginal
Time
Met
ric
Original
Time
Met
ric
Metric Discretization
• Association rules mining can only operate on data with discretized values
• Equal Width (EW) interval binning algorithm
Deriving Frequent Itemset from Past Test # 1
Time DB read/sec Throughput Request Queue Size10:00 Medium Medium Low10:03 Medium Medium Low10:06 Low Medium Medium10:09 Medium Medium Low10:12 Medium Medium Low10:15 Medium Medium Low
Deriving Frequent Itemset from Past Test # 1
Time DB read/sec Throughput Request Queue Size10:00 Medium Medium Low10:03 Medium Medium Low10:06 Low Medium Medium10:09 Medium Medium Low10:12 Medium Medium Low10:15 Medium Medium Low
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Deriving Performance Rulesfrom Past Test # 1
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Request Queue Size
Low
DB read/secMedium
ThroughputMedium
ThroughputMedium
Request Queue Size
Low
DB read/secMedium
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Pruning Performance Rules
• Rules with low support and confidence values are pruned
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Premise Consequence
( 0.5 , 0.9 )
(support, confidence)
Web Server CPU
Medium
DB read/secMedium
Web Server Memory
High( 0.1 , 0.7 )
Web Server CPU
Medium
Web Server MemoryMedium
ThroughputHigh
( 0.2 , 0.2 )
Detecting Violation Metrics in the Current Test
Time DB read/sec Throughput Request Queue Size08:00 Medium Medium High08:03 Medium Medium High08:06 Low Medium Medium08:09 Medium Medium Low08:12 Medium Medium Low08:15 Medium Medium High
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Detecting Violation Metrics in the Current Test
Time DB read/sec Throughput Request Queue Size08:00 Medium Medium High08:03 Medium Medium High08:06 Low Medium Medium08:09 Medium Medium Low08:12 Medium Medium Low08:15 Medium Medium High
ThroughputMedium
DB read/secMedium
Request Queue size
Low
Request Queue size
High
• Rules with significant changes in confidence values are flagged as “anomalous”
Combining Results
Throughputat t0
Throughputat t1
Throughputat t2
M1 M2 M3 M4 Anomalous?Stacking(? vote) (? vote) (? vote) (? vote)
New Test
Heterogeneous Lab Environments
v1.71 v5.50 v1.71 v5.10 v1.71 v5.50
Perf Lab A Perf Lab B Perf Lab CT1, T2 T3 T4
v1.71 v5.50 v1.71 v5.10 v1.71 v5.50
Perf Lab A Perf Lab B Perf Lab C
(CPU, DISK, OS, Java, MySQL)
Measuring Similarities Between Labs
(1, 1, 1, 1, 0)(0, 0, 0, 1, 0) (1, 1, 1, 1, 1) = 1 = 2.2 = 2
T1, T2 T3 T4
v5.50v1.71
v1.71 v5.10v1.71 v5.50 v1.71 v5.50
Perf Lab A Perf Lab B Perf Lab C
Assigning Weights to Past Tests
1 2.2 2 = 0.20 = 0.42 = 0.38
T1, T2 T3 T40.20 0.42 0.38
Combining Results
Throughputat t0
Throughputat t1
Throughputat t2
M1 M2 M3 M4 Anomalous?Stacking
1.00 v.s. 0.20
0.38 v.s 0.82
(0.20 vote) (0.20 vote) (0.42 vote) (0.38 vote)
0.58 v.s. 0.62
New Test
Case StudyTypes of Systems Experiments
Dell DVD Store Open Source Benchmark Application
Bug Injection
JPetStore Open Source Re-implementation of Oracle’s
Pet Store
Bug Injection
A Large Enterprise System
Closed Source Large-Scale Telephony System
Performance Regression Repository
Performance Evaluation Metrics
𝐹𝑚𝑒𝑎𝑠𝑢𝑟𝑒=2×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
A Large Enterprise System
E1 E2 E30
0.10.20.30.40.50.60.70.80.9
1
F-measureSingle Bagging Stacking