parallel execution of test runs for database application systems
DESCRIPTION
Parallel Execution of Test Runs for Database Application Systems. Donald Kossmann: ETH Zurich, i-TV-T AG Florian Haftmann: i-TV-T AG Eric Lo: ETH Zurich. Some Facts. Microsoft spends 50% of their development cost on testing SAP product cycle = 18 months 6 months to execute tests - PowerPoint PPT PresentationTRANSCRIPT
Parallel Execution of Test Runs for Database Application Systems
Donald Kossmann: ETH Zurich, i-TV-T AGFlorian Haftmann: i-TV-T AGEric Lo: ETH Zurich
2
Some Facts
• Microsoft spends 50% of their development cost on testing
• SAP product cycle = 18 months– 6 months to execute tests
Testing is the most expensive phase of the software development cycle
3
Observation
• The more test runs, the better• However, it
takes more time!• Goal: Optimize Testing Time
4
Definition: Test Run Ti• A sequence of requests• Test Run “Login” (2 requests):
Req Action Value Expected Result
1 Fill-in LoginFill-in Password
Eric********
2 Click Sign-in
5
Expected ResultReq Action Value Expected Result
1 Fill-in IDFill-in Password
Eric********
2 Click Sign-in
6
More Definitions
• Failed Test Run: At least one request does not return the expected result
• Test Database D: The state of an Application + Database at the beginning of each test
• Database Reset R: Bring the database back to D
7
A Test Run Fails When:
1. The application has a real bug2. Or the test database is in wrong state
due to execution of test runs
Carry out resets to find real bugs
9
Resetting the Test Database?
- P.O. Insertion- Count P.O.- …
DatabasePurchaseOrder={P1}
TA: Insert Purchase Order P2
10
Resetting the Test Database?
DatabasePurchaseOrder={P1 }
TA: Insert Purchase Order P2
<TA Success>- P.O. Insertion- Count P.O.- …
P2
11
Resetting the Test Database?
DatabasePurchaseOrder={P1}
TB: Get Total Purchase OrderExpected Result: 1
Actual Result: 1<TB Success>
- P.O. Insertion- Count P.O.- …
12
Resetting the Test Database?
DatabasePurchaseOrder={P1 }
TA: Insert Purchase Order P2
TB: Get Total Purchase OrderExpected Result: 1Actual Result: 2
<TB Fails>
- P.O. Insertion- Count P.O.- …
P2
13
- P.O. Insertion- Count P.O.- …
Database Reset is Database Reset is needed!needed!
DatabasePurchaseOrder={P1 }
TA: Insert Purchase Order P2
TB: Get Total Purchase OrderExpected Result: 1
Reset DB
P2
14
Database Reset
• Resetting a database for a large scale application takes about 2 minutes!
• Back-of-the-envelop calculation:– 10000 test runs
= 10000 resets x 2 min = 2 weeks on DB resets for 1 complete test
15
- P.O. Insertion- Count P.O.- …
Reordering Test Runs
DatabasePurchaseOrder={ P1, P2 }
TA: Insert Purchase Order P2
TB: Get Total Purchase OrderExpected Result: 1
Actual Result: 2<TB Fails>
16
- P.O. Insertion- Count P.O.- …
Reordering Test Runs
DatabasePurchaseOrder={ P1 }
TB: Get Total Purchase OrderExpected Result: 1
Actual Result: 1<TB Success>
17
- P.O. Insertion- Count P.O.- …
Order Matters!
DatabasePurchaseOrder={ P1, P2 }
TA: Insert Purchase Order P2<TA Success>
TB: Get Total Purchase OrderExpected Result: 1
Actual Result: 1<TB Success>
18
Our Previous Work (CIDR 2005)
• A test run depends on a correct state of a database
– Control the database state• Reduce the number of database resets• Algorithms to optimize order of test runs
• No parallelism in testing
20
Parallel Testing is a Two-dimensional Problem!
1. Fully utilize the available resources• Load Balancing!
2. Same as single machine, we still have to control the database state
• Reduce the database resets!
21
More about the Problem
• Regression test– Later stage of the development cycle
• Minor changes between versions– Execute the same set of test runs
• Version 1.1– Execute test: T1 T2 T3 T4
• Version 1.2 (Bug fixed and/or minor changes)– Execute test: T1 T2 T3 T4
23
Shared-Nothing (SN)• If I work for IBM, I can
install:– N applications– N databases– N machines
• One more machine:– More admin. work!– More license fees!
• Applications do not SHARE the database
Application
Database
Machine 1
...Application
Database
Machine N
T12
T4
...
T5
T31
...
24
Shared-Database (SDB)
Application
Database
T12
T4
...
T5
T31
...
...
Thread 1 Thread N• If I work for PoorEric.com,
I install:– N threads (e.g., open N browsers)– 1 database– 1 machine
• The threads SHARE the database• Test runs interference with
each others– Can’t scale as good as
Shared-Nothing
25
T2
Parallel Testing Framework
Conflicts DB
Scheduler ...Reset?
Reset?
HistoryM1
MN
...
Application
DatabaseMachine/Thread 1
Application
Machine/Thread NDatabase
... T1T5T2T6
T1
T5
26
Parallel Testing is a Two-dimensional Problem!
1. Fully utilize the available resources• Load Balancing!
2. Same as single machine, we still have to control the database state
• Reduce the database resets!
27
Execution Strategies• Optimistic Execution:
– Reset the database only when it is a must– Example: R T1 T2 T3 T4
• Optimistic++ Execution:– Avoid to execute a test run twice, again– Example (Wk 1): R T1 T2 T3 T4 R T4 T5– Example (Wk 2): R T1 T2 T3 (Next is T4 ?)
• Slice Reordering Heuristics:– Slice: A sequence of test runs without conflicts– Example: R T1 T2 T3 T4 R T4 T5– Collect <slice>s during each test
• Graph Reordering Heuristics
- R T- R T44 TT55
R TR T4 4 TT55
<T1 T2 T3>T4
29
Shared-Nothing
Conflicts DB
Scheduler
Reset
Reset
Application
DatabaseMachine 1
Application
DatabaseMachine 2
...Test Run Input Queue
T1T5T2T6
38
Test 1
Test Run Input Queue
Conflicts DB
M1: R
M2: R
Scheduler
T1
T5
T2
T6
T3
T7 T8
R T6
R
T5T6
T1T2T3
39
Test 1
Test Run Input Queue
Conflicts DB
M1: R
M2: R
Scheduler
T1
T5
T2
T6
T3
T7
T8
R T6
R
T5T6
T1T2T3
40
Test 1
Test Run Input Queue
Conflicts DB
M1: R
M2: R
Scheduler
T1
T5
T2
T6
T3
T7 T8R T6
R
T5T6
T1T2T3
41
Test 1
Test Run Input Queue
Conflicts DB
M1: R
M2: R
Scheduler
T1
T5
T2
T6
T3
T7 T8R T6
R T3
T5T6
T1T2T3
42
Shared-Nothing - Slice
3 major principles:1. The slices in the input queue are ordered by:
– Reordering the slices on each machine locally– Merge the partial order
2. Executes all test runs of the same slice on the same machine3. The scheduler makes sure conflicting slices are executed on different machines as m
uch as possible
43
Collect Slices
Test Run Input Queue
Conflicts DB
M1: R
M2: R
Scheduler
T1
T5
T2
T6
T3
T7 T8R T6
R T3
T5T6
T1T2T3
45
Merge Partial Order
M1: R
M2: R T6
T3
R
RT1 T2
T5
T3
T7 T8T6
Local Order M1:
Local Order M2:
Test Run Input Queue
46
Shared-Nothing - Slice
3 major principles:1. The slices in the input queue are ordered by:
– Reordering the slices on each machine locally– Merge the partial order
2. Executes all test runs of the same slice on the same machine3. The scheduler makes sure conflicting slices are executed on different machines as m
uch as possible
47
Test 10
T6 T7 T8T3
Test Run Input QueueT1 T2 T5
M1: R
M2: R
Scheduler
Conflicts DB
T5T6
T1T2T3T3T1
48
Test 10
T6 T7 T8
Test Run Input QueueT1 T2 T5
M1: R T3
M2: R
Scheduler
T3
Conflicts DB
T5T6
T1T2T3T3T1
49
Test 10
Test Run Input QueueT1 T2 T5
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
Conflicts DB
T5T6
T1T2T3T3T1
50
Test 10
Test Run Input QueueT1 T2 T5
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
Conflict?
Conflicts DB
T5T6
T1T2T3T3T1
51
Test 10
Test Run Input QueueT1 T2 T5
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
Conflict?
Conflicts DB
T5T6
T1T2T3T3T1
52
Test 10
Test Run Input QueueT5
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
Conflict? T1 T2
Conflicts DB
T5T6
T1T2T3T3T1
53
Test 10
Test Run Input Queue
T1 T2
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
T5
Conflicts DB
T5T6
T1T2T3T3T1
54
Test 10
Test Run Input Queue
M1: R T3
M2: R
Scheduler
T3
T6 T7 T8
T5
T1 T2
Conflicts DB
T5T6
T1T2T3T3T1
56
Shared-Database
T6 T2 T5 T1
Conflicts DB<T1 T5 T6 > T2
Scheduler
Reset
Reset
Application
Thread 1
Application
Database
Thread 2
...
...
Test Run Input Queue
57
Shared-Database, Slice
• Similar to Shared-Nothing• Different definition of a slice• Different scheduling decisions
58
Performance Experiments
• Simulation:– 10,000 test runs (0 min – 3 min)– 10,000 (low) – 5M (high) conflicts– Uniform + Zipf distribution– SN: 1 to 50 machines– SDB: 1 to 10 threads
• Real data: 61 test runs• Reporting average running time/reset of th
e last 10 tests (total 30 tests)
59
Shared-DB (Real Data)
Time unit: minute
Approach1 thread 5 threads 10 threads
Time Reset Time Reset Time Reset
Optimistic++ 41 7 22 6.6 16 5.8
Graph(MWD) 37 3.5 19 4.2 13 4.2
Slice 31 3 18 3.8 12 4.2
60
Shared-DB (Real Data)
Time unit: minute
Approach1 thread 5 threads 10 threads
Time Reset Time Reset Time Reset
Optimistic++ 41 7 22 6.6 16 5.8
Graph(MWD) 37 3.5 19 4.2 13 4.2
Slice 31 3 18 3.8 12 4.2
61
Experiment Summary
• Shared-Nothing (SN)– Linear scale-up, sometimes super-linear
• Shared-Database (SDB)– Scales up to 10 threads
• Heuristics:– Slice is the winner
• How about other distribution (e.g., Zipf)?– Similar results
62
Conclusions and Future Work• Parallel execution of test runs?
– It SCALES!• Studied a dynamic scheduling approach for
SN and SDB architecture:– Control the database state
Minimize DB resets– and Load balancing
• How to generate test runs and test data for database application programs?
• More in the paper
64
Parallel Testing Framework
... T4 T31 T5 T12
Conflicts DB
Scheduler ...T17
Reset?
T8
Reset?
T7
T9 T25T13
HistoryM1
MN
...Application
Database
Machine/Thread 1
Application
DatabaseMachine/Thread N
Test Run Input Queue
65
Example: Shared-Nothing, Slice
T6 T2 T5 T1
Conflicts DB<T1 T2> T3
Scheduler
Reset
Reset
Application
DatabaseMachine 1
Application
DatabaseMachine 2
...
Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...
...
Test Run Input Queue
Test 1:M1: R T1 T2 T3 R T3
M2: R T5 T6 R T6 T7 T8
66
Shared-Database, Slice
T6 T2 T5 T1
Conflicts DB<T1 T5 T6 > T2
Scheduler
Reset
Reset
Application
Thread 1
Application
Database
Thread 2
...
Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...
...
Test Run Input Queue
Th1: T1 T2 T2 T3
R R R Th2: T5 T6 T7 T8 T8
Test 1:
67
Conflicts DB<T1 T5 T6 > T2
Shared-Database, Slice - Test 1
T6 T2 T5 T1 Scheduler
Reset
Reset
Application
Thread 1
Application
Database
Thread 2
...
Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...
...
Test Run Input Queue
Th1: T1 T2 T2 T3
R R R Th2: T5 T6 T7 T8 T8
Test 1:
68
SDB – Subsequent Tests
Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...
Th1: T1 T2 T2 T3
R R R Th2: T5 T6 T7 T8 T8
Test 1:
Reordering
T2 T7 T3T8
Test Run Input QueueT1 T5 T6
Test N:
69
Additional Issues - SDB
• How to do a database reset when a test run fails?• Deferred:
– The database reset is deferred and the failed test run is re-scheduled at the end
• Eager: – Abort all concurrent test runs and reset immediately
• Lazy*: – Do not accept new test run, let active test runs finished and reset.
Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...
Th1: T1 T2 T2 T3
R R R Th2: T5 T6 T7 T8 T8
Test 1:
70
Shared-Nothing Performance
• Achieve linear scale-up?– Yes
• The best among the three:– Slice
• How about low conflict?– Similar results
• How about other distribution (e.g., Zipf)?– Similar results
71
Shared-Database Performance
• Scale-up if increasing the number of threads?– Yes, up to 10 threads
• If number of conflicts is high, > 10 test threads might hurt performance
• The best among the three:– Slice
72
SN Simulation (High Conflict)
Approach1 machines 5 machines 10 machines 50 machines
Time Reset Time Reset Time Reset Time Reset
Optimistic++358 1788 72 1787 36 1775 6.8 1753
Slice306 867 64 1098 32 1038 6.4 1048
Graph(MWD)
359 1792 71 1784 36 1780 7.6 1767
Time unit: hour
73
SN Simulation (High Conflict)
Approach1 machines 5 machines 10 machines 50 machines
Time Reset Time Reset Time Reset Time Reset
Optimistic++ 358 1788 72 178
7 36 1775 6.8 175
3
Slice 306 867 64 1098 32 103
8 6.4 1048
Graph(MWD) 359 179
2 71 1784 36 178
0 7.6 1767
Time unit: hour
74
SDB Simulation
Approach1 thread 5 threads 10 threads 50 threads
Time Reset Time Reset Time Reset Time Reset
Optimistic++358 1788 16
0 1385 157 1231 258 1425
Slice306 867 12
0 793 112 796 259 1422
MWD359 1792 16
4 1396 156 1251 204 1067
Time unit: hour
75
Optimistic
• Let the test runs execute until a DB reset is really needed!– Optimistic: R T1 T2 T3 T4
– If a test run T reports fail:– Reset the database and then rerun T– Then, if T still reports failure A real bug!– Example:
• Optimistic: R T1 T2 T3 T4 R T4
<T4 failure>
76
Optimistic++
• Optimistic++: Record all failures (conflicts) to avoid executing a test run twice, again– Test on Monday : R T1 T2 T3 R T3 … Tn
<T1 T2> T3
– Test on Tuesday : R T1 T2
– Test on Tuesday : R T1 T2 R T3 … Tn
(Next? T3?)
77
Reordering Heuristics - Slice
• Slice: sequence of test runs without conflicts• Collect <slice>s during each test
– Test Monday = R T1 T2
– Slices = <T1 T2> <T3 T4> <T5>
• Run test again?– Reorder slices according to the conflicts collected
T3 R T3 T4 T5 R T5
<T5><T3 T4> <T1 T2>
78
Test on Yesterday and Test on Today
Yesterday:M1: R T1 T2 T3 R T3
M2: R T5 T6 R T6 T7 T8
Reordering
Merge
T6 T7 T8T3
Test Run Input QueueT1 T2 T5
O1: <T3><T1 T2>O2: <T6 T7 T8><T5>
T6 T7 T8T3
Test Run Input QueueT1 T2 T5
Today:M1: R T3 T1 R T1 T2
M2: R T6 T7 T8 T5 R T5
Reordering
Merge