parallel execution of test runs for database application systems

Parallel Execution of Test Runs for Database Application Systems

Donald Kossmann: ETH Zurich, i-TV-T AGFlorian Haftmann: i-TV-T AGEric Lo: ETH Zurich

2

Some Facts

• Microsoft spends 50% of their development cost on testing

• SAP product cycle = 18 months– 6 months to execute tests

Testing is the most expensive phase of the software development cycle

3

Observation

• The more test runs, the better• However, it

takes more time!• Goal: Optimize Testing Time

4

Definition: Test Run Ti• A sequence of requests• Test Run “Login” (2 requests):

Req Action Value Expected Result

1 Fill-in LoginFill-in Password

Eric********

2 Click Sign-in

5

Expected ResultReq Action Value Expected Result

1 Fill-in IDFill-in Password

Eric********

2 Click Sign-in

6

More Definitions

• Failed Test Run: At least one request does not return the expected result

• Test Database D: The state of an Application + Database at the beginning of each test

• Database Reset R: Bring the database back to D

7

A Test Run Fails When:

1. The application has a real bug2. Or the test database is in wrong state

due to execution of test runs

Carry out resets to find real bugs

8

Resetting the Test Database?

DatabasePurchaseOrder={P1}

- P.O. Insertion- Count P.O.- …

9




TA: Insert Purchase Order P2

10


DatabasePurchaseOrder={P1 }


<TA Success>- P.O. Insertion- Count P.O.- …

P2

11



TB: Get Total Purchase OrderExpected Result: 1

Actual Result: 1<TB Success>


12




TB: Get Total Purchase OrderExpected Result: 1Actual Result: 2

<TB Fails>


P2

13


Database Reset is Database Reset is needed!needed!




Reset DB

P2

14

Database Reset

• Resetting a database for a large scale application takes about 2 minutes!

• Back-of-the-envelop calculation:– 10000 test runs

= 10000 resets x 2 min = 2 weeks on DB resets for 1 complete test

15


Reordering Test Runs

DatabasePurchaseOrder={ P1, P2 }



Actual Result: 2<TB Fails>

16


Reordering Test Runs

DatabasePurchaseOrder={ P1 }



17


Order Matters!

DatabasePurchaseOrder={ P1, P2 }

TA: Insert Purchase Order P2<TA Success>



18

Our Previous Work (CIDR 2005)

• A test run depends on a correct state of a database

– Control the database state• Reduce the number of database resets• Algorithms to optimize order of test runs

• No parallelism in testing

19

Can we do better if we have > 1 machine?

20

Parallel Testing is a Two-dimensional Problem!

1. Fully utilize the available resources• Load Balancing!

2. Same as single machine, we still have to control the database state

• Reduce the database resets!

21

More about the Problem

• Regression test– Later stage of the development cycle

• Minor changes between versions– Execute the same set of test runs

• Version 1.1– Execute test: T1 T2 T3 T4

• Version 1.2 (Bug fixed and/or minor changes)– Execute test: T1 T2 T3 T4

22

Parallel TestingShared-Nothing vs. Shared-Database

23

Shared-Nothing (SN)• If I work for IBM, I can

install:– N applications– N databases– N machines

• One more machine:– More admin. work!– More license fees!

• Applications do not SHARE the database

Application

Database

Machine 1

...Application

Database

Machine N

T12

T4

...

T5

T31

...

24

Shared-Database (SDB)

Application

Database

T12

T4

...

T5

T31

...

...

Thread 1 Thread N• If I work for PoorEric.com,

I install:– N threads (e.g., open N browsers)– 1 database– 1 machine

• The threads SHARE the database• Test runs interference with

each others– Can’t scale as good as

Shared-Nothing

25

T2

Parallel Testing Framework

Conflicts DB

Scheduler ...Reset?

Reset?

HistoryM1

MN

...

Application

DatabaseMachine/Thread 1

Application

Machine/Thread NDatabase

... T1T5T2T6

T1

T5

26

Parallel Testing is a Two-dimensional Problem!

1. Fully utilize the available resources• Load Balancing!

2. Same as single machine, we still have to control the database state

• Reduce the database resets!

27

Execution Strategies• Optimistic Execution:

– Reset the database only when it is a must– Example: R T1 T2 T3 T4

• Optimistic++ Execution:– Avoid to execute a test run twice, again– Example (Wk 1): R T1 T2 T3 T4 R T4 T5– Example (Wk 2): R T1 T2 T3 (Next is T4 ?)

• Slice Reordering Heuristics:– Slice: A sequence of test runs without conflicts– Example: R T1 T2 T3 T4 R T4 T5– Collect <slice>s during each test

• Graph Reordering Heuristics

- R T- R T44 TT55

R TR T4 4 TT55

<T1 T2 T3>T4

28

Parallel TestingShared-Nothing (SN)

29

Shared-Nothing

Conflicts DB

Scheduler

Reset

Reset

Application

DatabaseMachine 1

Application

DatabaseMachine 2

...Test Run Input Queue

T1T5T2T6

30

Test 1

Test Run Input Queue

Conflicts DB

M1: R

M2: R

Scheduler T1 T5 T2 T6 T3 T7 T8

31

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5 T2 T6 T3 T7 T8

32

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2 T6 T3 T7 T8

33

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6 T3 T7 T8

34

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3 T7 T8

35

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8

36

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8

R

37

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8

R T6

T5T6

38

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8

R T6

R

T5T6

T1T2T3

39

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7

T8

R T6

R

T5T6

T1T2T3

40

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8R T6

R

T5T6

T1T2T3

41

Test 1


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8R T6

R T3

T5T6

T1T2T3

42

Shared-Nothing - Slice

3 major principles:1. The slices in the input queue are ordered by:

– Reordering the slices on each machine locally– Merge the partial order

2. Executes all test runs of the same slice on the same machine3. The scheduler makes sure conflicting slices are executed on different machines as m

uch as possible

43

Collect Slices


Conflicts DB

M1: R

M2: R

Scheduler

T1

T5

T2

T6

T3

T7 T8R T6

R T3

T5T6

T1T2T3

44

Reordering Slices

M1: R

M2: R T6

T3

R

RT1 T2

T5

T3

T7 T8T6

Local Order M1:

Local Order M2:

45

Merge Partial Order

M1: R

M2: R T6

T3

R

RT1 T2

T5

T3

T7 T8T6

Local Order M1:

Local Order M2:


46

Shared-Nothing - Slice

3 major principles:1. The slices in the input queue are ordered by:

– Reordering the slices on each machine locally– Merge the partial order

2. Executes all test runs of the same slice on the same machine3. The scheduler makes sure conflicting slices are executed on different machines as m

uch as possible

47

Test 10

T6 T7 T8T3

Test Run Input QueueT1 T2 T5

M1: R

M2: R

Scheduler

Conflicts DB

T5T6

T1T2T3T3T1

48

Test 10

T6 T7 T8


M1: R T3

M2: R

Scheduler

T3

Conflicts DB

T5T6

T1T2T3T3T1

49

Test 10


M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

Conflicts DB

T5T6

T1T2T3T3T1

50

Test 10


M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

Conflict?

Conflicts DB

T5T6

T1T2T3T3T1

51

Test 10


M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

Conflict?

Conflicts DB

T5T6

T1T2T3T3T1

52

Test 10

Test Run Input QueueT5

M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

Conflict? T1 T2

Conflicts DB

T5T6

T1T2T3T3T1

53

Test 10


T1 T2

M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

T5

Conflicts DB

T5T6

T1T2T3T3T1

54

Test 10


M1: R T3

M2: R

Scheduler

T3

T6 T7 T8

T5

T1 T2

Conflicts DB

T5T6

T1T2T3T3T1

55

Parallel TestingShared-Database (SDB)

56

Shared-Database

T6 T2 T5 T1

Conflicts DB<T1 T5 T6 > T2

Scheduler

Reset

Reset

Application

Thread 1

Application

Database

Thread 2

...

...


57

Shared-Database, Slice

• Similar to Shared-Nothing• Different definition of a slice• Different scheduling decisions

58

Performance Experiments

• Simulation:– 10,000 test runs (0 min – 3 min)– 10,000 (low) – 5M (high) conflicts– Uniform + Zipf distribution– SN: 1 to 50 machines– SDB: 1 to 10 threads

• Real data: 61 test runs• Reporting average running time/reset of th

e last 10 tests (total 30 tests)

59

Shared-DB (Real Data)

Time unit: minute

Approach1 thread 5 threads 10 threads

Time Reset Time Reset Time Reset

Optimistic++ 41 7 22 6.6 16 5.8

Graph(MWD) 37 3.5 19 4.2 13 4.2

Slice 31 3 18 3.8 12 4.2

60

Shared-DB (Real Data)

Time unit: minute

Approach1 thread 5 threads 10 threads

Time Reset Time Reset Time Reset

Optimistic++ 41 7 22 6.6 16 5.8

Graph(MWD) 37 3.5 19 4.2 13 4.2

Slice 31 3 18 3.8 12 4.2

61

Experiment Summary

• Shared-Nothing (SN)– Linear scale-up, sometimes super-linear

• Shared-Database (SDB)– Scales up to 10 threads

• Heuristics:– Slice is the winner

• How about other distribution (e.g., Zipf)?– Similar results

62

Conclusions and Future Work• Parallel execution of test runs?

– It SCALES!• Studied a dynamic scheduling approach for

SN and SDB architecture:– Control the database state

Minimize DB resets– and Load balancing

• How to generate test runs and test data for database application programs?

• More in the paper

63

Thank YouMain contact: [email protected]

64

Parallel Testing Framework

... T4 T31 T5 T12

Conflicts DB

Scheduler ...T17

Reset?

T8

Reset?

T7

T9 T25T13

HistoryM1

MN

...Application

Database

Machine/Thread 1

Application

DatabaseMachine/Thread N


65

Example: Shared-Nothing, Slice

T6 T2 T5 T1

Conflicts DB<T1 T2> T3

Scheduler

Reset

Reset

Application

DatabaseMachine 1

Application

DatabaseMachine 2

...

Test 1:M1: R T1 T4 ... R ... M2: R T2 T3 T5 R ...

...


Test 1:M1: R T1 T2 T3 R T3

M2: R T5 T6 R T6 T7 T8

66

Shared-Database, Slice

T6 T2 T5 T1


Scheduler

Reset

Reset

Application

Thread 1

Application

Database

Thread 2

...


...


Th1: T1 T2 T2 T3

R R R Th2: T5 T6 T7 T8 T8

Test 1:

67


Shared-Database, Slice - Test 1

T6 T2 T5 T1 Scheduler

Reset

Reset

Application

Thread 1

Application

Database

Thread 2

...


...


Th1: T1 T2 T2 T3


Test 1:

68

SDB – Subsequent Tests


Th1: T1 T2 T2 T3


Test 1:

Reordering

T2 T7 T3T8


Test N:

69

Additional Issues - SDB

• How to do a database reset when a test run fails?• Deferred:

– The database reset is deferred and the failed test run is re-scheduled at the end

• Eager: – Abort all concurrent test runs and reset immediately

• Lazy*: – Do not accept new test run, let active test runs finished and reset.


Th1: T1 T2 T2 T3


Test 1:

70

Shared-Nothing Performance

• Achieve linear scale-up?– Yes

• The best among the three:– Slice

• How about low conflict?– Similar results

• How about other distribution (e.g., Zipf)?– Similar results

71

Shared-Database Performance

• Scale-up if increasing the number of threads?– Yes, up to 10 threads

• If number of conflicts is high, > 10 test threads might hurt performance

• The best among the three:– Slice

72

SN Simulation (High Conflict)

Approach1 machines 5 machines 10 machines 50 machines

Time Reset Time Reset Time Reset Time Reset

Optimistic++358 1788 72 1787 36 1775 6.8 1753

Slice306 867 64 1098 32 1038 6.4 1048

Graph(MWD)

359 1792 71 1784 36 1780 7.6 1767

Time unit: hour

73

SN Simulation (High Conflict)

Approach1 machines 5 machines 10 machines 50 machines


Optimistic++ 358 1788 72 178

7 36 1775 6.8 175

3

Slice 306 867 64 1098 32 103

8 6.4 1048

Graph(MWD) 359 179

2 71 1784 36 178

0 7.6 1767

Time unit: hour

74

SDB Simulation

Approach1 thread 5 threads 10 threads 50 threads


Optimistic++358 1788 16

0 1385 157 1231 258 1425

Slice306 867 12

0 793 112 796 259 1422

MWD359 1792 16

4 1396 156 1251 204 1067

Time unit: hour

75

Optimistic

• Let the test runs execute until a DB reset is really needed!– Optimistic: R T1 T2 T3 T4

– If a test run T reports fail:– Reset the database and then rerun T– Then, if T still reports failure A real bug!– Example:

• Optimistic: R T1 T2 T3 T4 R T4

<T4 failure>

76

Optimistic++

• Optimistic++: Record all failures (conflicts) to avoid executing a test run twice, again– Test on Monday : R T1 T2 T3 R T3 … Tn

<T1 T2> T3

– Test on Tuesday : R T1 T2

– Test on Tuesday : R T1 T2 R T3 … Tn

(Next? T3?)

77

Reordering Heuristics - Slice

• Slice: sequence of test runs without conflicts• Collect <slice>s during each test

– Test Monday = R T1 T2

– Slices = <T1 T2> <T3 T4> <T5>

• Run test again?– Reorder slices according to the conflicts collected

T3 R T3 T4 T5 R T5

<T5><T3 T4> <T1 T2>

78

Test on Yesterday and Test on Today

Yesterday:M1: R T1 T2 T3 R T3

M2: R T5 T6 R T6 T7 T8

Reordering

Merge

T6 T7 T8T3


O1: <T3><T1 T2>O2: <T6 T7 T8><T5>

T6 T7 T8T3


Today:M1: R T3 T1 R T1 T2

M2: R T6 T7 T8 T5 R T5

Reordering

Merge

79

False Positive• Case 1:

Buggy Application+ Tx “Fails”

Consistent DB State • Case 2:

Buggy Application+ Tx “Success”

Inconsistent DB State – The inconsistent DB “helps” the test run by

coincidence!• This a tradeoff between speed and nitpick

accuracy

parallel execution of test runs for database application systems

Documents