nov 6, 2008 presented by amy siu and ej park. application release 1 r1 test cases application...

Nov 6, 2008Presented by Amy Siu and EJ Park

ApplicationRelease 1

R1 TestCases


R2 TestCases

R1 TestCases

2

Regression testing is expensive!

Validate modified software Often with existing test cases from previous

release(s) Ensure existing features are still working

A strategy to◦ Minimize the test suite◦ Maximize fault detection ability

Considerations and trade-offs◦ Cost to select test cases◦ Time to execute test suite◦ Fault detection effectiveness

3

Regression test case selection techniques affect the cost-effectiveness of regression testing

Empirical evaluation of 5 selection techniques

No new technique proposed

4


R1 TestCases


P P'

T T' T'' T'''

Programs: P, P' Test suite: T Test cases: T’ ⊆ T New test cases: T'' for P‘ New test suite: T''' for P’ including selection from T’

Regression test selection problem

5

5 test case selection techniques◦ Minimization◦ Dataflow◦ Safe◦ Ad Hoc / Random◦ Retest-All

6

Minimization

Dataflow

Safe

Ad Hoc / Random

Retest-All

• Select minimal sets of test cases T'

• Only cover modified or affected portions of P– '81 Fischer et. al– '90 Hartman and Robson

7

• Select test cases T' that exercise data interactions that have been affected by modifications in P'– '88 Harrold and Soffa– '88 Ostrand and Weyuker– '89 Taha et. al

8

Minimization

Dataflow

Safe

Ad Hoc / Random

Retest-All

• Guarantee that T' contains all test cases in T that can reveal faults in P'– '92 Laski and Szermer– '94 Chen et. al– '97 Rothermel and Harrold– '97 Vokolos and Frankl

9

Minimization

Dataflow

Safe

Ad Hoc / Random

Retest-All

• Select T' based on hunches, or loose associations of test cases with functionality

10

Minimization

Dataflow

Safe

Ad Hoc / Random

Retest-All

• “Select” all the test cases in T to test P'

11

Minimization

Dataflow

Safe

Ad Hoc / Random

Retest-All

How techniques differ?◦ The ability to reduce regression testing cost

◦ The ability to detect faults

◦ Trade-offs between test size reduction and fault detection

◦ The Cost-effectiveness comparison

◦ Factors affect the efficiency and effectiveness of test selection techniques

12

Calculating the cost of RST (Regression Test Selection) Techniques

They measure◦ Reduction of E(T’) by calculating the size reduction

◦ Average of A by simulating on several machines

13

)(TEAostc A: The cost of analysis required to select test casesE(T’): The cost of executing and validating the selected test cases

TTeductionR /

On a Per-Test-Case Basis◦ Effectiveness = # of test cases revealing fault of P’ in T,

but not in T’

On a Per-Test-Suite Basis◦ Classify the result of test selection

(1) No test case in T is fault revealing then T’ too, or(2) Some test cases in T and T’ both revealing fault, or(3) Some test cases in T is revealing fault, but not in T’.

◦ Effectiveness = 1 – (% of no fault revealing test cases)

14

Their choiceTheir choice

Programs: All C programs

◦ The Siemens Programs: 7 C programs

◦ Space: Interpreter for an array definition language

◦ Player: Subsystem of Empire (Internet game)

15

Programs

Faulty versionHow do the authors create test pool and suite?

Siemens Programs◦ Constructing test pool of black-box test cases from

Hutchins et al.◦ Adding additional white-box test cases

Space◦ 10000 test cases from Vokolos and Frankl, randomly

generated◦ Adding new test cases from executing CFG

Player◦ 5 different unique version of player – named “base”

version◦ Creating own test cases from Empire information files

16

Programs

Test Pool Design

17

P1

…

P8

…

…

Siemens / Space

Test Pool

TC1 TC2 TC3

… … …

Tp(E)

Test Suites for each program

RandomNumber

Generator

…

Player

command2

TC1 TC3

command1command1

RandomSelection

TC2

Programs

Test Pool Design

Test SuiteDesign

Siemens: 0.06%~19.77%Space: 0.04%~94.35%Player: 0.77%~4.55%

Minimization◦ Created simulator tool

Dataflow◦ Simulating dataflow testing tool◦ Def-use pairs affected by modification

Safe ◦ DejaVu: Rothermel and Harrold’s RTS algorithm

Detect “dangerous edge”◦ Aristole: program analysis system

Random: n % of test cases from T randomly

18

Only for Siemens

Variables◦ Independent

9 Programs (Siemens, Space and Player) RTS technique (safe, dataflow, minimization,

random(25, 50, 75), retest-all Test suite creation criteria

◦ Dependent The average reduction in test suite size Fault detection effectiveness

Design◦ Test suites: 100 coverage-based + 100 random

19

Internal◦ Instrumentation effects can bias results They run each test selection algorithm on each test suite

and each subject program

External◦ Limitation to generalize results to industrial practice

Small size/simple fault pattern of test programs Only for corrective maintenance process

Construct◦ Adequate measurement

Cost and effectiveness measurement is too coarse!

20

Comparison1◦ Test Size Reduction◦ Fault Detection Effectiveness

Comparison2◦ Program Analysis Based Techniques

minimization, safe, and data-flow◦ Random Technique

21

22

Random Techniques: Constant percentage of test casesMinimization: Always choose 1 test caseSafe and Dataflow: Similar behavior on Siemens

Safe: Best on Space and Player

23

Random Techniques: Effectiveness increased by test suit sizeRandom Techniques: Increase rate diminished as size increased.Minimization: overall had the lowest effectivenessSafe & Dataflow: Similar median performance on Siemens

24

Random Techniques-Effective general-Selection Ratio ↑ Effectiveness ↑ Increase Rate ↓

Minimization-Reduction is very high-Various Effectiveness

Safe-100% Effectiveness-Various Test Suite Size

Dataflow-100% Effectiveness too Not safe

Minimization vs. Random◦ Assumption: k value = analysis time ◦ Comparison Method

Start from a trial value of k Choose test suite from minimization Choose |Test suite| + k test suits from

random Adjust k until the effectiveness is equal

◦ Comparison Result For coverage-based test suite: k = 2.7 For random test suite: k = 4.65

Safe vs. Random◦ Same assumption about k◦ Find k to make fixed

100(1-p)% of fault detect of Random techniques

◦ Comparison Results Coverage-based

k =0, 96.7% k = 0.1, 99%

Random k = 0, 89%

k = 10, 95% k = 25, 99%

Safe vs. Retest-all◦ When Safe is desirable?

Analysis cost is less than running the unselected test cases

Test suite reduction depends on program

Minimization◦ Smallest code size but least effective◦ “on the average” applies to long-run behavior◦ The number of test cases to choose depends on run-time

Safe and Dataflow◦ Nearly equivalent average behavior in cost-effective◦ Safe is better than Dataflow, why?◦ When dataflow is useful?◦ Better analysis required for Safe

Random◦ Constant percentage of size reduction◦ Size ↑, fault detect effectiveness ↑

Retest-All◦ No size reduction, 100% fault detect effectiveness

25

(1) Improve Cost Model with Other Factors(2) Extend analysis to Multiple Types of Faults(3) Develop Time-Series-Based Models(4) Scalability with More Complex Fault Distribution

26

Current Paper

Current Paper

2001 2002

2003

Java Software[1]

Test Prioritization [2]

With more factors [3],[4]

Using Field Data [5],[6]

2004

Larger Software[7]

2005

Larger and complex

Software[8]

2006

Improved Cost Model [9]

Multiple Types of Faults [10]

2007 2008

2 papers 4 papers

[1] Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, Steven Spoon, “Regression Test Selection for Java Software”, OOPSLA 2001, October 2001.

[2] Jung-Min Kim , Adam Porter, “A history-based test prioritization technique for regression testing in resource constrained environments”, 24th International Conference on Software Engineering, May 2002.

[3] A. G. Malishevsky, G. Rothermel, and S. Elbaum, “Modeling the Cost-Benefits Tradeoffs for Regression Testing Techniques”, Proceedings of the International Conference on Software Maintenance, October 2002.

[4] S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri, “Understanding the Effects of Changes on the Cost-Effectiveness of Regression Testing Techniques”, Technical Report 020701, Department of Computer Science and Engineering, University of Nebraska -- Lincoln, July 2002

[5] Alessandro Orso, Taweesup Apiwattanapong, Mary Jean Harrold, “Improving Impact Analysis and Regression Testing Using Field Data”. RAMSS 2003, May 2003.

[6] Taweesup Apiwattanapong, Alessandro Orso, Mary Jean Harrold, “Leveraging Field Data for Impact Analysis and Regression Testing”, ESEC9/FSE11 2003, September 2003.

[7] Alessandro Orso, Nanjuan Shi, Mary Jean Harrold, “Scaling Regression Testing to Large Software Systems”, FSE 2004, November 2004.

[8] J. M. Kim, A. Porter, and G. Rothermel, “An Empirical Study of Regression Test Application Frequency”, Journal of Software Testing, Verification, and Reliability, V. 15, no. 4, December 2005, pages 257-279.

[9] H. Do and G. Rothermel, “An Empirical Study of Regression Testing Techniques Incorporating Context and Lifecycle Factors and Improved Cost-Benefit Models”, FSE2006, November 2006

[10] H. Do and G. Rothermel, “On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques”, IEEE Transactions on Software Engineering, V. 32, No. 9, September 2006, pages 733-752

27

nov 6, 2008 presented by amy siu and ej park. application release 1 r1 test cases application...

Documents

t test cases

test cases et

test casesafe

r2 test cases r1 test

existing test cases

test cases time

selected test cases

t t new test cases