nov 6, 2008 presented by amy siu and ej park. application release 1 r1 test cases application...
TRANSCRIPT
Nov 6, 2008Presented by Amy Siu and EJ Park
ApplicationRelease 1
R1 TestCases
ApplicationRelease 2
R2 TestCases
R1 TestCases
2
Regression testing is expensive!
Validate modified software Often with existing test cases from previous
release(s) Ensure existing features are still working
A strategy to◦ Minimize the test suite◦ Maximize fault detection ability
Considerations and trade-offs◦ Cost to select test cases◦ Time to execute test suite◦ Fault detection effectiveness
3
Regression test case selection techniques affect the cost-effectiveness of regression testing
Empirical evaluation of 5 selection techniques
No new technique proposed
4
ApplicationRelease 1
R1 TestCases
ApplicationRelease 2
P P'
T T' T'' T'''
Programs: P, P' Test suite: T Test cases: T’ ⊆ T New test cases: T'' for P‘ New test suite: T''' for P’ including selection from T’
Regression test selection problem
5
5 test case selection techniques◦ Minimization◦ Dataflow◦ Safe◦ Ad Hoc / Random◦ Retest-All
6
Minimization
Dataflow
Safe
Ad Hoc / Random
Retest-All
• Select minimal sets of test cases T'
• Only cover modified or affected portions of P– '81 Fischer et. al– '90 Hartman and Robson
7
• Select test cases T' that exercise data interactions that have been affected by modifications in P'– '88 Harrold and Soffa– '88 Ostrand and Weyuker– '89 Taha et. al
8
Minimization
Dataflow
Safe
Ad Hoc / Random
Retest-All
• Guarantee that T' contains all test cases in T that can reveal faults in P'– '92 Laski and Szermer– '94 Chen et. al– '97 Rothermel and Harrold– '97 Vokolos and Frankl
9
Minimization
Dataflow
Safe
Ad Hoc / Random
Retest-All
• Select T' based on hunches, or loose associations of test cases with functionality
10
Minimization
Dataflow
Safe
Ad Hoc / Random
Retest-All
• “Select” all the test cases in T to test P'
11
Minimization
Dataflow
Safe
Ad Hoc / Random
Retest-All
How techniques differ?◦ The ability to reduce regression testing cost
◦ The ability to detect faults
◦ Trade-offs between test size reduction and fault detection
◦ The Cost-effectiveness comparison
◦ Factors affect the efficiency and effectiveness of test selection techniques
12
Calculating the cost of RST (Regression Test Selection) Techniques
They measure◦ Reduction of E(T’) by calculating the size reduction
◦ Average of A by simulating on several machines
13
)(TEAostc A: The cost of analysis required to select test casesE(T’): The cost of executing and validating the selected test cases
TTeductionR /
On a Per-Test-Case Basis◦ Effectiveness = # of test cases revealing fault of P’ in T,
but not in T’
On a Per-Test-Suite Basis◦ Classify the result of test selection
(1) No test case in T is fault revealing then T’ too, or(2) Some test cases in T and T’ both revealing fault, or(3) Some test cases in T is revealing fault, but not in T’.
◦ Effectiveness = 1 – (% of no fault revealing test cases)
14
Their choiceTheir choice
Programs: All C programs
◦ The Siemens Programs: 7 C programs
◦ Space: Interpreter for an array definition language
◦ Player: Subsystem of Empire (Internet game)
15
Programs
Faulty versionHow do the authors create test pool and suite?
Siemens Programs◦ Constructing test pool of black-box test cases from
Hutchins et al.◦ Adding additional white-box test cases
Space◦ 10000 test cases from Vokolos and Frankl, randomly
generated◦ Adding new test cases from executing CFG
Player◦ 5 different unique version of player – named “base”
version◦ Creating own test cases from Empire information files
16
Programs
Test Pool Design
17
P1
…
P8
…
…
Siemens / Space
Test Pool
TC1 TC2 TC3
… … …
Tp(E)
Test Suites for each program
RandomNumber
Generator
…
Player
command2
TC1 TC3
command1command1
RandomSelection
TC2
Programs
Test Pool Design
Test SuiteDesign
Siemens: 0.06%~19.77%Space: 0.04%~94.35%Player: 0.77%~4.55%
Minimization◦ Created simulator tool
Dataflow◦ Simulating dataflow testing tool◦ Def-use pairs affected by modification
Safe ◦ DejaVu: Rothermel and Harrold’s RTS algorithm
Detect “dangerous edge”◦ Aristole: program analysis system
Random: n % of test cases from T randomly
18
Only for Siemens
Variables◦ Independent
9 Programs (Siemens, Space and Player) RTS technique (safe, dataflow, minimization,
random(25, 50, 75), retest-all Test suite creation criteria
◦ Dependent The average reduction in test suite size Fault detection effectiveness
Design◦ Test suites: 100 coverage-based + 100 random
19
Internal◦ Instrumentation effects can bias results They run each test selection algorithm on each test suite
and each subject program
External◦ Limitation to generalize results to industrial practice
Small size/simple fault pattern of test programs Only for corrective maintenance process
Construct◦ Adequate measurement
Cost and effectiveness measurement is too coarse!
20
Comparison1◦ Test Size Reduction◦ Fault Detection Effectiveness
Comparison2◦ Program Analysis Based Techniques
minimization, safe, and data-flow◦ Random Technique
21
22
Random Techniques: Constant percentage of test casesMinimization: Always choose 1 test caseSafe and Dataflow: Similar behavior on Siemens
Safe: Best on Space and Player
23
Random Techniques: Effectiveness increased by test suit sizeRandom Techniques: Increase rate diminished as size increased.Minimization: overall had the lowest effectivenessSafe & Dataflow: Similar median performance on Siemens
24
Random Techniques-Effective general-Selection Ratio ↑ Effectiveness ↑ Increase Rate ↓
Minimization-Reduction is very high-Various Effectiveness
Safe-100% Effectiveness-Various Test Suite Size
Dataflow-100% Effectiveness too Not safe
Minimization vs. Random◦ Assumption: k value = analysis time ◦ Comparison Method
Start from a trial value of k Choose test suite from minimization Choose |Test suite| + k test suits from
random Adjust k until the effectiveness is equal
◦ Comparison Result For coverage-based test suite: k = 2.7 For random test suite: k = 4.65
Safe vs. Random◦ Same assumption about k◦ Find k to make fixed
100(1-p)% of fault detect of Random techniques
◦ Comparison Results Coverage-based
k =0, 96.7% k = 0.1, 99%
Random k = 0, 89%
k = 10, 95% k = 25, 99%
Safe vs. Retest-all◦ When Safe is desirable?
Analysis cost is less than running the unselected test cases
Test suite reduction depends on program
Minimization◦ Smallest code size but least effective◦ “on the average” applies to long-run behavior◦ The number of test cases to choose depends on run-time
Safe and Dataflow◦ Nearly equivalent average behavior in cost-effective◦ Safe is better than Dataflow, why?◦ When dataflow is useful?◦ Better analysis required for Safe
Random◦ Constant percentage of size reduction◦ Size ↑, fault detect effectiveness ↑
Retest-All◦ No size reduction, 100% fault detect effectiveness
25
(1) Improve Cost Model with Other Factors(2) Extend analysis to Multiple Types of Faults(3) Develop Time-Series-Based Models(4) Scalability with More Complex Fault Distribution
26
Current Paper
Current Paper
2001 2002
2003
Java Software[1]
Test Prioritization [2]
With more factors [3],[4]
Using Field Data [5],[6]
2004
Larger Software[7]
2005
Larger and complex
Software[8]
2006
Improved Cost Model [9]
Multiple Types of Faults [10]
2007 2008
2 papers 4 papers
[1] Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel Pennings, Saurabh Sinha, Steven Spoon, “Regression Test Selection for Java Software”, OOPSLA 2001, October 2001.
[2] Jung-Min Kim , Adam Porter, “A history-based test prioritization technique for regression testing in resource constrained environments”, 24th International Conference on Software Engineering, May 2002.
[3] A. G. Malishevsky, G. Rothermel, and S. Elbaum, “Modeling the Cost-Benefits Tradeoffs for Regression Testing Techniques”, Proceedings of the International Conference on Software Maintenance, October 2002.
[4] S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri, “Understanding the Effects of Changes on the Cost-Effectiveness of Regression Testing Techniques”, Technical Report 020701, Department of Computer Science and Engineering, University of Nebraska -- Lincoln, July 2002
[5] Alessandro Orso, Taweesup Apiwattanapong, Mary Jean Harrold, “Improving Impact Analysis and Regression Testing Using Field Data”. RAMSS 2003, May 2003.
[6] Taweesup Apiwattanapong, Alessandro Orso, Mary Jean Harrold, “Leveraging Field Data for Impact Analysis and Regression Testing”, ESEC9/FSE11 2003, September 2003.
[7] Alessandro Orso, Nanjuan Shi, Mary Jean Harrold, “Scaling Regression Testing to Large Software Systems”, FSE 2004, November 2004.
[8] J. M. Kim, A. Porter, and G. Rothermel, “An Empirical Study of Regression Test Application Frequency”, Journal of Software Testing, Verification, and Reliability, V. 15, no. 4, December 2005, pages 257-279.
[9] H. Do and G. Rothermel, “An Empirical Study of Regression Testing Techniques Incorporating Context and Lifecycle Factors and Improved Cost-Benefit Models”, FSE2006, November 2006
[10] H. Do and G. Rothermel, “On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques”, IEEE Transactions on Software Engineering, V. 32, No. 9, September 2006, pages 733-752
27