regression testing: theory and practice · 2015-06-16 · example from apache camel: commit...

REGRESSION TESTING: THEORY AND PRACTICE

Software Bugs Lead to Financial Losses or Loss of Life

https://issues.openmrs.org/browse/TRUNK-4475

Boeing’s avionics software

Medical record system

Knight’s bug ($440 million)

2

Add Tests Run Tests

Software Testing Lifecycle

Assess Tests

3

Assess Tests

Test Generation Run Tests

[ICST09,ICSE10*,ISSTA11,ECOOP13]

[CSTVA10,FASE11]

*Paper won an award or invited for a journal publication: ICSE10, ICST10, ICST12, ISSTA13

Automated Test Generation

4

Assess Tests



[CSTVA10,FASE11]


Bugs Found in Widely Used Projects

5



[CSTVA10,FASE11]


Metrics for Assessing Test Quality

[Mutation10,ISSTA13]

[FSE11,ISSTA13*,ASE13,TOSEM15]

Test Quality Assessment

6

Test Generation Regression Testing


[CSTVA10,FASE11]


Significantly Faster Regression Testing


[ICST10*,STVR13]


[ASE11,CAV14,ASE14,OOPSLA14,FSE14]


7



[CSTVA10,FASE11]


Concurrent Code Analysis


[ICST10*,STVR13]



Test Quality Assessment Concurrent Code Analysis

[ICSE08,IWMSE10,Scala11,FSE11,ICST12*,TACAS13,Onward!13]

8



[CSTVA10,FASE11]


Impact Outside of Academia


[ICST10*,STVR13]





9



[CSTVA10,FASE11]



Today’s Focus



[ICST10*,STVR13]




10

Regression Testing

• Executes tests for each new code revision

• Checks if changes broke something

• Widely used in industry

original revision modified revision

changes

Ava

ilab

le t

ests t1

t2

t3

tn

…

t1

t2

t3

tn

…

11

Regression Testing – Costly (1)

~5min

~10min

~45min

1296

361

~4h

~17h

Ru

n m

any

tim

es

eac

h d

ay1667

641534

~45min

~45min

631

test execution time

4975

number of tests

866312

Regression Testing – Costly (2)

linear increase in the number of revisions per daylinear increase in the number of tests per revision

=> quadratic increase in test execution time75+ million tests run per day20+ revisions per minute

*

*http://google-engtools.blogspot.com/2011/06testing-at-speed-and-scale-of-google.html

Personal experience

13

Regression Test Selection (RTS)

• Speeds up regression testing– Without requiring more computers or energy

• Analyzes changes to a codebase

• Runs only tests whose behavior may be affected

all affected tests =>

safe test selection


changes

rts

Ava

ilab

le t

ests t1

t2

t3

tn

…

t1

t2

t3

tn

…

14

RTS – Example

changes to 𝐶2, 𝐶3

C1 C2 C3 f

t1

t2

t3

t4

C1 C2 C3 f

t1

t2

t3

t4

rts(original,modified)


15

t1() {C1 obj = new C1();assert(obj.m() == 1);

}

class C1 {int m() { return 1; }

}

Outline

• Theory: Regression test selection for distributed software history

• Technique: Safe and efficient regression test selection for object-oriented languages

• System: Ekstazi tool for Java

16

Distributed Software Histories

• Distributed version control systems (e.g., Git)

• Complex DAGs due to branches, merges, etc.

• ~35% of revisions are merges

Branch

Merge

17

Distributed Software History: Explained

• Commit– Extends graph

with a new edge

• Merge– Joins two or more revisions

• Revert– Undoes a prior commit

• Cherry pick– Applies a change from one branch to another

1 2

3 4

5 6

0

CD

D E

C

D

h

How to do regression test selection (RTS) for all commands in distributed software history?

18

RTS for Commit Command

• Based on test selection between two revisions

1 2

3 4

5 6

0

C D E

t1

t2

t3

t4

𝑆𝑐𝑜𝑚𝑚𝑖𝑡 ℎ = 𝑟𝑡𝑠(𝑝𝑟𝑒𝑑 ℎ , ℎ)

CD

D E

C

D

t1,t4 t2,t4

t2,t4 t3

t1,t4 t2,t4

t1,t2,t3,t4

19

Merge Command: Option S1 (1/3)

1 2

3 4

5 6

0

C D E

t1

t2

t3

t4

CD

D E

C

D

h

t1,t4 t2,t4

t2,t4 t3

t1,t4 t2,t4

t1,t2,t3,t4t1,t2,t3,t4

C,D,E

0

𝑆𝑚𝑒𝑟𝑔𝑒1 ℎ = 𝑟𝑡𝑠(𝑖𝑚𝑑 ℎ , ℎ)

Pro: Runs test selection only once (i.e., relatively fast)

Con: There may be many changes between imd(h) and h =>many tests selected to run (i.e., slow)

imd – immediate dominator

20

6

4

6

Merge Command: Option Sk (2/3)

1 2

3

5

0

C D E

t1

t2

t3

t4

CD

D E

C

D

h

t1,t4 t2,t4

t2,t4 t3

t1,t4 t2,t4

t1,t2,t4t1,t2,t3,t4

4

2

C,D,E => t1,t2,t3,t4

C,D => t1,t2,t4

C,D,E => t1,t2,t3,t4

If a test is not affected between a parent and merge revisions,take the result from the parent

Pro: Selects fewer tests than 𝑆1

Con: Runs test selection k times (i.e., for each parent)

𝑆𝑚𝑒𝑟𝑔𝑒𝑘 =

𝑛∈𝑝𝑟𝑒𝑑(ℎ)

𝑟𝑡𝑠(𝑛, ℎ)

pred – predecessor nodes

21

6655

4433

Merge Command: Option S0 (3/3)

1 2

0

C D E

t1

t2

t3

t4

CD

D E

C

D

h

t1,t4 t2,t4

t2,t4 t3

t1,t4 t2,t4

t1,t2,t4t1,t2,t3,t4

2

t2,t4

t1,t2,t4

t2,t41

Con: Selects more tests than Sk (e.g., new tests in one of the branches)

Pro: Does not run test selection, but uses history resultsIf a test is affected on multiple branches,changes from different branches together may lead to diff result

𝑆𝑚𝑒𝑟𝑔𝑒0 ℎ = 𝑆𝑎𝑓𝑓(ℎ) ∪ 𝐴(ℎ)\

𝑝∈𝑝𝑟𝑒𝑑(ℎ)

𝐴(𝑝)

𝑆𝑎𝑓𝑓 ℎ =

𝑝,𝑝′∈𝑝𝑟𝑒𝑑 ℎ ,𝑝≠𝑝′,𝑑=𝑑𝑜𝑚(𝑝,𝑝′)

𝑛∈𝑑≤∗𝑝\ 𝑑

𝑆𝑠𝑒𝑙 𝑛 ∩

𝑛∈𝑑≤∗𝑝′\ 𝑑

𝑆𝑠𝑒𝑙(𝑛)

Merge Command: Comparison

𝑆𝑚𝑒𝑟𝑔𝑒1

𝑆𝑚𝑒𝑟𝑔𝑒𝑘

𝑆𝑚𝑒𝑟𝑔𝑒0

Analysis time Number of selected tests

Medium

Slow

Fast

Large

Medium

Small

(Naïve)

• The following relations holds

• If no new tests and reverts, the following holds

• S0 applicable for automerge (90%) and requires results for all revisions

𝑆𝑚𝑒𝑟𝑔𝑒𝑘 ⊆ 𝑆𝑚𝑒𝑟𝑔𝑒

0

𝑆𝑚𝑒𝑟𝑔𝑒𝑘 = 𝑆𝑚𝑒𝑟𝑔𝑒

0

𝑆𝑚𝑒𝑟𝑔𝑒𝑘 , 𝑆𝑚𝑒𝑟𝑔𝑒

1 are incomparable

23

Safety

• All our test selection algorithms are safe

• Proof 𝑆𝑚𝑒𝑟𝑔𝑒𝑘 ℎ ⊆ 𝑆𝑚𝑒𝑟𝑔𝑒

0 ℎ follows from– rts distributes over changes

– rts is monotonic with respect to the set of changes

– properties of automerge

Theorem 1: 𝑆𝑚𝑒𝑟𝑔𝑒𝑘 (ℎ) and 𝑆𝑚𝑒𝑟𝑔𝑒

1 (ℎ) are safe for every merge revision h

Theorem 2: 𝑆𝑚𝑒𝑟𝑔𝑒0 (ℎ) is safe for every automerge revision h

24

Revert Command

• Undoes a prior commit

𝑆𝑟𝑒𝑣𝑒𝑟𝑡𝑎𝑓𝑓(ℎ) = 𝑆𝑠𝑒𝑙(𝑝′, 𝑛𝑟𝑒) ∩


𝑆𝑠𝑒𝑙 𝑛 ∪


𝑆𝑠𝑒𝑙 𝑛

𝑆𝑟𝑒𝑣𝑒𝑟𝑡0 ℎ = 𝑆𝑟𝑒𝑣𝑒𝑟𝑡

𝑎𝑓𝑓ℎ ∪ 𝐴 𝑝′ \A 𝑛𝑟𝑒 ∪ 𝐴 𝑝 \A 𝑑

1 2

4

5 6

0 hC -C

25

Cherry-pick Command

• Applies a change from one branch to another

𝑆𝑐ℎ𝑒𝑟𝑟𝑦𝑎𝑓𝑓(ℎ) = 𝑆𝑠𝑒𝑙(𝑛

′𝑐𝑝, 𝑛𝑐𝑝) ∩



𝑛∈𝑑≤∗𝑛′𝑐𝑝\ 𝑑


𝑆𝑐ℎ𝑒𝑟𝑟𝑦0 ℎ = 𝑆𝑐ℎ𝑒𝑟𝑟𝑦

𝑎𝑓𝑓ℎ ∪ 𝐴 𝑛𝑐𝑝 \A 𝑛

′𝑐𝑝 ∪ 𝐴 𝑝 \A 𝑑

1 2

3 4

5 6

0 h

C

C

26

From Theoretical to Applied RTS








𝑎𝑓𝑓(ℎ) ∪ 𝐴(𝑛𝑐𝑝)\A(𝑛

′𝑐𝑝)






SAFE + EFFICIENT

?

𝑟𝑡𝑠

27

Outline




28

My RTS Technique

• Insight: test -> dynamically used files

• Safe by design for any code change

• Efficient due to these properties

– Small number of files are modified at each revision

– Small number of tests depends on each file

– Changes are localized

29

Fine-grained (Method) Dependencies

changes to 𝑝, 𝑟

C D E

m p q r

t1

t2

t3

t4

rts(original,modified)


C D E

m p q r

t1

t2

t3

t4

NOT SAFE30

Safety Example (1)

class A {A() {}int m() { return 1; }

}class B extends A {B() {}@Overrideint m() { return 2; }

}

class A {A() {}int m() { return 1; }

}class B extends A {B() {} // calls A()

}

test() {B b = new B();assert(b.m() == 1);

}

revision 0 revision 1

A() B() A.m()

test

A B

test 31

Safety Example (2)

test() {Method[] methods = A.class.getDeclaredMethods();assert(methods.length == 1);

}

class A {A() {}public void m() { … }

}

class A {A() {}public void m() { … }public void n() { … }

}

revision 0 revision 1

A() A.m()

test

A

test 32

Outline




34

From Applied to Practical RTS

• Implemented for JVM languages

ekstazi.org

• Technical challenges

– Monitoring used classes

– Handling jar files

– Parallel execution

– No explicit comparison of two revisions

– Smart hashing

35

http://ekstazi.org/

Evaluation – Summary

• More than 30 projects

• 773,565 tests

• ~5M LOC

• >500 revisions

36

Evaluation – Apache CXF

Reduces number of tests: ~15xReduces test execution time: ~8xReduces build+test time: ~3x

My recent work on faster building [OOPSLA’14] 37

Ekstazi Users

Example from Apache Camel: commit ff94895cDate: Thu Nov 13 09:17:06 2014 -0600

Including Ekstazi (www.ekstazi.org) profile to optimize execution of the tests

Zed - actuator services platform

JBoss Fuse examplesJBoss Operations Network

Proprietary banking software

39


Test Generation


Regression Testing

[ICST09,ICSE10,ISSTA11,ECOOP13,OOPSLA14]

[CSTVA10,FASE11]

[ASE11,CAV14,ASE14,FSE14]

[ICST10,STVR13]

[FSE11,ISSTA13,ASE13,TOSEM15] [ICSE08,IWMSE10,Scala11,FSE11,ICST12,TACAS13,Onward!13]


Overview of My Research

43

• Goal: Automatically generate test inputs

– Data structures

– Compilers

– IDEs

– DOM parsers

• Challenges

– How to obtain large set of complex test inputs

– How to describe the set of test inputs

– How to efficiently generate test inputs from the description

Test Generation (1/2)

[ICST09,CSTVA10,ICSE10,ISSTA11,FASE11,ECOOP13]

class A {

int f;}

class B extends A {

void m() {

super.f = 0;}}

44

0

1

3

2

• Solution

– Java-based language with non-deterministic constructs

– Lightweight symbolic execution engine

• Results: short descriptions, detected many bugs in:

• Comparison with prior work: 50% shorter descriptions and an order of magnitude faster generation

Test Generation (2/2)

[ICST09,CSTVA10,ICSE10,ISSTA11,FASE11,ECOOP13]45

Model Checking Database Applications (1/2)

• Goal: Detect concurrency bugs in database (DB) applications (e.g., web servers)

• Challenges

– Explore state space of DB applications

– Avoid state-space explosion

[ICSE08,ICST10,Mutation10,IWMSE10,Scala11,FSE11,ICST12,TACAS13,STVR13,Onward!13,ISSTA13]46

Model Checking Database Applications (2/2)

• Solution

– Software model checker for DB applications

– Partial-order reduction at various levels of granularity

• e.g., insert and insert with and without constraints

• Results: Scalable model checking, detected problems in large systems

[ICSE08,ICST10,Mutation10,IWMSE10,Scala11,FSE11,ICST12,TACAS13,STVR13,Onward!13,ISSTA13]47



Future Work (1)

Test input generation for evolving software

Incremental algorithms in DVCS

Cross-language regression testing

49

Future Work (2)

• Remain in software engineering and formal methods

• Testing and verification of emerging platforms

– Scalable model checking

– Performance and resilience testing

– Testing protocols and mocking

• Leverage cloud to speedup testing and verification

– Parallelizing analysis and execution phase

– Prediction models for regression runs

50

Gul Agha

Amin Alipour

Elton Alves

Andrea Arcuri

Sandro Badame

Farnaz Behrang

Marcelo d'Amorim

Lamyaa Eloussi

Gordon Fraser

Alex Groce

Tihomir Gvero

Alex Gyori

Munawar Hafiz

Daniel Jackson

Vilas Jagannath

Dongyun Jin

Ralph Johnson

Owolabi Legunsen

Sam Kamin

Sarfraz Khurshid

Viktor Kuncak

Steven Lauterburg

Yilong Li

Benjamin Livshits

Qingzhou Luo

Rupak Majumdar

Darko Marinov

Aleksandar Milicevic

Peter C. Mehlitz

Iman Narasamdya

Stas Negara

Jeffrey Overbey

Cristiano Pereira

Gilles Pokam

Chandra Prasad

Grigore Rosu

Wolfram Schulte

Rohan Sharma

Samira Tasharofi

Danny van Velzen

Andrey Zaytsev

Chaoqiang Zhang

51

Conclusions

• Improving software quality– Designed scalable algorithms and techniques with

theoretical foundation• Improved efficiency for regression testing, test generation,

concurrent code analysis

– Developed practical tools for the proposed techniques• Discovered many previously unknown (concurrency) bugs• Adopted outside of academia: Apache, Google, Microsoft

• Today’s talk: regression testing

52








𝑎𝑓𝑓(ℎ) ∪ 𝐴(𝑛𝑐𝑝)\A(𝑛

′𝑐𝑝)






regression testing: theory and practice · 2015-06-16 · example from apache camel: commit...

Documents