testing survey by_directions

Questions Unsolved

1. 现在的安排是怎样的？a) 对笔记本上的 idea汇总和总结。b) Xinming Wang对 code omission的分类！！！

（这里与周老师的小程序目的一样。Wang的方法不会对所有子分类都好）c) 阅读可能会引用到的文章。d) 他们的 Implementation 的 framework是怎样的？e) 理清 design，列出所有有可能的实验方法f) 逐个完成实验，提供论据。

2. 目标是什么呢？a) 开题：Our Approach

1. Evaluation的方法和数据集a) Siemens

b) Unix

c) 还有其他的数据集么？d) Java的程序集通常是哪些？

i. NanoXML

2. 熟悉 fault，对 fault进行分类a) Coincidental correctness

b) Code omission

c) Multi-Fault

d)

3. 熟悉用例的 run

a) run信息分类i. sum{covered statement}

ii. 覆盖iii. 次数iv. Trace

v. 语意vi. Slice

vii. State

viii. Predicates

ix. Symbolic execution

x. PDG

xi. AST

xii. CFG

b) 做统计图：分布，方差，均值。4. 熟悉 test case

a) 如何判断测试用例的距离，相似性？i. 输入

ii. 覆盖iii. 覆盖句子个数iv. 覆盖次数的相似度

b) 如何判断一个测试用例更容易找出错误？i. 测试用例的评价。

5. 学习使用工具a) gcov

b) weka

c) eclipse plugins

6. 努力的方向a) Socket通信的 Fault Localization

b) 循环中的 Fault Localization(例如<=3写成<3)

c) 递归中的 Fault Localization

d) 测试用例充分与否（正确测试用例只需 20个左右的那篇文章）e) 整体上提升，提出一个新公式f) 标记出不同种类的错误g) 针对特殊的错误提出一种公式h) 删除部分相似的测试用例i) 聚类后建立一个逻辑模型（使用 run.covered_statements.length来聚类）j) 谓语子句的逻辑组合覆盖？程序覆盖信息的本质在于条件判断语句的分支覆盖。k) 错误在疑似谓语子句的向上切片 slice中l) 一个 failed的 run的没有执行的语句都是正确语句

7. 总结目前的方向a) Run

i. 赋予权重给每个 run

ii. 对 run进行聚类iii. 删除部分极其相似的 run

iv. 删除部分极有可能是 coincidental correctness的 run。(删除与 failed run最接近的 passed run)

v. 对 run覆盖的语句做集合运算：交（大权），并（小权），补（负权）。Passed-Covered

Passed-Uncovered

Failed-Covered

Failed-Uncovered

b) 谓语的逻辑组合覆盖，再用 slicing的方法（这算是 CBFL与 slice的结合）这是因为覆盖的本质在于，条件判断语句（不过条件的结果受之前的赋值语句影响）

Questions Solved

1.

Questions

1. What is the background?

2. What assumptions do these approaches based?

3. Can you tell me what the best approach is in this area? Who proposed it?

4. Can you list some motivation examples for the approach?

5. The ideas are trivial. What is the biggest challenge in these approaches?

6. What is the approach’s IPO (Input-Process-Output)? Can you give me an example?

7. What are the paper’s contributions?

8. The result is better. Can you explain it? What is different from related work?

9. How to evaluate in this area, including methods, benchmarks and convincing reasons?

10. Can you find the design space of this area?

11. What can we learn from the author’s survey?

12. Can we make some breakthroughs? What’s our future work?

Test

[]

Annotation

Keyword

Abstract

Background

Motivation

Solution

Contribution

Evaluation

Total Pages Value Understanding Last Read

Question Result Validation

Method/Means | Evaluation |

Characterization

Technique | Analytic Model Analysis | Persuasion |

Experience

Test Case Generation

[McM04] Search-based software test data generation: a

survey

McMinn, P. (2004), Search-based software test data generation: a survey. Software Testing,

Verification and Reliability, 14: 105–156.

Annotation

The paper gives us a fairly comprehensive overview of search-based test generation. The author

firstly introduces the motivation of automated test, as well as the problems researchers have to

face. In the second chapter of the paper, several general search techniques is proposed .Through

the next 4 chapters, the author classifies the different types of search-based test generation. The

classification is based on the different type of testing, which is known as structural testing,

functional testing, grey-box testing and non-functional testing. The author also classifies more

details for each testing type, with a number of comprehensive examples. The classification is

impressive and helps a lot to understand each research’s position.

Keyword

search-based software engineering; automated software test data generation;

Abstract

Background

The use of metaheuristic search techniques for the automatic generation of test data has been a

burgeoning interest for many researchers in recent years.

Motivation

Previous attempts to automate the test generation process have been limited, having been

constrained by the size and complexity of software, and the basic fact that, in general, test data

generation is an undecidable problem.

Solution

Metaheuristic search techniques offer much promise in regard to these problems. Metaheuristic

search techniques are high-level frameworks, which utilize heuristics to seek solutions for

combinatorial problems at a reasonable computational cost. To date, metaheuristic search

techniques have been applied to automate test data generation for structural and functional testing;

the testing of grey-box properties, for example safety constraints; and also non-functional

properties, such as worst-case execution time.

Contribution

This paper surveys some of the work undertaken in this field, discussing possible new future

directions of research for each of its different individual areas.


52 High Normal 2010.09.24


Characterization Analytic Model Persuasion

[Edv99] A survey on automatic test data generation

Jon Edvardsson. A survey on automatic test data generation. In Proceedings of the Second

Conference on Computer Science and Engineering in Linköping (October 1999), pp. 21-28.

Annotation

A program-based test data generator is one component to automate software testing. The paper

begins by showing the architecture of a typical test data generator system and some basic

concepts, such as control flow graph, basic block, and branch predicate. In the next chapter, the

author classifies the Test Data Generators into four kinds: Static and Dynamic Test Data

Generation, Random Test Data Generation, Goal-Oriented Test Data Generation, and Path-

Oriented Test Data Generation. The author also discusses some problems of test data generation,

which involve Arrays and Pointers, Objects, Loops, Modules, Infeasible Paths, Constraint

Satisfaction, Oracle.

Keyword

Program-based Test Generation

Abstract

Outline

1. Introduction

2. Basic Concepts

3. An Automatic Test Data Generator System

a) The Test Data Generator

i. Static and Dynamic Test Data Generation

ii. Random Test Data Generation

iii. Goal-Oriented Test Data Generation

iv. Path-Oriented Test Data Generation

b) The Path Selector’s path criteria

i. Statement coverage

ii. Branch coverage

iii. Condition coverage

iv. Multiple-condition coverage

v. Path coverage

4. Problems of Test Data Generation

a) Arrays and Pointers

b) Objects

c) Loops

d) Modules

e) Infeasible Paths

f) Constraint Satisfaction

g) Oracle

Background

In order to reduce the high cost of manual software testing and at the same time to increase the

reliability of the testing processes researchers and practitioners have tried to automate it. One of

the most important components in a testing environment is an automatic test data generator - a

system that automatically generates test data for a given program.

Motivation

The focus of this article is program-based generation, where the generation starts from the actual

programs.

Solution

In this article I present a survey on automatic test data generation techniques that can be found in

current literature.

Contribution

Basic concepts and notions of test data generation as well as how a test data generator system

works are described. Problems of automatic generation are identified and explained. Finally

important and challenging future research topics are presented.


8 Normal Normal 2010.09.24



[GGJ+10] Test generation through programming in UDITA

Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak, Darko

Marinov. Test generation through programming in UDITA. Proceedings of the 32nd ACM/IEEE

International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South

Africa, 1-8 May 2010.

Annotation

Generating test input on complex data structures is time-consuming and results in test suites that

have poor quality and difficult to reuse. The author present a new language for describing tests,

UDITA, a Java-based language with non-deterministic choice operators and an interface for

generating linked structures. We can learn these tradeoffs below in this area: how easy to write the

specification, how fast to generate tests (efficiency), how good the tests are (effectiveness) and

how complex the tests are.

Keyword

test input generation; specification-based;

Abstract

Background

The consequences of software bugs become more severe, while widely adopted testing tools offer

little support for test generation.

Motivation

Practical application of these techniques were largely limited to testing units of code much smaller

than hundred thousand lines, or generating input values much simpler than representations of Java

programs. It means these techniques can not generate inputs with complex data structures. The

experiments show that test generation using UDITA is faster and leads to test descriptions that are

easier to write than in previous frameworks.

Solution

The author presents an approach for describing tests using nondeterministic test generation

programs. The author introduces UDITA, a Java-based language with non-deterministic choice

operators and an interface for generating linked structures. Further more, the author describe new

algorithms to generate tests and implemented their approach based on Java PathFinder (JPF).

Contribution

1. New language for describing tests

2. New test generation algorithms

3. Implementation

4. Evaluation

Evaluation

The author evaluated UDITA with four sets of experiments, three for black-box testing and one for

white-box. The first set of experiments, on six data structures, which are DAG, HeapArray,

NQueens, RBTree, SearchTree and SortedList, compares base JPF test generation. The second set

of experiments, on testing refactoring engines, compares UDITA with ASTGen. The third set of

experiments uses UDITA to test parts of the UDITA implementation itself. For white-box testing,

the forth set of experiments compares UDITA with symbolic execution in Pex. The experiments

show that test generation using UDITA is faster and leads to test descriptions that are easier to

write than in previous frameworks.


10 Normal Normal 2010-10-06


Method/Means Technique Analysis, Experience

[GGJ+09] On test generation through programming in UDITA

M. Gligoric, T. Gvero, V. Jagannath, S. Khurshid, V. Kuncak, and D. Marinov. On test generation

through programming in UDITA. Technical Report LARA-REPORT-2009-05, EPFL, Sep. 2009.


14 Normal Normal 2010-10-06


Method/Means Technique Analysis, Experience

Annotation

This is the Technical Report version of [GGJ+10], which offers more references, links and graphs

without the page limit.

[BKM02] Korat: Automated testing based on Java predicates

Boyapati, C., Khurshid, S., and Marinov, D. 2002. Korat: automated testing based on Java

predicates. In Proceedings of the 2002 ACM SIGSOFT international Symposium on Software

Testing and Analysis (Roma, Italy, July 22 - 24, 2002). ISSTA '02. ACM, New York, NY, 123-133.

Annotation

A novel framework for test generation is proposed in this paper. Korat uses the method

precondition writing in JML to automatically generate nonisomorphic test cases. Key techniques

in Korat are monitoring the predicate’s executions, pruning portions with structural invariants and

generating only nonisomorphic inputs. The evaluation in this area usually involves the time of

generation, the correctness and effectiveness of generated tests.

Keyword

specification-based testing

Abstract

Background

Manual software testing and test data generation are labor-intensive processes. Korat uses

Specification-based testing.

Motivation

Can we use precondition to generate test cases and postcondition to check the correctness of

output?

Solution

Korat exhaustively explores the bounded input space of the predicate. However, Korat also

monitor the predicate’s executions and pruning portions of the search space. Korat uses the Java

Modeling Language (JML) for specifications.

Contribution

1. A technique for automatic test case generation: given a predicate, and a bound on the size of

its inputs, Korat generates all nonisomorphic inputs for which the predicate returns true.

2. Korat uses backtracking to systematically explore the bounded input space of the predicate.

3. Korat monitors accesses that the predicate makes to all the fields of the candidate input to

prune large portions of the search space.

Evaluation

This paper present Korat’s performance, then compare Korat with Alloy Analyzer for test case

generation. The benchmarks are BinaryTree, HeapArray, LinkedList, TreeMap, HashSet and

AVTree. Some of them come from standard Java libraries. The comparison with Alloy Analyzer

includes the number of structures and the time to generate them.


11 High Well 2010.10.06


Method/Means Technique Analysis

[KM04] TestEra: Specification-Based Testing of Java

Programs Using SAT

Sarfraz Khurshid, Darko Marinov. TestEra: Specification-Based Testing of Java Programs Using

SAT. 403-434 2004 11 Autom. Softw. Eng. 4

Annotation

This paper proposed a framework for automated specification-based testing of Java programs.

Instead of JML [BKM02], the author took Alloy to express the specification of the pre- and post-

conditions of that method. Since the Alloy is a first-order declarative language, the author attempt

to use SAT solver to generate the test cases. The key idea behind TestEra is to automate testing of

Java programs, requiring only that the structural invariants of inputs and the correctness criteria

for the methods be formally specified.

Keyword

test generation

Abstract

Background

TestEra is a framework for automated specification-based testing of Java programs.

Motivation

The search space is huge and nonisomorphism is hard. In addition, enumeration of structurally

complex data is not efficient.

Solution

TestEra requires as input a Java method (in source code or byte code), a formal specification of the

pre- and post-conditions of that method, and a bound that limits the size of the test cases to be

generated, expressed in Alloy, a first-order declarative language based on sets and relations. Using

the method’s pre-condition, TestEra automatically generates all nonisomorphic test inputs up to

the given bound. It executes the method on each test input, and uses the method postcondition as

an oracle to check the correctness of each output. Due to the first-order specification, the author

uses SAT solvers to help solve the problem. The key idea behind TestEra is to automate testing of

Java programs, requiring only that the structural invariants of inputs and the correctness criteria

for the methods be formally specified. The framework is shown as below.

Evaluation

The author collected, for each case study, the method we test, a representative input size, and the

phase 1 (i.e., input generation) and phase 2 (i.e., correctness checking) statistics of TestEra’s

checking for that size. The case study include singly linked lists, red black trees, INS (Information

Network System) and Alloy-alpha Analyzer.




Method/Means Technique Experience

Symbolic execution

[Kin76] Symbolic execution and program testing

King, J. C. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (Jul. 1976), 385-

394.

Annotation

This paper is the Most Cited Paper in symbolic execution. The author attempts to introduce some

basic notions of this program analysis technique. The main difficulty in symbolic execution is

conditional branch type statements. The paper takes a simple programming language (PL/I) to

analyze the difficulty in details. By using two typical examples, the author introduces the symbolic

execution system based on symbolic execution tree and the strategy to solve conditional branch

problem. Furthermore, this paper discusses the program proving based on symbolic execution.

The symbolic execution accepts symbolic inputs and produce symbolic formulas as output. The

execution semantics is changed for symbolic execution. But neither the language syntax nor the

individual programs written in the language are changed.

Keyword

symbolic execution; program testing

Abstract

Background

Instead of supplying the normal inputs to a program (e.g. numbers) symbolic execution supplies

symbols representing arbitrary values. The execution proceeds as in a normal execution except

that values may be symbolic formulas over the input symbols.

Motivation

The difficult, interesting issues arise during the symbolic execution of conditional branch type

statements.

Solution

A particular system called EFFIGY which provides symbolic execution for program testing and

debugging is also described, it interpretively executes programs written in a simple PL/I style

programming language. It includes many standard debugging features, the ability to manage and

to prove things about symbolic expressions, a simple program testing manager, and a program

verifier.

Evaluation

A brief discussion of the relationship between symbolic execution and program proving is also

included.




Method/Means Technique Persuasion

[DJDM09] ReAssert: Suggesting Repairs for Broken Unit Tests

Brett Daniel, Vilas Jagannath, Danny Dig, Darko Marinov, "ReAssert: Suggesting Repairs for

Broken Unit Tests," ase, pp.433-444, 2009 IEEE/ACM International Conference on Automated

Software Engineering, 2009

Annotation

Software’s changes cause tests to fail. This paper is first published paper to suggest repairs to

failing tests’ code. The key challenge in repairing tests is to retain as much of the original test

logic as possible. The author also proposed several repair strategies: Replace Assertion Method,

Invert Relational Operator, Replace Literal in Assertion, Replace with Related Method, Trace

Declaration-Use Path, Accessor Expansion, Surround with Try-Catch and Custom Repair

Strategies. Notice that the repair only changes the test code (e.g. the code based on JUnit), instead

of the code to be tested.

Keyword

Software testing; Software maintenance

Abstract

Background

Developers often change software in ways that cause tests to fail. When this occurs, developers

must determine whether failures are caused by errors in the code under test or in the test code

itself. In the latter case, developers must repair failing tests or remove them from the test suite.

Motivation

Repairing tests is time consuming but beneficial, since removing tests reduces a test suite's ability

to detect regressions. Fortunately, simple program transformations can repair many failing tests

automatically.

Solution

We present ReAssert, a novel technique and tool that suggests repairs to failing tests' code which

cause the tests to pass. Examples include replacing literal values in tests, changing assertion

methods, or replacing one assertion with several. If the developer chooses to apply the repairs,

ReAssert modifies the code automatically.

Contribution

This paper makes contributions in Idea, Technique, Tool and Evaluation.

Evaluation

First, we describe two case studies in which researchers used ReAssert to repair failures in their

evolving software.

Second, we perform a controlled user study to evaluate whether ReAssert’s suggested repairs

match developers’ expectations.

Third, we assess ReAssert’s ability to suggest repairs for failures in open-source projects,

considering both manually written and automatically generated test suites.





[PV09] A survey of new trends in symbolic execution for

software testing and analysis

Corina S. Păsăreanu, Willem Visser. A survey of new trends in symbolic execution for software

testing and analysis. 339-353 2009 11 STTT 4

Annotation

Symbolic execution is an analysis technique which takes program as input, and output the

symbolic execution tree. A comprehensive overview of symbolic execution is given. By giving

some simple and classical examples, the author first introduces the basic notions and challenges of

symbolic execution. Secondly, the trend to combine concrete and symbolic execution is discussed.

Thirdly, the author introduces how researchers tried to solve scalability issues when facing large

programs, which is still the main obstacle against widespread application of symbolic execution.

Furthermore, the author gives an overview of the application of symbolic execution techniques,

such as test case generation, proving program properties and static detection of runtime error. In

the “future directions” part, the author discusses the main obstacle and possible solutions in this

area, e.g. new heuristic searches, extending the abstraction of programs and powerful decision

procedures for combinations of theories.

Keyword

symbolic execution; survey

Abstract

Background

Symbolic execution is a well-known program analysis technique which represents program inputs

with symbolic values instead of concrete, initialized, data and executes the program by

manipulating program expressions involving the symbolic values.

Motivation

Symbolic execution has been proposed over three decades ago but recently it has found renewed

interest in the research community, due in part to the progress in decision procedures, availability

of powerful computers and new algorithmic developments.

Solution

We provide here a survey of some of the new research trends in symbolic execution, with

particular emphasis on applications to test generation and program analysis.

Contribution

We first describe an approach that handles complex programming constructs such as input

recursive data structures, arrays, as well as multithreading. Furthermore, we describe recent hybrid

techniques that combine concrete and symbolic execution to overcome some of the inherent

limitations of symbolic execution, such as handling native code or availability of decision

procedures for the application domain.

We follow with a discussion of techniques that can be used to limit the (possibly infinite) number

of symbolic configurations that need to be analyzed for the symbolic execution of looping

programs. Finally, we give a short survey of interesting new applications, such as predictive

testing, invariant inference, program repair, analysis of parallel numerical programs and

differential symbolic execution.

Evaluation





[KPV03] Generalized symbolic execution for model checking

and testing

Khurshid, S., Păsăreanu, C. S., and Visser, W. 2003. Generalized symbolic execution for model

checking and testing. In Proceedings of the 9th international Conference on Tools and Algorithms

For the Construction and Analysis of Systems (Warsaw, Poland, April 07 - 11, 2003). H. Garavel

and J. Hatcliff, Eds. Lecture Notes In Computer Science. Springer-Verlag, Berlin, Heidelberg,

553-568.

Annotation

This paper proposes one of the early approaches focusing on Symbolic Execution on concurrent

programs and complex data structures. This paper presents a novel framework based on two-fold

symbolic execution. First, the paper defines a source translation instrument, which enables

standard model checkers to perform symbolic execution. For the purpose of handling dynamically

allocated structures, method preconditions, data and concurrency, this paper give a novel symbolic

execution algorithm.

Keyword

symbolic execution

Abstract

Background

Modern software systems, which often are concurrent and manipulate complex data structures,

must be extremely reliable.

Motivation

We need to automate checking of such systems, which are concurrent and manipulate complex

data structures.

Solution

We provide a two-fold generalization of traditional symbolic execution based approaches. First,

we define a source to source translation to instrument a program, which enables standard model

checkers to perform symbolic execution of the program. Second, we give a novel symbolic

execution algorithm that handles dynamically allocated structures (e.g., lists and trees), method

preconditions (e.g., acyclicity), data (e.g., integers and strings) and concurrency.

Contribution

1. To address the state space explosion problem.

2. To achieve modularity.

3. To check strong correctness properties of concurrent programs.

4. To exploit the model checker’s built-in capabilities

Evaluation

By introducing the implementation and illustrating two applications of the framework, the author

persuades the availability of this approach.


16 High Well 2010.10.09



[PV04] Verification of Java programs using symbolic

execution and invariant generation

C. S. Păsăreanu, W. Visser. Verification of Java Programs Using Symbolic Execution and Invariant

Generation. Lecture Notes in Computer Science, Vol. 2989, pp. 164-181, 2004.

Annotation

Software verification is recognized as an important and difficult problem. However, it suffers from

the state-explosion problem and can only deal with closed systems. This paper proposes a

framework uses method specifications and loop invariants to solve the problem. This paper also

illustrates some non-trivial examples, in which case they can benefit from the more powerful

approximation techniques.

Keyword

symbolic execution; method specifications; loop invariants

Abstract

Background

Software verification is recognized as an important and difficult problem.

Motivation

Model checking typically can only deal with closed systems and it suffers from the state-explosion

problem.

Solution

In order to solve the state-explosion problem, we present a novel framework, based on symbolic

execution, for the automated verification of software. The framework uses annotations in the form

of method specifications and loop invariants. We present a novel iterative technique that uses

invariant strengthening and approximation for discovering these loop invariants automatically.

Contribution

1. A novel verification framework that combines symbolic execution and model checking.

2. A new method for iterative invariant generation.

3. A series of (small) non-trivial Java examples showing the merits of our method.

Evaluation

By showing some non-trivial Java examples, we compare our work with the invariant generation

method presented in another paper [C. Flanagan and S. Qadeer. Predicate abstraction for software

verification. In Proc. POPL, 2002.].





Fault Localization

[WD10] Software Fault Localization

W. Eric Wong, Vidroha Debroy. "Software Fault Localization," IEEE Reliability Society 2009

Annual Technology Report, January 2010

Annotation

This article gives a fairly comprehensive overview of Software Fault Localization. After

introducing basic notions and classical ways of fault localization, this article classifies the

advanced fault localization techniques as follows: Static, Dynamic, and Execution Slice-Based

Techniques, Program Spectrum-based Techniques, Statistics-based Techniques, Program State-

based Techniques, Machine Learning-based Techniques, etc. Furthermore, important aspects of

fault localization are given, namely, Effectiveness, efficiency, and robustness; Impact of Test

Cases; Faults introduced by missing code; lastly, Programs with multiple bugs; which could be

regarded as the design space for future work.

Keyword

Fault Localization

Abstract

Background

Regardless of the effort spent on developing a computer program, it may still contain bugs. In fact,

the larger, more complex a program, the higher the likelihood of it containing bugs.

Motivation

It is always challenging for programmers to effectively and efficiently remove bugs, while not

inadvertently introducing new ones at the same time.

Solution

Automatic fault localization techniques can guide programmers to the locations of faults with

minimal human intervention.


6 High Well 2010.10.10


Characterization Analytic Model Experience

Web

[ADT+10] Practical fault localization for dynamic web

applications

Artzi, S., Dolby, J., Tip, F., and Pistoia, M. 2010. Practical fault localization for dynamic web

applications. In Proceedings of the 32nd ACM/IEEE international Conference on Software

Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010). ICSE '10. ACM, New

York, NY, 265-274.

Annotation

In this Paper, an automatic fault localization technique is proposed, which first fully finds and

localizes malformed HTML errors in Web applications that execute PHP code on the server side.

This technique is based on the previous work [3, 4] of combined concrete and symbolic execution

to Web applications written in PHP. This technique needn’t an upfront test suite. Furthermore, this

paper defines the statement’s suspiciousness rating in web applications with the use of an output

mapping from statements.

However, the suspiciousness definition as followes is a bit magical, where the suspiciousness

rating and the Tarantula suspiciousness rating are 1.1 and 0.5.

Keyword

Fault Localization

Abstract

Background

Web applications are typically written in a combination of several programming languages. As

with any program, programmers make mistakes and introduce faults, resulting in Web-application

crashes and malformed dynamically generated HTML pages. While malformed HTML errors may

seem trivial, and indeed many of them are at worst minor annoyances.

Motivation

Previous fault-localization techniques need an upfront test suite. And there is no fully automatic

tool that finds and localizes malformed HTML errors in Web applications that execute PHP code

on the server side.

Solution

We leverage combined concrete and symbolic execution and several fault-localization techniques

to create a uniquely powerful tool for localizing faults in PHP applications. The tool automatically

generates tests that expose failures, and then automatically localizes the faults responsible for

those failures.

Contribution

1. We present an approach for fault localization that uses combined concrete and symbolic

execution to generate a suite of passing and failing tests.

2. We demonstrate that automated techniques for fault localization are effective at localizing

real faults in open-source PHP applications.

3. We present 6 fault localization techniques that combine variations on the Tarantula algorithm.

4. We implemented these 6 techniques in Apollo.

Evaluation

This evaluation aims to answer two questions:

1. How effective is the Tarantula fault localization technique in the domain of PHP web

applications?

2. How effective is Tarantula, when combined with the use of an output mapping and/or when

modeling the outcome of conditional expressions in Section 4.

The benchmarks are faqforge, webchess, schoolmate and timeclock. And 6 techniques that

combine variations are used in the experiment.

Note: The author doesn’t know the location of faults, needed to localize them manually. Manually

localizing and fixing faults is a very time-consuming task, so they limited themselves to 20 faults

in each of the subject programs.


10 High Well 2010.10.09



[AKD+10] Finding Bugs in Web Applications Using Dynamic

Test Generation and Explicit-State Model Checking

Shay Artzi, Adam Kieżun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Michael

D. Ernst, "Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State

Model Checking," IEEE Transactions on Software Engineering, vol. 99, no. RapidPosts, pp. 474-

494, , 2010

Annotation

This paper enhances the tools and methods in the authors’ previous work [AKD+08]. By

implementing a form of explicit-state software model checking, this paper try to handle user input

options that are created dynamically by a web application, which includes keeping track of

parameters that are transferred from one script to the next.

Keyword

test generation; symbolic execution; explicit-state model checking

Abstract

Background

Web script crashes and malformed dynamically-generated web pages are common errors, and they

seriously impact the usability of web applications.

Motivation

Current tools for web-page validation cannot handle the dynamically generated pages that are

ubiquitous on today’s Internet.

In the previous work, we did not yet supply a solution for handling user input options that are

created dynamically by a web application, which includes keeping track of parameters that are

transferred from one script to the next—either by persisting them in the environment, or by

sending them as part of the call.

Solution

We present a dynamic test generation technique for the domain of dynamic web applications. The

technique utilizes both combined concrete and symbolic execution and explicit-state model

checking. The technique generates tests automatically, runs the tests capturing logical constraints

on inputs, and minimizes the conditions on the inputs to failing tests, so that the resulting bug

reports are small and useful.

Contribution

1. The technique utilizes both combined concrete and symbolic execution and explicit-state

model checking.

2. We adapt the established technique of dynamic test generation, based on combined concrete

and symbolic execution.

3. We created a tool, Apollo.

4. We evaluated our tool by applying it to 6 real web applications.

5. We present a detailed classification of the faults found by Apollo

Evaluation

The Evaluation Methods are almost the same with the previous work [AKD+08].





[AKD+08] Finding bugs in dynamic web applications

Artzi, S., Kiezun, A., Dolby, J., Tip, F., Dig, D., Paradkar, A., and Ernst, M. D. 2008. Finding bugs

in dynamic web applications. In Proceedings of the 2008 international Symposium on Software

Testing and Analysis (Seattle, WA, USA, July 20 - 24, 2008). ISSTA '08. ACM, New York, NY,

261-272.

Annotation

A framework of test generation for web application is proposed in this paper. The technique is

based on combined concrete and symbolic execution. The authors also present the failure

detection algorithm and the path constraint minimization algorithm.

Keyword

symbolic execution; dynamic analysis; test generation

Abstract

Background

Web script crashes and malformed dynamically-generated Web pages are common errors, and they

seriously impact usability of Web applications.

Motivation

Current tools for Web-page validation cannot handle the dynamically-generated pages that are

ubiquitous on today’s Internet.

Solution

In this work, we apply a dynamic test generation technique, based on combined concrete and

symbolic execution, to the domain of dynamic Web applications. The technique generates tests

automatically, uses the tests to detect failures, and minimizes the conditions on the inputs exposing

each failure, so that the resulting bug reports are small and useful in finding and fixing the

underlying faults. Our tool Apollo implements the technique for PHP. Apollo generates test inputs

for the Web application, monitors the application for crashes, and validates that the output

conforms to the HTML specification.

Contribution

1. We adapt the established technique of dynamic test generation, based on combined concrete

and symbolic execution, to the domain of Web applications.

2. We created a tool, Apollo.

3. We evaluated our tool by applying it to real Web applications and comparing the results with

random testing.

Evaluation

The author designed the experiments to answer the following research questions:

1. How many faults can Apollo find, and of what varieties?

2. How effective is the fault localization technique of Apollo compared to alternative

approaches such as randomized testing, in terms of the number and severity of discovered

faults and the line coverage achieved?

3. How effective is our minimization in reducing the size of inputs parameter constraints and

failure-inducing inputs?

For the evaluation, the author selected the following four open-source PHP programs: faqforge,

webchess, schoolmate, phpsysinfo.





Test Execution

Test Optimization

[DGM10] On test repair using symbolic execution

Daniel, B., Gvero, T., and Marinov, D. 2010. On test repair using symbolic execution. In

Proceedings of the 19th international Symposium on Software Testing and Analysis (Trento, Italy,

July 12 - 16, 2010). ISSTA '10. ACM, New York, NY, 207-218.

Annotation

When the program is changed, the test code is out of data, which may cause regression tests failed.

The paper proposes a technique based on symbolic execution to repair the test. The author

analyzed the .NET code’s symbolic execution by using a tool named Pex. This paper is

[DJDM09]’s enhanced solutions. It fixed several failures that ReAssert could not repair or that it

could have repaired in a better way. The author describe modifications on expected values,

expected object comparison and conditional expected value as examples.

Keyword

test repair; symbolic execution

Abstract

Background

When developers change a program, regression tests can fail not only due to faults in the program

but also due to out of date test code that does not reflect the desired behavior of the program.

Motivation

Repairing tests manually is difficult and time consuming.

Solution

We recently developed ReAssert, a tool that can automatically repair broken unit tests, but only if

they lack complex control flow or operations on expected values.

Contribution

This paper introduces symbolic test repair, a technique based on symbolic execution, which can

overcome some of ReAssert’s limitations.

Evaluation

We reproduce experiments from earlier work and find that symbolic test repair improves upon

previously reported results both quantitatively and qualitatively. We also perform new experiments

which confirm the benefits of symbolic test repair and also show surprising similarities in test

failures for open-source Java and .NET programs. Our experiments use Pex, a powerful symbolic

execution engine for .NET, and we find that Pex provides over half of the repairs possible from the

theoretically ideal symbolic test repair.

Q1: How many failures can be repaired by replacing literals in test code? That is, if we had an

ideal way to discover literals, how many broken tests could we repair?

Q2: How do literal replacement and ReAssert compare? How would an ideal literal replacement

strategy affect ReAssert’s ability to repair broken tests?

Q3: How well can existing symbolic execution discover appropriate literals? Can symbolic

execution produce literals that would cause a test to pass?

Java: Checkstyle, JDepend, JFreeChart, Lucene, PMD, XStream

.NET: AdblockIE, CSHgCmd, Fudge-CSharp, GCalExchangeSync, Json.NET, MarkdownSharp,

NerdDinner, NGChart, NHaml, ProjectPilot and SharpMap.





[HO09] MINTS: A general framework and tool for supporting

test-suite minimization

Hwa-You Hsu; Orso, A.; , "MINTS: A general framework and tool for supporting test-suite

minimization," Software Engineering, 2009. ICSE 2009. IEEE 31st International Conference on ,

vol., no., pp.419-429, 16-24 May 2009

Annotation

This is first published paper, which attempted to handle multi-criteria test-suite minimization

problems. This approach models multi-criteria minimization as binary ILP problems and then

leverages ILP solvers to compute an optimal solution to such problems.

Note that the difference and relation between minimization criteria and minimization policies.

Keyword

test-suite minimization

Abstract

Background

Test-suite minimization techniques aim to eliminate redundant test cases from a test-suite based on

some criteria, such as coverage or fault-detection capability.

Motivation

Most existing test-suite minimization techniques have two main limitations: they perform

minimization based on a single criterion and produce suboptimal solutions.

Solution

In this paper, we propose a test-suite minimization framework that overcomes these limitations by

allowing testers to (1) easily encode a wide spectrum of test-suite minimization problems, (2)

handle problems that involve any number of criteria, and (3) compute optimal solutions by

leveraging modern integer linear programming solvers.

Contribution

1. A general test-suite minimization framework that handles minimization problems involving

any number of criteria and can produce optimal solutions to such problems.

2. A prototype tool that implements the framework, can interface seamlessly with a number of

different ILP solvers, and is freely available.

3. An empirical study in which we evaluate the approach using a wide range of programs, test

cases, minimization problems, and solvers.

Evaluation

In the evaluation, the authors investigated the following research questions:

1. How often can MINTS find an optimal solution for a test-suite minimization problem in a

reasonable time?

2. How does the performance of MINTS compare with the performance of a heuristic

approach?

3. To what extent does the use of a specific solver affect the performance of the approach?

Note that the authors consider one absolute minimization criterion and three relative minimization

criteria. The authors also consider eight different minimization policies: seven weighted and one

prioritized.

The benchmark is the Siemens suite and three additional programs with real faults: flex,

LogicBlox, and Eclipse.





[WHLM95] Effect of test set minimization on fault detection

effectiveness

Wong, W. E., Horgan, J. R., London, S., and Mathur, A. P. 1995. Effect of test set minimization on

fault detection effectiveness. In Proceedings of the 17th international Conference on Software

Engineering (Seattle, Washington, United States, April 24 - 28, 1995). ICSE '95. ACM, New York,

NY, 41-50.

Annotation

Keyword

Abstract

Background

Size and code coverage are important attributes of a set of tests.

Motivation

A program P is executed on elements of the test set T. Can we observe the fault detecting

capability of T for P? Which T induces code coverage on P according to some coverage criterion?

Whether is it the size of T or the coverage of T on P which determines the fault detection

effectiveness of T for P?

While keeping coverage constant, what is the effect on fault detection of reducing the size of a test

set?

Solution

We report results from an empirical study using the block and all-uses criteria as the coverage

measures.

Contribution

Evaluation



Method/Means | Evaluation |

Characterization

Technique | Analytic Model Analysis | Persuasion |

Experience

Test Adequacy Criterion

Mutant Testing

[LJT+10] Is operator-based mutant selection superior to

random mutant selection?

Zhang, L., Hou, S., Hu, J., Xie, T., and Mei, H. 2010. Is operator-based mutant selection superior

to random mutant selection?. In Proceedings of the 32nd ACM/IEEE international Conference on

Software Engineering - Volume 1 (Cape Town, South Africa, May 01 - 08, 2010). ICSE '10. ACM,

New York, NY, 435-444.

Annotation

Mutant selection is used for reduce the expensiveness of compiling and executing too many

mutants. Many researches on mutant selection are operator- based. In this paper, the issue whether

operator- based mutant selection really superior than ones using random methods is addressed. By

empirical study on three operator- based mutant selection techniques (i.e., Offutt et al.’s 5

mutation operators [31], Barbosa et al.’s 10 mutation operators [4], and Siami Namin et al.’s 28

mutation operators [37]) and two random ones, the research indicates that operator-based mutant

selection is not superior.

Keyword

Abstract

Background

Due to the expensiveness of compiling and executing a large number of mutants, it is usually

necessary to select a subset of mutants to substitute the whole set of generated mutants in mutation

testing and analysis. Most existing research on mutant selection focused on operator-based mutant

selection, i.e., determining a set of sufficient mutation operators and selecting mutants generated

with only this set of mutation operators. Recently, researchers began to leverage statistical analysis

to determine sufficient mutation operators using execution information of mutants.

Motivation

However, whether mutants selected with these sophisticated techniques are superior to randomly

selected mutants remains an open question.

Solution

In this paper, we empirically investigate this open question by comparing three representative

operator-based mutant-selection techniques with two random techniques. Our empirical results

show that operator-based mutant selection is not superior to random mutant selection. These

results also indicate that random mutant selection can be a better choice and mutant selection on

the basis of individual mutants is worthy of further investigation.

Contribution

Our study empirically evaluates three recent operatorbased mutant-selection techniques (i.e.,

Offutt et al. [31], Barbosa et al. [4], and Siami Namin et al. [37]) against random mutant selection

for mutation testing.

Our study produces the first empirical results concerning stability of operator-based mutant

selection and random mutant selection for mutation testing.

Beside the random technique studied previously (referred to as the one-round random technique in

this paper), our study also investigates another random technique involving two steps to select

each mutant (referred to as the two-round random technique in this paper).

The subjects used in our study are larger than those used in previous studies of random mutant

selection. To the best of our knowledge, due to the extreme expensiveness of experimenting

mutant-selection techniques, the Siemens programs are by far the largest subjects2 used in studies

of mutant selection [37].

Evaluation




Characterization Analytic Model Analysis

[ST10] From behaviour preservation to behaviour

modification: constraint-based mutant generation

Annotation

This paper presents a mutant generation approach, which generate both syntactically semantically

correct mutants. The author builds this approach with several constraint-based methods. From

Accessibility Constraints, Introducing or Deleting Entities, Type Constraints, the author not only

generates mutants, but also rejects mutants. The author also applied this technique to several open

source programs, such as JUnit, JHotDraw, Draw2D, Jaxen and HTMLParser.

Keyword

Mutation Analysis

Abstract

Background

This paper is about mutation generation. The authors’ approach builds on their prior work on

constraint-based refactoring tools, and works by negating behaviour-preserving constraints.

Motivation

The efficacy of mutation analysis depends heavily on its capability to mutate programs in such a

way that they remain executable and exhibit deviating behaviour. Whereas the former requires

knowledge about the syntax and static semantics of the programming language, the latter requires

some least understanding of its dynamic semantics, i.e., how expressions are evaluated.

Solution

We present an approach that is knowledgeable enough to generate only mutants that are both

syntactically and semantically correct and likely exhibit non-equivalent behaviour.

Evaluation

As a proof of concept we present an enhanced implementation of the Access Modifier Change

operator for Java programs whose naive implementations create huge numbers of mutants that do

not compile or leave behaviour unaltered. While we cannot guarantee that our generated mutants

are non-equivalent, we can demonstrate a considerable reduction in the number of vain mutant

generations, leading to substantial temporal savings.




Method/Means Technique Analysis, Persuasion

[JH09] An analysis and survey of the development of mutation

testing

Yue Jia, Mark Harman (September 2009). "An Analysis and Survey of the Development of

Mutation Testing" (PDF). CREST Centre, King's College London, Technical Report TR-09-06.

Annotation

An overview of Mutation Testing is given in the paper. Not only the basic notions, the author also

introduces the history and the application of Mutation Testing at first. In the second part,

fundamental hypotheses, the process, the problems in theoretical research is discussed. In the third

part, Techniques in Mutation Testing is classified into two types, reduction of the generated

mutants (which combines do fewer and do faster) and reduction of the execution cost (which

corresponds to do faster). To detect if a program and one of its mutants programs are equivalent is

an known undecidable problem. The problem is discussed in Part 4. In the fifth part, the author

classified the applications of mutation testing into program mutation and specification mutation.

More detailed statistics is shown. In Part 6 and Part 7, empirical evaluation and Tools using

mutation testing is gathered and listed. In the last part, the author shows five important avenues for

research: a need for high quality higher order mutants, a need to reduce the equivalent mutant

problem, a preference for semantics over syntax, an interest in achieving a better balance between

cost and value and a pressing need to generate test cases to kill mutants.

Keyword

mutation testing

Abstract

Outline

1. Introduction

2. The theory of mutation testing

3. Cost reduction techniques

4. Equivalent mutant detective techniques

5. The application of mutation testing

6. Empirical evaluation

7. Tools support mutation testing

8. Future Trend

9. Conclusion

Background

Mutation Testing is a fault–based software testing technique that has been widely studied for over

three decades.

Motivation

The literature on Mutation Testing has contributed a set of approaches, tools, developments and

empirical results which have not been surveyed in detail until now.

Solution

This paper provides a comprehensive analysis and survey of Mutation Testing. The paper also

presents the results of several development trend analyses.

Evaluation

These analyses provide evidence that Mutation Testing techniques and tools are reaching a state of

maturity and applicability, while the topic of Mutation Testing itself is the subject of increasing

interest.


32 High Well 2010.09.27


Characterization Analytic Model Analysis, Experience

High-Dimensional Clustering

[HK99] Optimal Grid-Clustering: Towards Breaking the Curse

of Dimensionality in High-Dimensional Clustering

Alexander Hinneburg , Daniel A. Keim, Optimal Grid-Clustering: Towards Breaking the Curse of

Dimensionality in High-Dimensional Clustering, Proceedings of the 25th International Conference

on Very Large Data Bases, p.506-517, September 07-10, 1999

Annotation

Keyword

High-Dimensional Clustering

Abstract

Background

Many applications require the clustering of large amounts of high-dimensional data. In addition,

the high-dimensional data often contains a significant amount of noise which causes additional

effectiveness problems.

Motivation

The comparison reveals that condensation-based approaches (such as BIRCH or STING) are the

most promising candidates for achieving the necessary efficiency, but it also shows that basically

all condension-based approaches have severe weaknesses with respect to their effectiveness in

high-dimensional space.

Solution

To overcome these problems, we develop a new clustering technique called OptiGrid which is

based on constructing an optimal grid-partitioning of the data. The optimal grid-partitioning is

determined by calculating the best partitioning hyperplanes for each dimension (if such a

partitioning exists) using certain projections of the data.

Evaluation

We perform a series of experiments on a number of different data sets from CAD and molecular

biology. A comparison with one of the best known algorithms (BIRCH) shows the superiority of

our new approach.


12 High Bad 2010.10.23


Method/Means, Evaluation Technique Analysis

[KKZ09] Clustering high-dimensional data: A survey on

subspace clustering, pattern-based clustering, and

correlation clustering.

Kriegel, H., Kröger, P., and Zimek, A. 2009. Clustering high-dimensional data: A survey on

subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl.

Discov. Data 3, 1 (Mar. 2009), 1-58.

Annotation

Keyword

Abstract

Outline

INTRODUCTION

a) Sample Applications of Clustering High-Dimensional Data

i. Gene Expression Analysis.

ii. Metabolic Screening.

iii. Customer Recommendation Systems.

iv. Text Documents.

b) Finding Clusters in High-Dimensional Data

i. The main challenge for clustering here is that different subsets of features are

relevant for different clusters, that is, the objects cluster in subspaces of the data

space but the subspaces of the clusters may vary.

ii. A common way to overcome problems of high-dimensional data spaces where

several features are correlated or only some features are relevant is to perform

feature selection before performing any other data mining task.

iii. Unfortunately, such feature selection or dimensionality reduction techniques cannot

be applied to clustering problems.

iv. Instead of a global approach to feature selection, a local approach accounting for

the local feature relevance and/or local feature correlation problems is required.

Background

As a prolific research area in data mining, subspace clustering and related problems induced a vast

quantity of proposed solutions.

Motivation

However, many publications compare a new proposition—if at all—with one or two competitors,

or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition.

As a consequence, even if two solutions are thoroughly compared experimentally, it will often

remain unclear whether both solutions tackle the same problem or, if they do, whether they agree

in certain tacit assumptions and how such assumptions may influence the outcome of an

algorithm.

Solution

In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering

in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying

assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how

several prominent solutions tackle different problems.

Evaluation