cooperative testing and analysis:

Cooperative Testing and Analysis:

Human-Tool, Tool-Tool, and Human-Human Cooperations to Get the Job

Done

Tao Xie

North Carolina State UniversityRaleigh, NC, USA

Turing Test Tell Machine and Human Apart

Human vs. Machine Machine Better Than Human?

IBM's Deep Blue defeated chess champion Garry Kasparov in 1997

IBM Watson defeated top human Jeopardy! players in 2011

CAPTCHA: Human is Better

"Completely Automated Public Turing test to tell Computers and Humans Apart"

Human Computer Interaction

Movie: Minority Report

CNN News

iPad

Human-Centric Software Engineering

…

Automation in Software Testing

2010 Dagstuhl Seminar 10111

Practical Software Testing: Tool Automation and Human Factors

http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011



Automation in Software Testing

2010 Dagstuhl Seminar 10111

Practical Software Testing: Tool Automation and Human Factors

Human Factors




Automated Test Generation

9

Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths

Example tool: Pex from Microsoft Research (for .NET programs)

Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In Proc. PLDI 2005Koushik Sen, Darko Marinov, and Gul Agha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005Nikolai Tillmann and Jonathan de Halleux. Pex - White Box Test Generation for .NET. In Proc. TAP 2008

Dynamic Symbolic Execution

Code to generate inputs for:

Constraints to solve

a!=null a!=null &&a.Length>0

a!=null &&a.Length>0 &&a[0]==1234567890

void CoverMe(int[] a){ if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug");}

Observed constraints

a==null

a!=null &&!(a.Length>0)a!=null &&a.Length>0 &&a[0]!=1234567890

a!=null &&a.Length>0 &&a[0]==1234567890

Data

null

{}

{0}

{123…}a==null

a.Length>0

a[0]==123…T

TF

T

F

F

Execute&MonitorSolve

Choose next path

Done: There is no path left.

Negated condition

Automating Test Generation

Method sequences MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE

09], Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]

Environments e.g., db, file systems, network, … DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11] CloudApp Testing [Zhang et al. IEEE Soft 12]

Loops Fitnex [Xie et al. DSN 09]

Code evolution eXpress [Taneja et al. ISSTA 11]

@NCSU ASE

Pex on MSDN DevLabsIncubation Project for Visual Studio

Download counts (20 months)(Feb. 2008 - Oct. 2009 )

Academic: 17,366 Devlabs: 13,022 Total: 30,388

http://research.microsoft.com/projects/pex/

Open Source Pex extensionshttp://pexase.codeplex.com/

Publications: http://research.microsoft.com/en-us/projects/pex/community.aspx#publications

http://research.microsoft.com/en-us/projects/pex/community.aspx

State-of-the-Art/Practice Testing Tools

Running Symbolic PathFinder ...…=============================

========================= results

no errors detected=============================

========================= statistics

elapsed time: 0:00:02states: new=4, visited=0,

backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884

…

14

Challenges Faced by Test Generation Tools

object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%

Total block coverage achieved is 50%, lowest coverage 16%.

15

Example: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths Challenge: path explosion

Example Object-Creation Problem

16

A graph example from QuickGraph library

Includes two classes GraphDFSAlgorithm

GraphAddVertexAddEdge: requires

both vertices to be in graph

00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }

//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } } 16

[Thummalapenta et al. OOPSLA 11]

17

Test target: Cover true branch (B4) of Line 24

Desired object state: graph should include at least one edge

Target sequence:

Graph ag = new Graph();Vertex v1 = new Vertex(0);Vertex v2 = new Vertex(1);ag.AddVertex(v1);ag.AddVertex(v2);ag.AddEdge(v1, v2);DFSAlgorithm algo = new

DFSAlgorithm(ag);algo.Compute(v1);

17

00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }

//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } }

Example Object-Creation Problem

[Thummalapenta et al. OOPSLA 11]

Challenges Faced by Test Generation Tools

object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%

Total block coverage achieved is 50%, lowest coverage 16%.

18

Example: Dynamic Symbolic Execution/Concolic (Pex) Instrument code to explore feasible paths Challenge: path explosion

Example External-Method Call Problems (EMCP)

Example 1: File.Exists has data

dependencies on program input

Subsequent branch at Line 1 using the return value of File.Exists. Example 2:

Path.GetFullPath has data dependencies on program input

Path.GetFullPath throws exceptions.

Example 3: String.Format do not cause any problem

19

1

2

3

Human Can Help! Object Creation Problems (OCP)Tackle object-creation problems with Factory Methods

20

Human Can Help!External-Method Call Problems (EMCP)Tackle external-method call problems with Mock Methods or Method Instrumentation

Mocking System.IO.File.ReadAllText

21

State-of-the-Art/Practice Testing Tools

Running Symbolic PathFinder ...…=============================

========================= results

no errors detected=============================

========================= statistics

elapsed time: 0:00:02states: new=4, visited=0,

backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884

…

Tools Typically Don’t Communicate Challenges Faced by Them to Enable Cooperation between Tools and Users

22

Bigger Picture

Machine is better at task set A Mechanical, tedious, repetitive tasks, … Ex. solving constraints along a long path

Human is better at task set B Intelligence, human intent, abstraction,

domain knowledge, … Ex. local reasoning after a loop, recognizing

naming semantics

= A U

B23

Cooperation Between Human and Machine

Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]

Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun

Interfaces are important. Contents are important too!

24

Human-Assisted ComputingMotivation

Tools are often not powerful enough Human is good at some aspects that tools are not

What difficulties does the tool face? How to communicate info to the user to get help?

How does the user help the tool based on the info?

25

Iterations to form Feedback Loop

Human-Assisted ComputingMotivation

Tools are often not powerful enough Human is good at some aspects that tools are not

What difficulties does the tool face? How to communicate info to the user to get

help?

How does the user help the tool based on the info? 26

Iterations to form Feedback Loop

Difficulties Faced by Automated-Structural-Test-Generation Tools

external-method call problems (EMCP)

object-creation problems (OCP)

27

Existing Solution of Problem Identification

Existing solution identify all executed external-method calls report all object types of program inputs and

fields

Limitations the number is often high some identified problem are irrelevant for

achieving higher structural coverage

28

DSE Challenges - Preliminary Study

Real EMCPs: 0Real OCPs: 5

Reported EMCPs: 44Reported OCPs: 18 vs.

29

Proposed Approach: Covana

Goal: Precisely identify problems faced by tools when achieving structural coverage

Insight: Partially-Covered Statements have data dependency on real problem candidates

30

[Xiao et al. ICSE 11]

Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011

Overview of Covana

Data Dependence Analysis

Forward Symbolic Execution

Problem Candidat

es

Problem Candidate Identificati

on

Runtime Informati

on

Identified Problems

Coverage

Program

Generated Test Inputs

Runtime Events

31

Problem Candidate Identification

Data Dependencies

32

External-method calls whose arguments have data dependencies on program inputs

Data Dependence Analysis

Symbolic Expression:return(File.Exists) == true

Element of EMCP Candidate:return(File.Exists)

Branch Statement Line 1 has data dependency on File.Exists at Line 1

33

Partially-covered branch statements have data dependencies on EMCP candidates for return values

Evaluation – Subjects and Setup

Subjects: xUnit: unit testing framework for .NET▪ 223 classes and interfaces with 11.4 KLOC

QuickGraph: C# graph library▪ 165 classes and interfaces with 8.3 KLOC

Evaluation setup: Apply Pex to generate tests for program under

test Feed the program and generated tests to Covana Compare existing solution and Covana

34

Evaluation – Research Questions

RQ1: How effective is Covana in identifying the two main types of problems, EMCPs and OCPs?

RQ2: How effective is Covana in pruning irrelevant problem candidates of EMCPs and OCPs?

35

Evaluations - RQ1: Problem Identification

Covana identifies • 43 EMCPs with only 1 false positive and 2 false negatives• 155 OCPs with 20 false positives and 30 false negatives.

36

Evaluation –RQ2: Irrelevant-Problem-Candidate Pruning

Covana prunes • 97% (1567 in 1610) EMCP candidates with 1 false positive and 2 false negatives• 66% (296 in 451) OCP candidates with 20 false positives and 30 false negatives

37

Cooperation Between Human and Machine

Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]

Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun

Interfaces are important. Contents are important too!

38

Microsoft Research Pex for FunTeaching and Learning CS via Social Gaming

1,126,136 clicked 'Ask Pex!'

www.pexforfun.com

The contributed concept of Coding Duel games as major game type of Pex for Fun since Summer 2010

39

N. Tillmann, J. De Halleux, T. Xie, S. Gulwani and J. Bishop. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proc. ICSE 2013, Software Engineering Education (SEE), 2013.

Behind the Scene of Pex for Fun

Secret Implementation class Secret {

public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); }}

Player Implementation

class Player { public static int Puzzle(int x) { return x; }}

class Test {public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); }}

behaviorSecret Impl == Player Impl

40

Human-Centric Computing

Coding duels at http://www.pexforfun.com/ Brain exercising/learning while having fun Fun: iterative, adaptive/personalized, w/ win

criterion Abstraction/generalization, debugging,

problem solving

Brain exercising

http://www.pexforfun.com/

Coding Duel Competition @ICSE 2011

http://pexforfun.com/icse2011



Coding Duels for Automatic Grading @NCSU CSC 510

Especially valuable in Massive Open Online Courses (MOOC)

http://pexforfun.com/gradsofteng



Human-Human Cooperation: Pex for Fun (Crowdsourcing)

44

Internet

class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }

Everyone can contribute Coding duels Duel solutions

Access Control Policy (ACP) ACP includes rules to control which

principals have access to which resources

A policy rule includes four elements subject – HCP action - edit resource - patient's account effect - deny

“The Health Care Personnel (HCP) does not have the ability to edit the patient's account.”

ex.

Objectives How to ensure correct specification of ACPs?

ACPs may be complex/error-prone to specify ACPs are often written in natural language (NL)

How to ensure correct enforcement of ACPs? Gap btw ACPs (domain concepts) and system

implementation (programming concepts) Functional requirements bridge the gap but are

often written in NL

NL Functional Requirement

System ImplementationNL ACPs

conformance

NCSU/NIST Access Control Policy Test Tool (ACPT)

http://csrc.nist.gov/groups/SNS/acpt/index.html

Model Construction specify and combine

access control (AC) models (e.g., Multi-Level, RBAC )

Model Verification verify AC models

against given properties Implementation

Testing test AC implementation

with NIST ACTS XACML Synthesis

~130 organizations/users : DISA, DOE Fermi Lab, SAIC, NOAA, Rosssampson Corporation, John Hopkins U, Inventure Enterprises, …



ACP in NL Documents

In practice, ACPs are often written in natural language (NL), especially in legacy systems Supposed to be written in non-functional

requirements (e.g., security requirement)

But often buried inside functional requirements……Patient MID should be the number assigned when the patient is added to the system and cannot be edited.The HCP does not have the ability to edit the patient's security question and password.…….( UC1 of iTrust use cases)

ex.

http://agile.csc.ncsu.edu/iTrust/wiki/doku.php



Example Extraction of ACPs

ACP Extraction

Access Control Policy

EffectSubjec

t Action Resource

HCP editpatient.accoun

t

deny

“The Health Care Personnel (HCP) does not have the ability to edit the patient's account.”

Functional Requirements – Use Cases

Scenario-based functional requirements: use case: a sequence of action steps, describing▪ principals access different resources for achieving

some functionalities

Resource access information: subject – patient action – view resource – access log

The patient views access log.ex.

Inconsistencies in Functional Requirements

Validate to detect inconsistencies of action steps with formalized/extracted ACPs in terms of inconsistent names used for

referring to the same entity (e.g., user) across different use casesenterer/editor used in UC 4 of iTrust use

cases actually refers to admin and LHCP users.

ex.

“An admin creates a LHCP, an ER, a Laboratory Technician (LT), or a public health agent (PHA) [S1]. A LHCP creates [S2] UAPs. Once entered, the enterer/editor is presented a screen of the input to approve [E2].”

Technical Challenges (TC) in Policy Extraction

TC1: Semantic Structure Variance different ways to specify the same rule

TC2: Negative Meaning Implicitness verb could have negative meaning

ACP 1: An HCP cannot change patient’s account.ACP2: An HCP is disallowed to change patient’s account.

Technical Challenges in Action-step Extraction

TC3: Anaphora

TC4: Transitive Subject

TC5: Perspective Variance

These challenges apply when extracting ACPs from Functional Requirements

Step 1: An HCP creates an account.Step 2:He edits the account.Step 3: The system updates the account.Step 4: The system displays the updated account.HCP HCP views the updated

account.

Text Analytics for Security– Text2Policy Ensure correct specification

automatically extract ACPs from NL documents Ensure correct enforcement

automatically extract action steps from NL use cases

New Natural Language Processing (NLP) techniques syntactic analysis: extract syntactic structure (noun

group, verb group) semantic analysis: extract semantic meaning of

elements (e.g., subject, action, resource, and effect)

[FSE 2012]

http://people.engr.ncsu.edu/txie/publications.htm#fse12-nlp

http://people.engr.ncsu.edu/txie/publications.htm

http://people.engr.ncsu.edu/txie/publications.htm

Summary: Cooperative Testing and Analysis

Human-Assisted Computing Covana

Human-Centric Computing Pex for Fun

Security Policy NCSU/NIST ACPT Text2Policy

Thank you!

Questions ?

https://sites.google.com/site/asergrp/

Human-Human/Tool Cooperation: Performance Debugging in the Large

57

Pattern Matching

Bug update

Problematic Pattern

Repository

Bug Database

Trace analysis

Bug filing

StackMine [Han et al. ICSE 12]

Trace StorageTrace collection

Internet

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

StackMine: Industry Impact

“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”

- from Development Manager in WindowsHighly effective new issue

discovery onWindows mini-hang

Continuous impact on future Windows versions

58

Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012

Tool-Tool Cooperation

Static analysis + dynamic analysis Static checking + Test generation …

Dynamic analysis + static analysis Fix generation + fix validation …

Static analysis + static analysis …

Dynamic analysis + dynamic analysis …

59Example: Xiaoyin Wang, Lu Zhang, Tao Xie, Yingfei Xiong, and Hong Mei. Automating Presentation Changes in Dynamic Web Applications via Collaborative Hybrid Analysis. In Proc. FSE 2012

cooperative testing and analysis:

Documents

game deep blue

deep blue defeated kasparov

humanhuman cooperations

machine machine

deep blue project

human judge

human jeopardy

turing test