10/06/2015dr andy brooks1 msc software maintenance ms viðhald hugbúnaðar fyrirlestrar 27 & 28...

30
30/03/22 Dr Andy Brooks 1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool http://www.cs.cmu.edu/~NatProg/whyline-java.ht

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

18/04/23 Dr Andy Brooks 1

MSc Software MaintenanceMS Viðhald Hugbúnaðar

Fyrirlestrar 27 & 28Debugging with the Whyline tool

http://www.cs.cmu.edu/~NatProg/whyline-java.html

Page 2: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

18/04/23 Dr Andy Brooks 2

Case StudyDæmisaga

ReferenceDebugging Reinvented: Asking and Answering Why and

Why Not Questions about Program Behavior, Andrew J. Ko and Brad A. Myers, ICSE´08, pp 301-310, 2008. ©ACM

Page 3: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

1. Introduction• When program behaviour is incorrect, software

engineers must think of questions to ask about the code. Often they simply guess.– “Is this double increment caused by a typo

somewhere, a ‘2’ perhaps instead of a ‘1’?

• Studies have reported that initial guesses are wrong almost 90% of the time.– The double increment was actually caused by faulty

program logic which resulted in the incrementing method being called twice.

18/04/23 Dr Andy Brooks 3

Page 4: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

1. Introduction

• Breakpoint debuggers require the software engineer to choose a line of code.– to examine program state at a particular time

• Slicing tools also require the software engineer to choose a seed variable or statement.– to display all the code that has an influence

• If the wrong variable or wrong line of code is chosen then tool output can be irrelevant to solving the problem.– garbage-in garbage-out

18/04/23 Dr Andy Brooks 4

Page 5: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Whyline• A new kind of program understanding and

debugging tool.• Whyline allows the user to choose a why did or

why didn´t question about program output.• Whyline then generates an answer to the

question using various program analyses.– static and dynamic slicing, precise call graphs, “new algorithms”– chains of events as explanations

• Whyline works with Java programs that use standard Java I/O and that do not run “too long”.

18/04/23 Dr Andy Brooks 5

1. Introduction

Page 6: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Simple painting application• The user demonstrates the program behaviour they want to

inquire about (1).• When the program halts, Whyline loads the trace. Using a time

controller (2), the user finds the point in time they want to ask about.

• The user clicks on something of interest and questions pop up about it (3). The user selects a question.

• Whyline determines the responsible execution sequence and the user can select from a list of pop up questions (4).

• Whyline determines the instantiation event (5) and the corresponding source code is shown (6).

• The call stack and locals at the time of the selected event are also shown (7).

18/04/23 Dr Andy Brooks 6

2. An Example

Page 7: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

18/04/23 Dr Andy Brooks 7Figure 1. ©ACM

green not blue slider used

interactive debugging

Page 8: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

2. An Example• Using Whyline, time spent debugging was

halved because:– people do not have to guess search terms or

understand the resulting matches– people do not have to set breakpoints

• Using WhyLine people “simply pointed to something that they knew was relevant and wrong, and let the Whyline determine the related evidence”.

18/04/23 Dr Andy Brooks 8

Watch the WhyLine videos:http://www.cs.cmu.edu/~NatProg/movies/whyline-java-demo-web.mov http://www.cs.cmu.edu/~NatProg/movies/whyline-java-tutorial-web.movhttp://staff.unak.is/not/andy/MScMaintenance0809/WhyLine/whyline-java-demo-web.movhttp://staff.unak.is/not/andy/MScMaintenance0809/WhyLine/whyline-java-tutorial-web.mov

Page 9: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

3.1 Recording an Execution Trace

• The Whyline takes a postmortem approach to debugging by capturing a trace.

• A trace stores Java source files, instrumented class files, sequences of events in each thread, and other types of meta data.

• Each thread has a separate trace file for its events.• Currently 55 types of events are defined in the Whyline.• Events include values after their header to help

developers interpret program state.– for an assignment event, the value assigned is included– for an invocation event, values passed as arguments are included

18/04/23 Dr Andy Brooks 9method invocation

Page 10: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

3.1 Recording an Execution Trace

• Unexecuted classes referenced by a dynamically loaded class are also saved as part of the trace to help answer why didn´t questions.– This is not applied recursively as this would “likely include all

known classes”.

18/04/23 Dr Andy Brooks 10

Page 11: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

3.2 Loading a trace

• All source files and class files are loaded.– used for almost every aspect of question and answering

• Whyline constructs lists of output instructions which are used as basis to generate questions.

• Whyline generates a call graph from the invocations found in the class files.

• Then events are loaded in order of their event IDs.– Whyline has a “complete ordering of the events in the execution.”

• “To improve the performance of question derivation and answering, the Whyline constructs lists of invocations, assignments to fields, and other types of events.”

18/04/23 Dr Andy Brooks 11

Page 12: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

3.3 Creating an I/O History• From the low-level event information recorded in traces,

Whyline constructs a user interface for navigating the output history.

18/04/23 Dr Andy Brooks 12

A user can move backwards and forwards in time. The selected input time T determines what events are visible on the screen.

snapshots from QuickTime video

Page 13: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

3.3 Creating an I/O History• Whyline finds fields and invocations that could

have influenced output.– “For example, the color of a rectangle might be

affected by some field in an object, or by the return value of a call to some method.”

• If an output instruction directly invokes rather than simply influence output (e.g. draw a rectangle rather than set the rectangle´s colour), Whyline marks all the potential indirect callers as output invoking.

18/04/23 Dr Andy Brooks 13

tracking dependencies

Page 14: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

(1) Why did property = value?(refers to value passed to output call)

18/04/23 Dr Andy Brooks 14

3.4 Deriving questions See Figure 3.

Page 15: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

(6) Why didn´t an instance of class C appear?(refers to instantiations of C)

• Why didn´t questions “support questions about output that has no representative output to click on”.

• Whyline has a why didn´t question for each familiar class that has output invoking methods (not output influencing), inherited or declared.– “A class is familiar if user owned code either defines or references the

specific class.”

18/04/23 Dr Andy Brooks 15

3.4 Deriving questions See Figure 3.

Page 16: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

(4) Why did object get created?(refers to instantiation of object)(5) Why didn´t method execute after time T?(refers to potential invocation instructions)

18/04/23 Dr Andy Brooks 16

3.4 Deriving questions See Figure 3.

Page 17: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

(2) Why did field = value?(refers to assignment before T)(3) Why didn´t field´s value change after time T?(refers to potential assignment instructions)

18/04/23 Dr Andy Brooks 17

3.4 Deriving questions See Figure 3.

Page 18: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.1 Performance Feasibility

• Performance tests were run on a 2GHz Intel Core Duo MacBook Pro with 2GB of RAM.– standard OS X JVM, given a 1 GB heap

• The Unix time command was used to measure time to a tenth of a second.

• The casy study article text says performance tests were run five times and the results averaged.– Table 1 says tests were run 10 times and results averaged.

• Execution times were measured for normal operation, profiling time (using the profiler YourKit), and tracing time using the Whyline.

04/18/23 Dr Andy Brooks 18

5. EVALUATION

Page 19: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Table 1 ©ACM

• LOC calculated omitting whitespace lines.• Whyline´s tracing is slower than profiling “because it

instruments more code”.• Whyline´s tracing time should improve once Whyline has

been optimised.

18/04/23 Dr Andy Brooks 19

Page 20: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Table 1 ©ACM

• Compressed trace sizes compare favourable with those reported in dynamic slicing work.

• Loading time is an issue. The single biggest limiting factor is memory. The larger traces resulted in garbage collection and virtual memory use.– improvements in Whyline´s memory management are needed

18/04/23 Dr Andy Brooks 20

Page 21: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Does Whyline scale?

• A minute of user interaction with ArgoUML was tested.– 35,597 I/O events

• The output history is navigable at interactive speeds.

• Clicking on an event produced a menu of questions at interactive speeds.

18/04/23 Dr Andy Brooks 21

5. EVALUATION

Page 22: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.2 Question Coverage• Does the Whyline provide questions that a user actually

wants to ask?• 9 bug reports for the applications listed in Table 1 were

chosen at random.• All but one bug report had a possible corresponding

Whyline question.– one bug report was a feature request

• This evaluation did not test actual Whyline usage.– Would the user actually locate the question and would Whyline’s

answer make any sense?

• “In future work, we will assess this issue in greater detail”

18/04/23 Dr Andy Brooks 22

5. EVALUATION

Page 23: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.3 User Study

• A pilot evaluation was conducted with 9 participants having a variety of backgrounds:– psychology, design, computer science, linguistics, food science,

engineering

• One participant had never seen a line of code. Another had programmed for more than 10 years.

• The evaluation task was to resolve the slider bug using the Whyline.

• Task performance was compared with18 self-described Java experts who used Eclipse 2.1 to resolve the slider bug in a previous study [10].

18/04/23 Dr Andy Brooks 23

5. EVALUATION

Page 24: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.3 User Study• Participants recieved a short tutorial (1-2

minutes) on how to use the Whyline.• The blue slider´s incorrect behaviour was

demonstrated to participants.– Participants were asked to find the cause of this

incorrect behaviour.

• Participants were allowed to ask about the user interface but not about that the task or code.

• The experimenter offered clarification if a user expressed confusion about the user interface.

18/04/23 Dr Andy Brooks 24

5. EVALUATION

Page 25: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.3 User Study

Times (minutes) Mininum Maximum Median

Whyline 1 12 4

control group 3 38 10

• Whyline participants were more than twice as fast as the Java experts (the control group).– statistically significant difference

• p < 0,05 (Wilcoxon rank sums test)

• The pilot evaluation has limited external validity.– single task– small sample size (n=9)

18/04/23 Dr Andy Brooks 25

5. EVALUATION

Page 26: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

5.3 User Study• Novices in the pilot evaluation tended to

outperform the experts in the pilot evaluation. – Often they asked aloud “Why is the line blue?” and

used Whyline directly to have the question answered.

• Experts in the pilot evaluation asked the same question but they first speculated about the reason rather than use the Whyline directly.– e.g. “Why didn´t this slider´s event get handled”

• One expert didn´t expect Whyline could make the connection between the slider and the color.

18/04/23 Dr Andy Brooks 26

5. EVALUATION

Page 27: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

7. Limitations (of Whyline)

• The Whyline tracing approach is practical only for executions lasting a few minutes.

• Some bugs can only be reproduced without interference from instrumentation.

• Loading traces feels “heavier” in comparison to breakpoint debugger use that has virtually no setup time.

• Cryptic names used for method and field names will result in cryptic Whyline questions.– ‘Why did wd = 251?’ rather than ‘Why did width = 251?’

• Whyline helps find code related to a behaviour but does not explain how to change that behaviour.

18/04/23 Dr Andy Brooks 27

Page 28: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

8. Discussion

• Whyline has no special knowledge about user interface toolkits or other APIs.

• A user thinking “Why didn´t this window change?” must choose a question like “Why didn´t this JFrame´s repaint() method get called?”

• “It might be helpful if one could write plug-ins for the Whyline to add special knowledge and heuristics for certain APIs, to improve the specificity of questions and answers.”

18/04/23 Dr Andy Brooks 28

Page 29: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

8. Discussion

• Modern applications can run across multiple platforms and can be written using multiple languages. How can traces be captured in such an environment?

• Does Whyline need to provide support for people collaborating on bug fixing?

18/04/23 Dr Andy Brooks 29

Page 30: 10/06/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 27 & 28 Debugging with the Whyline tool NatProg/whyline-java.html

Critical commentary from Andy

• Whyline technology could revolutionise approaches to debugging.

• The evaluation, however, was focussed on one defect in a small, stand-alone application.– The result regarding time saved is not generalisable.

• Would maintainers prefer a DORA approach to identify relevant code to reason about rather than explore the question set posed by Whyline?

• Much more evaluation work needs to be done.

18/04/23 Dr Andy Brooks 30