comp3033 workbook iisyllabus.cs.manchester.ac.uk/ugt/2020/comp33812/lecture...suzanne m. embury room...

Course Notes for

COMP30332

SOFTWARE EVOLUTION

Part II

2008/2009 Academic Session

Suzanne M. Embury Room KB 2.105

Department of Computer Science University of Manchester

Oxford Road Manchester M13 9PL

U.K.

Telephone: (0161) 275 6128 Fax: (0161) 275 6236

E-Mail: [email protected]

- 3 -

Part II: Program Comprehension

One of the features of working in software evolution is the constant need to modify software systems that are unfamiliar, or (at best) only partially understood. However, evidence suggests (unsurprisingly) that programmers who have a better understanding of the system as a whole are able to introduce changes into the system with fewer defects than those with a weaker understanding. There has been a steady flow of work into techniques and methods for improving our ability to understand a system, and this next group of 4 lectures will introduce some of these, and give you the opportunity to practice them.

Lecture 6. Code Reading

In this lecture, we consider the importance of code reading skills for software evolution, and look at several studies of the way humans approach the task of understanding an unfamiliar system. Several techniques for approaching code reading tasks have been proposed, based on theses studies.

CS333 Software Evolution © University of Manchester

Evolution ActivitiesWhat do maintenance programmers spend most of their time doing?

The answer is

Notes

- 4 -

CS333 Software Evolution © University of Manchester 2004

Understanding a System

Sources of information?

Notes


Program Understanding: How?

ToolsSoftware visualisationProgram analysis and transformationReverse engineeringBut, the most fundamental tool is:

The Human Brain!The Human Brain!


Aim of Code Reading

Aim to understand both the …………… (i.e. specification) and the ……… of the program

i.e., both “what” and “how”cf. bubble sort and insertion sortcf. sort used to find student with the top marks, who wins the module prize, and sort used to print out marks obtained by all students in order.

Hardest of all is to understand “why?”

- 5 -

Notes for revision:

Notes

Ability in code reading seems to correlate with skill/experience in programming (unsurprisingly). But everyone can get better with practice. By studying how experienced programmers work, we can learn winning techniques for code reading. Such a study also allows us to:

• Provide better training for software engineers • Develop better software tools to support code reading activities • Provide better documentation for long term maintenance of software.

As a revision exercise, think about how you should write comments if they are to be of maximum help to some programmer in the future, trying to figure out what your code does.


The Effects of Code Reading

Applicationdomain

Mentalmodel of

s/w

GeneralProgramming

ExistingKnowledge

AcquiredKnowledge

Applicationdomain

Mentalmodel of

s/w

GeneralProgramming

Code Reading


How are the Models Constructed?

By formulating and verifying hypothesesLetovsky (1986) proposed

Why conjectures – what is the purpose of this piece of code?How conjectures – how does this code accomplish its goal?What conjectures – what role does this code element play?

Each conjecture has an associated “degree of certainty”

- 6 -


How? Why? What? Whether?

Example questions recorded by Letovsky

“So let’s see how it searches the database”

“It’s setting IPTR to ZERO. I’d like to know why.”

“I want to find out what field 7 is.”

“Is this subroutine actually deleting a record or is it just putting a delete mark there, and the record is still there?”

Notes


Try it Yourselfstatic void move_last_runqueue(struct task_struct * p) {

struct task_struct *next = p->next_run;struct task_struct *prev = p->prev_run;

next->prev_run = prev; /* remove from list */prev->next_run = next;

p->next_run = &init_task; /* add back to list */prev = init_task.prev_run;init_task.prev_run = p;p->prev_run = prev;prev->next_run = p;

}

This fragment of code is taken from an early version of the Linux kernel source code. (Linux V2.0, copyright 1992, Linus Torvalds).

List some questions that you might ask if trying to understand what this code does:

…………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

…………………………………………………………………………………………………

- 7 -


Implications

How does all this help us to learn how to read code?

Importance of background knowledgeAbility to work with unconfirmed hypotheses

“The essence of an effective and efficient strategy is to keep the number of open hypotheses manageable while increasing understanding incrementally”

von Mayrhauser and Vans (1986)

Code reading strategy

Notes


Where to Begin?

Start at the top and read each statement in turn?


Bottom-Up Strategy

Typically used when unfamiliar with the code/application

Look for recognisable idioms within the codeE.g. the “swap” idiom

t = x; x = y; y = t;

Combine recognised units to understand ever larger sections of the code

- 8 -


Example Idioms (1)Process every item in a collection

curr := first itemwhile not at end of collection do

process currcurr := next item after curr

end while

Example: increment array itemsint i = 0;while (i <= a.length) {

a[i] = a[i] + 1;i = i + 1;

}

Notes

As a revision exercise, try writing code based on this idiom to increment all the items stored in a vector, or a linked list structure.


Example Idioms (2)

Search a collection for first item that has some property

curr := first itemfound := falsewhile not at end and not found do

if curr has property thenfound := true; item := curr

end ifcurr := next item after curr

end while


Example Idioms (2)

Example: find first item containing “fred” in linked list

c = head();s = null;while (c <> null && s == null) {

if (c.value().matches(“*fred*”))s = c;

c = c.next();}

- 9 -


Example Idioms (3)

Which idiom would you use to sum the elements in an array?

Use the idiom to write the routine

Notes


public static String rvs(final String s) {int n = s.length();char[] datum = new char[n];s.getChars(0, n, datum, 0);n--;for (int i = (n-1)/2 ; i >= 0 ; i--) {

char temp = datum[i];datum[i] = datum[n-i];datum[n-i] = temp

}return new String (datum);

}

Bottom-Up Code Reading Example


Another Examplepublic String mystr(String original, char strchar) {

String returnstring = new String();char p = ' ';int a = 0;int fl = 0;while (original != null && a < original.length() && fl != 1) {

p = original.charAt(a);if (p != strchar) {

returnstring = returnstring + p;} else {

fl = 1;}a++;

}return returnstring;

}

- 10 -

Which idioms can you spot in this program? Make a note of what questions/hypotheses you formulate after discovering these idioms?

Revision Exercise: try using bottom up reading to figure out what the fragment of Linux code given on page 5 is doing. Check out the footnote1 if you need a hint, but you ought to be able to spot an example of the idioms we’ve looked at in this lecture, plus an additional very common idiom for working with lists.


Take Home Message

Acknowledgements

The example programs used in this lecture are based on programs written by:

• R. Winder and G. Roberts (from Developing Java Software, 2000 – see references) • Zurk (from the Source Forge Snippet Library at:

http://sourceforge.net/snippet/detail.php?type=snippet&id=100613) • Linus Torvalds (Linux Kernal V2.0)

1 This fragment comes from the Linux process scheduler. Think about what kinds of thing a process scheduler has to do, in order to support a multitasking or multiprocessing operating system.

- 11 -

Lecture 7. The Top-Down Approach to Code Reading

After examining the bottom-up approach to code reading in the previous lecture, we will now look at the opposite approach of beginning to comprehend the program from the top down. This approach requires a quite different way of thinking from the bottom-up, and has a different set of advantages and disadvantages.


Top-Down Approach

Suitable when programmer is familiar with the type of program to be comprehended

E.g. may have worked on several invoicing systemsKnows what components must be presentKnows what these components consist of

Use hypotheses to refine view of system incrementally

Notes


Top-Down Example

Consider an OS expert trying to understand the code of a new OS

O/SProcess

Manager

FileManager

MemoryManager

InterprocessComms

ProcessScheduling

RoundRobin

ShortestFirst

PrioritySched.

- 12 -


Top-Down Code Reading

At each stage, the code is examined just far enough to confirm the hypothesis that such a component exists

I.e. don’t look at every line in detail

Beacons are very important for this processCues that suggest interpretations of the codeExamples

Procedure name “process_mgr”Swap inside loop could indicate a sortVar names including ‘num’ and ‘ttl’

Notes

COMP3033 Software Evolution © University of Manchester 2008

Diagrams

Often use diagrams as part of top-down readingGive an overview of the programHide unimportant and distracting detailsExamples

Architecture diagrams for systems built from several componentsClass diagrams for OO systemsFunctional calling hierarchy

It is common to draw diagrams similar to those drawn when designing a system

We try to recreate the original design documentationE.g. DFDs, JSP diagrams, ER models

Program 7.1 You should have a separate handout containing printouts of the files that make up

program 7.1. We’re now going to step through the process of coming to understand their contents using the top-down approach. Some space for you to make notes on the process is given on the next page, but you should wait until the exercise is complete before completing them.

- 13 -

Use this space to summarise the main points of the top-down reading process we have just gone through.

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………


Which strategy is best?

Bottom-up readingGives detailed, concrete understandingCan be difficult to understand full meaning of code without contextDifficult to know where to start, in large programs

Top-down readingGives a good overview of whole systemIndicates which parts of code are worth investigationReader must keep a lot of information in short term memory

Notes

- 14 -


Opportunistic Approach

The best approach is a hybrid of the twoBegin with top-down

Gain an overview of the functions of the program

Then selectively apply bottom-up strategies when nearing “code level”

Use to verify hypotheses resulting from top-down reading

Presence of beacons can indicate opportunity for change of strategy

E.g. by suggesting a hypothesis that is best verified by non-current strategy

Notes


Now try it yourself!

Look at program 7.2, and try to apply the techniques we have been learning, to understand its function

Your aim is to write a short paragraph of text, that could be used as a manual page for the program

what inputs are requiredWhat outputs are produced

Program 7.2

The second Java example gives you a chance to try applying the techniques of top-down and bottom-up code reading for yourself. Again, you’ll need to have some paper handy, to make notes on, as we work through the programs. Don’t forget that you can draw diagrams to help you understand what is going on, and that you should look out for beacons that can indicate the presence of particular functionality. You should also be careful to make notes of any hypotheses that present themselves, and to keep track of what evidence you have for or against each one.

- 15 -


Practise, Practise, Practise

As with debugging, you can improve your ability to read code by practising as often as possibleBuild up your personal library of general software knowledge

Collect idioms, components, algorithms, …

A good way to improve your programming abilitylearn from the work of top programmersWWW is a good source of examples

GNU archive at www.doc.ic.ac.ukOpen Source Developers Network

Notes


Take Home Message

Code Reading in the Examination You should be prepared to read unfamiliar code in the examination, and to describe

what you have learnt from the code and how you went about discovering it. This may involve recognising familiar idioms that have not been presented in these lectures. We won’t ask you to recognise anything very obscure, but you should make sure that you are familiar with the major sorting and searching algorithms, for example, and that you can recognise examples of the major data structures (lists, queues, etc.) and the operations typically performed on them. Note: in previous years, this course unit has included a lecture on how to read code written in a programming language you don’t know and some past exam questions involved code reading in unfamiliar legacy languages. We won’t be covering these techniques in the course this year, so you can assume that any code reading questions asked will be in a language you definitely should know. You can still try your hand at these past questions, but will find them more challenging than the past papers that set code reading exercises based on Java code.

- 16 -

- 17 -

Lectures 8 & 9. Software Tools for Program Understanding

Many of the program understanding techniques we have introduced so far can be difficult to apply to the large systems that are commonly encountered in real applications. Many of the tasks involved require careful and repetitive analysis of the source code – something that is much more easily done by software than by humans. Over the years, a whole host of different kinds of software tool for supporting the processes of program understanding and modification have been proposed and implemented. Some of these facilities are now well established, and are provided by many modern software development environments, while others remain the preserve of research prototypes or advanced “niche” CASE tools. In these two lectures, we will briefly survey the range of tools available, and will look at some of the techniques that underlie them.


Comprehension Tasks

During program comprehension we:Explore the program code non-linearlyDerive a variety of views of the program code

To focus on particular aspectsTo remove irrelevant details

Formulate hypotheses and search for evidenceLink program constructs to real world concepts

E.g. var SAL refers to the salary of an employee

Many of these involve repetitive tasks that are more quickly and more reliably performed by a software tool - which ones?

Notes


Types of Tool

Software visualisation toolsSupport browsing and exploration of the software

Static analysis toolsExtract information from program code

Dynamic analysis toolsExtract information from individual executions of the code

Knowledge-based repositoriesStore knowledge about the domain, and links between SLCOsDocument the process of understanding

- 18 -

Notes


Visualising Program Structure

For top-down reading, we need to understand the overall organisation of the program

We need to be able to navigate freely around the code at a high level of abstraction

Many modern development tools provide these kind of overview and navigation features

E.g. Eclipse, Microsoft Visual Studio

An Example Program Understanding Tool CodeSurfer (GrammaTech Ltd)

• Adapts several techniques from the research community as well as facilities found in most modern development environments

• Call graphs • Use/definition links • Slicing

- 19 -


Static Code Analysis

Most advanced visualisation/filtering techniques are based around static code analysis techniques

I.e. analysis of source code to extract properties true of every possible executionExamples

Data dependency graphControl flow graphCall graph

Notes


Control Flow Graphs

We encountered these earlier when talking about software change impact analysisControl flow graphs

What are they?

How are they created? 3 5

1

2

6

T F

• 1 stmt = • Edges indicate

••


CFGs - Applications

CFGs are not very useful in themselves, but they are the foundation for many other forms of analysis

However, they can reveal the presence of spaghetti code!

- 20 -


Control Dependencies

Software tools can perform more complex analyses than feasible by handExample: control dependencies

CFG edges give some idea of control dependenciesBut – too much detail, too difficult to interpret

Instead, a control dependency captures a more abstract view of the way the execution of some statements is controlled by the execution of others

Notes


Control Dependencies: Example1

2

3

4

5 6

7

8

T

FT

F

Which “control” nodes determine whether node 3 is executed?

Which “control” nodes determine whether node 6 is executed?


Computing Control Deps.

Step 1: produce an augmented CFGAdd a single START nodeAdd a single STOP nodeConvert all nodes so that each has at most two exits

E.g. case statements

1X<0

X=0

3 42

5

X>0

1 X<0

X=0

3 42

5

1bT

F

T F

becomes

- 21 -



Step 2: produce a post-dominator tree

Definition:

A node Y post-dominates a node X ifffor every path X→n1→n2→…→STOP

the sub-path n1→…→STOP contains Y

That is, if Y post-dominates X then whenever X is executed Y will necessarily be executed too

Notes


Post-Dominates ExampleSTART

1

2

3

4

STOP5

6

7

T

T

F

F

Node 1 is post-dominated by node 4

Node 1 is not post-dominated by node 3

Complete the table showing which nodes post-dominate each of the node in the graph.

Node Is post-dominated by

START

1

2

3

4

5

6

7

STOP

- 22 -


Post-Dominator Tree

To construct the post-dominator tree

The root node is (the only) node which is not post-dominated by any other node

Remove this node from the PD listsWhile still more nodes to add to tree

For each node which is only post-dominated by nodes present in the PD tree

Add node as child of its least post-dominatorRemove node from PD lists

End forEnd While

Notes


Defining Control Dependence

A node Y is control dependent on node X iffi. there exists a path X→n1→n2→…→nk→Y where every

n1, …, nk is post-dominated by Yii. X is not post-dominated by Y

That is: after executing X, one possibility is to execute a paththat necessarily requires execution of Y (Condition i)

But, it is possible to choose a different path from X, which mayor may not include Y (i.e. X has exactly 2 exits) (Condition ii)


An ExampleSTART

1

2

3

4

STOP5

6

7

T

T

F

F

Is node 3 control dependent on node 1?



- 23 -



Step 3Find the set S of all CFG edges (A, B) where B does not post-dominate A

For each pair in S, find the least common ancestor L in the PD tree

If L = parent of A then all nodes in PD tree between L and B exclusive are control dependent on A, and B is also control dependent on A

If L = A then all nodes in PD tree between A and B inclusive are control dependent on A

Notes


The Complete ExampleSTART

1

2

3

4

STOP5

6

7

is control dependent on


Control Deps. - Applications

Control dependencies have many applications in compiling

E.g. for optimisation of generated code

But they also have a very important application in program understanding

They can be used to generate program slices (Ottenstein & Ottenstein 1984)

- 24 -


Program Slicing

Idea:when attempting to understand a program, often need to know how variables got their values at specific pointsBut, the statements which produced these values are buried amongst many others that did not affect the variables of interest

So, can we find some way to remove these irrelevant statements automatically?

The remaining code is the program slice!

Notes


Slicing - Example

read(n);i := 1;sum := 0;prod := 1;while i <= n do begin

sum := sum + i;prod := prod * i;i := i + 1

end;write(sum);write(prod)

read(n);i := 1;

prod := 1;while i <= n do begin

prod := prod * i;i := i + 1

end;

write(prod)

Slice of program w.r.t. variable prod

on the last line


Slicing Defined

A program slice is commonly defined to be:A subset of the statements in a program Pthat has the same effect on the variables

of interest at the given point as P

Note that a slice is computed according to a user-supplied slicing criterion

the statement of interest in Pthe variables of interest in PE.g. (line 10, {prod})

- 25 -


Computing Program Slices

Several methods for computing slices have been proposedWe adopt the algorithm of Ottenstein and Ottenstein (1984)

Based on the notion of a program dependence graph (PDG)Nodes = statements/expressions in programEdges = all control dependencies in program + all data dependenciesA slice wrt to a statement S in P is given by

All nodes in PDG from which node(S) is reachable

Notes


Slicing Example cont.

p := p * i

read(n)

i := 1

s := 0

p := 1

i := i + 1s := s + i;

write(s)

write(p)

START

while i <= n


Varieties of Slicing

Slicing has been topic of much research since its invention (Weiser 1979)

Many different algorithms designedInter-procedural slicingSlicing over complex data structuresHandling concurrency/inter-process comms

And many variations proposedBackwards slicing vs forward slicingStatic slicing vs dynamic slicing

- 26 -


Dynamic Slicing

Problem with static slicingSlices can be largeRetains too much irrelevant detail

We need some way to filter the program furtherOne possibility is to produce a slice of the behaviour of the program for a given test set/set of input valuesThis is called dynamic slicingSlicing criteria is now:

(input values, statement occurrence, vars)

Notes


Dynamic Slicing Example

1 read(n);2 i := 1;3 while (i <= n) do4 begin5 if (i mod 2 = 0) then6 x := 07 else8 x := 1;9 i := i + 1;10 end;11 write(x)

1 read(n);2 i := 1;3 while (i <= n) do4 begin5 if (i mod 2 = 0) then6 x := 07 else8 ;9 i := i + 1;10 end;11 write(x)


Take Home Message

- 27 -

Self Test 2: Program Comprehension 2.1 Which of the following hypotheses is not typical of the sorts of hypothesis formed

during the early stages of bottom-up code reading?

a. "This code fragment implements the swap idiom." b. "The variable EmpSal contains the salary of the employee selected by the

user." c. "All error handling routines can be found in the file errhand.c." d. "The syntax `DO <var> = <lb> TO <ub>' has the same meaning as a basic

FOR loop."

2.2 Which code reading strategy is most appropriate when reading code written in a language with which you are not familiar?

a. Top-down reading. b. Bottom-up reading. c. Hybrid (opportunistic) strategy. d. No strategy is more appropriate than any other.

2.3 The following is a list of the modules present in a system, along with information about which other modules are invoked from them. Module m1 calls modules m3 and m5. Module m2 calls modules m3, m4 and m5. Module m3 calls modules m6, m7 and m10. Module m4 calls modules m6 and m8. Module m5 calls modules m7, m9 and m11. Modules m6, m7, m8, m9, m10 and m11 do not call any other modules. Which architectural pattern is embodied by these modules?

a. The pipe-and-filter pattern. b. The repository pattern. c. The layered pattern. d. The object-oriented pattern. e. The main-program-and-subroutine pattern.

(We do not teach architectural patterns in this course unit, so if you are not familiar with them from your other courses, you will not be able to answer the question. You could make an intelligent guess, though!)

2.4 Which of the following Java fragments contain a beacon indicating that the method they appear in implements a search algorithm?

a. while !(found) { b. temp = code4; c. salary = hoursWorked * hourlyRate; d. public void srchCusts(int custID) { e. boolean matched = false; f. for(int i = 0; i < cust.length; i++) {

- 28 -

2.5 Consider the following Java code fragment: 001 skipArr = initialiseSkips; 002 i = j = mchar; 003 004 while (j >= 1 && i <= maxChars) { 005 if (a[i] == p[j]) { 006 i--; 007 j--; 008 } else { 009 if (mchar - j + 1 > skipArr[index(a[i])]) 010 i = i + mchar - j + 1; 011 else 012 i = i + skip[index(a[i])]; 013 j = mchar; 014 } 015 } Which of the lines of code indicated below are transitively control dependent on line 4 of this fragment?

a. Line 2. b. Line 5. c. Line 4. d. Line 15. e. Line 9.

- 29 -

Sample Exam Question 2 2 a) Letovsky has suggested that the process of code understanding involves the

creation of three different kinds of conjectures: how conjectures, why conjectures and what conjectures. Give one example of each of these three types of conjecture that you might form while reading the following fragment of code. while true do if door.maxErrorTries then break; fi; code = door.acceptSecurityCode; if not door.authorisedCode(code) then door.unauthorisedEntrySignal; door.incrementErrorTries; else door.resetErrorTries; if door.fireStatus == Door.allClear then door.open; else door.unsafeEntrySignal; fi; fi; elihw;

6

b) When reading code top-down, we try to use our expectations about the application domain to predict what the major functional components of the code will be. Imagine that you have been asked to fix some errors in the software that controls a security system at a chemical plant. Describe any three major functional components that you would expect to be present in such a system, and give some examples of the kinds of functionality that would be provided by each component.

9

c) While reading through the code, you begin to suspect that a particular routine is using a binary search algorithm to locate information about the security status of a particular entrance to the plant. What beacons would you look for to confirm this hypothesis? Give an example of a beacon that would lead you to suspect that your hypothesis is incorrect.

5

- 30 -

References Used in Part II 1983 R. Brooks

Towards a Theory of the Comprehension of Computer Programs, in International Journal of Man-Machine Studies, Vol. 18(6), pp. 543-554.

1994 G. Canfora, A. De Lucia, G.A. Di Lucca and A.R. Fasolino Recovering the architectural design for software comprehension, in Proceedings of the Third IEEE Workshop on Program Comprehension, IEEE Computer Society Press, pp. 30 –38.

1984 H. Sneed Software Renewal: a Case Study, in IEEE Software, Vol. 1(3), pp. 56-63, July.

1984 T.A. Standish An Essay on Software Reuse, in IEEE Transactions on Software Engineering, Vol. 10(5), pp. 494-497.

1986 S. Letovsky Cognitive Processes in Program Comprehension, in Empirical Studies of Programmers, E. Soloway and Iyengar (eds.) Ablex Publishing Corporation, pp. 58-79.

1995 A. von Mayrhauser and A.M. Vans Program Understanding: Models and Experiments, in Advances in Computers, Vol. 40, pp. 1-38.

2000 R. Winder and G. Roberts Developing Java Software, 2nd edition, John Wiley and Sons.

2001 C. Britton IT Architectures and Middleware, Addison-Wesley Publishers.

2001 T. Zimmermann and A. Zeller, Visualizing Memory Graphs, in Proceedings of the Dagstuhl Seminar on Software Visualization, Lecture Notes in Computer Science, Springer Verlag.

comp3033 workbook iisyllabus.cs.manchester.ac.uk/ugt/2020/comp33812/lecture...suzanne m. embury room...

Documents