title slide “ software’s hidden clockwork: a general theory of software defects " les...

28
Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement Title Slide Software’s Hidden Clockwork: a general theory of software defects" Les Hatton www.leshatton.org Version 1.1: 25/June/2012 Presentation at IEEE SES12, 25 June, 2012. “Mastering uncertainty in the Software Industry: Risks, Rewards and Reality” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Upload: sybil-russell

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Title Slide

“Software’s Hidden Clockwork:a general theory of software defects"

Les Hatton

www.leshatton.orgVersion 1.1: 25/June/2012

Presentation at IEEE SES12, 25 June, 2012.“Mastering uncertainty in the Software Industry: Risks, Rewards and Reality”

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Page 2: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Enduring misconceptions about software “How could it fail, its been tested ?”. CTO of major car

company after million+ recall of embedded software, 1999.

“The software works but is not well documented”, Article in Science, April 2012.

Words which should be banned in software systems Works, Routine, Tested, Simple, Obvious, Upgraded,

Enhanced …

Descriptions of software failure

Page 3: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

A little about software defect A hidden clockwork Conclusions

Overview

Page 4: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

On quantification Computer scientists have researched the average

density of defect in code extensively Where we have been much less successful is in

quantifying the effects of such defect on numerical results.

So what can we say about defect ?

Page 5: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

On defect density …

0.1

1.0

10.0

NASA Shuttle software HAL (0.1)

Linux kernel C (0.14)

Several commercial C systems (0.15-0.4)The best 5% of systemsapproximatelyDefects/KXLOC

Commercial Tcl-Tk (0.9)

NAG Fortran (2.1)

Medical app C++ (5.1)

Ada comms (7)NASA Fortran (8)

Sources Fiedler (1989), Compton (1990), Keller (1993), Basili (1996), Hatton (2005,2007,2008)

In general, there is no obvious significant relationship with programming language.

Page 6: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

On the effects of defect …How big must a defect be

before we notice it ?

Comparison of 9 commercial packages using the same algorithms on the same data in the same programming language, (Hatton and Roberts (1994))

Page 7: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

On software growth …

The IEEE Software Impact column has taught us Multi-million line systems are now quite common. They double with astonishing consistency about

every 4-5 years, (Michiel’s observation now backed up with quite a few systems).

Page 8: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Yes, systems really fail …on planes

26th February 2007

6 F-22 Raptors were left without major systems when the systems crashed after crossing the International Date Line on route from Okinawa to Hawaii. “It was a software glitch – somebody made an error in a couple of lines of code out of millions.”

Page 9: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Yes, systems really fail …on buses

27-Feb-2012: The launch of the new London Routemaster bus, (cost GBP 1.4 million each) was delayed by a week after a software problem meant it had to run with the rear platform shut.

Page 10: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

22-25(?) June 2012 Routine software update basically stops two of Britain’s

biggest banks (NatWest and RBS) processing payments. The low estimates of what this will cost are in the tens of

millions of pounds.

Yes, systems really fail …and in banks

Page 11: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Some empirical observations There appears no significant correlation with static

properties like decision counts There doesn’t seem to be anything special about

(apparently) zero-defect components Defects cluster – where you find one, you will

usually find more, (strong evidence for this). There appears no significant correlation with

programming language.

A little more about defect

Page 12: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

A little about software systems A hidden clockwork Conclusions

Overview

Page 13: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Software size distributions appear power-law in LOC

Smoothed (cdf) data for 21 systems, C, Tcl/Tk and Fortran,

combining 603,559 lines of code distributed across 6,803

components, (Hatton 2009, IEEE TSE)

Systems appear astonishingly similar in their component size distributions …

Power-law is linear on log-log plot

Page 14: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

How do we build systems from tiny pieces (tokens) ?

However, lines of code are too crude so let’s try tokens. Take an example from C:

void int ( ) [ ] { , ; for = >= -- <= ++ if > -

bubble a N i j t 1 2

void bubble( int a[], int N){ int i, j, t; for( i = N; i >= 1; i--) { for( j = 2; j <= i; j++) { if ( a[j-1] > a[j] ) { t = a[j-1]; a[j-1] = a[j]; a[j] = t; } } }}

Fixed (18)

Variable (8)

+

Total (94)

Page 15: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

How do we build systems from tiny pieces (tokens) ?

So, consider a general software system of T tokens divided into M pieces each with ti tokens, in which size and choice (Hartley-Shannon information I) is conserved.

1 2 3 ….

ti,I’i

… M

M

iitT

1

i

M

ii ItI '

1

Page 16: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Which leads to the following clockwork theorem

ii ap ~

This states that in any software system, conservation of size and information (i.e. choice) is overwhelmingly likely to produce a power-law component size distribution in ai which is the list of unique keywords and identifiers used to build the ith component.

Hatton L (2011) IFIP, Boulder Colorado August 2011.

Page 17: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Application to software systems

Can we test this ? We are looking for the following signature

log pi

log i

ii ap ~

fa

Fixed symbols of the language used

Page 18: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Equilibriation to the clockwork theorem in ~ 400,000 line chunks

42 million lines of Ada, C, C++, Fortran, Java, Tcl-Tk from 80+ systems

Page 19: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Has a simple constant alphabet, ai = 4, so pi is constant.

This implies that the random variable L representing gene length is uniformly distributed, giving

A biological aside: Genetic example of clockwork theorem

M

TkLE .

T is the total length of the genetic sequence and M the number of genes. k is a constant.

Page 20: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

A biological aside: Genetic example of clockwork theorem

This is indeed observed :-

Prokaryotes

Eukaryotes

Xu et al. (2006) “Average gene length is highly conserved in Prokaryotes and Eukaryotes and diverges only between the two kingdoms”, Mol Biol Evol (June 2006) 23 (6), p. 1107-1108

Page 21: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

However, systems evolve such that the total number of defects D is conserved, (since they are finite), (Hatton, (2009) IEEE TSE, 35(4), p. 566-572), giving

So, what can we say about software defects ?

i

i

t

d

i

M

ii e

QpdD

)(

1

1

Combining this with the clockwork theorem

ii ap ~

Page 22: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Predicts the following asymptotic defect distribution

The tloga theorem

iii atd log

Can we test this ?

Page 23: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

The tloga theorem

NAG Fortran Library Eclipse IDE (Java)

95%

Page 24: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Equilibriation to tloga in the Eclipse IDE*

95% data

*With grateful thanks to Andreas Zeller et. al. (2007) for extracting the defect data and making it openly available. http://www.st.cs.uni-sb.de/softevo. The data comes from releases 2.0,2.1 and 3.0. There are 10,613 components in the release 3.0.

Regression linedata

sparse

Page 25: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

This should give us a handle on how well software has been tested simply from the tloga linearity of its current defect distribution.

In other words, we should be able to tell the difference between good software and bad testing.

(It also predicts the observed phenomenon of defect clustering)

The tloga theorem

Page 26: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

A little about software systems A hidden clockwork Conclusions

Overview

Page 27: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

Conclusions

Software component distributions evolve the same way however we build them and whatever they do. They are guided by the unseen hand of information conservation. (Equivalent to energy conservation in physical systems.)

Software defect distribution is fundamentally statistical. This implies clustering which we can exploit.

tloga defect growth will help us with assessing the risk of insufficient maturity in systems.

For truly trustworthy systems, we need to understand the effects of software defect far better than we do.

ii ap ~ iii atd log~

Page 28: Title Slide “ Software’s Hidden Clockwork: a general theory of software defects " Les Hatton  Version 1.1: 25/June/2012 Presentation at

Copyright Les Hatton, 2012-. Copying freely permitted with acknowledgement

References

My writing site:-

http://www.leshatton.org/

Specifically,

http://www.leshatton.org/variations_2010.html

For comments on reproducibility in computational science,

http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html

Thanks for your attention.