measuring software product quality

54
T +31 20 314 0950 [email protected] www.sig.eu Measuring Software Product Quality Eric Bouwers June 17, 2014

Upload: lymien

Post on 11-Jan-2017

251 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Measuring Software Product Quality

T +31 20 314 0950 [email protected] www.sig.eu

Measuring Software Product Quality Eric Bouwers

June 17, 2014

Page 2: Measuring Software Product Quality

2 I 54

Software Improvement Group

Who are we? •  Highly specialized advisory company for cost,

quality and risks of software •  Independent and therefore able to give objective

advice What do we do? •  Fact-based advice supported by our automated

toolset for source code analysis •  Analysis across technologies by use of technology-

independent methods

Our mission: We give you control over your software.

Page 3: Measuring Software Product Quality

3 I 54

Services

Software Risk Assessment •  In-depth investigation of software quality and risks •  Answers specific research questions

Software Monitoring •  Continuous measurement, feedback, and decision support •  Guard quality from start to finish

Software Product Certification •  Five levels of technical quality •  Evaluation by SIG, certification by TÜV Informationstechnik

Application Portfolio Analyses •  Inventory of structure and quality of application landscape •  Identification of opportunities for portfolio optimization

Page 4: Measuring Software Product Quality

4 I 54

Measuring Software Product Quality

Today’s topic

Page 5: Measuring Software Product Quality

5 I 54

Some definitions

Projects •  Activity to create or modify software products •  Design, Build, Test, Deploy

Product •  Any software artifact produced to support business processes •  Either built from scratch, from reusable components, or

by customization of “standard” packages Portfolio •  Collection of software products

in various phases of lifecycle •  Development, acceptance, operation,

selected for decommissioning

Product

Product

Product

Product

Product Product

Project

Page 6: Measuring Software Product Quality

6 I 54

Our claim today

“Measuring the quality of software products is key to successful software projects and healthy software portfolios”

Product

Product

Product

Product

Product Product

Project

Page 7: Measuring Software Product Quality

7 I 54

Measuring Software Product Quality

Today’s topic

Page 8: Measuring Software Product Quality

8 I 54

The ISO 25010 standard for software quality

Functional Suitability Performance Efficiency Compatibility

Reliability

Portability Maintainability Security

Usability ISO 25010

Page 9: Measuring Software Product Quality

9 I 54

Why focus on maintainability?

Init- iation Build

Accep- tation

Maintenance

20% of the costs 80% of the costs

Page 10: Measuring Software Product Quality

10 I 54

The sub-characteristics of maintainability in ISO 25010

Maintain

Analyze Modify Test Reuse

Modularity

Page 11: Measuring Software Product Quality

11 I 54

Measuring maintainability Some requirements

Simple to understand

Allow root-cause analyses

Technology independent

Easy to compute

Heitlager, et. al. A Practical Model for Measuring Maintainability, QUATIC 2007

Suggestions?

Page 12: Measuring Software Product Quality

12 I 54

Measuring ISO 25010 maintainability using the SIG model

Volume

Duplication

Unit size

Unit complexity

Unit interfacing

Module coupling

Component balance

Component independence

Analysability X X X X

Modifiability X X X

Testability X X X

Modularity X X X

Reusability X X

Page 13: Measuring Software Product Quality

13 I 54

Measuring maintainability Different levels of measurement

System

Component

Module

Unit

Page 14: Measuring Software Product Quality

14 I 54

Source code measurement Volume

Lines of code •  Not comparable between technologies

Function Point Analysis (FPA) •  A.J. Albrecht - IBM - 1979 •  Counted manually •  Slow, costly, fairly accurate

Backfiring •  Capers Jones - 1995 •  Convert LOC to FPs •  Based on statistics per technology •  Fast, but limited accuracy

Page 15: Measuring Software Product Quality

15 I 54

Source code measurement Duplication

0: abc 1: def 2: ghi 3: jkl 4: mno 5: pqr 6: stu 7: vwx 8: yz

34: xxxxx 35: def 36: ghi 37: jkl 38: mno 39: pqr 40: stu 41: vwx 42: xxxxxx

Page 16: Measuring Software Product Quality

16 I 54

Source code measurement Component balance

Measure for number and relative size of architectural elements •  CB = SBO × CSU •  SBO = system breakdown optimality, computed as distance from ideal •  CSU = component size uniformity, computed with Gini-coefficient

0.0 0.1 0.2 0.6

E. Bouwers, J.P. Correia, and A. van Deursen, and J. Visser, A Metric for Assessing Component Balance of Software Architectures in the proceedings of the 9th Working IEEE/IFIP Conference on Software Architecture (WICSA 2011)

A B C DA B C DA B C DA B C D

Page 17: Measuring Software Product Quality

17 I 54

From measurement to rating A benchmark based approach

0.14

0.96

0.09

0.84 0.09

0.14

0.84

0.96

…. ….

0.34

Note: example thresholds

sort

HHIII

Threshold Score 0.9 HHHHH

0.8 HHHHI

0.5 HHHII

0.3 HHIII

0.1 HIIII

Page 18: Measuring Software Product Quality

18 I 54

But what about the measurements on lower levels?

Volume

Duplication

Unit size

Unit complexity

Unit interfacing

Module coupling

Component balance

Component independence

Analysability X X X X

Modifiability X X X

Testability X X X

Modularity X X X

Reusability X X

Page 19: Measuring Software Product Quality

19 I 54

Source code measurement Logical complexity

•  T. McCabe, IEEE Transactions on Software Engineering, 1976

•  Academic: number of independent paths per method •  Intuitive: number of decisions made in a method •  Reality: the number of if statements (and while, for, ...)

McCabe: 4

Method

Page 20: Measuring Software Product Quality

20 I 54

How can we aggregate this?

My question …

Page 21: Measuring Software Product Quality

21 I 54

Option 1: Summing

Crawljax GOAL Checkstyle Springframework Total McCabe 1814 6560 4611 22937

Total LOC 6972 25312 15994 79474

Ratio 0,260 0,259 0,288 0,288

Page 22: Measuring Software Product Quality

22 I 54

Option 2: Average

Crawljax GOAL Checkstyle Springframework Average McCabe 1,87 2,45 2,46 1,99

0"

200"

400"

600"

800"

1000"

1200"

1400"

1600"

1800"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 17" 18" 19" 20" 21" 22" 23" 24" 25" 27" 29" 32"

Page 23: Measuring Software Product Quality

23 I 54 Cyclomatic complexity

Risk category

1 - 5 Low

6 - 10 Moderate

11 - 25 High

> 25 Very high

Sum lines of code"per category"

Lines of code per risk category

Low Moderate High Very high

70 % 12 % 13 % 5 %

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

Crawljax#

Goal#

Checkstyle#

Springframework#

Option 3: quality profile

Page 24: Measuring Software Product Quality

24 I 54

First Level Calibration

Page 25: Measuring Software Product Quality

25 I 54

The formal six step proces

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

Page 26: Measuring Software Product Quality

26 I 54

Visualizing the calculated metrics

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

Page 27: Measuring Software Product Quality

27 I 54

Choosing a weight metric

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

Page 28: Measuring Software Product Quality

28 I 54

Calculate for a benchmark of systems

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010 70% 80% 90%

Page 29: Measuring Software Product Quality

29 I 54

SIG Maintainability Model Derivation metric thresholds

1.  Measure systems in benchmark 2.  Summarize all measurements

3.  Derive thresholds that bring out the metric’s variability

4.  Round the thresholds

Cyclomatic complexity

Risk category

1 - 5 Low

6 - 10 Moderate

11 - 25 High

> 26 Very high

(a) Metric distribution (b) Box plot per risk category

Fig. 10: Unit size (method size in LOC)

(a) Metric distribution (b) Box plot per risk category

Fig. 11: Unit interfacing (number of parameters)

computed thresholds. For instance, the existence of unit testcode, which contains very little complexity, will result in lowerthreshold values. On the other hand, the existence of generatedcode, which normally have very high complexity, will resultin higher threshold values. Hence, it is extremely important toknow which data is used for calibration. As previously stated,for deriving thresholds we removed both generated code andtest code from our analysis.

VIII. THRESHOLDS FOR SIG’S QUALITY MODEL METRICS

Throughout the paper, the McCabe metric was used as casestudy. To investigate the applicability of our methodology toother metrics, we repeated the analysis for the SIG qualitymodel metrics. We found that our methodology can be suc-cessfully applied to derive thresholds for all these metrics.

Figures 10, 11, 12, and 13 depict the distribution and the boxplot per risk category for unit size (method size in LOC), unitinterfacing (number of parameters per method), module inwardcoupling (file fan-in), and module interface size (number ofmethods per file), respectively.

From the distribution plots, we can observe, as for McCabe,that for all metrics both the highest values and the variabilitybetween systems is concentrated in the last quantiles.

Table IV summarizes the quantiles used and the derivedthresholds for all the metrics from the SIG quality model.As for the McCabe metric, we derived quality profiles foreach metric using our benchmark in order to verify that thethresholds are representative of the chosen quantiles. The

(a) Metric distribution (b) Box plot per risk category

Fig. 12: Module Inward Coupling (file fan-in)

(a) Metric distribution (b) Box plot per risk category

Fig. 13: Module Interface Size (number of methods per file)

results are again similar. Except for the unit interfacing metric,the low risk category is centered around 70% of the code andall others are centered around 10%. For the unit interfacingmetric, since the variability is relative small until the 80%quantile we decided to use 80%, 90% and 95% quantiles toderive thresholds. For this metric, the low risk category is around 80%, the moderate risk is near 10% and the other twoaround 5%. Hence, from the box plots we can observe thatthe thresholds are indeed recognizing code around the definedquantiles.

IX. CONCLUSION

A. ContributionsWe proposed a novel methodology for deriving software

metric thresholds and a calibration of previously introducedmetrics. Our methodology improves over others by fulfillingthree fundamental requirements: i) it respects the statisticalproperties of the metric, such as metric scale and distribution;ii) it is based on data analysis from a representative set ofsystems (benchmark); iii) it is repeatable, transparent andstraightforward to carry out. These requirements were achievedby aggregating measurements from different systems usingrelative size weighting. Our methodology was applied to alarge set of systems and thresholds were derived by choosingspecific percentages of overall code of the benchmark.

B. DiscussionUsing a benchmark of 100 object-oriented systems (C#

and Java), both proprietary and open-source, we explained

Derive & Round"

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

Page 30: Measuring Software Product Quality

30 I 54 Cyclomatic complexity

Risk category

1 - 5 Low

6 - 10 Moderate

11 - 25 High

> 25 Very high

Sum lines of code"per category"

Lines of code per risk category

Low Moderate High Very high

70 % 12 % 13 % 5 %

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

Crawljax#

Goal#

Checkstyle#

Springframework#

The quality profile

Page 31: Measuring Software Product Quality

31 I 54

Second Level Calibration

Page 32: Measuring Software Product Quality

32 I 54

How to rank quality profiles? Unit Complexity profiles for 20 random systems

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH HHHHH

HHHHI

HHHII

HHIII

HIIII

?

Page 33: Measuring Software Product Quality

33 I 54

Ordering by highest-category is not enough!

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH

HHHHI

HHHII

HHIII

HIIII

?

Page 34: Measuring Software Product Quality

34 I 54

A better ranking algorithm

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

Order categories

Define thresholds of given systems

Find smallest possible

thresholds

Page 35: Measuring Software Product Quality

35 I 54

Which results in a more natural ordering

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH

HHHHI

HHHII

HHIII

HIIII

Page 36: Measuring Software Product Quality

36 I 54

Second level thresholds Unit size example

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

Page 37: Measuring Software Product Quality

37 I 54

SIG Maintainability Model Mapping quality profiles to ratings

1.  Calculate quality profiles for the systems in the benchmark 2.  Sort quality profiles 3.  Select thresholds based on 5% / 30% / 30% / 30% / 5% distribution

Select thresholds" HHHHH

HHHHI

HHHII

HHIII

HIIII

Sort"

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

Page 38: Measuring Software Product Quality

38 I 54

SIG measurement model Putting it all together

JavaJSP

Property measurements per technology

Property ratings per technology Measurements per system Rating per system

Unit size HHIIIJavaJSP Unit size HIIII

Unit size HHIII

Volume HHHII

Duplication HHIIIJavaJSP

JavaJSP Duplication HHIII

Duplication HHHII

JavaJSP Unit complexity HHHHI

Unit complexity HHHIIJavaJSP

Unit complexity HHIII

JavaJSP Unit interfacing N/A

Unit interfacing HHHHIJavaJSP

Unit interfacing HHHHI

JavaJSP Module coupling N/A

JavaJSP

Module coupling HHHIIModule coupling HHHII

Component balance HHHHH

Component independence HHHHIJavaJSP

JavaJSP Component independence N/A

Component independence HHHHI

21 Man years

Subcharacteristics

Modifiability HHHII

Analysability HHIII

Testability HHHHI

Modularity HHHHI

Reusability HHHII

N/A

Maintainability rating

Maintainability HHHII

N/A

N/A

Page 39: Measuring Software Product Quality

39 I 54

Does this work?

Your question …

Page 40: Measuring Software Product Quality

40 I 54

SIG Maintainability Model Empirical validation

•  The Influence of Software Maintainability on Issue Handling MSc thesis, Technical University Delft

•  Indicators of Issue Handling Efficiency and their Relation to Software Maintainability, MSc thesis, University of Amsterdam

•  Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

16 projects (2.5 MLOC)

150 versions

50K issues

Internal: maintainability

external: Issue handling

Page 41: Measuring Software Product Quality

41 I 54

Empirical validation The life-cycle of an issue

Page 42: Measuring Software Product Quality

42 I 54

Empirical validation Defect resolution time

Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

Page 43: Measuring Software Product Quality

43 I 54

Empirical validation Quantification

Luijten et.al. Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

Page 44: Measuring Software Product Quality

44 I 54

SIG Quality Model Quantification

Resolution time for defects and enhancements

•  Faster issue resolution with higher quality •  Between 2 stars and 4 stars, resolution

speed increases by factors 3.5 and 4.0

Page 45: Measuring Software Product Quality

45 I 54

SIG Quality Model Quantification

Productivity (resolved issues per developer per month)

•  Higher productivity with higher quality •  Between 2 stars and 4 stars, productivity

increases by factor 10

Page 46: Measuring Software Product Quality

46 I 54

Does this work? Yes

Theoretically

Your question …

Page 47: Measuring Software Product Quality

47 I 54

But is it useful?

Your question …

HHHHH

Page 48: Measuring Software Product Quality

48 I 54

Software Risk Assessment

Analysis in SIG Laboratory

Finalpresentation

Kick-offsession

Strategysession

Technicalsession

Technicalvalidation

session

Riskvalidation

session

Final Report

Page 49: Measuring Software Product Quality

49 I 54

Example Which system to use?

HHHII HHIII

HHHHI

Page 50: Measuring Software Product Quality

50 I 54

Should we accept delay and cost overrun, or cancel the project?

User Interface

Business Layer

Data Layer

User Interface

Business Layer

Data Layer

Vendor framework

Custom

Page 51: Measuring Software Product Quality

51 I 54

Software Monitoring

Source code

Automated analysis

Updated Website

Interpret Discuss Manage

Act!

Source code

Automated analysis

Updated Website

Interpret Discuss Manage

Act!

Source code

Automated analysis

Updated Website

Interpret Discuss Manage

Act!

Source code

Automated analysis

Updated Website

Interpret Discuss Manage

Act!

1 week 1 week 1 week

Page 52: Measuring Software Product Quality

52 I 54

Example Very specific questions

HHHHH

Volume HHHHH

Duplication HHHHH

Unit size HHHHH

Unit complexity HHHHH

Unit interfacing HHHHI

Module coupling HHHHI

Component balance HHHHI

Component independence HHHII

Page 53: Measuring Software Product Quality

53 I 54

Software Product Certification

Page 54: Measuring Software Product Quality

54 I 54

Summary

Volume

Duplication

Unit size

Complexity

Unit interfacing

Module coupling

Component balance

Component indepedence

Analysability X X X X

Modifiability X X X

Testability X X X

Modularity X X X

Reusability X X

Thank you! Eric Bouwers

[email protected]

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%#

Crawljax#

Goal#

Checkstyle#

Springframework#

HHHHH