prioritization of code anomalies based on architecture sensitiveness

31
OPUS Group LES | DI |PUC-Rio - Brazil Prioritization of Code Anomalies Based on Architecture Sensitiveness Roberta Lopes Arcoverde

Upload: roberta-arcoverde

Post on 05-Dec-2014

219 views

Category:

Technology


0 download

DESCRIPTION

Brief summary of the results of my master's research.

TRANSCRIPT

Page 1: Prioritization of Code Anomalies Based on Architecture Sensitiveness

OPUS GroupLES | DI |PUC-Rio - Brazil

Prioritization of Code Anomalies Based on Architecture Sensitiveness

Roberta Lopes Arcoverde

Page 2: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Analyzing the Impact of Code Anomalies I

Code anomalies and architecture problems were related in

of the 40 analyzed versions for 8 different systems

Roberta @ OPUS Group 2

77,5%

Isela Macia, Roberta Arcoverde et al – CSMR2012: On the Relevance of Code Anomalies for Identifying Architecture Degradation Symptoms

Page 3: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Analyzing the Impact of Code Anomalies II

Roberta @ OPUS Group 3

HealthWatcher MobileMedia PDP MIDAS0

102030405060708090

100

Relevant Irrelevant

Page 4: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Analyzing the Impact of Code Anomalies II

Roberta @ OPUS Group 4

HealthWatcher MobileMedia PDP MIDAS0

102030405060708090

100

Relevant Irrelevant

40% of analyzed code anomalies were NOT the cause of architecture problems

Page 5: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Refactoring of Relevant Anomalies

658 refactorings 33% high-level

Move member (16%) Extract class or superclass (12%)

67% low-level Rename (32%) Extract local variable (16%)

Only 37% of all architecture relevant anomalies were refactored

Roberta @ OPUS Group 5

Page 6: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Problem

Developers are not refactoring architecturally relevant anomalies

Identifying architecturally relevant anomalies is not enough

1. Unmanageable lists of anomalies2. High number of false positives3. Order of removal is important (incomplete refactorings)

Roberta @ OPUS Group 6

Page 7: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Problem

How to maximize the effectiveness of code anomaly prioritization,

in terms of their impact on software architecture

Roberta @ OPUS Group 7

Page 8: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Ranking Heuristics

8Roberta @ OPUS Group

Which characteristics could be explored for detecting and ranking architecturally relevant code anomalies ?

Change-proneness Error-proneness Anomaly density (number of anomalies found) Architecture role

Roberta Arcoverde et al – RSSE/ICSE 2012: Automatically Detecting Architecturally-Relevant Code Anomalies

Page 9: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Change-proneness

9Roberta @ OPUS Group

Elements that changed more have higher priorities

Multiple changes might indicate that the element has too many responsibilities

Mining source version control repositories for finding which files changed the most Multiple versions are required

Page 10: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Error-proneness

10Roberta @ OPUS Group

Architecture problems and code-level defects seem to be related

Macia et al, SBCARS 2011 Kim et al, ICSE 2011 Couto et al, CSMR 2012

Mining issue tracking systems and test reports

Page 11: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Anomaly Density

11Roberta @ OPUS Group

Classes containing several code anomalies are considered high-priority targets for refactoring

“Broken windows” theory Empirical evidence

Co-occurrences of Long Method and Feature Envy anomalies in classes that presented architecture problems

Architecture problems concentrated (~40%) in a small subset of classes (4)

Isela Macia, Roberta Arcoverde et al – CSMR2012: On the Relevance of Code Anomalies for Identifying Architecture Degradation Symptoms

Page 12: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Architecture Role

12Roberta @ OPUS Group

Role played by a particular module might affect its priority Boundary modules vs Internal modules Model vs View vs Controller

Challenging Implementation Cannot be completely automated

Three main inputs The existing architecture roles for a software project Their order of relevance Mappings between them and code elements

Page 13: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Heuristics EvaluationProcedures

Two different approaches To what extent ranked elements were related to architecture

problems (previously detected) How accurate are the rankings when compared to those

provided by architects (ground truth)

Roberta @ OPUS Group 13

Dat

a Co

llecti

on Collect needed informationApply heuristic

Arch

itect

ure

Prob

lem

s

Identify ranked elements related to architecture problems

Gro

und

Trut

h Collect ground truth rankingCompare rankings

Page 14: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Heuristics EvaluationComparing rankings

We analyzed the top 10 rankings produced by each heuristic

Roberta @ OPUS Group 14

Rankings were compared using 3 metrics Overlaps Distance between rankings

Eliminating non-overlaps (NSF) Considering non-overlaps as k+1 (NF)

Page 15: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Heuristics EvaluationTarget Applications

*industry systems

Roberta @ OPUS Group 15

MIDAS* HealthWatcher (HW)

MobileMedia (MM)

PDP*

Programming language C++ Java/ AspectJ Java/ AspectJ C#

KLOC 72 46 51 22# of code anomalies 178 273 176 175

# of arch. problems 29 112 90 28

analyzed revisions 1 10 8 409

# of errors - 63 39 116

Page 16: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Change-proneness Results 1/3

System Overlaps NSF NF

HW 8/14 38% 13%

MM 5/10 0% 11%

PDP 5/10 56% 46%

Roberta @ OPUS Group 16

System # of Ranked CE Architecturally Relevant %

HW 14 10 71%

MM 10 7 70%

PDP 10 10 100%

Rankings and actual architecture problems

Comparing rankings to ground truth

Page 17: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Change-proneness Results 2/3

Roberta @ OPUS Group 17

1 2 3 4 5 6 7 8 9 100

1020304050607080

Related to architecture problemsNot related to architecture problems

Page 18: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Change-proneness Results 3/3

Target applications had different evolution patterns HealthWatcher had mostly perfective changes PDP had 409 versions, against ~10 for

HealthWatcher and MobileMedia Elements related to architecture problems

changed together Façades, classes with multiple clients (PDP) Hierarchies (HW)

Roberta @ OPUS Group 18

Page 19: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Rankings and actual architecture problems

Comparing rankings to ground truth

Error-proneness Results 1/1

Roberta @ OPUS Group 19

System Overlaps NSF NF

HW 10/14 100% 26%

MM 3/10 100% 24%

PDP 5/10 17% 26%

System # of Ranked CE Architecturally Relevant %

HW 14 12 85%

MM 10 8 80%

PDP 10 8 80%

Page 20: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Rankings and actual architecture problems

Comparing rankings to ground truth

Anomaly Density Results 1/2

Roberta @ OPUS Group 20

System Overlaps NSF NF

HW 5/10 34% 46%

MM 7/10 59% 30%

PDP 8/10 63% 64%

MIDAS 9/10 60% 80%

System # of Ranked CE Architecturally Relevant %

HW 10 5 50%

MM 10 9 90%

PDP 10 8 80%

MIDAS 10 6* 60%*

Page 21: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Anomaly Density Results 2/2

Code elements infected with multiple anomalies are often perceived as high priority

High number of anomalies on HW hindered the identification of architecturally relevant ones However, elements related to architecture

problems presented the same number and type of code anomalies

Many false positives arise from utilitarian classes

Roberta @ OPUS Group 21

Page 22: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Rankings and actual architecture problems

Comparing rankings to ground truth

Architecture Role Results 1/2

Roberta @ OPUS Group 22

System Overlaps NSF NF

HW 4/10 50% 28%

MM 6/10 78% 59%

PDP 6/10 67% 59%

System # of Ranked CE Architecturally Relevant %

HW 10 4 40%

MM 10 9 90%

PDP 10 10 100%

Page 23: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Architecture Role Results 2/2

Results are highly dependent on the architecture roles specified

PDP had 4 roles identified, with different relevance levels We observed a significant improvement (~80%) on HW

results when specifying 1 additional role Heuristic is mainly important for discarding false

positives when combined to other heuristics In PDP, 21 out of 30 false positives could be distinguished by

analyzing their architecture roles

Roberta @ OPUS Group 23

Page 24: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Concluding Remarks

Heuristics proposed were able to correctly outline architecturally relevant anomalies

Ranked elements were architecturally relevant in 75%-85% average

Groups of classes that change together Groups of classes infected with the same anomalies Groups of classes with the same architecture roles

Unexpectedly, the presence of errors was a good indicator of architecture relevance

Error propagation

Roberta @ OPUS Group 24

Page 25: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Future Work

Evaluate the accuracy of different combinations of the prioritization heuristics (ongoing)

Evaluate the efficacy of the heuristics for improving refactoring effectiveness in a development environment

Controlled experiment Improve implementation

Integration with issue tracking systems Support for ranking code anomalies in a finer level of

granularity (methods)

Roberta @ OPUS Group 25

Page 26: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Roberta @ OPUS Group 26

?

Thank you

Page 27: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Error-proneness Results 2/2

Two approaches for errors mining HealthWatcher: 63 errors (findBugs) MobileMedia: 39 errors (findBugs) PDP: 116 (issue tracking reports and broken tests)

Multiple errors on utilitarian classes affected the accuracy of this heuristic

Combining with the architecture role heuristic could be an improvement

Inherited errors were responsible for the good results presented by HW and MM

Roberta @ OPUS Group 27

Page 28: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Anomaly Density Results 2/2

Code elements infected with multiple anomalies are often perceived as high priority

High number of anomalies on HW hindered the identification of architecturally relevant ones However, elements related to architecture

problems presented the same number and type of code anomalies

Many false positives arise from utilitarian classes

Roberta @ OPUS Group 28

Page 29: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Using metrics-based strategies and tools

April 10, 2023 Seu nome @ OPUS Group 29

Page 30: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Embedded DSL for defining detection strategies

LOC: Lines of CodeNOM: Number of MethodsNCC: Number of Architectural Concerns

Architecture-Sensitive Detection of Code Anomalies

Roberta @ OPUS Group 30

codeanomaly<class> God Class : (LOC > 100) or (NOM > 15) or (NCC > 3)

Page 31: Prioritization of Code Anomalies Based on Architecture Sensitiveness

Detection of Code Anomaly Patterns

More than 80% of architectural problems were related to code smells

... But code smell patterns seemed to be stronger indicators of architectural problems than single code smells ...

Accuracy of current automated strategies More than 60% of the automatically-detected code smells

were not correlated with architectural problems More than 45% of the false negatives were found to be

correlated with architectural modularity problems

31

Isela Macia et al – AOSD 2012: Are Automatically-detected Code Anomalies relevant to Architectural Modularity?

Roberta @ OPUS Group