aiding comprehension of cloning through categorization

28
Aiding Comprehension Aiding Comprehension of Cloning Through of Cloning Through Categorization Categorization Cory Kapser and Michael W. Godfrey Cory Kapser and Michael W. Godfrey Software Architecture Group Software Architecture Group School of Computer Science, School of Computer Science, University Of Waterloo University Of Waterloo

Upload: kimama

Post on 12-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Aiding Comprehension of Cloning Through Categorization. Cory Kapser and Michael W. Godfrey Software Architecture Group School of Computer Science, University Of Waterloo. Overview. Motivation Background Methods Case Studies Results Discussion Summary. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Aiding Comprehension of Cloning Through Categorization

Aiding Comprehension Aiding Comprehension of Cloning Through of Cloning Through CategorizationCategorization

Cory Kapser and Michael W. GodfreyCory Kapser and Michael W. Godfrey

Software Architecture GroupSoftware Architecture Group

School of Computer Science, University Of School of Computer Science, University Of WaterlooWaterloo

Page 2: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Page 3: Aiding Comprehension of Cloning Through Categorization

MotivationMotivation

• Code duplication (“cloning”) is Code duplication (“cloning”) is common in large, long-lived common in large, long-lived industrial software systems.industrial software systems.– Negatively affects successful system Negatively affects successful system

evolution!evolution!

• Thus, clone management or Thus, clone management or removal is desirable.removal is desirable.

Page 4: Aiding Comprehension of Cloning Through Categorization

Problems with clone Problems with clone detection technologiesdetection technologies

• ComprehensionComprehension– Result sets often provide little Result sets often provide little

information beyond “it’s a clone”information beyond “it’s a clone”

• ScalabilityScalability– VERY large result sets typicalVERY large result sets typical

• AccuracyAccuracy– Esp. false positivesEsp. false positives

Page 5: Aiding Comprehension of Cloning Through Categorization

Proposed solutionProposed solution

• Classification of clonesClassification of clones– Improve comprehension through Improve comprehension through

informative grouping and statistical informative grouping and statistical analysisanalysis

– Improve scalability through easier Improve scalability through easier navigationnavigation

– Improve accuracy through region-Improve accuracy through region-specific filteringspecific filtering

Page 6: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Page 7: Aiding Comprehension of Cloning Through Categorization

Code cloningCode cloning

• A serious problem in industrial A serious problem in industrial software.software.– Typically, 15% of a system is Typically, 15% of a system is

duplicated code.duplicated code.– As high as 50% in some cases As high as 50% in some cases

[Ducasse][Ducasse]

Page 8: Aiding Comprehension of Cloning Through Categorization

Reasons for code cloningReasons for code cloning

• Perceived costPerceived cost• Time constraintsTime constraints• Insufficient understanding of the Insufficient understanding of the

underlying problemunderlying problem• Architectural clarityArchitectural clarity

Page 9: Aiding Comprehension of Cloning Through Categorization

Problems with clonesProblems with clones

• MaintenanceMaintenance• SizeSize• ComprehensionComprehension• Bugs (copied and new)Bugs (copied and new)• Indication of poor designIndication of poor design

Page 10: Aiding Comprehension of Cloning Through Categorization

Managing clonesManaging clones

• RemovalRemoval• DocumentationDocumentation

Page 11: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Page 12: Aiding Comprehension of Cloning Through Categorization

Our approachOur approach

1.1. Perform clone detectionPerform clone detection

2.2. Extract/define “regions” from Extract/define “regions” from source codesource code

3.3. Map clone pairs to regionsMap clone pairs to regions

4.4. Classify clonesClassify clones

5.5. Filter clonesFilter clones

6.6. Display resultsDisplay results

Page 13: Aiding Comprehension of Cloning Through Categorization

The taxonomyThe taxonomy

• Classifies clones according to Classifies clones according to attributes such as location and attributes such as location and region type of a cloneregion type of a clone

• HierarchicalHierarchical

Page 14: Aiding Comprehension of Cloning Through Categorization
Page 15: Aiding Comprehension of Cloning Through Categorization

ADD A SLIDE HEREADD A SLIDE HERE

• To discuss what you hoped yoru To discuss what you hoped yoru taxonomy would help you withtaxonomy would help you with– Why did you pcik that design?Why did you pcik that design?

• Give an example of how using this Give an example of how using this taxonomy could be helpful in a taxonomy could be helpful in a (simple, made up) example case(simple, made up) example case

Page 16: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Page 17: Aiding Comprehension of Cloning Through Categorization

Case studiesCase studies

• PostgreSQLPostgreSQL– 543,387 LOC543,387 LOC– 1097 source files1097 source files

• Linux kernel file-system subsystemLinux kernel file-system subsystem– 280,177 LOC280,177 LOC– 537 source files537 source files

Page 18: Aiding Comprehension of Cloning Through Categorization

Filtering and classification Filtering and classification resultsresults

• 85 – 87% of clones could be 85 – 87% of clones could be classified using the taxonomyclassified using the taxonomy

• Fewer unclassified clones in Fewer unclassified clones in Same Same Directory ClonesDirectory Clones categorycategory

• Large percentage of false positives Large percentage of false positives were removed via filtering were removed via filtering structuralstructural and and prototypeprototype regions. regions.

Page 19: Aiding Comprehension of Cloning Through Categorization

Overall cloning in the Overall cloning in the systemssystems

• Function ClonesFunction Clones dominate the dominate the SameSame Directory ClonesDirectory Clones..

• Most cloning occurs within the Most cloning occurs within the same directory.same directory.

Page 20: Aiding Comprehension of Cloning Through Categorization

Frequency of clone typesFrequency of clone types

• Very few Very few looploop clones clones• Relatively many Relatively many conditionalconditional clones clones• 38% of the clone pairs in the Linux 38% of the clone pairs in the Linux

fs and 53% of the clone pairs of fs and 53% of the clone pairs of PostgreSQL made up PostgreSQL made up functionfunction clonesclones

Page 21: Aiding Comprehension of Cloning Through Categorization

• It is possible to insert a table here with It is possible to insert a table here with the results even if it is partial (to show the results even if it is partial (to show that the work is there and that there are that the work is there and that there are numbers)?numbers)?

• Or maybe a graph? Nice to have this to Or maybe a graph? Nice to have this to imply: here’s all the hard work we did, imply: here’s all the hard work we did, boy did we sweat, and there are so boy did we sweat, and there are so many results that the obersvations are many results that the obersvations are probably meaningfulprobably meaningful

Page 22: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• DiscussionDiscussion• SummarySummary

Page 23: Aiding Comprehension of Cloning Through Categorization

Cloning comprehensionCloning comprehension

• Classification of clones can improve Classification of clones can improve comprehensioncomprehension– User will have a working understanding of User will have a working understanding of

what a clone in a certain type meanswhat a clone in a certain type means– We believe navigation of the “clone space” We believe navigation of the “clone space”

will be greatly improvedwill be greatly improved– We now know more about cloning as it We now know more about cloning as it

occurs in a software systemoccurs in a software system– Simple metrics are now availableSimple metrics are now available

Page 24: Aiding Comprehension of Cloning Through Categorization

Tool supportTool support

• Clone Interpretation and Clone Interpretation and Classification System (CICS)Classification System (CICS)– Provides GUI to navigate classified Provides GUI to navigate classified

clonesclones– Will provide benchmarking support for Will provide benchmarking support for

clone detection toolsclone detection tools– Many features can be added Many features can be added

complement the sorting of clones in complement the sorting of clones in the taxonomythe taxonomy

Page 25: Aiding Comprehension of Cloning Through Categorization

CICSCICS

Page 26: Aiding Comprehension of Cloning Through Categorization

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• DiscussionDiscussion• SummarySummary

Page 27: Aiding Comprehension of Cloning Through Categorization

SummarySummary

• Management of clones is important Management of clones is important for the healthy evolution of a for the healthy evolution of a software systemsoftware system

• We can make the process of We can make the process of managing clones more managing clones more comprehensible, scalable, and comprehensible, scalable, and accurateaccurate

Page 28: Aiding Comprehension of Cloning Through Categorization

Future workFuture work

• Deeper classificationDeeper classification• Benchmark suiteBenchmark suite• IDE pluginsIDE plugins• Evolution of clonesEvolution of clones