aiding comprehension of cloning through categorization

Post on 12-Jan-2016

21 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Aiding Comprehension of Cloning Through Categorization. Cory Kapser and Michael W. Godfrey Software Architecture Group School of Computer Science, University Of Waterloo. Overview. Motivation Background Methods Case Studies Results Discussion Summary. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Aiding Comprehension Aiding Comprehension of Cloning Through of Cloning Through CategorizationCategorization

Cory Kapser and Michael W. GodfreyCory Kapser and Michael W. Godfrey

Software Architecture GroupSoftware Architecture Group

School of Computer Science, University Of School of Computer Science, University Of WaterlooWaterloo

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

MotivationMotivation

• Code duplication (“cloning”) is Code duplication (“cloning”) is common in large, long-lived common in large, long-lived industrial software systems.industrial software systems.– Negatively affects successful system Negatively affects successful system

evolution!evolution!

• Thus, clone management or Thus, clone management or removal is desirable.removal is desirable.

Problems with clone Problems with clone detection technologiesdetection technologies

• ComprehensionComprehension– Result sets often provide little Result sets often provide little

information beyond “it’s a clone”information beyond “it’s a clone”

• ScalabilityScalability– VERY large result sets typicalVERY large result sets typical

• AccuracyAccuracy– Esp. false positivesEsp. false positives

Proposed solutionProposed solution

• Classification of clonesClassification of clones– Improve comprehension through Improve comprehension through

informative grouping and statistical informative grouping and statistical analysisanalysis

– Improve scalability through easier Improve scalability through easier navigationnavigation

– Improve accuracy through region-Improve accuracy through region-specific filteringspecific filtering

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Code cloningCode cloning

• A serious problem in industrial A serious problem in industrial software.software.– Typically, 15% of a system is Typically, 15% of a system is

duplicated code.duplicated code.– As high as 50% in some cases As high as 50% in some cases

[Ducasse][Ducasse]

Reasons for code cloningReasons for code cloning

• Perceived costPerceived cost• Time constraintsTime constraints• Insufficient understanding of the Insufficient understanding of the

underlying problemunderlying problem• Architectural clarityArchitectural clarity

Problems with clonesProblems with clones

• MaintenanceMaintenance• SizeSize• ComprehensionComprehension• Bugs (copied and new)Bugs (copied and new)• Indication of poor designIndication of poor design

Managing clonesManaging clones

• RemovalRemoval• DocumentationDocumentation

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Our approachOur approach

1.1. Perform clone detectionPerform clone detection

2.2. Extract/define “regions” from Extract/define “regions” from source codesource code

3.3. Map clone pairs to regionsMap clone pairs to regions

4.4. Classify clonesClassify clones

5.5. Filter clonesFilter clones

6.6. Display resultsDisplay results

The taxonomyThe taxonomy

• Classifies clones according to Classifies clones according to attributes such as location and attributes such as location and region type of a cloneregion type of a clone

• HierarchicalHierarchical

ADD A SLIDE HEREADD A SLIDE HERE

• To discuss what you hoped yoru To discuss what you hoped yoru taxonomy would help you withtaxonomy would help you with– Why did you pcik that design?Why did you pcik that design?

• Give an example of how using this Give an example of how using this taxonomy could be helpful in a taxonomy could be helpful in a (simple, made up) example case(simple, made up) example case

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• ResultsResults• DiscussionDiscussion• SummarySummary

Case studiesCase studies

• PostgreSQLPostgreSQL– 543,387 LOC543,387 LOC– 1097 source files1097 source files

• Linux kernel file-system subsystemLinux kernel file-system subsystem– 280,177 LOC280,177 LOC– 537 source files537 source files

Filtering and classification Filtering and classification resultsresults

• 85 – 87% of clones could be 85 – 87% of clones could be classified using the taxonomyclassified using the taxonomy

• Fewer unclassified clones in Fewer unclassified clones in Same Same Directory ClonesDirectory Clones categorycategory

• Large percentage of false positives Large percentage of false positives were removed via filtering were removed via filtering structuralstructural and and prototypeprototype regions. regions.

Overall cloning in the Overall cloning in the systemssystems

• Function ClonesFunction Clones dominate the dominate the SameSame Directory ClonesDirectory Clones..

• Most cloning occurs within the Most cloning occurs within the same directory.same directory.

Frequency of clone typesFrequency of clone types

• Very few Very few looploop clones clones• Relatively many Relatively many conditionalconditional clones clones• 38% of the clone pairs in the Linux 38% of the clone pairs in the Linux

fs and 53% of the clone pairs of fs and 53% of the clone pairs of PostgreSQL made up PostgreSQL made up functionfunction clonesclones

• It is possible to insert a table here with It is possible to insert a table here with the results even if it is partial (to show the results even if it is partial (to show that the work is there and that there are that the work is there and that there are numbers)?numbers)?

• Or maybe a graph? Nice to have this to Or maybe a graph? Nice to have this to imply: here’s all the hard work we did, imply: here’s all the hard work we did, boy did we sweat, and there are so boy did we sweat, and there are so many results that the obersvations are many results that the obersvations are probably meaningfulprobably meaningful

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• DiscussionDiscussion• SummarySummary

Cloning comprehensionCloning comprehension

• Classification of clones can improve Classification of clones can improve comprehensioncomprehension– User will have a working understanding of User will have a working understanding of

what a clone in a certain type meanswhat a clone in a certain type means– We believe navigation of the “clone space” We believe navigation of the “clone space”

will be greatly improvedwill be greatly improved– We now know more about cloning as it We now know more about cloning as it

occurs in a software systemoccurs in a software system– Simple metrics are now availableSimple metrics are now available

Tool supportTool support

• Clone Interpretation and Clone Interpretation and Classification System (CICS)Classification System (CICS)– Provides GUI to navigate classified Provides GUI to navigate classified

clonesclones– Will provide benchmarking support for Will provide benchmarking support for

clone detection toolsclone detection tools– Many features can be added Many features can be added

complement the sorting of clones in complement the sorting of clones in the taxonomythe taxonomy

CICSCICS

OverviewOverview

• MotivationMotivation• BackgroundBackground• MethodsMethods• Case StudiesCase Studies• DiscussionDiscussion• SummarySummary

SummarySummary

• Management of clones is important Management of clones is important for the healthy evolution of a for the healthy evolution of a software systemsoftware system

• We can make the process of We can make the process of managing clones more managing clones more comprehensible, scalable, and comprehensible, scalable, and accurateaccurate

Future workFuture work

• Deeper classificationDeeper classification• Benchmark suiteBenchmark suite• IDE pluginsIDE plugins• Evolution of clonesEvolution of clones

top related