scalable clone detection and elimination for erlang programs huiqing li, simon thompson university...
TRANSCRIPT
![Page 1: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/1.jpg)
Scalable Clone Detection and Elimination for Erlang Programs
Huiqing Li, Simon Thompson
University of KentCanterbury, UK
![Page 2: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/2.jpg)
Overview
Erlang
Wrangler
Clone detection
Clone elimination
Case studies
Conclusions and future work
![Page 3: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/3.jpg)
Erlang• Weakly typed functional programming language.
• Built-in support for concurrency, distribution and fault-
tolerance.
• Some eccentricities: multiple binding occurrences,
bound variables in patterns, multiple usages of atoms,
side-effects, .... %% Factorial in Erlang. -module (fac).
-export ([fac/1]).
fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).
![Page 4: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/4.jpg)
Wrangler
Basic refactorings: structural, macro, process and test-framework related
Clone detection+ removal
Improve modulestructure
![Page 5: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/5.jpg)
![Page 6: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/6.jpg)
![Page 7: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/7.jpg)
Clone Detection
![Page 8: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/8.jpg)
Clone Detection
• The Wrangler clone detector
– Report clone classes whose members are
identical or similar
– No false positives
– High recall rate
– Scalable.
![Page 9: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/9.jpg)
X+4 Y+5X+4 Y+5
What is ‘identical’ code?
variable+number
Identical if values of literals and variables
ignored, but respecting binding structure.
![Page 10: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/10.jpg)
(X+3)+4 4+(5-(3*X))
What is ‘similar’ code?
X+Y
The anti-unification gives the (most specific)
common generalisation.
Similarity = min( , , )||(X+3)+4||||4+(5-(3*X))||
||X+Y|| ||X+Y||
![Page 11: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/11.jpg)
Clone Detection
• All clones in a project meeting the threshold
parameters.
• Thresholds:
– minimum number of expressions,
– minimum number of tokens,
– minimum number of duplications,
– maximum number of new parameters, and
– minimum similarity score.
![Page 12: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/12.jpg)
![Page 13: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/13.jpg)
Clone result with threshold values: 1, 40, 2, 4, 0.8:
![Page 14: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/14.jpg)
Clone result with threshold values: 3, 20, 2, 2,0.8:
![Page 15: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/15.jpg)
Implementation
![Page 16: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/16.jpg)
Implementation
• Clone detection in an incremental way.
– Initial clone detection.
– Incremental clone detection.
• AST-based two-phase clone detection.
![Page 17: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/17.jpg)
Parse program, annotate and serialise AST
Generalise and hash expression
Clone detection using generalised suffix tree
Examination of clone candidates using anti-unification
Source Erlang programs
Serialised AAST
Hashed expression sequences
Initial clone candidates
Final clones
The Initial Detection Algorithm
• Bypasses the Erlang pre-processor;
• Location information included In AST;
• Static semantic information added to AST
• AAST traversed, and expression sequences collected.
• Bypasses the Erlang pre-processor;
• Location information included In AST;
• Static semantic information added to AST
• AAST traversed, and expression sequences collected.
• Capture structural similarity between expressions while keeping a structural skeleton of the original;
• Replace certain substrees with a placeholder, but only if sensible to do so.
• Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers.
• Capture structural similarity between expressions while keeping a structural skeleton of the original;
• Replace certain substrees with a placeholder, but only if sensible to do so.
• Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers.
• Check a candidate clone class for anti-unification, and will return none, one or more clone classes;
• Generation of anti_unifier function;
• Generation of application instances.
• Check a candidate clone class for anti-unification, and will return none, one or more clone classes;
• Generation of anti_unifier function;
• Generation of application instances.
![Page 18: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/18.jpg)
The Initial Detection Algorithm
• Designed with incremental clone detection in
mind.
– Use relative locations, every function starts from
location {1, 1};
– Intermediate information cached: AAST, Static
semantic information, hash information, clone
table.
![Page 19: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/19.jpg)
The Incremental Detection Algorithm
• Follow the same steps as the initial detection
algorithm, but reuse and incrementally update
the information cached from the previous run
of the clone detection.
• Take a function, instead of a file, as a unit to
track changes.
• Track the change of clones, mark each clone
class as new, unchanged, change+, changed-,
or change+- .
![Page 20: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/20.jpg)
![Page 21: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/21.jpg)
Clone Elimination
• Fully automatic clone elimination not desirable in
practice.
– Choice of clones to remove.
– functionality of the clone needs to be examined.
– the anti-unification function of a clone class, and its
parameters need to be renamed.
– A host module for the anti-unification function needs
to be selected.
![Page 22: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/22.jpg)
Clone Elimination with Wrangler• Copy and paste the anti_unification function to an proper
Erlang module.
• Modify the anti_unification function is necessary.
• Rename function name.
• Rename variable names.
• Re-order function parameters.
• Apply ‘fold expressions against a function definition’ to
the new function.
![Page 23: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/23.jpg)
Case Study 1
![Page 24: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/24.jpg)
Incremental vs. Standalone Clone Detection
![Page 25: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/25.jpg)
Case Study 2
![Page 26: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/26.jpg)
SIP case study
Session Initiation Protocol
SIP message processing allows rewriting rules to transform messages.
SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.
![Page 27: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/27.jpg)
Clone detection
![Page 28: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/28.jpg)
Clone detection
![Page 29: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/29.jpg)
Reducing the case study
Step1 2658 6 2218 11 2131
2 2342 7 2203 12 2097
3 2231 8 2201 13 2042
4 2217 9 2183 … …
5 2216 10 2149
![Page 30: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/30.jpg)
Case Study 3
![Page 31: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/31.jpg)
![Page 32: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/32.jpg)
Conclusions
• Efficient clone detection on medium-sized projects.• Possible to improve code using these techniques, but only with expert involvement.• A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.
![Page 33: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/33.jpg)
Future Work
• To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions.• To check client code against libraries.
![Page 34: Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697c0061a28abf838cc5651/html5/thumbnails/34.jpg)
http://www.cs.kent.ac.uk/projects/wrangler/
Thank you!