phd proposal
DESCRIPTION
TRANSCRIPT
Techniques for Detecting and Preventing Copy-and-Paste Errors during Software Development
A Dissertation ProposalBy
Patricia Jablonski
Engineering ScienceClarkson University
September 5, 2007
Outline
Copying and pasting code Modifying copy-and-pasted code Our proposed solution (CnP) Our proof of concept (CReN) Demo of CReN Related Eclipse features Evaluation plan Proposed plan
Copying and Pasting Code
A common form of software reuse Reuse copied code as a template
Why copy and paste code? Duplicate code exactly Defer creating an abstraction Experiment and test
Results in code clones Multiple similar code fragments
What happens when code needs modification?
Modifying Copy-and-Pasted Code (1 of 2)
Expensive software maintenance Original copied code could be erroneous Changes need to be made to each instance
Solutions: clone detection and removal, clone tracking tools Linked editing and simultaneous editing
Clones are selected and linked together so that modifications in one clone can be made to all of the clones that it is linked to simultaneously
Modifying Copy-and-Pasted Code (2 of 2)
Manual modifications can result in undetected errors and unintended inconsistencies
Solution: error detection tools CP-Miner tool
Uses identifier mapping, “forget-to-change” vs. “change”, and unchanged ratio
DECKARD-based tool Uses a count of unique identifiers
What about proactive error prevention?
Our Proposed Solution (CnP)
Provide automated tool support in the IDE Eclipse, Java
Improve software quality during development
What are the main features of the CnP tool? Tracks & highlights copy-pasted statements Detects inconsistencies based on inferences
of the programmer’s intention Inconsistencies are based on inferred rules
What is the current status of CnP?
Our Proof of Concept (CReN) Design and Implementation (1 of 5)
Consistent renaming usage pattern Identifier (for example, variable name)
renaming within a copy-and-paste clone Manual renaming can result in inconsistencies
What are the main features of the CReN tool? Tracks & highlights copy-pasted statements Automatically renames all instances of an
identifier in a group when any one instance in the group is modified, the inferred rules can be refined by the programmer
Our Proof of Concept (CReN) Design and Implementation (2 of 5)
Tracking copy-and-paste clones No clone detection tool or manual selection Clone region: Java file name + clone’s range
Obtaining ASTs from clone locations Abstract syntax tree (AST) API in Eclipse AST captures the source code characters &
their absolute position in the source code Each ASTNode has starting/ending positions
denoting character positions within the node
Our Proof of Concept (CReN) Design and Implementation (3 of 5)
Matching identifiers between clones Determine relationships of identifiers
between copy-and-pasted code fragments Identifiers in the copied code are matched
with their corresponding identifiers in the pasted code
When the code has just been pasted, its contents are identical to the copied fragment, only at a different location
Rules are inferred across all clones
Our Proof of Concept (CReN) Design and Implementation (4 of 5)
Partitioning identifiers into groups Determine relationships of identifiers within
copy-and-pasted code fragments Identifiers in the copied and pasted code are
partitioned into groups and mapped to each other
Defines the group of identifiers that are to be renamed together
Want group of identifiers that resolve to the same variable – use binding, if available
Our Proof of Concept (CReN) Design and Implementation (5 of 5)
Refining the inferred rules When the code is initially pasted, the
inferred rule assumes that all identifiers that would resolve to the same program entity should be renamed consistently
Programmer can choose to exclude the currently renamed identifier from the group (this instance is deleted from the vector)
The updated rule is inferred across all clones
Let’s see if CReN can detect/prevent errors...
Our Proof of Concept (CReN) Usage and Demonstration
Three examples from literature show an inconsistent renaming of identifiers within a copy-and-pasted clone in production code
Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code”, USENIX-ACM SIGOPS Symposium on Operating Systems Design and Implementation (OSDI), 2004.
B. Liblit, A. Aiken, A.X. Zheng, and M.I. Jordan, “Bug Isolation via Remote Program Sampling”, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2003.
L. Jiang, Z. Su, and E. Chiu, “Context-Based Detection of Clone-Related Bugs”, European Software Engineering Conference (ESEC) and ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), 2007.
Demo of CReN
Demonstrate how CReN would catch each identifier renaming error in the examples as if they were currently being written
(Some) CReN future work Consistent renaming of any kind of identifier Allow “undo” of taking identifier out of group Consistent renaming in a user-defined scope Apply renaming across all related clones
How are other Eclipse features related to CReN?
Related Eclipse Features
Find & Replace Text-based search, manually started Not limited to within a code fragment
Rename Refactoring Automatically applies to the whole project Binding is important for it to work
Linked Renaming Like Rename Refactoring, but applies to file
What are our next steps in our research?
Evaluation Plan
We tested CReN with the three examples We plan to perform controlled experiments
Give a homework assignment to students Require them to use Eclipse & CnP plug-in Have them write a suitable application
We plan to evaluate in terms of: Usefulness, usability (user error), user
experience, accuracy (false negatives & false positives), performance
What is our plan after CReN is fully evaluated?
Proposed Plan
Determine usage patterns by using clone detection tools
What other kinds of errors could CnP handle? Lexical/naming pattern inconsistencies
Substring is the same on both sides of = Naming pairs like left/right, top/bottom
Type inconsistencies Inferences can be made about types at the
same positions across clones Improve the mgmt and visualization of clones
Conclusion
Copy-and-paste will remain a common programming practice, which can result in undetected errors
Error detection and prevention should happen during software development, not only “after-the-fact”
So far, we have implemented one of three parts of the proposed CnP tool, called CReN Automatic tracking of copy-and-paste clones Consistent renaming of identifiers within
copy-and-paste clones
Questions / Comments
Extra Slides (CReN Demo Screen Shots)