revision control system using delta script of syntax tree
DESCRIPTION
Revision Control System Using Delta Script of Syntax Tree. Yasuhiro Hayase Makoto Matsushita Katsuro Inoue Graduate School of Information Science and Technology, Osaka University, Japan. Contents. Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees - PowerPoint PPT PresentationTRANSCRIPT
1Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Revision Control SystemUsing Delta Script of Syntax Tree
Yasuhiro Hayase
Makoto Matsushita
Katsuro Inoue
Graduate School of Information Science and Technology,
Osaka University, Japan
2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees
Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging
Implementation of the System Experiments Conclusion and Future Work
3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Open Source Software Development
Increasing attention on the open-source development.
Developers are using the following tools. Revision Control System
Storing the history of the source codes and the documents through the development process.
Example: CVS, Subversion … Mailing List
Developers and users discuss using Mailing Lists. Bug-Tracking System
4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Merging on Parallel Development
Repository
XX
X1X1
Edit
Check out
Check in
XX
Check out
DeveloperA
DeveloperB
EditX2
X2 X3X3
Check inCheck outthe newest version(= X1)
The modification of Developer Awill be lost if X2 will be checked in.
X
X1
X2X3
Merging byRevision Control System
5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Problems The existing revision control systems used in open-source
development merge the files line-by-line. The line-by-line merging sometimes generates inaccurate
outputs when applied to source code:1. Detecting false conflicts
when the same line is changed by both developers.
2. Overlooking real conflictswhen the changes are occur in different lines.
If the system fails in merging the two files,
the developers have to fix it.
6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Problem 1. False Conflict Developer A and B are editing working copies of the
same file concurrently.If developers changed the same line, the revision
control system detects a conflict.But changes to the same line might not always
conflict, they can be compatible.
int refs;
int refs; /* reference count */
int refs=0;
int refs=0; /* reference count */
Fails in merging
Developer A
Developer B
7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Problem 2. Overlooking Conflict Developer A and B are editing working copies of the
same file concurrently.If developers do not change the same line, the
revision control system does not detect conflict.But changes to different lines may conflict.
int num, sum, avg;
int num, sum;
int num, sum;:avg = sum/num;int num, sum, avg;
:avg = sum/num; Illegal
merging output
Developer A
Developer B
8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Our Research GoalBuild an intelligent merging system and reduce the load on the
developers.
Avoiding false conflict on merging. Finer grained merging.
Reducing problems caused by merging. Checking that the use of a variable corresponds to its
declaration. Allowing the developers to keep their working habits.
The developers can use arbitrary editor to edit source codes. Usability of the new system should be similar to the existing sy
stems.
9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees
Step 1. Converting the Source Code into a TreeStep 2. computing Delta of the TreesStep 3. Merging
Implementation of the System Experiments Conclusion and Future Work
10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Merging Source Codes Recognizing Tree Structure
Difference Computation and Merging of Tree StructureStep 1. Analyze the source codes and convert it to trees.
Step 2. Compute the delta of the trees.
Step 3. Apply the delta to the target tree.
Delta
Source Code
Source Code
Source
Code
Source
Code
Origin of Delta Computation Destination of Delta Computation
Target
Source Code
Source Code
Source Code
Source Code
11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step 1. Source Code Conversion The source code is parsed and an augmented parse-tree is
built The tree includes white-space and comment nodes Each node has a string value A unique ID is assigned to each node:
the current tree is compared with the previous version of the tree stored in the repository
If corresponding node exists, same ID is assigned Otherwise, new unique ID is assigned
Each node corresponding to the use of a variable is linked to the node corresponding the declaration of that variable
4 Declare
5 int 7 i 11 i
10 Statement
1 Block{ int i; i;}
9 <WS>3 <WS> 13 <WS>
6 <WS> 8 ; 12 ;
2 { 14 }
12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step 2. Delta ComputationDelta of two trees is computed Editing Operation
Insertion of a node: insert(NewID, String, ParentID, Index) Deletion of a leaf node: delete(ID) Updating of the node’s string: update(ID, NewString) Moving a sub-tree: move(ID, ParentID, Index)
Editing Script A sequence of editing operations Represents all the operations needed to transform a tree A into a tree
B
2 Declare
3 int 4 i 7 i
6 Statement
1 Block
5 ; 9 ;
insert(10, Declare, 1, 0)10 Declaredelete(8)
update(3, long)move(2, 1, 0) 3 long
2 Declare
4 i 5 ;3 long 8 <WS>
13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Editing ScriptThe differences between the tree A and the tree B are expressed by the editing script.
When determining the editing script, we must care to not include unnecessary operations. Assign a cost to each editing operation. Define the cost of the editing script as the sum of the cost of
each editing operations. Minimize the editing script cost.
An extended version of the existing approximate algorithm FMES is used to compute the delta between the trees.
* S. S. Chawathe, A. Rajaraman, H. Garcia-Molina, and J. Widom.Change detection in hierarchically structured information.In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 493–504, 1996.
14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Delta Computation Algorithm Cost of the editing operations
insert = delete = move = 1update = from 0 to 2 it depends on the value of string before and a
fter the update operation:2*(1 – 2 * length(LCS(before, after))/(length(before)+length(after)))
Algorithm Determine the couples of matching nodes
Leaf nodes: string similarity. Inner nodes except for identifier nodes: match ratio of leaf nodes. Identifier nodes: exact same string or matching of the descendent
nodes. Build the editing script
15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example: Delta Computation
? if
? then
? doA
? Blockdelete(4)
? x
1 doA
2 x
0 Block
3 doB
4 y
5 if
6 then
1 doA
2 x
1 doA
2 x
0 Block
3 doB
4 y
delete(3)insert(5, if, 0, 1)insert(6, then, 5, 0)move(1, 6, 0)
1 doA
2 x
0 Block
5 if
6 then
16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step 3. MergingThe editing script for converting tree A to tree B is
applied to tree C.
Problem:For some operation in the editing script there may not be a corresponding node in the tree C.If no node with a matching ID is present in the tree C, a similar node is searched. Similarity is based on: Matching of the parent node or sibling nodes Similar string
If a suitable node is found, replace the original ID in the editing script with the ID of node found.
17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example of Merging
0 if
1 then 4 else
2 doA 5 doB
3 x
A0 if
1 then
2 doA5 doB
3 x
Bupdate(6, i) move(5, 1, 0)delete(4)
0 if
1 then 4 else
2 doA 8 doC
C
6 i
D1 D3
6 y
9 z3 x
update(6, i) move(5, 1, 0)delete(4)
0 if
1 then
2 doA8 doC
9 z 3 x
update(6, i) move(5, 1, 0)delete(4)
update(6, i) move(8, 1, 0)delete(4)
0 if
1 then
2 doA 8 doC
9 z3 x
4 else
0 if
1 then
D2
2 doA
3 x
No node can be substituted
Node 8 is similar a bit. Building two treesone with the operation applied to node 8,
and one without the operation applied
Node 4 has a child node in tree C2.Building both of trees to which the operation is not
applied and sub-tree whose root is node 4 is deleted.The developer selects one of them
0 if
1 then
2 doA 8 doC
9 z3 x
4 else
0 if
1 then
8 doC 8 doA
3 x9 z
4 else
C1 C2
18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees
Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging
Implementation of the System Experiments Conclusion and Future Work
19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
System ImplementationThe implementation of our system is based on the
existing revision control system subversion. Client-server system The delta computation and the merge operations are made on
the client side.
Target Programming Language is Java.
Repository stores the augmented parse trees instead of the raw source files.
The tree is stored in XML format.
20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
System Overview
subversionServer
Repository
subversionClient
Developer
Delta Computation
Delta Application
Mutual ConversionXML and source code
Node Matching
XML Merging
Converting between source code and XML
21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Check-in and check-out Source Code
subversionServer
Repository
subversionClient
Node MatchingMutual Conversion
XML and source code
OriginalXML File
OriginalXML File
Editedsource code
Editedsource code
XML Filewith Node ID
XML Filewith Node ID
XML Filewithout Node
ID
XML Filewithout Node
ID
Source codeSource code
Edit
Delta Computation
Delta Application
Dataflow on Check-outDataflow on Check-in
Developer
22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Merging
subversionServer
Repository
subversionClient
OriginalXML File
OriginalXML File
Editedsource code
Editedsource code
Dataflow on Merging
XML Filewith Node ID
XML Filewith Node ID
The Newest Version of XML File
The Newest Version of XML File
Delta Computation
Delta Application
Delta
Offer themto Developer
マージ結果の XML
マージ結果の XML
マージ結果の XML
Sorted XML Files as merging result
Mutual ConversionXML and source code
マージ結果の XML
マージ結果の XML
マージ結果の XML
Sorted source codes as
merging result
XML Filewithout Node
ID
XML Filewithout Node
IDNode Matching
Developer
23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Contents Revision Control System Problem on Merging the Source Codes Research Goal Merging the Trees
Step 1. Converting the Source Code into a TreeStep 2. Computing Delta of the TreesStep 3. Merging
Implementation of the System Experiments Conclusion and Future Work
24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 1Checking the proper functionality of the system with a trivial test
case A small source file has been written. (Original) From Original, three variants have been derived:
Variant 1: The variable avg has been deleted.
Variant 2: A method accessing the variable avg had been added
Variant 3: The variable avg has been renamed to average
The deltas between Original and each of the three variants has been computed (Delta 1…3)
Apply each Delta to each Variant.
class C { double num, sum, avg; …}
class C { double num, sum; …}
class C { double num, sum, avg; … m() { … avg … } …}
class C { double num, sum, average; …}
25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of Experiment 1
Delta 1 Delta 2 Delta 3
Variant 1
Illegal output
Detect conflict
Variant 2
Illegal output
Illegal output
Variant 3
Detect conflict
Illegal output
Line-by-line MergingDelta 1 Delta 2 Delta 3
Variant 1
Failed (too many candidates are generated)
Success
Variant 2
Detect conflict
Success
Variant 3
Success Success
Our Algorithm
Our algorithm gave correct a result in 5 out of 6 cases. In just one case our algorithm failed to search a valid substitute node and generated too many candidates.
26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment 2
Evaluating the efficiency of the algorithm at actual software development
Two open source projects has been selected as test cases:Jakarta Project (22,606 files, 162,683 revisions)Eclipse Project (19,420 files, 103,358 revisions)
84 pairs of check-ins where merge occurred have been identified.
The line-by-line merging and our algorithm have been compared.
27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of Experiment 2Line-by-line merging Count Our algorithm Count
Success 71 Success 71
Failure 13 Success 9
Failure 4
Our algorithm succeeded in the cases in which line-by-line merging succeeded.
Our algorithm also succeeded in 9 of the 13 cases in which line-by-line merging failed.
28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result of Experiment 2:Detail when line-by-line merging failedCause of failure of
line-by-line mergingOur algorithm
Comment
Addition or deletion of white space to same line
4 Success 4
Semantic change and reform 1 Success 1
EOL code change 1 Success 1
Overlapped semantic change 2 Success 2 Many candidates are generated in one case.
Overwriting prior change 2 Success 1
Failure 1 Too many candidates are generated.
Semantic Conflict 2 Failure 2
Broken source code 1 Failure 1 Can’t parse and make tree.
3 of the 4 cases in which our algorithm failed are real conflict. But in another one case. our algorithm failed to find substitute nodes and positions, and generated too many candidates. And in one case in which our algorithm succeeded, many candidates are generated also.
29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion and Future Work Summary of this presentation
Problems on existing revision control systems used in open source development.
Syntactic merging of source code as solution.Implementation of the system.Two evaluations.
Future workImproving the precision of the search algorithmImproving user interface for selecting merging result
Highlight the differences between the candidates.Making inter-file link