performance evaluation of plagiarism detection method based on the intermediate language
DESCRIPTION
Performance evaluation of plagiarism detection method based on the intermediate language. Vedran Juričić Tereza Jurić Marija Tkalec. Plagiarism detection method. Method for detecting plagiarism in source code for .Net languages C# Visual Basic.Net C++ … Identify similar code fragments - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/1.jpg)
![Page 2: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/2.jpg)
Plagiarism detection methodPlagiarism detection method
Method for detecting plagiarism in source code for .Net languages C# Visual Basic.Net C++ …
Identify similar code fragments
Determine similarity between source files
Based on intermediate language
2
![Page 3: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/3.jpg)
Plagiarism detectionPlagiarism detection
3
1. using System.Text;
2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. foreach (double temp in Input) {7. if (temp>result)8. result = temp; }9. return result; } } }
1. using System.Text;
2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. for (int i=0;i<Input.Length;i++) {7. if (Input[i]>result)8. result = Input[i]; }9. return result; } } }
Similarity = Number of overlapping lines / Total number of lines = 6 / 9 = 66,66%
First Second
![Page 4: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/4.jpg)
But…But…
4
1. using System.Text;
2. namespace Test {3. class Math {4. public double GetMaximum(double[] Input) {5. double result = Input[0];6. foreach (double temp in Input) {7. if (temp>result)8. result = temp; }9. return result; } } }
1. using System;
2. namespace OtherTest {3. class MyClass {4. public double ReturnMaximum(double[] Array) {5. double current = Input[0];6. for (int j=0;j<Input.Length;j++) {7. if (Input[j]>current)8. current = Input[j]; }9. return result; } } }
Similarity = Number of overlapping lines / Total number of lines = 0 / 9 = 0,00%
First Second
![Page 5: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/5.jpg)
ProblemsProblems
Modification of variable names, types, constants
Modification of class member definitions
Line and command reordering
…
Solution Detail analysis Complex preprocessing For each supported language
5
![Page 6: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/6.jpg)
Our solutionOur solution
Convert from source language to low-level language (Common Intermediate Language)
By using existing tools Compiler Disassemler
Tools exist for all .Net languages
6
![Page 7: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/7.jpg)
Our solutionOur solution
7
using System.Text;
namespace Test{ class Math { public double GetMaximum(double[] Input) { double result = Input[0]; foreach (double temp in Input) { if (temp>result) result = temp; } return result; } }}
.method public hidebysig instance float64 GetMaximum(float64[] Input) cil managed { // Code size 61 (0x3d) .maxstack 2 .locals init (float64 V_0, float64 V_1, float64 V_2, float64[] V_3, int32 V_4, bool V_5) IL_0000: nop IL_0001: ldarg.1 IL_0002: ldc.i4.0 IL_0003: ldelem.r8 IL_0004: stloc.0 IL_0005: nop IL_0006: ldarg.1 IL_0007: stloc.3….. IL_0037: ldloc.0 IL_0038: stloc.2 IL_0039: br.s IL_003b
IL_003b: ldloc.2 IL_003c: ret } // end of method C::GetMaximum
C# language
Common Intermediate Language
C# compiler nop ldarg.1 ldc.i4.0 ldelem.r8 stloc.0 nop ldarg.1 stloc.3 … ldloc.0 stloc.2 br.s
ldloc.2 ret
![Page 8: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/8.jpg)
Plagiarism detection systemPlagiarism detection system
Evaluate the performance
Analyze and compare behavior to most commonly used plagiarism detection systems: MOSS JPlag CodeMatch
8
![Page 9: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/9.jpg)
Tested systemsTested systems
MOSS Developed in 1994. Commonly used in computer science faculties Supports 26 programming languages
JPlag Developed in 1996. Commonly used in education Supports C, C++, C# and Java
9
![Page 10: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/10.jpg)
Tested SystemsTested Systems
CodeMatch Developed in 2003. Commercial software Supports 26 languages
ILMatch (our system) Developed in 2010. Supports all .Net languages (currently 59 languages)
10
![Page 11: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/11.jpg)
Testing Testing
6 test categories
50 test cases covering common code modification techniques
Evaluation methods Precision, recall F-measure
11
![Page 12: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/12.jpg)
ResultsResults
12
MOSS JPlag
CodeMatch ILMatch
Highest F-measures
![Page 13: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/13.jpg)
PositivePositive
No impact User comments Code formatting Modification of variable and class names Modification of class members Changing data types
Some impact Replacing expressions and loops Rewritting code in different language
13
![Page 14: Performance evaluation of plagiarism detection method based on the intermediate language](https://reader030.vdocuments.site/reader030/viewer/2022032804/56812a52550346895d8da056/html5/thumbnails/14.jpg)
Further workFurther work
Significant impact Reordering operands Reordering class members Adding redundant statements and variables
Improvements in comparison algorithm
14