software engineering laboratory, department of computer science, graduate school of information...
TRANSCRIPT
![Page 1: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/1.jpg)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
A Preliminary Study on Impact of Software Licenses on
Copy-and-Paste Reuse
Yu Kashima† , Yasuhiro Hayase†† ,Norihiro Yoshida††† ,
Yuki Manabe† , Katsuro Inoue†
† : Osaka University †† : Toyo University†††: Nara Institute of Science and Technology
1
![Page 2: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/2.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Software Reuse
• Purpose of software reuse– Development of reliable software– Increasing software productivity
• We focus on Copy-and-Paste(CnP)– A basic method of software reuse
2
![Page 3: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/3.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Open Source Software and Licenses
• Open Source Software(OSS)– Derivative works from OSS products are allowed
to be distributed– Reusable source code is increasing because of
increasing OSS products• OSS Licenses
– Many kind of licenses are designed for satisfying various developer’s intent
– Each OSS licenses have different conditions– Reuse is also restricted by the licenses
3
![Page 4: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/4.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Representative OSS Licenses
• 3-clause BSD License(BSD3)– A derivative work must retain copyright notices, list of
conditions and disclaimer of warranties• Apache License Version 2(Apachev2)
– A derivative work must retain copyrights, patents, trademarks and attribution notices
• GNU General Public License Version 2(GPLv2)– A derivative work must be distributed under GPLv2
• LicenseName Code ≡ source code distributed under LicenseNameEx. BSD3 code ≡ source code distributed under BSD3
4
![Page 5: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/5.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
CnP between different license files
• If a developer reuse source code; – Both license of reused code and license of
developing code must be satisfied simultaneously
– Distributions of developing code are prohibited in case
CnP
5
BSD3 GPLv2
CnP
CnP
Apachev2 GPLv2
CnP
![Page 6: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/6.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Impact of License on CnP
• Hypothesis– Characteristic of source code reuse depends on
their license• Frequency of CnP• Kind of licenses used by source code developed by CnP
• To our knowledge, there are no quantitative studies on CnP reuse from the aspect of software license
• We investigate actual OSS to confirm this hypothesis
6
![Page 7: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/7.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Experiment
• An quantitative experiment was performed on a small set
• Purpose– Confirming our hypothesis– Investigating the scalability of our method
• Overview– Investigation of the number of CnP on each license– Code clone detection is used for CnP detection
• Code clone is a code fragment similar to other• Code clone is typically generated by CnP
7
![Page 8: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/8.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Method of Experiment
Step1. License
detection
Source Files
Application X
Application YStep3. Counting Code Clones
Code fragments grouped by their license
8
License #Code Fragments
License A 10
License B 3
… …
Unknown
License A
License B
License A License A
License A License B
Step2. Code Clone
Detection
![Page 9: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/9.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step1. License Detection
• Ninka[1] is used for detecting licenses of source files– Analyzing license description in the source file– Having the high precision of the detected license
• Excluding files Ninka fails to detect their licenses– Files which contain no license description or
unknown license description
[1] D. M. German, Y. Manabe and K. Inoue: “A sentence-matching method for automatic license identification of source code files”, ASE 2010, pp. 437–446 (2010)
9
![Page 10: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/10.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step2. Code Clone Detection
• CCFinder[2] is used for extracting code clone across different application– We assume that CnP within application will not cause license problems
• Filtering– Excluding code clones generated by other than CnP
Ex. getter/setter, variable declarations
• Directions of CnP are undecided
10
License A License B License C
Application X Application Y Application Z
CnP CnP
Getter/Setter[2] T. Kamiya, S. Kusumoto and K. Inoue: “CCFinder: A multilinguistic token-based code clone detection system for large scale source code”, IEEE Transactions on Software Engineering, 28, pp. 654–670 (2002)
Variable Declarations
![Page 11: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/11.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step3. Counting Code Clones(1/2)
• Repeating the following steps to target licenses
1. Select a license as an analysis target
2. Extract clone sets including the license code• Clone set is a set of code clones similar to each
other
3. Count code fragments in extracted clone sets grouped by their license
11
License A License B License C License #Code Fragments
License A 2
License B 1
License C 2
Application X Application Y Application Z
Fragments having CnP relations to License A code
![Page 12: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/12.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Step3. Counting Code Clones(2/2)
• A clone set including both original code fragments and code fragments generated by CnP
→ Counting code fragments in clone sets approximates counting the number of CnP
• Counting the number of CnP to/from target license code fragments
• Although this table includes the CnP of opposite direction, it is enough to understand the brief of summary
12
License A License B License C License #Code Fragments
License A 2
License B 1
License C 2
Application X Application Y Application Z
Fragments having CnP relations to License A code
![Page 13: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/13.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Analyzed Code
• Java files(.java) in Debian GNU/Linux 5.0.2 main section
• Reasons for selecting this target– consisted of various licenses– enable to be analyzed by both Ninka and
CCFinder– an feasible scale for this experiment
13
#Packages 452
#Files 77,452
LOC 8,530,896
![Page 14: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/14.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
License Distribution in Analyzed Code
14
Apach
ev2
GPLv2+
Less
erGPLv
2.1+
GPLnoV
ersio
n,GPLv
2+,L
inkExc
eptio
n
GPLv2
BSD3
GPLv2,
ClassP
athE
xcep
tion
othe
r
No Not
ificat
ion
Unkno
wn lic
ense
02000400060008000
100001200014000160001800020000
#Files
![Page 15: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/15.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result ( BSD3 )
15
License #Fragments Percentage
BSD3 613 92%
GPLv2+ 20 3.0%
Apachev2 16 2.4%
LesserGPL2+ 14 2.1%
GPLv2,ClassPathException 1 0.15%
LesserGPL2.1+ 1 0.15%
• Result of counting code fragments in clone sets including BSD3 fragments grouped by their license• The frequency of license used by code fragments having CnP relationship to BSD3 fragments
• BSD3 code is mostly reused by BSD3 code
![Page 16: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/16.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result ( Apachev2 )License #Fragments Percentage
Apachev2 1533 77%
Apachev1.1 316 16%
LesserGPL2.1+ 42 2.1%
MPLv1.1 33 1.6%
BSD3 29 1.5%
MX4JLicensev1 16 0.80%
GPLv2+ 4 0.20%
LibraryGPL2+ 3 0.15%
MPLv1.0 2 0.10%
MITX11noNotice 2 0.10%
Public Domain 1 0.050%
Subversion+ 1 0.050%
EPLv1 1 0.050%
16
• Large percentage of CnP between Apachev2 code fragments
• Apachev1.1 code has been changed their license to Apachev2
![Page 17: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/17.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Result ( GPLv2+ )
17
License #Fragments Percentage
GPLv2+ 268 44%
GPLnoVersion,GPLv2+,LinkException 225 41%
BSD3 28 5.1%
LibraryGPLv2+ 20 3.6%
Apachev2 4 0.73%
LesserGPLv2.1+ 4 0.73%
• CnP within GPLv2+ code occupy the highest percentage • “GPLnoVersion, GPLv2+, LinkException” has high percentage
• “GPLnoVersion, GPLv2+, LinkException” code is reused by GPLv2+ code.
CnP
GPLnoVersion, GPLv2+, LinkException GPLv2+
CnP
![Page 18: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/18.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
#Files and #Fragments under Each License
18
#Fragments #Files #Fragments / #Files
BSD3 665 2181 0.305
Apachev2 1983 16350 0.121
GPLv2+ 549 8160 0.0673
• The frequency of CnP per file BSD3 > Apachev2 > GPLv2+
• Code under a license is copy-and-pasted frequently, if “#Fragments / #Files” of the license is large
![Page 19: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/19.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Summary of the Results
• Common characteristic of all licenses– CnP within code distributed under same license or
licenses designed by the same organization have a majority• CnP might happen mostly in an organization
• Apachev2 has CnP relations to various licenses– Files under Apachev2 have the largest number– The condition of Apachev2 is more relaxed than
that of GPLv2+• The frequency of CnP per file
BSD3 > Apachev2 > GPLv2+
19
![Page 20: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/20.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Threat to Validity
• Insufficient to apply this result to general OSS– This analysis target is small
→ We plan large scale analysis– Only Java files were analyzed
• History of Java files is short, hence Java files are less copy-and-pasted than others
→ We plan analysis of C/C++ files• Overlap code fragments may be counted separately
– Number of overlap code fragments might be small
20
Fragment A
Fragment B
![Page 21: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/21.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Scalability of Investigating Method
• This method can apply to large target, because each step can– License detection
• Ninka can analyze files in linear order
– Code clone detection• There are more scalable tools than CCFinder such
as CCFinderX and D-CCFinder.
– Counting code clone• This process did not take a long time
21
![Page 22: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/22.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Conclusion
• A preliminary study of impact of licenses on CnP was performed– Java files in Debian/GNU Linux 5.0.2 main section
were analyzed• CnP are happened mostly within code
distributed under the same license or licenses designed by the same organization
• The frequency of CnP per file– BSD3 > Apachev2 > GPLv2+
• Our method can be applied to a large target
22
![Page 23: Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary](https://reader035.vdocuments.site/reader035/viewer/2022081516/551b91cb550346a10a8b59df/html5/thumbnails/23.jpg)
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Future Work
• Large Scale Experiment• Investigating that code fragments are
copy-and-pasted mostly in an organization• Detecting direction of CnP
23