overview and evaluation of java component search system spars-j

97
1 Overview and Evaluatio n of Java Component Search S ystem SPARS-J Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

Upload: chadwick-ewing

Post on 31-Dec-2015

18 views

Category:

Documents


2 download

DESCRIPTION

Overview and Evaluation of Java Component Search System SPARS-J. Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue** *Japan Science and Technology Agency **Osaka University. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview and Evaluation of Java Component Search System  SPARS-J

1

Overview and Evaluation ofJava Component Search Syste

m SPARS-J

Reishi Yokomori **, Hideo Nishi**, Fumiaki Umemori**, Tetsuo Yamamoto*, Makoto Matsushita**, Shinji Kusumoto **Katsuro Inoue**

*Japan Science and Technology Agency**Osaka University

Page 2: Overview and Evaluation of Java Component Search System  SPARS-J

2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

SPARS-J (Outline)Ranking methodSystem architecture

Experimental evaluation for SPARS-JConclusion and Future work

Page 3: Overview and Evaluation of Java Component Search System  SPARS-J

3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

MotivationA library of software is a fount of wisdom.

Reuse of software components improves productivity and quality.Example of components: source code, document …..

Maintenance activity is more easier with the library.However, a collection of software is not utilized effectively.

A developer doesn’t know an existence of desirable components.Although there are a lot of components, these components are not organized.

We need a system to manage components and to search suitable component.

Page 4: Overview and Evaluation of Java Component Search System  SPARS-J

4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Research aim

We build a system which have functions as follows

searches component, which is suitable for user’s requestmanages the component information

TargetsIntranet

Closed software development environment inside a company

InternetSource code from a lot of open-source-software community

– Source Forge, Jakarta Project. etc.

Page 5: Overview and Evaluation of Java Component Search System  SPARS-J

5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

SPARS-J (Outline)Ranking methodSystem architecture

Experimental evaluation for SPARS-JConclusion and Future work

Page 6: Overview and Evaluation of Java Component Search System  SPARS-J

6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

SPARS-J(Software Product Archive , analysis and Retrieval System for Java)

SPARS-J is Java Source Code Search System

analyzes and extracts components automatically.

Component: a source code of class or interface

builds a database based on the analysis.Use-Relation, Similar Components, Metrics, .....

provides keyword-search.Three ranking methods: KR, CR, KR+CRAnalysis information

– Components using (used by) the component– Package hierarchy

Page 7: Overview and Evaluation of Java Component Search System  SPARS-J

7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking method1. Component used repeatedly (by important

component)– Ranking based on use relation between components

2. Component suited to a user request– Frequency of word appearance (arranged TF-IDF)– A class-name, a method-name, ..., have special

importance

3. Integrated Ranking– Components prized both in KR and CR are very important – Integration by Borda Count method

Ranking search results

Keyword Rank (KR)

Component Rank (CR)

KR+CR Rank ( KR+CR)

Page 8: Overview and Evaluation of Java Component Search System  SPARS-J

8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

System architecture of SPARS-J (Building a Database)

Component analysis • extracts components• indexes each appeared word• extracts use-relation• clustering similar components • calculates Component rank

Database

Component retrieval

User interface

Library(Java source files)

store

provide

• Component Information• Indexes • Use-Relation• Clustered Component Graph • Component Rank

Page 9: Overview and Evaluation of Java Component Search System  SPARS-J

9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

System architecture of SPARS-J (Searching Components)

Component analysisComponent retrieval

• searches components from Indexes• sorts components by CR, KR, KR+CR

User

User interfaceQuery

Result

•analyzes query•Analysis condition •Keywords•displays search results• Additional Information Source Code Use Relation Similar Components Metrics etc.........

RequestComponents List

Database• Component Information• Indexes • Use-Relation• Clustered Component Graph • Component Rank

Query

Information

Page 10: Overview and Evaluation of Java Component Search System  SPARS-J

10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Top page)

Page 11: Overview and Evaluation of Java Component Search System  SPARS-J

11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Search results)

Page 12: Overview and Evaluation of Java Component Search System  SPARS-J

12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Source code)

Page 13: Overview and Evaluation of Java Component Search System  SPARS-J

13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Similar components)

Page 14: Overview and Evaluation of Java Component Search System  SPARS-J

14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Using the component)

Page 15: Overview and Evaluation of Java Component Search System  SPARS-J

15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Used by the component)

Page 16: Overview and Evaluation of Java Component Search System  SPARS-J

16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (Package browsing)

Page 17: Overview and Evaluation of Java Component Search System  SPARS-J

17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

SPARS-J (Outline)Ranking methodSystem architecture

Experimental evaluation for SPARS-J Conclusion and Future work

Page 18: Overview and Evaluation of Java Component Search System  SPARS-J

18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experimental Evaluation

1. Comparison of each ranking method in SPARS-JWe investigate the best ranking methodCR vs. KR vs. CR+KR

2. Comparison with other search enginesWe verify SPARS-J’s effectiveness as a software component search engine.vs. Google, Namazu

3. Application of SPARS-J in actual development environment

We confirm that SPARS-J is useful to management and understanding of software.

Page 19: Overview and Evaluation of Java Component Search System  SPARS-J

19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 1: Comparison of ranking method in SPARS-J

Purpose of ExperimentWe investigate the best method among 3 ranking method in SPARS-J.

1. CR (Based on Use-relation)2. KR (Based on TF-IDF)3. CR+KR ( Integrating 1 & 2)

Preparation Database from Java source codes publicly available

About 140,000 files from JDK, SourceForge, etc.....

Keywords10 queries assumed development of simple system

Page 20: Overview and Evaluation of Java Component Search System  SPARS-J

20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 1: Comparison of ranking method in SPARS-J

Criterion of EvaluationPrecision of components in the top 10 Result :

The percentage of suitable components– User tends to look at only a higher ranked results.– High precision means that there are many useful components in ran

ge of user’s visibility.

Ndpm :The percentage of the component pair which differs rank order between two ranking methods.– We define user‘s ideal ranking in advance, and calculate ndpm.

» The quantitative indicator which shows a distance from ideal– Ndpm considers all the components in a search result.

» Its distance becomes large when required components are ranked low.

Page 21: Overview and Evaluation of Java Component Search System  SPARS-J

21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result (Experiment 1)

Keyword CR KR

CR+KR

CR KRCR+K

R

A 1 1 1 0.036 0.048 0.037

B 1 1 1 0.194 0.261 0.221

C 0.5 0.5 0.5 0.133 0.117 0.092

D 0.4 0.9 0.8 0.123 0.200 0.189

E 0.4 0.4 0.4 0.208 0.192 0.194

F 0.2 0.2 0.2 0.184 0.184 0.160

G 0.9 1 1 0.081 0.103 0.080

H 1 0.8 1 0.047 0.109 0.052

I 0.6 0.7 0.7 0.210 0.324 0.267

J 0.5 0.7 0.7 0.219 0.243 0.114

Ave. 0.65 0.72 0.73 0.1430.17

80.141

Precision Ndpm

Page 22: Overview and Evaluation of Java Component Search System  SPARS-J

22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Consideration (Experiment 1)

By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.

Precision: KR,CR+KR ≫ CRNdpm: CR,CR+KR ≫ KR

Characteristic of each method CR

CR generally ranks components in desirable order. Higher ranked components are important but often have no relevance to keyword.

KRKR generally appreciates components which have strong relevance. In required component, keyword doesn’t always appear with high frequency.

CR+KRCR+KR has good result at both precision and ndpm.CR+KR has the best of both ranking

We use CR+KR as a default ranking method.

Page 23: Overview and Evaluation of Java Component Search System  SPARS-J

23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 2:Comparison with other search engines

Purpose of ExperimentWe verify SPARS-J’s effectiveness as a software component search engine.

1. SPARS-JDatabase from 140,000 files (Same as Experiment 1)We use CR+KR as ranking method.

2. GoogleFamous web search Engine Input queries to www.google.co.jp

3. NamazuFull-text search system for documents.Namazu uses TF-IDF to rank documents.Database from 140,000 files (Same files as SPARS-J)

Preparation Keywords: 10 queries (Same as Experiment 1)Criterion of Evaluation: Precision of the top 10 Result

Page 24: Overview and Evaluation of Java Component Search System  SPARS-J

24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result (Experiment 2)

keyword SPARS-J Google Namazu

A 1 0.7 0.9

B 1 0.4 0.6

C 0.5 0.3 0.4

D 0.8 0.3 0.6

E 0.4 0.1 0.3

F 0.2 0 0.1

G 1 0.3 0.4

H 1 0.1 0.2

I 0.7 0.4 0.4

J 0.7 0.4 0.7

Ave. 0.73 0.3 0.46

Precision of the top 10 result

Page 25: Overview and Evaluation of Java Component Search System  SPARS-J

25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Consideration (Experiment 2)

By Paired-Difference T-Test, we have confirmed that following difference are significant at the 5% level.

PrecisionSPARS-J≫ Namazu ≫ Google(*) SPARS-J (CR, KR, CR+KR) ≫ Namazu

Consideration of ResultsGoogle

In the result, there are many pages other than an explanation of Java source code.Performance depends on how much description there are.

NamazuSince the datasets consists of only source codes, the result is better than Google.Without characteristics of Java programs, we cannot get good results.

For searching software components, SPARS-J is more useful than other search engines.

Page 26: Overview and Evaluation of Java Component Search System  SPARS-J

26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment 3: Application of SPARS-J in actual development environment

Purpose of ExperimentWe confirm that SPARS-J is useful to management and understanding of software resource.

Criterion of EvaluationQualitative evaluation about SPARS-J

Preparation We set up SPARS-J to a company.

7 employees use SPARS-J for two weeks.They are all engaged in the software development and the maintenance activity.

We carry out a questionnaire survey about SPARS-J

Page 27: Overview and Evaluation of Java Component Search System  SPARS-J

27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Result (Experiment 3)

Questionnaire Item \ examinee

A B C D E F G Mode

Package Browser 4 5 5 5 4 3 3 5

Similar components 4 5 5 2 4 3 5,4

Components used by the class 5 5 5 5 5 5 5

Components using the class 5 1 5 5 5 5 5

Metrics of the class 1 4 1 2 4 5 4,1

Download of the class 1 3 5 5 2 5 5

Contribution to reduction of time cost 3 5 5 3 4 1 5,3

Improvement for software quality 5 3 3 3 4 1 3

Understanding of software resource 3 1 5 3 5 2 1 5,3,1

View-ability of the component-list view 4 4 5 5 3 3 5 5

View-ability of the highlighted source code

3 5 5 5 5 5 5 5

( [Useful or Used repeatedly] 5 4 3 2 1 [Useless or seldom Used] )

Page 28: Overview and Evaluation of Java Component Search System  SPARS-J

28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Consideration (Experiment 3)

Highly rated questionnaire itemsReference by package browserReference by similar componentsReference by components using (used by) the classView-ability of the component list view and source code

Activities realized by using SPARS-JListing of applications which uses certain componentImpact analysis at reediting components

Page 29: Overview and Evaluation of Java Component Search System  SPARS-J

29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Consideration (Experiment 3)

Other commentResponse speed is very quick, and we have felt no stress. Since it is not necessary to install in a client, sharing of software components is easy.

SPARS-J can support maintenance work effectively.

Easier grasp of software components

Page 30: Overview and Evaluation of Java Component Search System  SPARS-J

30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future worksConclusion

We construct software component search system SPARS-J.

Search engine for Java source codeRanking components with consideration of characteristics. Provision of useful relevant information.

We verified the validity of SPARS-J based on experimental evaluation.

SPARS-J is useful to search software components. SPARS-J is very helpful to grasp and manage components.

Future worksThe quantitative evaluation other than ranking performanceSupport for other software component

Page 31: Overview and Evaluation of Java Component Search System  SPARS-J

31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Page 32: Overview and Evaluation of Java Component Search System  SPARS-J

32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Page 33: Overview and Evaluation of Java Component Search System  SPARS-J

33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Page 34: Overview and Evaluation of Java Component Search System  SPARS-J

34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 35: Overview and Evaluation of Java Component Search System  SPARS-J

35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component analysis part

Extract component and its information from a Java source fileThe process

Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)

Page 36: Overview and Evaluation of Java Component Search System  SPARS-J

36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract and index a component

Extracting componentFind class or interface block in a java source file

Location information in the file (start line number, end line number)

IndexingExtract index key from the component

Index key : a word and the kind of itNo reserved words are extracted

Count frequency in use of the word

word kind

Sort Class name

quicksort Comment

quicksort Method name

pivot Variable name

quicksort Method call

: :Index key

public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}

1

1

1

1

2

:frequency

Page 37: Overview and Evaluation of Java Component Search System  SPARS-J

37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations

Node: componentEdge: use relation

Inheritance

Interfaceimplementati

on

Variable type

Instance creation

Field access

Method callThe kind of use relation

public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}

Sort

Data

Test

Component graph

InheritanceField access

Method call

Page 38: Overview and Evaluation of Java Component Search System  SPARS-J

38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have

C

B F

A D

G

E

Component graph

BF

AD E

C G

Clustered component graph

C

B F

A D

G

E

Page 39: Overview and Evaluation of Java Component Search System  SPARS-J

39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clustering components

We measure characteristics metrics to merge componentsThe difference ratio of each component metrics

Metricscomplexity

– The number of methods, cyclomatic, etc. – represent a structural characteristic

Token-composition– The number of appearances of each token– represent a surface characteristic

Page 40: Overview and Evaluation of Java Component Search System  SPARS-J

40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking based on use relation

Component Rank (CR)Reusable component have many use relation

The example of use is muchGeneral purpose componentSophisticated component

We measure use relation quantitatively, and rank components

The component used by many components is importantThe component used by important component is also importantKatsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank:

Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

Page 41: Overview and Evaluation of Java Component Search System  SPARS-J

41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.34 0.33

0.33

0.17

0.17

0.330.33

Ad-hoc weights are assigned to each node

Page 42: Overview and Evaluation of Java Component Search System  SPARS-J

42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.33 0.17

0.5

0.175

0.175

0.170.5

The node weights are re-defined by the incoming edge weights

Page 43: Overview and Evaluation of Java Component Search System  SPARS-J

43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.5 0.175

0.345

0.25

0.25

0.1750.345

We get new node weights

Page 44: Overview and Evaluation of Java Component Search System  SPARS-J

44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.4 0.2

0.4

0.2

0.2

0.20.4

• We get stable weight assignment next-step weights are the same as previous ones

• Component Rank : order of nodes sorted by the weight

Page 45: Overview and Evaluation of Java Component Search System  SPARS-J

45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 46: Overview and Evaluation of Java Component Search System  SPARS-J

46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component retrieval part

Search components from database, rank components The process

Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)

Page 47: Overview and Evaluation of Java Component Search System  SPARS-J

47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Search components

Search queryWords a user inputThe kind of an index word, package name

Components contain given query are searched from Database

Page 48: Overview and Evaluation of Java Component Search System  SPARS-J

48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking suited to a user request

Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight

– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class

name

Sort the sum of all given word weightTF-IDF weighting using full-text search engine

Page 49: Overview and Evaluation of Java Component Search System  SPARS-J

49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Calculation of KR value

Calculate weight Wct with component c word t

TFi : The frequency with which a kind i of word t occurs in component c IDF : the total number of components / the number of components containing word tkwi : Weight of a kind i

KR value is the sum of all word Wct

kindall

iict IDFTFkww ) (

the kind of a word

weight

Class name 200

Interface name 50

Method name 200

Package name 50

Import 30

Method call 10

Field access 10

Variable type 10

Instance creation

10

Local var access 1

Comment 30

Doc comment 50

Line comment 10

String 1

Page 50: Overview and Evaluation of Java Component Search System  SPARS-J

50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Aggregate two ranks

Aggregate two ranks KR and CRAggregation method

Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards

SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated

Page 51: Overview and Evaluation of Java Component Search System  SPARS-J

51Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…

A : 15+3+6+4=28pointsB : 38pointsC : 38pointsD : 22pointsE : 26points

1st

2nd

3rd

4th

5th

3 A B C D E

3 E B C D A

2 C B A E D

2 C D B A E

1st

1st

3rd

4th

5th

B C A D E

Aggregation

Page 52: Overview and Evaluation of Java Component Search System  SPARS-J

52Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 53: Overview and Evaluation of Java Component Search System  SPARS-J

53Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

User interface

Receive a user’s query and provide the search results through Web browser

Microsoft Internet Explore, Mozilla, etc.

The processParse query word and the search conditionShow rank ordered resultsShow analyzed information of the component

Used by/Using the componentMetrics

Page 54: Overview and Evaluation of Java Component Search System  SPARS-J

54Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Analyzed informationA component information are as follows

MetricsThe number of method, variableLOC, cyclomaticEtc. (measurable metrics in the component itself)

Components used by/using the componentShow lists of nodes followed use relation

Components that are similar to the component

Show lists of similar components

Page 55: Overview and Evaluation of Java Component Search System  SPARS-J

55Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Package browsing

The naming structure for Java packages is hierarchical

A user can search lists of components in same package of a component easily

Page 56: Overview and Evaluation of Java Component Search System  SPARS-J

56Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 57: Overview and Evaluation of Java Component Search System  SPARS-J

57Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(1/2)

Comparison with GoogleRegister about 130,000 components get from InternetQuery words ‘calculator applet’ and ‘chat server client’

Calculate relevance ratio of 10 rank higherRelevance: The component is reusable source code

Google is a web search engine…Add ‘java source’ term to the query wordsFollow one link from the result web page

Page 58: Overview and Evaluation of Java Component Search System  SPARS-J

58Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(2/2)Example 1 :

”calculator applet”SPARS-J

9 hits7 suited components

Example 2 :”chat server client”SPARS-J

69 hits57 suited components

Using SPARS-J, suited component is high order

orderrank

componentrelevant ofnumber Theratio relevance

SAPRS-J Google SPARS-J Google

order

Relevance

Ratio Relevance

Ratio Relevance

Ratio Relevance

ratio

1 ○ 1 ○ 1 ○ 1 × 0

2 ○ 1 × 0.5 ○ 1 × 0

3 ○ 1 ○ 0.67

○ 1 × 0

4 ○ 1 × 0.5 ○ 1 × 0

5 ○ 1 ○ 0.6 ○ 1 × 0

6 × 0.83

○ 0.67

○ 1 × 0

7 ○ 0.86

× 0.57

○ 1 ○ 0.14

8 × 0.75

○ 0.63

○ 1 × 0.13

9 ○ 0.78

× 0.56

○ 1 ○ 0.22

10 - - × 0.5 ○ 1 ○ 0.3

Example1 Example2

Page 59: Overview and Evaluation of Java Component Search System  SPARS-J

59Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future work

We developed component search engine SPARS-JUsing SPARS-J, retrieval of components used well is enabled easily.

Future workMorphological analysis of Index keywordCollaborative filteringInvestigate best ranking method

The value of weightAggregation ranks

Evaluation of SPARS-JUsability

Page 60: Overview and Evaluation of Java Component Search System  SPARS-J

60Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

End

Page 61: Overview and Evaluation of Java Component Search System  SPARS-J

61Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component graph

A B

C

ED

F

G

IH

System X System Y

componentuse relation

Page 62: Overview and Evaluation of Java Component Search System  SPARS-J

62Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weight of nodes

A B

C

ED

F

G

IH

System X System Y

sum of all node weights = 1 ... (1)weight of node represents significance of node

0.10.1

0.2

0.1 0.1

0.1

0.2

0.050.05

1 w(x) 0

Page 63: Overview and Evaluation of Java Component Search System  SPARS-J

63Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weights of edges

A0.2

0.05

0.05

0.05

0.05

B

0.2

0.05

0.15

0.4

d=1/4

d=1/4

d=1/4

d=1/4

d: distribution ratio

• Node weight is distributed to each outgoing edge• Edge weights are collected at the destination node

sum of all outgoing edge weights = origin node weight ... (2)sum of all incoming edge weights = destination node weight ... (3)

Page 64: Overview and Evaluation of Java Component Search System  SPARS-J

64Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definition of weights

Under constraints (1)~(3), we have a simultaneous equation

)(

)(

)(

2

1

nvw

vw

vw

)(

)(

)(

2

1

nvw

vw

vw

t

ddd

ddd

ddd

nnnn

n

n

21

22221

11211

= .

Dt: transposed matrix of distribution ratios

W: node weight vector

This simultaneous equation can be solved by propagating node weight through edges in the graph

Page 65: Overview and Evaluation of Java Component Search System  SPARS-J

65Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Pseudo use relation

A B C

• Weight computation does not always converge

• Add a pseudo edge from a node to another, if there is no 'real' edge

• Distribution ratios: pseudo edges << real edges

Page 66: Overview and Evaluation of Java Component Search System  SPARS-J

66Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Markov model

• Component rank model can be considered as a Markov Chain of user's focus

• User's focus moves from one component to another along a use relation at a fixed time duration

• Node weight represents the existence probability of the user's focus at infinite future

0.01

0.02 0.01

0.030.05

0.001 0.1

Page 67: Overview and Evaluation of Java Component Search System  SPARS-J

67Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Related WorksMarkov models of documentation traversal

Influence Weight: impact factor of journal publication thought incoming referencesPage Rank: weight of HTML in the Internet through incoming web links

Explicit use relationsNo clustering (important for software products)

Measurement reusability of components or interfaces

Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components

Page 68: Overview and Evaluation of Java Component Search System  SPARS-J

68Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

1. 利用関係に基づく順位付け

Component Rank ( CR )法利用関係から部品の利用実績を評価し,順位付けする

多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要

多く利用される部品や,重要な箇所で利用される部品に大きな評価値が与えられる

使用例が多く,汎用的な部品が上位に来る

Page 69: Overview and Evaluation of Java Component Search System  SPARS-J

69Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

2. キーワード出現頻度に基づく順位付け

Keyword Rank ( KR )法文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える

部品に繰り返しあらわれる索引キー希少で,特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー

重みの総和を評価値として順位付け

クエリと適合する部品が上位に来る

Page 70: Overview and Evaluation of Java Component Search System  SPARS-J

70Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

3.CR と KR を統合した順位付け

各順位付けの観点1. 部品の使用例の多さ,汎用さ

– CR 法2. クエリと部品内容の適合度

– KR 法

順位を統合して,両面で優れている部品を検索結果の上位に表示する

Borda の手法部品検索部で行うため高速な方法が望ましい

Page 71: Overview and Evaluation of Java Component Search System  SPARS-J

71Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

部品群グラフをもとにした繰り返し計算

計算手順1. 各頂点に適当な重みを与える

– 重みの総和は 1

2. 各有向辺の重みを求める– 頂点の重みを,出ていく辺で分配する

3. 各頂点の重みを再計算– 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定

義する4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値と

する– 部品の評価値は属する部品群の CR 値とする

C10.334

C20.333

C30.333

C10.334

C20.333

C30.333

v1×50%

v1×50%

v2×100%v3×100%

C1 C2

C3

0.167

0.167

0.3330.333

C10.333

C20.167

C30.500

C1 C2

C3

0.1665

0.1665

0.1670.500

C10.500

C20.1665

C30.3335

C10.400

C20.200

C30.400

0.200

0.200

0.2000.400

CR 値の計算

Page 72: Overview and Evaluation of Java Component Search System  SPARS-J

72Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

研究の目的ソフトウェア部品の効率的な検索により,

既存部品を他のシステム開発で用いる再利用支援部品の関連の掌握によるプログラム理解や保守支援

SPARS-J に対して実験的評価を行い,システムの有効性を検証する

Page 73: Overview and Evaluation of Java Component Search System  SPARS-J

73Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

評価にあたっての問題点検索システムの順位付け性能評価

評価用のテストコレクションを用いれば容易ソフトウェア部品検索システムを対象としたテストコレクションは存在していない

言語,部品の単位等の違いにより,テストコレクションの構築が非常に困難

独自の評価実験を適用する

Page 74: Overview and Evaluation of Java Component Search System  SPARS-J

74Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験の内容1. 他の検索システムとの比較

– 一般的な検索システムとの比較により, SPARS-Jのソフトウェア部品検索システムとしての有効性を検証する

2. SPARS-J の各順位付け手法の比較– 各順位付け手法を比較し,最も妥当な順位付け手

法を調査する3. 実際の開発環境における SPARS-J の適用実験

– 企業において実際に利用してもらうことで,ソフトウェアの管理や理解に有用であることを確認する

Page 75: Overview and Evaluation of Java Component Search System  SPARS-J

75Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験1 .  他の検索システムとの比較

実験の目的一般的な検索システムとの比較により, SPARS-J のソフトウェア部品検索システムとしての有効性を検証する

比較対象Google : Web ページ検索システムで,様々な目的での検索に利用されるNamazu :信頼性の高い日本語全文検索システム

評価尺度適合率:検索結果のうちの適合部品数の割合

割合が高いほど,検索結果に利用できる部品が多く含まれる

Page 76: Overview and Evaluation of Java Component Search System  SPARS-J

76Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

適合率の対象検索結果全てを対象として適合率を求めても意味がない

適合部品数が同じであれば,上位にあっても下位にあっても同じ評価となってしまう

検索結果上位の部品について適合率を求めるべきである

Web ページ検索では,検索結果の最初の1ページ( 10 件)目に該当文書が見つからない場合,2ページ目を検索するよりは検索キーワードを変更する傾向がある†

検索結果の上位 10 件の部品に対する適合率を求める

†Amanda Spink, B. J. Jansen, D. Wolfram, T. Saracevic:”From E-Sex to E-Commerce: Web Search Changes” IEEE Computer,Vol.35,No.3,pp.107-109,Mar(2002).

Page 77: Overview and Evaluation of Java Component Search System  SPARS-J

77Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験1 .  他の検索システムとの比較

準備データベース

SPARS-J と Namazu に関しては, JDK およびWeb 上で公開されているソースコード(約 14万個のソースファイル)で構築Google は公開されている検索エンジンをそのまま用いる

検索キーワード簡単なシステムの開発を想定したクエリを 10個用意する

手順各検索システムの検索結果上位 10 件の部品を対象として,適合率を求める

Page 78: Overview and Evaluation of Java Component Search System  SPARS-J

78Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験1の結果

keyword SPARS-J Google Namazu

A 1 0.7 0.9

B 1 0.4 0.6

C 0.5 0.3 0.4

D 0.8 0.3 0.6

E 0.4 0.1 0.3

F 0.2 0 0.1

G 1 0.3 0.4

H 1 0.1 0.2

I 0.7 0.4 0.4

J 0.7 0.4 0.7

Ave. 0.73 0.3 0.46

各検索システムの適合率の比較

Page 79: Overview and Evaluation of Java Component Search System  SPARS-J

79Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験1の評価対応のある平均値の差の検定有意水準5%で以下の有意差が見られた

適合率SPARS-J ≫ Namazu ≫ Google

Page 80: Overview and Evaluation of Java Component Search System  SPARS-J

80Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験1の考察Google

Web ページ検索システムであり Java ソフトウェア部品に関係するページ以外のものも多く含まれているので,検索結果を絞りきれなかった

NamazuSPARS-J とデータベースは同じ

Google の結果と比較して,検索結果を絞ることができたと考えられる日本語全文検索システムでありソースコードの構文解析を行っておらず, Java ソースコードの特性を考慮していない

SPARS-J は他の検索システムより,ソフトウェア部品を検索するときに有用である

データベースによる差だと考えられる

順位付け性能による差だと考えられる

Page 81: Overview and Evaluation of Java Component Search System  SPARS-J

81Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2 .  SPARS-J の各順位付け手法の比較

実験目的各順位付け手法を比較し,最も妥当な順位付け手法を調査す

るSPARS-J では3種類の評価手法による順位付け機能を実現

1.利用関係に基づく順位付け手法2.キーワードの出現頻度に基づく順位付け手法3.1.2. の手法を統合した順位付け手法

評価尺度

 

Page 82: Overview and Evaluation of Java Component Search System  SPARS-J

82Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

1. 利用関係に基づく順位付け

Component Rank ( CR )法利用関係から部品重要度を評価し,順位付けする

多くの部品から利用されている部品は重要重要な部品から利用されている部品もまた重要

多く利用される部品や,重要な箇所で利用される部品に大きな評価値が与えられる

使用例が多く,汎用的な部品が上位に来る

Page 83: Overview and Evaluation of Java Component Search System  SPARS-J

83Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

2. キーワード出現頻度に基づく順位付け

Keyword Rank ( KR )法文書検索システムで用いられる TF-IDF 法を改変部品を特徴付ける索引キーに適当な重みを与える

部品に繰り返しあらわれる索引キー希少で,特定の部品に偏ってあらわれる索引キークラス定義名など部品を象徴するトークン種類の索引キー

重みの総和を評価値として順位付け

クエリと適合する部品が上位に来る

Page 84: Overview and Evaluation of Java Component Search System  SPARS-J

84Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

3.CR と KR を統合した順位付け

各順位付けの観点1. 部品の使用例の多さ,汎用さ

– CR 法2. クエリと部品内容の適合度

– KR 法

順位を統合して,両面で優れている部品を検索結果の上位に表示する部品検索部で行うため高速な方法が望ましい

Page 85: Overview and Evaluation of Java Component Search System  SPARS-J

85Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

3.CR と KR を統合した順位付け

Borda の手法順位に対して評価点を割り当て,その合計点をもとに昇順にソート例 ) 部品群 = { A, B, C, D, E }

以降, CR と KR を統合した順位付けを CR+KR と呼ぶことにする

CR

KR

1位 A D

2位 E C

3位 C A

4位 B B

5位 D E

CR

KR

合計点

A 1 3 4

B 4 4 8

C 3 2 5

D 5 1 6

E 2 5 7

統合順位1位 A

2位 C

3位 D

4位 E

5位 B

CR と KR の順位 各部品群の評価点 統合された順位

Page 86: Overview and Evaluation of Java Component Search System  SPARS-J

86Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2 .  SPARS-J の各順位付け手法の比較

実験目的各順位付け手法を比較し,最も妥当な順位付け手法を調査す

るSPARS-J では3種類の評価手法による順位付け機能を実現

1.利用関係に基づく順位付け手法( CR )2.キーワードの出現頻度に基づく順位付け手法( KR )3.1.2. の手法を統合した順位付け手法( CR+KR )

評価尺度適合率:検索結果のうちの適合部品数の割合

割合が高いほど,検索結果に利用できる部品が多く含まれる

ndpm 値:ユーザの順位付けとシステムの順位付けの違い値が小さいほど,理想的な順位付けと言える

Page 87: Overview and Evaluation of Java Component Search System  SPARS-J

87Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

ndpm 値の計算方法検索文書集合の全てのペア( d,d’ )における,システムとユーザで順位付けが異なるペア数( m )の割合

文書集合の要素数を n とすると

321

,, dddD 321ddd

ss

333.02/)13(3

1

ndpm

231ddd

uu

2C

mndpm

n

計算例

Page 88: Overview and Evaluation of Java Component Search System  SPARS-J

88Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2 .  SPARS-J の各順位付け手法の比較

準備データベース

(実験1と同じ) 14 万個のソースコード群から構築検索キーワード

(実験1と同じ) 10 個のクエリを用いる

手順各順位付け手法の検索結果上位 10 件の部品を対象として適合率を求める各順位付け手法による検索結果の全ての部品を対象として,ユーザの想定する理想的な順位付けとの ndpm 値求める

Page 89: Overview and Evaluation of Java Component Search System  SPARS-J

89Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2の結果

keyword CR KR

CR+KR

CR KRCR+K

R

A 1 1 1 0.036 0.048 0.037

B 1 1 1 0.194 0.261 0.221

C 0.5 0.5 0.5 0.133 0.117 0.092

D 0.4 0.9 0.8 0.123 0.200 0.189

E 0.4 0.4 0.4 0.208 0.192 0.194

F 0.2 0.2 0.2 0.184 0.184 0.160

G 0.9 1 1 0.081 0.103 0.080

H 1 0.8 1 0.047 0.109 0.052

I 0.6 0.7 0.7 0.210 0.324 0.267

J 0.5 0.7 0.7 0.219 0.243 0.114

Ave. 0.65 0.72 0.730.14

30.17

80.141

適合率 ndpm値

Page 90: Overview and Evaluation of Java Component Search System  SPARS-J

90Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2の評価対応のある平均値の差の検定有意水準5%で以下の有意差が見られた

適合率KR,CR+KR ≫ CR

ndpm 値CR,CR+KR ≫ KR

Page 91: Overview and Evaluation of Java Component Search System  SPARS-J

91Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験2の考察CR 法

全体的にユーザの想定した順位で並んでいる傾向がある

KR 法上位には適合部品が多く存在しやすい

CR+KR 法適合率と ndpm 値の両方で優れた順位付けを行うことができたCR 法と KR 法の両方で高順位になったものが上位に順位付けされていると考えられる

Page 92: Overview and Evaluation of Java Component Search System  SPARS-J

92Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験3 .  企業内のソフトウェアに対する適用実験実験目的

企業において実際に利用してもらうことで,ソフトウェアの管理や理解に有用であることを確認する

実験内容SPARS-J ついての定性的な評価

企業内のソフトウェア開発・保守を行っている従業員7名に対して, SPARS-J についてのアンケートを実施

Page 93: Overview and Evaluation of Java Component Search System  SPARS-J

93Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

アンケートの結果

A B C D E F G 最頻値パッケージブラウザの利用 4 5 5 5 4 3 3 5

同グループのクラスの参照 4 5 5 2 4 3 5,4

検出されたクラスを利用しているクラスの参照 5 5 5 5 5 5 5

検出されたクラスが利用しているクラスの参照 5 1 5 5 5 5 5

クラスのメトリクス値 1 4 1 2 4 5 4,1

ソースコードがダウンロード機能 1 3 5 5 2 5 5

時間的コストの削減 3 5 5 3 4 1 5,3

ソフトウェア品質の向上 5 3 3 3 4 1 3

企業内のソフトウェア把握 3 1 5 3 5 2 1 5,3,1

検索結果一覧表示の見やすさ 4 4 5 5 3 3 5 5

ハイライト表示の見やすさ 3 5 5 5 5 5 5 5

(良 5 4 3 2 1 悪)

Page 94: Overview and Evaluation of Java Component Search System  SPARS-J

94Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

アンケートの評価特に評価の高かった点パッケージブラウザの利用同グループの参照利用・被利用クラスの参照検索結果一覧・ハイライト表示の見やすさ

SPARS-J を用いることで可能になること部品を利用しているアプリの把握部品改訂時の影響範囲の調査

Page 95: Overview and Evaluation of Java Component Search System  SPARS-J

95Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

実験3の考察ソフトウェア部品の把握が容易となるということを意味しており,保守作業の支援に繋がっていると言えるその他の感想

検索速度が速く,ストレスを感じない前もって索引語を抽出しているため

クライアントにインストールする必要がなく,ソフトウェア部品を共有できる共有のデータベースを構築するためSPARS-J はソフトウェア部品の検索に有用であり,

また部品の把握や管理にも非常に役に立つシステムである

Page 96: Overview and Evaluation of Java Component Search System  SPARS-J

96Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

まとめと今後の課題ソフトウェア部品検索システム SPARS-J の実験的評価

他の検索システムとの比較Google と Namazu より優れている

SPARS-J の各順位付け手法の比較CR・ KR それぞれ特徴があり,それらを統合することで性能が向上した

企業内のソフトウェアに SPARS-J を適用した保守を行う時に非常に有効であった

今後の課題再現率の調査順位付け性能以外の観点からの定量的評価他のソフトウェア部品への対応

Page 97: Overview and Evaluation of Java Component Search System  SPARS-J

97Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

END

The End