Download - Component Search and Retrieval
Component Search
and Retrieval
Advanced Reuse SeminarsEduardo Cruz
Information Retrieval - 1948
Structured Documents Unstructured Documents
No software documentation standard
Semi-Structured Documents
Calvin Northrup Mooers
Mooers' Law: “An information retrieval systemwill tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it,” 1959
Calvin Northrup Mooers
Mass Production Software components
[Mcllroy, 1968]
“software industry is weakly founded, and that one aspect of this weakness
is the absence of a software components subindustry”
[McIlroy, 1968]
“The storage and retrieval of software assets is nothing but a specialized form of information storage and retrieval”
[Mili, 1998]
Software Library
Browsing – Inspecting without a predefined criterion Retrieval – Satisfy a predefined matching criterion
Classification Scheme
Facet-based Better than hierarchical classification Manual classification different facets Automatic classification
Controlled Vocabulary Semantic information
Uncontrolled Vocabulary Big software libraries Little or no descriptors
Recall and Precision
High Precision – Most retrieved elements are relevant
High Recall – Few elements left behind Spreading Activation (Relaxed Search) – Related
matches are retrieved Coverage – The average number of assets that are
visited over the total size of the library
Asset Representation
Library representation is made in full knowledge of the artifact. User representation is made in ignorance of the artifact
Asset representation is purposefully abstract to capture important features while overlooking miner or irrelevant details
Asset's surrogate is used in retrieval literature
Asset retrieval Goals
Exact retrieval – Black box reuse Approximate retrieval – White box reuse
Generative modification – Reusing the design Compositional modification – using building blocks of the
retrieved asset
Usually non included information
Interface description Non-functional requirements Interoperability
Situational Model x System Model
Component retrieval model [Lucrédio et. al, 2004]
“Repository representation is made in full knowledge of the artifact at hand”
“User representation is made in ignorance of the artifact”
[Mili, 1998]
Scott Henninger
Tools
Component Search Tools
Web Delphi Search Engine Ispey CSourceSearch.net (2004) Gonzui SourceBank Koders (2004) Codase (2005)
Aplications Agora (1998) Codebroker (2002) Koders Enterprise (2004) Maracatu (2005)
Delphi Search Engine
Ispey.com
SPARS-J – (2003)
Filter
SourceBank
Filter
CSourceSearch.Net – (2004)
Koders.com – (2004)
CODASE – Launched Sep 9, 2005
Example Searches
Browsing
Multiple Search Options
“…based on the number of people in your company, starting from $5,000 USD”
CODASE - Browsing
Other Tools
AGORA - Location and Indexing (1998)
INTERNETJavaBeans
AgentJavaBeansIntrospector
JavaBeansAgent
JavaBeansIntrospector
JavaBeansAgent
JavaBeansIntrospector
AltaVistaSearch
Index ServerFilter
INDEX
AltaVista Query Server
Web Server
Component Rank (1998)
V1
V3
V2
0.2
0.2
0.2
0.20.4
0.4
0.4
D12 = 0.5
D13 = 0.5
D23 = 1
D31 = 1Nodes vEdges eGraph GWeight wDistribution Ratio d
“Classes defining data structures and their containers are
highly ranked”
Clustered Component Graph
V3
V2
V1
V1 ≡ V4 , V2 ≡ V6
V7
V6
V4 V5
V7
V’26
V’14 V’5
V’3
NO MORE MULTIPLE
DISCONNECTED COMPONENTS
V3
V2
V1
V7
V6
V4 V5
Component Rank System Architecture
.java file ≡ component
(1) Similarity Measurement
(2) Clustering
(3) Use Relation Extraction
(4) Component Graph Construction
(5) Component Rank Computation by
Repetition
(6) De-Clustering to Original Component Graph
INPUT
OUTPUT
Order of Weights ≡ Component Rank of .java files
Simple Copied Components
A
B
A
B
X
Y
Copied Components
OtherComponents
Non-clustered component Graph
A’
B’
X’
Y’
1/4
Clustering Before Weight Computation
1/4
1/4
1/4
A’
B’
X’
Y’
1/3
Clustering After Weight Computation
1/3
1/6
1/6
DO NOT COUNT SIMPLY
DUPLICATED COMPONENTS
Copied AND MODIFIED Components
A
B
A
C
X
Y
Copied andModified
Components
OtherComponents
Non-clustered component Graph
X’
Y’
Clustering Before Weight Computation
1/5
1/5
Original Components
A
B’ C’
2/5
1/51/5
X’
Y’
Clustering Before Weight Computation
1/5
1/6
A’
B’ C’
1/3
1/61/6
Beyond Searching and Browsing
Searching and browsing Require users to initiate the information seeking process
Information access and Information Delivery
CodeBroker – (2001)
Components repositories are often so large that software developers cannot learn about all of the components
Component repositories are not static New components added Old components updated
Context-Aware browsing
May not have suficient knowledge about the reuse repository
May perceive that reuse costs more than developing from scratch
May not be able to use the repository by formulating a proper query
May not be able to understand the found components
BeliefVaguely
Known
Information Islands
Well Known
L4: Entire Information Space
Unknowncomponents
L3:Belief
L2:Vaguely
Known
CodeBroker
L1: Well
Known
L4: Entire Information Space
Information Use:
L1 – Use by Memory
L2 – Use by Recall
L3 – Use by Anticipation
L4 – Use by Delivery
Already Known Components
Irrelevant Components
Task Relevant Information
Program Aspects
Concept Formal Informal
Indentation, comments, identifier names (semantic) Executability
Code Constraint environment
Signature
Information delivery
Feedback After execution of the action
Feedforward Affects the execution of the action
Information delivery
Interruptive Noninterruptive
Latent Semantic Analysis (LSA)
Synonymy Polysemy
“Text documents and queries are represented as vectors in the semantic space, based on the words contained and the similarity between a query and a document is determined by the distance of their respective vectors”
Comm
ents
signa
ture
Discourse model
User model
Koders Enterprise – (2004)
M.A.R.A.C.A.T.U. – Modern Architecture for Retrieving All Components At The Universe (2005)
Using Structural Context to Recommend Source
Code Examples
Reid Holmes and Gail C. Murphy
University of British Columbia
Software Practices Lab
The Problem: A Concrete Example
Frameworks can improve developer productivity. But developers can become stuck trying to use the APIs
Imagine trying to use the Eclipse APIs to place text in the status line of the Eclipse IDE
Eclipse has 38,000 public methods
Structural Context
ProjectRepository
Development Environment
Examples
Using Structural Context to Recommend Source Code Examples - Reid Holmes and Gail C. Murphy
Strathcona: Extract Structural Context
ViewPart
SampleView
setMessage(Strin
g)
IStatusLineManag
ersetMessage(String)
Visual representation Highlights key relationships between example and query
Multiple examples can be quickly viewed
Strathcona: Example Navigation
Strathcona: Viewing Example Source
Code view Example shows how to get a status line manager Example is not a perfect match, but good enough to help
Conclusion
Information Delivery Similarity Analyser Ranking – Metrics Context Automatic Facet Classification
Uncontrolled vocabulary + additional terms
References [McIlroy, 1968] M. D. McIlroy, Mass Produced Software Components , NATO Software Engineering Conference Report,
Garmisch, Germany, October, 1968, pp. 79-85.
[Mili, 1998] A. Mili, R. Mili, R. T. Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, Vol. 5, 1998, pp. 349-414
[Seacord, 1998] Robert C. Seacord, Scott A. Hissam, Kurt C. Wallnau. "Agora: A Search Engine for Software Components," IEEE Internet Computing, vol. 02, no. 6, pp. 62-70, November/December, 1998
[Szyperski, 1999] Szyperski C., “Component Software: Beyond Object-Oriented Programming”. Addison Wesley, 1999
[Dey, 2001] Dey, A.. Understanding and Using Context. Personal Ubiquitous Comput. 5, 1 (Jan. 2001)
[Greengrass, 2001] Greengrass, Ed. Information retrieval: A survey. DOD Technical Report TR-R52-008-001, 2001
[Ye, 2001] Ye, Y. and Fischer, G. Context-Aware Browsing of Large Component Repositories. In Proceedings of the 16th IEEE international Conference on Automated Software Engineering (November 26 - 29, 2001). ASE. IEEE Computer Society, Washington, DC, 99.
[Ye, 2002] Y. Yunwen and G. Fischer. Information delivery in support of learning reusable software components on demand. In Proceedings of the 7th international conference on Intelligent user interfaces, California, USA
[Ye, 2002] Ye, Y. and Fischer, G. Supporting Reuse by Delivering Task Relevant and Personalized Information. In Proceedings of the 24th International Conference on Software Engineering. p. 513-523, Orlando, Florida, May, 2002
Bibliography [Inoue, 2003] K. Inoue et al.: "Component Rank: Relative Significance Rank for
Software Component Search", Proceedings of ICSE 2003
[Maxville, 2003] Valerie Maxville, Chiou Peng Lam, Jocelyn Armarego. "Selecting Components: a Process for Context-Driven Evaluation," apsec, p. 456, 10th Asia-Pacific Software Engineering Conference (APSEC'03), 2003
[Maxville, 2004] Valerie Maxville, Jocelyn Armarego, Chiou Peng Lam. "Intelligent Component Selection," compsac, pp. 244-249, 28th Annual International Computer Software and Applications Conference (COMPSAC'04), 2004.
[Prado, 2004] Lucrédio, D.; Almeida, E, S.; Prado, A, F. A Survey on Software Components Search and Retrieval, In the 30th IEEE EUROMICRO Conference, Component-Based Software Engineering Track, 2004, Rennes - France. IEEE Press,2004
[Holmes, 2005] Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th international Conference on Software Engineering (St. Louis, MO, USA, May 15 - 21, 2005). ICSE '05
“Imperfect technology in a working market is sustainable;
perfect technology without any market will vanish”
[Szyperski, 1999]