a framework for examning topical locality in object-oriented software
DESCRIPTION
A Framework for Examning Topical Locality in Object-Oriented Software. 2012 IEEE International Conference on Computer Software and Applications p76004546 江怡岑 P76004685 王于庭. OUTLINE. Introduction Background & Related work Framework Dataset and Experimental Procedure - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/1.jpg)
A Framework for Examning Topical Locality in Object-Oriented Software2012 IEEE International Conference on Computer Software and Applications
p76004546 江怡岑P76004685 王于庭
![Page 2: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/2.jpg)
OUTLINE Introduction Background & Related work Framework Dataset and Experimental Procedure Static analysis results Conclusions
![Page 3: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/3.jpg)
INTRODUCTION Program comprehension is a key
developer activity during software maintenance.
Topic models : rely on lexical information to identify topics that are semantically related to high-level domain concepts. LSI ( latent semantic indexing ) LDA ( latent Dirichlet allocation )
![Page 4: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/4.jpg)
INTRODUCTION While topics reflect semantic
relatedness, it is believed that human evolves spatial cognition strategies to navigate the code base.
for object-oriented (OO) systems built on the principle of encapsulation, the entities should be spatially organized in a way that reflects the topics of software
![Page 5: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/5.jpg)
INTRODUCTION the tenet of “topical locality”
spatial relatedness entails semantic relatedness So basic that in many cases it is not mentioned When the tenet is mentioned, its validity is not
measured explicitly.
our goal is to measure the extent to which this key tenet holds for OO systems. propose a framework to examine what extent three
relationships of topical locality hold in large-scale open-source projects.
![Page 6: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/6.jpg)
BACKGROUND and Related WorkA. Way-finding in Code BaseB. Relating Spatial and Semantic
CuesC. Topical Locality Applied in
Software Engineering Tools
![Page 7: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/7.jpg)
BACKGROUND and Related WorkA. Way-finding in Code Base
Developer comprehending a code base can therefore be thought of as continually trying to answer way-finding questions.
Moonen has examined way-finding in soft-ware and extended the concept of legibility to software.
![Page 8: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/8.jpg)
BACKGROUND and Related Work B. Relating Spatial and Semantic Cues
We are interested in the interplay of different cues so that they can be effectively synthesized.
We focus on the relationship between two types of cues. Spatial. Semantic.
Spatial + Semantic = “topical locality” the software entities should be neither randomly
named nor randomly placed. Source code entities should be spatially organized to
reflect the semantics of software.
![Page 9: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/9.jpg)
BACKGROUND and Related Work C. Topical Locality Applied in Software
Engineering Tools The idea of topical locality plays an important
role in building a number of software engineering tools.
Survey three tools Code Indexers Code Visualizers Code Summarizers
![Page 10: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/10.jpg)
BACKGROUND and Related Work Code Indexers
An indexer takes source code and generates profiles of the code for later searching
Should index header comments ? we want to address how well name and
header comments represent the target code entity’s topic.
![Page 11: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/11.jpg)
BACKGROUND and Related Work
![Page 12: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/12.jpg)
BACKGROUND and Related Work Code Visualizers
Once a relevant code line is located , its surroundings provide valuable contextual information for the developer
examining topical locality of a contiguous fragment allows us to assess to what extent the code line indicates the topic of its surroundings.
![Page 13: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/13.jpg)
BACKGROUND and Related Work
![Page 14: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/14.jpg)
BACKGROUND and Related Work Code Summarizers
A summarizer generates a snapshot of the source code in order to reduce the cost for developers to read and understand the staggering amount of software repository information
Our contribution is to measure the degree of topical locality of the snapshot
![Page 15: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/15.jpg)
BACKGROUND and Related Work
![Page 16: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/16.jpg)
FRAMEWORKoverview Framework Overview
![Page 17: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/17.jpg)
FRAMEWORKresearch questions Research questions
RQ1 : Which better conveys class body’s topic: class name, header comments, or a combination of both?
RQ2 : Can a code line indicate its surrounding’s topic?
RQ3 : Can a contiguous code fragment serve as a snapshot of the entire class?
![Page 18: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/18.jpg)
FRAMEWORKmethod independent variables are concerned
with identifying spatial relationships dependent variable is about the
semantic relatedness Three measures:
TFIDF cosine similarity query term probability document overlap
We treat source code as document output score in the range [0, 1]
![Page 19: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/19.jpg)
FRAMEWORKthree measures (1/3) TFIDF scheme – text mining model
𝑞𝑖 = ( )×𝑡𝑓𝑖 𝑄 𝑖𝑑𝑓𝑖 𝑤𝑖 = ( )×𝑡𝑓𝑖 𝑊 𝑖𝑑𝑓𝑖 𝑡𝑓𝑖 refers to the term frequency of 𝑡𝑒𝑟𝑚𝑖 𝑖𝑑𝑓𝑖 is the inverse document frequency, = 𝑖𝑑𝑓𝑖
2( +1/ ), where is the total number of 𝑙𝑜𝑔 𝑡 𝑑𝑓𝑖 𝑡documents in the corpus and is the number of 𝑑𝑓𝑖documents in which occurs.𝑡𝑒𝑟𝑚𝑖
![Page 20: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/20.jpg)
FRAMEWORKthree measures (2/3) Query term probability
measures the likelihood of a term in the query/source being present in the target document.
![Page 21: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/21.jpg)
FRAMEWORKthree measures (3/3) Document overlap
a set-based measure that quantifies the amount of overlap between two documents Q and W
![Page 22: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/22.jpg)
Dataset and Experimental Procedure
LOC : the lines of code COM : the lines of comments CCs : the number of classes
![Page 23: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/23.jpg)
Dataset and Experimental Procedure Use a source code indexer to process the code
base of the selected projects. The indexing process results in the profiles that
store partial and important information from the source code.
We calculate the three semantic relatedness measures (TFIDF-Cos, Prob and Overlap) based on the profiles.
![Page 24: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/24.jpg)
RQ1 Can class name (N) and/or header
comment (H) convey the topic of class body(B) ?
Calculate the lexical similarity for (N,B), (H,B), (NH,B)
![Page 25: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/25.jpg)
RQ2 Can a code line indicate the topic of its
surroundings? For randomly selected code line(L), we take a
contiguous code fragment of 30 lines as its surroundings (S) and select from the same file another 30-line contiguous code fragment(R)
Compare the lexical similarity of (L,S) with that of (L,R)
Those classes with at least 70 LOC are considered.
![Page 26: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/26.jpg)
RQ3 Can a contiguous code fragment serve as a
snapshot of entire class? Form a code search perspective, the lexical
similarity of the snapshot should indicate the topical closeness of the classes
Randomly select a term w(‘data’ in Fig.4) to act as query keyword. The snapshot is extracted as 30-line contiguous code fragment.
Only consider classes with at least 60 LOC.
![Page 27: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/27.jpg)
Static Analysis Results RQ1 : Name vs. Header RQ2:Code Line and Surroundings RQ3: Contiguous Fragment as a
Snapshot Threats to Validity
![Page 28: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/28.jpg)
RQ1 : Name vs. Header NH is the closet to B in most cases, expect
MegaMek when measured by TFIDF, where NB is larger than HB and NHB.
=> MegaMek classes do not have useful header comments.
![Page 29: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/29.jpg)
RQ1 : Name vs. Header Least Significant Distance(LSD) multiple comparison
test: a test places the combinations significantly different from others in separate groups, and allocates the best combination to ‘group A’.
The result classifies NH-B into ‘group A’, indicating that the similarity score of NH-B is significantly higher than N-B and H-B.
We conclude that if the class contains useful header comments, then it is important to combine the header comments with the class name in order to convey the topic of the class body.
![Page 30: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/30.jpg)
RQ2:Code Line and Surroundings A code line indicates the topic of its
surroundings more than it indicates the topic of a random code fragment.
![Page 31: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/31.jpg)
RQ3: Contiguous Fragment as a Snapshot We calculate the Pearson correlation coefficient,
which is a parametric statistic that shows the correlation between two variables.
From the viewpoint of distinguishing the topics of different classes, a contiguous code fragment can serve as a snapshot of the entire class.
![Page 32: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/32.jpg)
Threats to Validity Construct Validity: the selection of 30-line
contiguous, non-empty, and comments-inclusive code fragment for addressing RQ2 and RQ3.
Empty lines contribute little to spatial and semantic information. All comments is a choice influenced by RQ1.
Internal validity : using three measures derived form different mathematical models diminished the measuring bias.
External validity : this analysis may not generalize to other software projects.
![Page 33: A Framework for Examning Topical Locality in Object-Oriented Software](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816456550346895dd62360/html5/thumbnails/33.jpg)
Conclusions In this paper, we contributed a novel
experimental framework for testing this tenet of “topical locality” and applied the framework to provide empirical evidence of topical locality in large-scale OO systems.
Our future work includes carrying out more empirical studies to examine other topical locality instances.
It is important to integrate the theoretical understandings and empirical findings to enhance the practical tool support for software developers.