combining formal concept analysis with information ...denys/pubs/talks/icpc07.fca.lsi.pdf ·...
TRANSCRIPT
![Page 1: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/1.jpg)
Combining Formal Concept Analysis Combining Formal Concept Analysis with Information Retrieval for Concept with Information Retrieval for Concept
Location in Source CodeLocation in Source Code
Denys Poshyvanyk and Andrian Marcus
SEVERE Group @
ICPC 2007, Banff, Alberta
![Page 2: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/2.jpg)
ICPC 2007, Banff, Alberta
Incremental Change of Software
![Page 3: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/3.jpg)
ICPC 2007, Banff, Alberta
State of the Art in Concept Location
• Static– Dependency based search [Rajlich’00]– Information Retrieval based methods [Marcus’04]
• Dynamic– Execution traces - Reconnaissance [Wilde’92]– Scenario based probabilistic ranking [Antoniol’06]
• Combined – FCA + Execution Traces [Eisenbarth’03][Tonella’04]– IR + Execution Traces [Poshyvanyk’07]– …
![Page 4: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/4.jpg)
ICPC 2007, Banff, Alberta
IRiSS
JIRiSS GES
![Page 5: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/5.jpg)
ICPC 2007, Banff, Alberta
![Page 6: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/6.jpg)
ICPC 2007, Banff, Alberta
![Page 7: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/7.jpg)
ICPC 2007, Banff, Alberta
Concept Location with Concept Lattices
1. Creating a corpus of a software system2. Indexing with Latent Semantic Indexing3. Formulating a query4. Ranking methods5. Selecting words6. Clustering with Formal Concept Analysis7. Examining results
![Page 8: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/8.jpg)
Creating a corpus of a software system
• Parsing source code and extracting documents– corpus – collection of documents (e.g., methods)
• Removing non-literals and stop words– common words in English, standard function library
names, programming language keywords
• Preprocessing: split_identifiers and SplitIdentifiers
![Page 9: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/9.jpg)
ICPC 2007, Banff, Alberta
Concept Location with Concept Lattices
1. Creating a corpus of a software system2. Indexing with Latent Semantic Indexing3. Formulating a query4. Ranking methods5. Selecting words6. Clustering with Formal Concept Analysis7. Examining results
![Page 10: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/10.jpg)
ICPC 2007, Banff, Alberta
Concept Location with Concept Lattices
1. Creating a corpus of a software system2. Indexing with Latent Semantic Indexing3. Formulating a query4. Ranking methods5. Selecting words6. Clustering with
Formal Concept Analysis
7. Examining results
![Page 11: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/11.jpg)
Selecting Descriptive Words
• Ranking criteria – words which are mostly similar to search results than to the rest of the system
• Words in a subset of search results are ranked with LSI
Page Paper Rendering Device PrinterstartPage 2 1 1 0 0 …endPage 2 1 1 0 0 …
getBounds 0 1 2 1 1 ……
otherMethod1 3 0 0 2 3otherMethod2 2 0 0 3 4
![Page 12: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/12.jpg)
ICPC 2007, Banff, Alberta
Concept Location with Concept Lattices
1. Creating a corpus of a software system2. Indexing with Latent Semantic Indexing3. Formulating a query4. Ranking methods5. Selecting words6. Clustering with Formal Concept Analysis7. Examining results
![Page 13: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/13.jpg)
FCA Example - classification of geometrical shapes
square
scalene-triangle
equilateral-triangle
rectangle
isoscele-triangle
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
![Page 14: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/14.jpg)
square
scalene-triangle
equilateral-triangle
rectangle
isoscele-triangle
4-sides
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
FCA Example - classification of geometrical shapes
![Page 15: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/15.jpg)
square
scalene-triangle
equilateral-triangle
rectangle
isoscele-triangle
3-sides
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
FCA Example - classification of geometrical shapes
![Page 16: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/16.jpg)
square
scalene-triangle
equilateral-triangle
rectangle
isoscele-triangle
regular
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
FCA Example - classification of geometrical shapes
![Page 17: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/17.jpg)
square
scalene-triangle
equilateral-triangle
rectangle
isoscele-triangle
isoscele
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
FCA Example - classification of geometrical shapes
![Page 18: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/18.jpg)
4-sides 3-sides regular isoscele
square × ×rectangle ×scalene-triangle ×isoscele-triangle × ×equilateral-triangle × × ×
Obj
ects
Attributes
CONTEXT
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
FCA Example - classification of geometrical shapes
![Page 19: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/19.jpg)
Concept lattice
0c 4c
5c1c 3c
2c
bot
top
equilateral-trianglesquare
rectangle
4-sides
isoscele-triangle
isoscele
3-sides
scalene-triangleregular
gene
raliz
atio
n(i
mpl
icat
ion)
spec
ializ
atio
n
Example from Paolo Tonella’s tutorial on Formal Concept Analysis in Software Engineering
![Page 20: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/20.jpg)
ICPC 2007, Banff, Alberta
Clustering Search Results with Formal Concept Analysis
• Subset of methods in search results• Subset of words selected from search results• Example: searching for print page feature
printer print page job device paper rendering
startJob × ×endJob × ×
cancelJob × ×startPage × × ×endPage × × ×
getBounds × × ×
![Page 21: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/21.jpg)
ICPC 2007, Banff, Alberta
Concept Lattice of Search Results for Print Page
Attributes (words) – grey boxesObjects (methods) – white boxes
![Page 22: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/22.jpg)
ICPC 2007, Banff, Alberta
Case Study
• Locating concepts associated with bug descriptions– Bug fixes contain a set of changed methods
• Source code of Eclipse 3.1– vocabulary of unique terms – 56,8631
– number of extracted methods – 86,208
1Oxford English Dictionary has 171,476 words in current use
![Page 23: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/23.jpg)
ICPC 2007, Banff, Alberta
Evaluation of Novel Approach
• Lattices compared with ranked lists:– Number of methods: 80, 90, 100– Number of words: 10, 15, 20, 25
• Studied properties of lattices:– grouping of relevant information– browsing overhead of lattice structure
• Redefined measures from [Cigarran’04]:– Lattice distillation factor– Lattice browsing complexity
![Page 24: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/24.jpg)
ICPC 2007, Banff, Alberta
Measures
Ranked list1.startPage2.endPage3.getBounds4.startJob5.endJob6.cancelJob…
Lattice Distillation Factor - how many methods are visited
Lattice Browsing Complexity- how many concept nodes (categories) are visited while browsing
![Page 25: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/25.jpg)
ICPC 2007, Banff, Alberta
Locating Features in Eclipse -Example
• Table Headers Feature
• “The task list, which uses the native table widget, cannot be sorted by clicking on the table headers”
• Query: “table headers sorted”
• Associated with bug report# 34160
![Page 26: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/26.jpg)
ICPC 2007, Banff, Alberta
Locating Table Headers Feature
TablecreateTable
- Widget.setData- FilteredList.TableUpdater- …- Table.createWidget
tableViewergetTabletableValue, keyTable
HeadersetHeaderVisiblesetLineVisible…
1. WidgetTable.put2. TableTree.getTable3. EditorsView.getTable4. SimpleLookupTable.rehash5. WidgetTable.shells• …39.TableTreeEditor.resize
• …71. Widgets.Table.createWidget
Clustered results into labeled categories
Ranked List
![Page 27: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/27.jpg)
ICPC 2007, Banff, Alberta
Results for Table Headers
Docs Terms LMBA C CVIEW100 10 51 17 8100 15 40 25 11100 20 38 33 11100 25 36 39 1090 10 43 17 890 15 36 24 1190 20 34 31 1190 25 26 36 1180 10 38 15 780 15 30 22 1080 20 29 28 1080 25 23 33 10
• LMBA-rank of the first relevant method in the lattice
• C – number of concepts in the lattice
• CVIEW – number of expanded concepts in the lattice
![Page 28: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/28.jpg)
ICPC 2007, Banff, Alberta
Conclusions
• Novel representation for the search results in the source code– results are ranked and structured– labels for topics are automatically
extracted
• Scalable to software of any size – fixed number of top search results– fixed number of attributes
![Page 29: Combining Formal Concept Analysis with Information ...denys/pubs/talks/ICPC07.FCA.LSI.pdf · MBA-rank of the first relevant method in the lattice •C – o munrfeb concepts in the](https://reader034.vdocuments.site/reader034/viewer/2022052103/603dcfe60cca6f544c2cb9f2/html5/thumbnails/29.jpg)
ICPC 2007, Banff, Alberta
Current and Future Work
• Different strategies for ranking and selecting words
• Lattice size: best ratio of docs and terms
• Adding IR ranks and similarities to the lattices
• Use FCA and LSI during impact analysis• More user studies• Search problem inside the search
problem