analyzing networks of issue reports
TRANSCRIPT
Analyzing Networks of Issue Reports
Markus BorgDietmar Pfahl Per Runeson
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Markus Borg
Dietmar Pfahl Per RunesonUniversity of Tartu
EstoniaLund University
Sweden
• Third year PhD student• MSc CS and engineering• Software developer (2007-2010)• Empirical research group
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Agenda
• Background and Context– Information management– Safety-critical development– Impact analysis
• Goal and method of this study• Results• Future work
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Background and Context
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Information management
• Large projects, much information
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Challenges
• A state of information overload– Engineers cannot process all
information– Causes stress– Obstructs decision making
• Poor findability– More effort to navigate information
landscape
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Intensified in safety development
• Safety standards mandate documentation
Railroad Nuclear Process Machinery Automotive Industry
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Mandated documents in IEC 26262
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Work task: Impact Analysis (IA)
• Required by IEC 61508 before changes to production code• Studied an industrial case
– Documented– Reviewed during safety audits
RequirementsTests
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Work task: Impact Analysis (2)
• Formal template• Impact on code and non-code
specified as traceability links• Manual work
IMS
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Supporting the impact analysis?Work task
?
Reqs. DB Code Repo Test DB
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Reuse knowledge from previous IAs
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Information in networks, so what?
• Search in hyperlinked structures well researched– Also applied in software engineering (Karabatis et al. (2009))
HITS algorithm
Page et al. (1999)
Kleinberg (1999)
Social network analysis
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Networks of issue reports
• What type of networks can we find in issue databases?
?
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Method
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Issue databases under study
• Safety IMS (2000-2012)– Industrial control system– Mandated by strict processes– Issues submitted by engineers
• Android IMS (2007-2012)– OS for handheld devices– Open source software– Issues submitted to public database
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Link mining in the issue databases
• Safety IMS– ”Related cases” field in database
• Android IMS– No separate field for linking issues– Communication using comments
(100,000+)– Developers refer to other issues,
stored as HTML hyperlinks• Extracted using regular expressions
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Results
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted network - Overview
• Safety IMS– 26,120 issue reports– 18,000 links– 15,000 components– 13,000 isolated issue reports
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted network – Close-up
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted network - Overview
• Android IMS– 20,176 issue reports– 3,500 links– 18,000 components– 17,000 isolated issue reports
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Example of sub-network
Bug starOne central issue report pointing at several others
Caused by duplicates
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Example of sub-network
Dense ringMost issue reports are connected.
Caused by copy-paste comments
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted networks
What do developers signal by creating HTML hyperlinks?
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Link semantics
• Indicate relationships with different certainty– Related issue report (possibly probably definetely)– Duplicate issue report (possibly probably definetely)– Cloned issue report
• Misc. links– Raising awareness of issue reports– Release planning– Links with the wrong target
• Links appear to carry meaning
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
More recent results
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Contents of IA reports in the Safety IMS
Code
HW description
Misc. documents
Test caseUser manual
Test case
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Mining IA reports in the Safety IMS
• ~ 5,000 impact analysis reports
Node types• Issue reports• Requirements• Test specifications• Hardware descriptions
Link types• Related issue• Specified by• Verified by• Needs update• Impacted HW
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted semantic network
• 27,958 nodes– ~26,000 issue reports– ~3,000 other artifacts
• 28,230 links– ~18,000 related issue– ~4,000 specified by– ~2,300 verified by
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Extracted semantic network – Circle layout
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Future work
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
How can the networks be exploited?
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Neighbourhood search
Application 1:Search for connected artifacts
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Centrality measures
Application 2: Identification of key artifacts (ranking)
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Goal: Impact Recommender
1. Identify similar issues2. Identify neighbours3. Rank candidates
Far awayTextual
sim.High cent.
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Summary
• Link mining in IMSs can discover complex issue networks – The process-heavy IMS contains more links– Links among issue reports, created in comments by Android
developers, typically signal relations
• Networks of issue reports can be extended by other artifacts
• Networked information enables better navigation Broaden search (following links) Sharpen search (better ranking)
Analyzing networks of issue reports| Borg, Pfahl, and Runeson
Thanks!
?