language interaction and quality issues: an exploratory study
TRANSCRIPT
Languages interaction and possible effects: an exploratory study
Antonio Vetrò - Federico TomassettiMarco Torchiano - Maurizio Morisio
No one writes in a single language anymore. Even trivial applications have a general-purpose language, SQL, JavaScript, CSS, and dozens of frameworks, each of which includes an external DSL
Wampler 2010
How do those languages interact?
Is that interaction problematic?
Research questions
RQ1 How much interaction is there between the languages used in a project?
RQ2 Which language pairs interact more?
RQ3 Are Cross Language Modules more defect-prone than Intra Language Modules?
Plan
• Define a measure for the level of interaction among languages
• Investigate interaction vs. defect proneness
• Perform a case study
The Case Study
Apache Hadoop, which is a software to supportdistributed data storage and processing.
Used in many real applications (e.g., Yahoo, Facebook).
Commit typesLanguage A (.extA)
Language B (.extB)
Cross-Language Commit (CLC)
Intra-Language Commit (ILC)
RQ1 How much interaction is there between the languages present in a project?
All(RQ 1.1)
Bug Improv
ement
New
Feature
Sub
task
Task Test
0.53 0.12 0.26 0.30 0.45 0.26 0.05
Metric: Percentage of Cross-Language Commits
• All type of commits (RQ1.1)• Commits divided by activity type (e.g., improvement,
bug fixing, new feature) (RQ1.2)
Cross Language RatioLanguage A (.extA)
Language B (.extB)
3 out of 4 commits involving m are Cross-Language
m
Cross Language Ratio of module m CLRm = 0.75
Language C (.extC)
Interaction level of a language
• Cross language ratio of an extension (language)
RQ2 Which extensions interact more?
CLRext Nr files Extension
0.96 49 c
0.87 114 sh
0.72 75 properties
0.71 320 xml
0.59 4328 java
Metric: CLRext
Considering one extension versus all the other extensions (RQ2.1)
Focusing on extension pairsLanguage A (.extA)
Language B (.extB)
2 out of 3 commits involving m together with extA are Cross Language
Cross Language Ratio of module m w.r.t extACLRm,extA = 0.67
Language C (.extC)
m
Interaction level of a pair
• Cross language ratio of an extension w.r.t. another extension
– Asymmetrical measure!
RQ2 Which extensions do interact more?
extA/extB C Java Properties Sh
C - 0.51 0.10 0.50
Java 0.01 - 0.28 0.04
Properties 0 0.54 - 0.36
Sh 0.09 0.22 0.24 -
Xml 0.04 0.52 0.43 0.24
Considering the most interacting ordered pairs of extensions (RQ2.2).
Metric: CLRextA,extB
Cross vs. Intra Lang Modules
Cross Language Module (CLM): CLR is ≥ t%
Intra Language Modules (ILM): CLR is < t%
t = 50%
RQ3 Are Cross Language Modules more defect-prone?
ILM
no def.
ILM
def.
CLM
no def.
CLM
def.
p-value OR
all 1891 225 2875 89 <0.001 0.26
c 2 0 46 1 1.000 Inf
java 1692 201 2239 25 <0.001 0.09
properties 19 1 45 7 0.429 2.92
sh 10 5 64 13 0.162 0.41
xml 96 11 184 24 0.851 1.14
Metric: Odds ratio of CLM with/without defects , ILM with/without defects
- all module regardless of extension (RQ3.1)- by extension (RQ3.2)
RQ3 Are Cross Language Modules more defect-prone?
C Java Properties sh XML
C - Inf 0 0 Inf
Java 2.79 - 0.32 0.43 0.96
Properties Inf 1 - 12.08 0.94
Sh 3.55 4.45 17.17 - 7.44
Xml 3.83 0.95 3.22 4.73 -
Considering interaction between specific ordered pairs of extensions (RQ3.3).
In bold significant values
Metric: Odds ratio of CLM with/without defects , ILM with/without defects
Threats• Confounding factors: age and size of modules• Usage of proxy for interaction between artifacts• Apache Hadoop representativeness• Renaming of modules
Conclusions
Language interaction depends on the type of activity
Frequent interactions are generally not symmetric
Many of them involve XML
Though several language pairs have CLMs significantly more defect prone then ILMs, see C
In general language interaction is not related to higher defect proneness, see Java
Antonio Vetrò - Federico TomassettiMarco Torchiano - Maurizio Morisio
Languages interaction and possible effects: an exploratory study
Questions?