1 exposing behavioral differences in cross-language api mapping relations hao zhong suresh...
TRANSCRIPT
1
Exposing Behavioral Differences in Cross-Language
API Mapping Relations
Hao Zhong Suresh Thummalapenta Tao XieInstitute of Software, CAS, China IBM Research, India NC State University, USA
2
Many programming languages are introduced over decades
Motivation
Business requirements force companies to release applications in multiple languages E.g., Lucene and WordNet have both Java and C# variants
Three major reasons for developing variants in multiple languages
For API libraries, to attract a large number of programmers
For stand-alone applications, to acquire specific features of underlying languages
For mobile applications, to support multiple platforms
3
Develop in one language and translate to other languages Example applications: Lucene.Net and Db4o Advantage: significant reduction of effort
Many translation tools already exist E.g., Java2CSharp, Net2Java Key idea: replace APIs of one language with
their corresponding APIs in another language via API mapping relations
Trends in Developing Variants
4
Associate APIs of one language with APIs of the other language
What Are API Mapping Relations?
Help translate code from one language to the other language
5
Mapped APIs can have behavioral differences Differences among outputs or exceptions being
thrown Such differences lead to defects in translated code
Problem
An Example from Lucene project
Substring API Java: 2nd parameter represents end index C#: 2nd parameter represents #characters
6
Are such behavioral differences pervasive?
What types of behavioral differences are there?
What types of differences are more common than others?
Are these differences easy to be resolved?
Goals of Our Study
7
Mapping relations are not available explicitly and take long time to be written manually
Extraction from tools : translation tools use different formats for specifying API mapping relations
Extraction from translated code: applications under translation may not cover APIs of interest
Extraction from translated code: translated code typically has compilation errors, not feasible for testing
Challenges
8
A tool chain, called TeMAPI, that detects behavioral differences among API mapping relations
Empirical results showing Behavioral differences are pervasive 8 findings on exposed behavioral differences
and implications to API-library implementers&users
Behavioral differences indicating defects in translation tools, and 4 defects were confirmed by developers
Major Contributions
9
MotivationStudy SetupEmpirical ResultsConclusion
Outline
10
Subject libraries
Study Setup
Includes two major steps
11
Create wrapper for each API method in one lang
Apply translation tools on the wrapperExtract the mapping relation from
original & translated wrappers Ignore a mapping relation if the translated wrapper does not
compile
Step 1: Extract Mapping Relations
12
Step 2: Generate Test Cases
Original Wrapper
Translated Wrapper
Apply translation tool
Original Test case
Translated Test case
Generate test on original wrapper
Execute test on translated
wrapperApply translation
tool
Two existing state-of-the-art test generation tools Pex: a dynamic-symbolic-execution-based test generation
tool Randoop: a feedback-guided random test generation tool
13
MotivationStudy SetupEmpirical ResultsConclusion
Outline
14
We address the following research questions: Are behavioral differences pervasive in
cross-language API mapping relations? What are the characteristics of behavioral
differences concerning inputs and outputs?
What are the characteristics of behavioral differences concerning method sequences?
Research Questions
15
Columns E-Tests: #exception-causing test casesColumn A-Tests: #assertion-failing test cases
RQ1: Pervasiveness
About 50% of the generated test cases fail:Behavioral differences are pervasive in API mapping relations between Java and C#
16
Finding 1. 36.8% - handling of null inputs. Java.lang.Integer.parseInt(null, 10) ->NumberFormatException System.Convert.ToInt32(null, 10)->0
Implication API-library implementers should clearly define
behaviors of null inputs Programmers should handle null inputs
carefully.
RQ2: Findings and Implications
17
Finding 2. 22.3% - returned string values. ToString vs toString GetName vs getName
Implication A method in Java and a method in C#
typically return different string values even if they have the same functionality. ▪ Programmers should be cautious while using
these values.
RQ2: Findings and Implications
18
Finding 3. 11.5% - input domains. java.lang.Boolean.parserBoolean(“test”)-
>false System.Boolean.Parse(“test”)-
>FormatException. Implication
Programmers should be cautious while dealing with methods with odd input values.
RQ2: Findings and Implications
19
Finding 4. 10.7% - implementations. java.lang.Character.isJavaIdentifierPart(“\
0”)->true ILOG.J2CsMapping.Util.Character.IsCSharpI
dentifierPart (“\0”)->false
Implication Some differences reflect different
natures of different languages, and some others indicate defects in translation tools.▪ Programmers should learn the natures of
different programming languages to figure out such differences, e.g., different definitions of paths and files.
RQ2: Findings and Implications
20
Finding 5. 7.9% - handling of exceptions.
Implication API-library implementers may design
different exception-handling mechanisms. If programmers do not notice these
differences, they may introduce dead or defective code
java.lang.StringBuffer.insert(int,char)->ArrayIndexOutofBoundsException
System.Text.StringBuilder.Insert(int, char)-> ArgumentOutOfRangeException
IndexOutOfRangeException
RQ2: Findings and Implications
21
Finding 6. 2.9% - constants. java.lang.Double.MAX VALUE->
1.7976931348623157E+308
System.Double.MaxValue -> 1.79769313486232 E+308
Implication API-library implementers may store
different values in constants, even if two constants have the same name.
Programmers should be careful to use constants.
RQ2: Findings and Implications
22
Finding 7. Different inheritance hierarchies that can lead to compilation errors.
Implication When programmers translate code (e.g., cast
statements), they should be aware of such differences.
StringBufferInputStream var4 = ...;InputStreamReader var10 = new InputStreamReader((InputStream)var4, var8);
StringReader var4 = ...;StreamReader var10 = new StreamReader((Stream)var4, var8);
StringBufferInputStream is a subclass of InputStream
StringReader is NOT a subclass of Stream
RQ3: Findings and Implications
23
Finding 8. 3.4% - method sequences.
Implication Legal method sequences can become illegal after
translation, due to various factors such as constraints in the target programming language and field accessibility.
DateFormatSymbols var0 = new DateFormatSymbols();String[] var16 = new String[]...;var0.setShortMonths(var16);
DateTimeFormatInfo var0 = System.Globalization.DateTimeFormatInfo.CurrentInfo;String[] var16 = new String[]...;var0.AbbreviatedMonthNames = var16;
InvalidOperationException
RQ3: Findings and Implications
24
Tool chain + empirical study of exposing behavioral differences of API mapping relations
Behavioral differences are pervasive and dangerous
8 findings with valuable implications for API-library implementers and users + 4 defects confirmed
Conclusion
Original Wrapper
Translated Wrapper
Apply translation tool
Original Test case
Translated Test case
Generate test on original wrapper
Execute test on translated wrapper
Apply translation tool
25
Thank You
Acknowledgment: NSF of China No. 61100071, NSF of China No. 61228203, NSF grants CCF-0845272, CCF-0915400, CNF-0958235, CNS-1160603, and an NSA Science of Security Lablet Grant
26
Tool chain + empirical study of exposing behavioral differences of API mapping relations
Behavioral differences are pervasive and dangerous
8 findings with valuable implications for API-library implementers and users + 4 defects confirmed
Conclusion
Original Wrapper
Translated Wrapper
Apply translation tool
Original Test case
Translated Test case
Generate test on original wrapper
Execute test on translated wrapper
Apply translation tool