safety checks and semantic understanding via program ...lightweight specifications conservative...
TRANSCRIPT
IBM Haifa Research Lab: Software Asset Management Group
IBM PLE Seminar 2005 © 2005 IBM Corporation
Safety Checks and Semantic Understanding via Program Analysis
Techniques Nurit Dor
Joint Work: EranYahav, Inbal Ronen, Sara Porat
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Goal
� Find properties of a program � Anti-patterns that indicate potential bugs� Semantic-patterns that have a meaning of interest
� Technology� Lightweight specifications� Conservative (sound) static analysis � Combing static and dynamic analyses
� Challenges� Scale to real programs� Produce a reasonable number of false positives� Utilizing dynamic�information as much as possible
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
public class SimpleExample1 {
public static void main(String[] args) {
FileComponent f1 = new FileComponent();
foo(f1);
...
bar(f1);
}
public static void foo(FileComponent f) {
...
f.close();
...
}
public static void bar(FileComponent f) {
...
f.read(); ...
}
}
“Finding Bugs is Easy”
FileComponent f = new FileComponent();
f.close();
f.read();
Not really…
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
private Connection getConnection (){
if (…)
return DriverManager.getConnection(DBUrl);
else {
Context initial = new InitialContext();
DataSource dataSource = (DataSource) initial.lookup(DSName);
return dataSource.getConnection();
}
}
public void execute(String query){
Connection conn = getConnection();
Statement stmt = conn.createStatement();
stmt.execute(query); // which DB and table is accessed?
}
Understanding program dependency is easy?
Not really…
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Finding Properties is Hard
� Handling “non-local” properties
� Interprocedural analysis
� Producing a reasonable number of false positives
� “Not finding non-bugs” is hard
� Correlating statements , e.g. which SQL statement relates to which database connection
� Inferring values and not just control and data flow
� Determine values that can occur at runtime
� Scaling to real programs
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Agenda
� Motivation
� IBM Research Projects
� CARDS
� SAFE� Pattern Language : Specifying properties
� Typestate Algorithms : Identifying properties instances
� Inferring pointer aliasing
� Handling multiple objects� Combing Static and Dynamic Analyses
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
CAPA – Common Architecture for Program Analysis
� IBM Research cross lab project
� Goal: A program analysis infrastructure effort to help Research
� quickly create software lifecycle applications that exploit various flavors of program analysis
� foster sharing and collaboration between groups
� speed technology transfer to product groups
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
CARDS (Combining Analyses: Runtime, Dynamic and Static) HRLGoal: End-To-End Impact analysisWhat happens if a change a database table?
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
� Scalable and flexible error-detection (“bug finding”) and verification
• Detecting violations of simple correctness properties
• Verify the absence of these properties
� Wide range of techniques
• Detect common bug patterns based on XML representation of a program
• Integrated pointer-analysis and Interprocedural typestate checking
� More precise than existing tools (=less false alarms)
� Experimental version deployed to early adopters within IBM SWG
Watson Research project
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation IBM PLE Seminar 2005
Agenda
� Motivation� IBM Research Projects
� CARDS� SAFE
�Pattern Language : Specifying properties� Typestate Algorithms : Identifying property instances
� Inferring pointer aliasing� Handling multiple objects
� Combing Static and Dynamic Analyses
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation IBM PLE Seminar 2005
Specifying pattern
� Used for modeling runtime properties of interest� Sequences of method invocations that have a specific semantic meaning
• Data flow relationships– Result of one invocation is the target/parameter of a second invocation
• Control flow relationships– Order may or may not be meaningful
• Some method invocations are semantically equivalent – Usage of abstract patterns and inheritance
� List of values (parameters, return values,..) to resolve
� Patterns are written in XML and converted into an automata
� A pattern instance is a set of specific method calls that are detected in the code.� The same method call can be part of several pattern instances (of the same
or different pattern)
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
public class SimpleExample1 {
public static void main(String[] args) {
FileComponent f1 = new FileComponent();
foo(f1);
...
bar(f1);
}
public static void foo(FileComponent f) {
...
f.close();
...
}
public static void bar(FileComponent f) {
...
f.read(); ...
}
}
Finite State Automata
FileComponent may not be read after being closed
open closed err
close readread
*
close
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Finite State Stack Automata
Identify database and table access statements
init connected executed
res = DriverManager.getConnection (s)
DBName := sconn := res
gotStat
…
private Connection getConnection (){
return DriverManager.getConnection(DBUrl);
}
public void execute(String query){
Connection conn = getConnection();
Statement stmt = conn.createStatement();
stmt.execute(query);
}
…
res = Connection.createStatement ()
conn == targetstat := res
Statement.execute(s)
stat ==� targetSQLstmt := s
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Agenda
� Motivation� IBM Research Projects
� CARDS� SAFE
� Pattern Language : Specifying properties�Typestate Algorithms : Identifying property instances
� Inferring pointer aliasing� Handling multiple objects
� Combing Static and Dynamic Analyses
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
TypeState Base algorithm – Single Objects
� Based on flow insensitive global pointer analysis
� Concrete objects are represented by a finite set of abstract objects, e.g. for each allocation site
� Iterative algorithm that tracks <o, state>
� Each object is handled separately
� Handles pointer aliasing conservatively, i.e. weak-updates
f1.close…
f2.read
<o, open>
open closed err
close readread
*
close
<o, open>, <o,close>
<o, open>,<o,err>
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
TypeState Uniqueness algorithm
� Compute which abstract objects may represent at most one runtime object
� If a pointer may only point to a single unique abstract object, perform a strong update
f1.close
…
f2.read
<o, open>f1
f2
<o,close>
<o,close>,<o,err>
o
o’
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
TypeState with Access Path
� Track which access paths are definitely pointing to the tracked abstract object
� perform strong update
f2 = f1
f1.close
…
f2.read
<o, open, {f1}>
f1
f2
o
o’
<o, open, {f1, f2}>
<o, close, {f1, f2}>
<o, err, {f1, f2}>
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
TypeState for Multiple Objects
� Track memory < {conn = o, stat = o’}, typestate>
� On Statements
� Check precondition
� Update Memory
init connected executed
res = DriverManager.getConnection (s)
DBName := sconn := res
gotStat
……
res = Connection.createStatement ()
conn == targetstat := res
Statement.execute(s)
stat ==� targetSQLstmt := s
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Agenda
� Motivation� IBM Research Projects
� CARDS� SAFE
� Pattern Language : Specifying Properties� Type state Algorithms : Identifying property instances
� Inferring pointer aliasing� Handling multiple objects
�Combing static and dynamic analyses
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Inferring values by utilizing dynamic information
� For some properties data values are of interest� Sparsely log execution (data and control) of a set of
predefined method invocations� Methods indicated by the properties� Common external input methods
� Correlate runtime method invocation to the source code according to level of existing monitoring precision� Caller-callee � Line number� Byte code offset
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation� IBM PLE Seminar 2005
Static and Dynamic Combination
� Execute the program and obtain log files of method invocations� Statically perform typestate algorithm
� Report pattern instances
� Statically perform data value flow of static and dynamic values� Report all possible values that may reach program points of interest
� Report pattern instances with values� Limitations
� May report values on a program point that can never reach this point � Is not (and can never) be sound� May lose precision due to the two phase approach: typestate and
value resolution
IBM Haifa Research Lab: Software Asset Management Group
© 2005 IBM Corporation�� IBM PLE Seminar 2005
Empirical results
� CARDS dependency analysis� Detects database accesses on J2EE and Java applications� Infers call graph from dynamic logging
� Safe error detection� Verifies usages of Socket, Vector, Iterator,..� Scaling is good: ~10min for 100,000LOC� Best Typestate checking algorithm verifies 95.6% of candidate statements
(i.e. may reach an error state)� False alarms are due to
• imprecision in pointer aliasing• Logic of the program implies the safety, e.g. a flag indicating if a vector is
empty of not