safety checks and semantic understanding via program ...lightweight specifications conservative...

22
IBM Haifa Research Lab: Software Asset Management Group IBM PLE Seminar 2005 © 2005 IBM Corporation Safety Checks and Semantic Understanding via Program Analysis Techniques Nurit Dor Joint Work: EranYahav, Inbal Ronen, Sara Porat

Upload: others

Post on 22-Jan-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

IBM PLE Seminar 2005 © 2005 IBM Corporation

Safety Checks and Semantic Understanding via Program Analysis

Techniques Nurit Dor

Joint Work: EranYahav, Inbal Ronen, Sara Porat

Page 2: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Goal

� Find properties of a program � Anti-patterns that indicate potential bugs� Semantic-patterns that have a meaning of interest

� Technology� Lightweight specifications� Conservative (sound) static analysis � Combing static and dynamic analyses

� Challenges� Scale to real programs� Produce a reasonable number of false positives� Utilizing dynamic�information as much as possible

Page 3: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

public class SimpleExample1 {

public static void main(String[] args) {

FileComponent f1 = new FileComponent();

foo(f1);

...

bar(f1);

}

public static void foo(FileComponent f) {

...

f.close();

...

}

public static void bar(FileComponent f) {

...

f.read(); ...

}

}

“Finding Bugs is Easy”

FileComponent f = new FileComponent();

f.close();

f.read();

Not really…

Page 4: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

private Connection getConnection (){

if (…)

return DriverManager.getConnection(DBUrl);

else {

Context initial = new InitialContext();

DataSource dataSource = (DataSource) initial.lookup(DSName);

return dataSource.getConnection();

}

}

public void execute(String query){

Connection conn = getConnection();

Statement stmt = conn.createStatement();

stmt.execute(query); // which DB and table is accessed?

}

Understanding program dependency is easy?

Not really…

Page 5: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Finding Properties is Hard

� Handling “non-local” properties

� Interprocedural analysis

� Producing a reasonable number of false positives

� “Not finding non-bugs” is hard

� Correlating statements , e.g. which SQL statement relates to which database connection

� Inferring values and not just control and data flow

� Determine values that can occur at runtime

� Scaling to real programs

Page 6: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation

� IBM Research Projects

� CARDS

� SAFE� Pattern Language : Specifying properties

� Typestate Algorithms : Identifying properties instances

� Inferring pointer aliasing

� Handling multiple objects� Combing Static and Dynamic Analyses

Page 7: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

CAPA – Common Architecture for Program Analysis

� IBM Research cross lab project

� Goal: A program analysis infrastructure effort to help Research

� quickly create software lifecycle applications that exploit various flavors of program analysis

� foster sharing and collaboration between groups

� speed technology transfer to product groups

Page 8: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

CARDS (Combining Analyses: Runtime, Dynamic and Static) HRLGoal: End-To-End Impact analysisWhat happens if a change a database table?

Page 9: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

� Scalable and flexible error-detection (“bug finding”) and verification

• Detecting violations of simple correctness properties

• Verify the absence of these properties

� Wide range of techniques

• Detect common bug patterns based on XML representation of a program

• Integrated pointer-analysis and Interprocedural typestate checking

� More precise than existing tools (=less false alarms)

� Experimental version deployed to early adopters within IBM SWG

Watson Research project

Page 10: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

�Pattern Language : Specifying properties� Typestate Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

� Combing Static and Dynamic Analyses

Page 11: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation IBM PLE Seminar 2005

Specifying pattern

� Used for modeling runtime properties of interest� Sequences of method invocations that have a specific semantic meaning

• Data flow relationships– Result of one invocation is the target/parameter of a second invocation

• Control flow relationships– Order may or may not be meaningful

• Some method invocations are semantically equivalent – Usage of abstract patterns and inheritance

� List of values (parameters, return values,..) to resolve

� Patterns are written in XML and converted into an automata

� A pattern instance is a set of specific method calls that are detected in the code.� The same method call can be part of several pattern instances (of the same

or different pattern)

Page 12: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

public class SimpleExample1 {

public static void main(String[] args) {

FileComponent f1 = new FileComponent();

foo(f1);

...

bar(f1);

}

public static void foo(FileComponent f) {

...

f.close();

...

}

public static void bar(FileComponent f) {

...

f.read(); ...

}

}

Finite State Automata

FileComponent may not be read after being closed

open closed err

close readread

*

close

Page 13: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Finite State Stack Automata

Identify database and table access statements

init connected executed

res = DriverManager.getConnection (s)

DBName := sconn := res

gotStat

private Connection getConnection (){

return DriverManager.getConnection(DBUrl);

}

public void execute(String query){

Connection conn = getConnection();

Statement stmt = conn.createStatement();

stmt.execute(query);

}

res = Connection.createStatement ()

conn == targetstat := res

Statement.execute(s)

stat ==� targetSQLstmt := s

Page 14: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

� Pattern Language : Specifying properties�Typestate Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

� Combing Static and Dynamic Analyses

Page 15: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState Base algorithm – Single Objects

� Based on flow insensitive global pointer analysis

� Concrete objects are represented by a finite set of abstract objects, e.g. for each allocation site

� Iterative algorithm that tracks <o, state>

� Each object is handled separately

� Handles pointer aliasing conservatively, i.e. weak-updates

f1.close…

f2.read

<o, open>

open closed err

close readread

*

close

<o, open>, <o,close>

<o, open>,<o,err>

Page 16: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState Uniqueness algorithm

� Compute which abstract objects may represent at most one runtime object

� If a pointer may only point to a single unique abstract object, perform a strong update

f1.close

f2.read

<o, open>f1

f2

<o,close>

<o,close>,<o,err>

o

o’

Page 17: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState with Access Path

� Track which access paths are definitely pointing to the tracked abstract object

� perform strong update

f2 = f1

f1.close

f2.read

<o, open, {f1}>

f1

f2

o

o’

<o, open, {f1, f2}>

<o, close, {f1, f2}>

<o, err, {f1, f2}>

Page 18: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

TypeState for Multiple Objects

� Track memory < {conn = o, stat = o’}, typestate>

� On Statements

� Check precondition

� Update Memory

init connected executed

res = DriverManager.getConnection (s)

DBName := sconn := res

gotStat

……

res = Connection.createStatement ()

conn == targetstat := res

Statement.execute(s)

stat ==� targetSQLstmt := s

Page 19: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Agenda

� Motivation� IBM Research Projects

� CARDS� SAFE

� Pattern Language : Specifying Properties� Type state Algorithms : Identifying property instances

� Inferring pointer aliasing� Handling multiple objects

�Combing static and dynamic analyses

Page 20: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Inferring values by utilizing dynamic information

� For some properties data values are of interest� Sparsely log execution (data and control) of a set of

predefined method invocations� Methods indicated by the properties� Common external input methods

� Correlate runtime method invocation to the source code according to level of existing monitoring precision� Caller-callee � Line number� Byte code offset

Page 21: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation� IBM PLE Seminar 2005

Static and Dynamic Combination

� Execute the program and obtain log files of method invocations� Statically perform typestate algorithm

� Report pattern instances

� Statically perform data value flow of static and dynamic values� Report all possible values that may reach program points of interest

� Report pattern instances with values� Limitations

� May report values on a program point that can never reach this point � Is not (and can never) be sound� May lose precision due to the two phase approach: typestate and

value resolution

Page 22: Safety Checks and Semantic Understanding via Program ...Lightweight specifications Conservative (sound) static analysis Combing static and dynamic analyses ... Concrete objects are

IBM Haifa Research Lab: Software Asset Management Group

© 2005 IBM Corporation�� IBM PLE Seminar 2005

Empirical results

� CARDS dependency analysis� Detects database accesses on J2EE and Java applications� Infers call graph from dynamic logging

� Safe error detection� Verifies usages of Socket, Vector, Iterator,..� Scaling is good: ~10min for 100,000LOC� Best Typestate checking algorithm verifies 95.6% of candidate statements

(i.e. may reach an error state)� False alarms are due to

• imprecision in pointer aliasing• Logic of the program implies the safety, e.g. a flag indicating if a vector is

empty of not