cil: intermediate language and tools for analysis and transformation of c programs george c.necula...

Post on 01-Apr-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CIL:Intermediate Language and

Tools for Analysis and Transformation of C programs

George C.NeculaScott McPeak

S.P.RahulWestley Weimer

University of California, BerkeleyProc. of Conference on Compiler Construction, 2002

INDEX

Author

Questions

Overview

Introduction

Evaluation

1st AUTHOR

George C.Necula

Scott McPeak

S.P.Rahul

Westley Weimer

George C.Necula George C. Necula, Philip Wadler: Proceedings of the 35th ACM SIGPLAN-SIGACT

Symposium on Principles of Programming Languages, POPL 2008, San Francisco, California, USA, January 7-12, 2008 ACM 2008

Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008)

François Pottier, George C. Necula: Proceedings of TLDI'07: 2007 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, Nice, France, January 16, 2007 ACM 2007

Jeremy Condit, Matthew Harren, Zachary R. Anderson, David Gay, George C. Necula: Dependent Types for Low-Level Programming. ESOP 2007: 520-535

Bor-Yuh Evan Chang, Xavier Rival, George C. Necula: Shape Analysis with Structural Invariant Checkers. SAS 2007: 384-401

Ajay Chander, David Espinosa, Nayeem Islam, Peter Lee, George C. Necula: Enforcing resource bounds via static verification of dynamic checks. ACM Trans. Program. Lang. Syst. 29(5): (2007)

CONTINUED Jens Knoop, George C. Necula, Wolf Zimmermann: Preface. Electr. Notes Theor. Comput.

Sci. 176(3): 1-2 (2007)

Sumit Gulwani, George C. Necula: A polynomial-time algorithm for global value numbering. Sci. Comput. Program. 64(1): 97-114 (2007)

George C. Necula: Using Dependent Types to Port Type Systems to Low-Level Languages. CC 2006: 1

Feng Zhou, Jeremy Condit, Zachary R. Anderson, Ilya Bagrak, Robert Ennals, Matthew Harren, George C. Necula, Eric A. Brewer: SafeDrive: Safe and Recoverable Extensions Using Language-Based Techniques. OSDI 2006: 45-60

Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, George C. Necula: XFI: Software Guards for System Address Spaces. OSDI 2006: 75-88

Bor-Yuh Evan Chang, Matthew Harren, George C. Necula: Analysis of Low-Level Code Using Cooperating Decompilers. SAS 2006: 318-335

Bor-Yuh Evan Chang, Adam J. Chlipala, George C. Necula: A Framework for Certified Program Analysis and Its Applications to Mobile-Code Safety. VMCAI 2006: 174-189

Scott McPeak Scott McPeak, George C. Necula: Data Structure Specifications via Local Equality Axioms.

CAV 2005: 476-490

George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3): 477-526 (2005)

Scott McPeak, George C. Necula: Elkhound: A Fast, Practical GLR Parser Generator. CC 2004: 73-88

Jeremy Condit, Matthew Harren, Scott McPeak, George C. Necula, Westley Weimer: CCured in the real world. PLDI 2003: 232-244

George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228

George C. Necula, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy code. POPL 2002: 128-139

Dan Bonachea, Eugene Ingerman, Joshua Levy, Scott McPeak: An Improved Adaptive Multi-Start Approach to Finding Near-Optimal Solutions to the Euclidean TSP. GECCO 2000: 143-150

S.P.Rahul George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate

Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228

George C. Necula, Shree Prakash Rahul: Oracle-based checking of untrusted software. POPL 2001: 142-154

Westley Weimer Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, Claire Le Goues: A genetic

programming approach to automated software repair. GECCO 2009: 947-954

Raymond P. L. Buse, Westley Weimer: The road not taken: Estimating path execution frequency statically. ICSE 2009: 144-154

Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest: Automatically finding patches using genetic programming. ICSE 2009: 364-374

Pieter Hooimeijer, Westley Weimer: A decision procedure for subset constraints over regular languages. PLDI 2009: 188-198

Tamim I. Sookoor, Timothy W. Hnat, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: Macrodebugging: global views of distributed program execution. SenSys 2009: 141-154

Claire Le Goues, Westley Weimer: Specification Mining with Few False Positives. TACAS 2009: 292-306

CONTINUED Nicholas Jalbert, Westley Weimer: Automated duplicate detection for bug tracking systems.

DSN 2008: 52-61

bibliographical record in XML Kinga Dobolyi, Westley Weimer: Changing Java's Semantics for Handling Null Pointer Exceptions. ISSRE 2008: 47-56

Raymond P. L. Buse, Westley Weimer: A metric for software readability. ISSTA 2008: 121-130

Raymond P. L. Buse, Westley Weimer: Automatic documentation inference for exceptions. ISSTA 2008: 273-282

Xiang Yin, John C. Knight, Elisabeth A. Nguyen, Westley Weimer: Formal Verification by Reverse Synthesis. SAFECOMP 2008: 305-319

Timothy W. Hnat, Tamim I. Sookoor, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: MacroLab: a vector-based macroprogramming framework for cyber-physical systems. SenSys 2008: 225-238

Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008)

2nd Questions

Q1: What does the recursive structure transformation look like in CIL?

Q2: What's the implementation of Integrating a CFG into the Intermediate Language?

Q3: How do they achieve the goal of making code immune to stack-smashing attack?

Q4: What are the difficulties in designing the whole-program merger and what about implementation?

Q5: How does the merger deal with .lib and .dll?

BackDraws in C

Phenomenon: the same syntax but different meanings.

What if low-level representation?

No ambiguities for loss of structural information about types, loops, and other high-level constructs.

3rd OVERVIEW

CIL

CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program.

Feature

The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics.

Translating from CIL to C is fairly easy.

Q1: What does the recursive structure transformation look like in CIL?

Q2: What's the implementation of Integrating a CFG into the Intermediate Language? (After transformation, call Cil.computeCFGInfo<Compute all statements and find the successor and predecessor of each statement;Return a list of statements>)

4th INTRODUCTION

Basic components

Compilation( C ---> CIL)

A whole-program merger

Representative application

BASIC COMPONENTS

Lvalue

An lvalue is expressed as a pair of a base plus an offset. The base address can be either the starting address for the storage for a variable (local or global) or any pointer expression.

BASIC COMPONENTS

Expression & Instruction

Note:

Casts are inserted explicitly to make the program conform to our type system.

BASIC COMPONENTS

Statement

BASIC COMPONENTS

TypesCIL moves all type declarations to the beginning of the program

and gives them global scope.

All anonymous composite types are given unique names in CIL and every composite types has its own declaration at the top-level.

BASIC COMPONENTS

AttributesIt is often useful to have a mechanism for the programmer to

communicate additional information to the program analysis.

The type attributes for a base type must be specified immediately following the type.

The type attributes for a pointer type must be specified immediately after the * symbol.

The attributes for a function type or for an array type can be specified using parenthesized declarators.

COMPILATION One of the most significant transformations is that expressions

that contain side-effects are separated into statements.

Type specifiers are interpreted and normalized.

Nested structure tag definitions are pulled apart. This means that all structure tag definitions can be found by a simple scan of the globals.

Prototypes are added for those functions that are called before being defined. Furthermore, if a prototype exists but does not specify the type of parameters that is fixed.

Initializers are normalized to include specific initialization for the missing elements.

CIL will remove from the source file those type declarations, local variables and inline functions that are not used in the file. This means that your analysis does not have to see all the ugly stuff that comes from the header files.

Local variables in inner scopes are pulled to function scope (with appropriate renaming). Local scopes thus disappear. This makes it easy to find and operate on all local variables in a function.

A WHOLE-PROGRAM MERGER

A tool that merges all of a program’s compilation units into a single compilation unit, with proper renaming to preserve semantics considering many analyses are most effective when applied to the whole program.

Q4: What's the difficulties in designing the whole-program merger and what about implementation?

File-scope identifiers must be renamed properly to avoid clashes with globals and with similar identifiers in different files.

Solution: Structural Equivalence VS Name Equivalence

For each file there are two merging phases. In the first phase we merge the types and tags.Then in the second stage we rewrite the variable declarations and function bodies.

REPRESENTATIVE APP

Q3: How do they achieve the goal of making code immune to stack-smashing attack?

CIL modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack.

5th EVALUATION

CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects.

CIL was tested against GCC’s c-torture testsuite and (except for the tests involving complex numbers and inner functions, which CIL does not currently implement) CIL passes most of the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that it should pass. GCC itself fails 19 tests.

Thank you!

More information at http://hal.cs.berkeley.edu/cil/

top related