strings and automata modulo theories margus veanes july 18, 2015smt'15, san fransisco1

53
STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015 SMT'15, San Fransisco 1

Upload: august-stephens

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 1

STRINGS AND

AUTOMATA MODULO THEORIES

Margus Veanes

July 18, 2015

Page 2: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 2

• Symbolic execution– Path feasibility analysis involving string

constraints– Regular expression matching

• Security vulnerabilities– SQL injection attacks– XSS attacks – DoS attacks

• e.g. regex injection

– Directory traversal attacks

…• Data processing

– Parallelization– Deforestation

• Malware detection

MOTIVATION

July 18, 2015

[OWASP]top 1,3 culprits

http://foo.bar.system/scripts/..%c1%1c../winnt/system32/cmd.exe?/c+dir+c:\

Page 3: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 3

“EARLY” WORK RELATED TO STRING ANALYSIS

• Tools– Mona: Henriksen-Jensen-Jørgensen-Klarlund-Paige-Rauhe-Sandholm, TACAS’95

• Built on BRICS automata library

– JSA: Christensen-Møller-Schwartzbach, SAS’03 (Uses BRICS)– Haderach: Shannon-Hajra-Lee-Zhan-Khurshid, MUTATION’07 (Uses BRICS)

• Theory– Bjørner, PhD Thesis’98, Decision procedure for queues– Blumensath-Grädel, LICS’00 (automatic structures)– Benedikt-Libkin-Schwentick-Segoufin, LICS’01 (regular string relations)– Khoussainov-Nies-Rubin-Stephan, LICS’04 (automatic Boolean Algebras)– Bala, STACS’04, (regular term matching)– Kunc, DLT’2007, (complexity of language equations)

July 18, 2015

Page 4: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 4

THE RISE OF THE STRING ANALYZERS

• String theory encodings in SMT:– Pex-LL: Bjørner-Tillmann-Voronkov, TACAS’09 (strings + SMT)– Reggae: Li-Xie-Tillmann-deHalleux-Schulte, ASE’09 (symolic exploration of regex code)– Z3-str: Zheng-Zhang-Ganesh, ESEC/FSE 2013 (plugin to Z3)– CVC4-str: Liang-Reynolds-Tinelli-Barrett-Deters, CAV’14 (DPLL(TSLRp))– S3: Trinh-Chu-Jaffar, CCS’14 (uses Z3-str-star)

• Automata related:– Stranger: Yu-Alkhalaf-Bultan-Ibarra-Cova, SPIN’08, TACAS’09, TACAS’10 (automata based)– DPRLE: Hooimeijer-Weimer, PLDI’09 (subset checking)– Hampi: Kiezun-Ganesh-Guo-Hooimeijer-Ernst, ISSTA’09 (best paper award) (reduction to BV)– Kaluza(in Kudzu): Saxena-Akhawe-Hanna-Mao-McCamant-Song, Okland’10 (Hampi + mult.var.)– Rex: Veanes-deHalleux-Tillmann-Bjørner-deMoura, ICST’10, LPAR’2010 (language acceptors)– Bek: Hooimeijer-Livshits-Molnar-Saxena-Veanes-Bjørner, USENIX Security'11, POPL’12 (transducers)– Bex: D’Antoni-Veanes, VMCAI’13, CAV’13 (lookahead)– PASS: Li-Ghosh, HVC 2013 (best paper award) . (array based)– SMC: Luu-Shinde-Saxena-Demsky, PLDI’14 (model counting)

CAV’15:– ABC: Aydin-Bang-Bultan (automata based counting, using Stranger and BRICS)– NORN: Abdulla-Atig-Chen-Holik-Rezine-Rümmer-Stenman, also CAV’14 (Horn clauses, BRICS)– Z3-str+: Zheng-Ganesh-Subramanian-Tripp-Dolby-Zhang. (string + regex + length )

July 18, 2015

Page 5: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 5

TWO QUESTIONS

• What are characters?

• What are strings?

July 18, 2015

smileycipher(“hello world”) = “ 😧😤😫😫😮😶😮

”😱😫😣

Is this a string function?

Page 6: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 6

WHAT ARE CHARACTERS?1. Elements of a Finite Alphabet ?

– Only primitive operation is =: Bool– What about Unicode, e.g., 😀 😁 http://unicode.org/charts/PDF/U1F600.pdf

• || = 1,112,064 – For succinctness allow total order ≺: Bool and ranges [a-b] (denotes {x | a ≼ x ≼ b})

• This affects the notion of automaton over !• Why not other operations as well?

2. Bit-vectors, say char (BV16) ?– With primitive operations like &: char char char – “ ” 😀 = “\uD83D\uDE00” (UTF16 surrogate pair)

• has its own theory, namely bv theory!

3. Integers (code points) ?– 😀 = 0x1F600 = 128512– e.g. + 1 = = 0x1F601😀 😁

• has its own theory, namely int theory!

July 18, 2015

Page 7: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 7

WHAT ARE STRINGS?• Finite sequences of characters (char)

– CVC4-strSingleton string = char

• Restricted arrays of int to char– Pex-LL, PASSarray<int,char> ≠ char singleton string ≠ char

• Finite lists of characters– Pex-Rexlist<char> ≠ char singleton string ≠ char

• Finite queues– transducers

The answer depends on the context and the required operations. – First, Last, Rest, Append, Substring, Length, …

July 18, 2015

Page 8: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 8

ANALYSIS TASKS

• Consider character type C, string type S<C>, and regular expression type R<C>.– When is DPLL(TC,TS<C>,TR<C>) possible/feasible?

• What about (finite state) transducers?– Regular transformations of type S<Tin> S<Tout>

– Typically Tin = Tout = bit-vectors– Many string transformations are such:

• sanitizers, encoders

July 18, 2015

Page 9: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 9

HTML ENCODER

July 18, 2015

Arithmetic operations on

characters

Page 10: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 10

FOR EACH DOMAIN SPECIFIC TASK

Design a language that• only has the features required by the task• it is simple to use• enables to automatically reason about what

the programs do• compiles into efficient code

July 18, 2015

Page 11: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 11

THE REST OF THE TALK

• Symbolic Automata and Transducers• BEK and string sanitizers• BEX and string encoders• Data parallel BEK/BEX for string processing

July 18, 2015

Page 12: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE AUTOMATA

July 18, 2015 SMT'15, San Fransisco 12

Page 13: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 13

SYMBOLIC FINITE AUTOMATON (SFA)

• Labels are predicates

qp x. 'a' ≤ x ≤ 'd'

July 18, 2015

one symbolic transition:

denotesmany concrete

transitions:qp

'a'

‘c'‘b'

'd'

for x〚 'a' ≤ x ≤ 'd' 〛

Page 14: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SFA EXECUTION EXAMPLE

14

λx. x mod 2=0

λx. x mod 2=1

p q

λx. x mod 2 =0λx. x mod 2=1

1 2 5 3

p p q p p

p is final accept the inputJuly 18, 2015 SMT'15, San Fransisco

Page 15: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE AUTOMATAWhat is the alphabet?

July 18, 2015 SMT'15, San Fransisco 15

Page 16: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALPHABET IS ANEFFECTIVE BOOLEAN ALGEBRA

July 18, 2015 SMT'15, San Fransisco 16

Domain Predicates

P 2D

(D,P, 〚 _ 〛 , , T, , , )

Page 17: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALPHABET EXAMPLE

July 18, 2015 SMT'15, San Fransisco 17

{a,b}

{,{a},{b},{a,b}}

id

{a,b}

c

p q

{a,b}{a}

{b}

a*b(a|b)*

SFA over 2{a,b} :

regex :

2{a,b} = (D,P, 〚 _ 〛 , , T, , , )

Page 18: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALPHABET EXAMPLE: 2BVK

• D = {n | 0 n < 2k}• P = BDDs of depth k• Boolean operations are BDD operations Below 〚 i 〛 = {n D | i'th bit of n is 1}

July 18, 2015 SMT'15, San Fransisco 18

i has fixed size independent of i

Page 19: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALPHABET EXAMPLE: SMTINT

• D = Integers • P = integer linear arithmetic formulas

(with one fixed free variable)• 〚 〛 = 〚〛 〚〛• 〚〛 = , 〚 〛 = D \ 〚〛• Satisfiability: 〚〛

July 18, 2015 SMT'15, San Fransisco 19

Page 20: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

BOOLEAN ALGEBRA INTERFACE IN C#

July 18, 2015 SMT'15, San Fransisco 20

public interface IBoolAlg<P>{

P Top { get; }P Bot { get; }P Not(P pred);P Or(P pred1, P pred2);P And(P pred1, P pred2);bool IsSat(P predicate);}

public interface IBoolAlgExt<P,D> : IBoolAlg<P>{IEnumerable<D> Den(P);P One(D);}

Page 21: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

UNIT ALPHABET EXAMPLE IN C#

July 18, 2015 SMT'15, San Fransisco 21

class A1 : IBoolAlg<bool>{

public bool Top { get { return true; } }public bool Bot { get { return false; } }public bool Not(bool pred) { return !pred; }public bool Or(bool pred1, bool pred2) { return pred1 || pred2; }public bool And(bool pred1, bool pred2) { return pred1 && pred2; }public bool IsSat(bool pred){ return pred; }}

One-letter alphabet

Page 22: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ANOTHER ALPHABET EXAMPLE IN C#

July 18, 2015 SMT'15, San Fransisco 22

class A16 : IBoolAlg<UInt16>{

public UInt16 Top { get { return 0xFFFF; } }public UInt16 Bot { get { return 0; } }public UInt16 Not(UInt16 pred) { return ~pred; }public UInt16 Or(UInt16 pred1, UInt16 pred2) { return pred1 | pred2; }public UInt16 And(UInt16 pred1, UInt16 pred2) { return pred1 & pred2; }public bool IsSat(UInt16 pred){ return pred != 0; }}

16-letter alphabet

Page 23: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALPHABET TRANSFORMATIONS

• Effective Boolean algebras can be extended– e.g. disjoint union

• Effective Boolean algebras can be restricted– e.g. restriction wrt. a given predicate

July 18, 2015 SMT'15, San Fransisco 23

Page 24: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

DISJOINT UNION OF ALPHABETS IN C#

July 18, 2015 SMT'15, San Fransisco 24

public class PairAlg<S, T> : IBoolAlg<Pair<S, T>>{ IBoolAlg<S> A; IBoolAlg<T> B; Pair<S,T> Bot {get return new Pair<S,T>(A.Bot,B.Bot);} … public Pair<S, T> Or(Pair<S,T> a, Pair<S,T> b) { return new Pair<S,T>(A.Or(a[0],b[0]), B.Or(a[1],b[1])); } public bool IsSat(Pair<S,T> p) { return A.IsSat(p[0]) || B.IsSat(p[1]); }}

Page 25: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SFA VS. CLASSICAL AUTOMATA?

• SFAs can support infinite alphabets• For some cases SFAs are

exponentially more succinct than NFAsExample (recall the BDDs i from before):

Equivalent NFA requires 2k transitions.July 18, 2015 SMT'15, San Fransisco 25

Page 26: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE AUTOMATAAlgorithms over SFAs.

July 18, 2015 SMT'15, San Fransisco 26

Page 27: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ALGORITHMS OVER SFAS

• Language intersection– Uses product of automata

• Language complementation– Requires determinization

• Minimization– Extensions of Moore/Hopcroft [POPL’14]

• Regex SFA construction– Uses BDDs to represent Unicode character sets– Requires BDD interval-set conversions

• May cause exponential blowup: recall the BDDs i

July 18, 2015 SMT'15, San Fransisco 27

Page 28: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

LANGUAGE INTERSECTION

• Uses DFS and product of transitions

July 18, 2015 SMT'15, San Fransisco 28

p1 q1

p2 q2

A:

B:

p1

p2

AB: q1

q2

delete when

unsat

X

Page 29: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

INTERSECTION EXAMPLE

July 18, 2015 SMT'15, San Fransisco 29

a1 a2

2

A:

B:

66

b1

3

a1

b1

a2 b2

23

63

a1 b2

3

let k(x) ((x mod k) = 0)

AB:

b263

X

Page 30: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

LANGUAGE COMPLEMENTATIONFirst determinize then swap final and nonfinal states

July 18, 2015 SMT'15, San Fransisco 30

p q

r

{p}{q}

{q,r}

{r}

delete unsat guards

determinize

Page 31: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

31

MINIMIZATION (SYMBOLIC MOORE)

D := (F (Q\F)) ((Q\F) F)foreach (p’,q’) D, (p,q) D if (IsSat(guard(p,p’) ∧ guard(q,q’)))

add (p,q) to D

p

q

p’

q’

distinguishable

φ

ψ

distinguishable IsSat(φ ∧ ψ)

July 18, 2015 SMT'15, San Fransisco

Page 32: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

REGEX SFA

• Classical algorithm extended to work with predicates– First produces SFA (SFA with -moves )– Then -moves are eliminated using the

standard -elimination algorithm– Requires interval-set BDD algorithm for

converting character classesExample: [\0x0-\0xFF] = BDD whose bits in pos. > 7 are 0

July 18, 2015 SMT'15, San Fransisco 32

Page 33: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ONLINE SFA ALGORITHM EXAMPLES

• http://www.rise4fun.com/Bex/zE

July 18, 2015 SMT'15, San Fransisco 33

Page 34: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE TRANSDUCERS

July 18, 2015 SMT'15, San Fransisco 34

Page 35: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE TRANSDUCER (SFT)

• Labels are guarded transformation functions

Concrete transitions:

p

q

Symbolic transition:

‘\x80’/“\xC2\x80”

… ‘\x7FF’/“\xDF\xBF”

q

p

x. 8016 ≤ x ≤ 7FF16/[C016|x10,6, 8016|x5,0]

guard

bitvector operations

1920transitions

SMT'15, San Fransisco 35July 18, 2015

Page 36: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SFT EXECUTION EXAMPLE

36

x mod 2 =0/[x, x]

x mod 2 =1/[x-1]

p q

x mod 2 =0/[]x mod 2 =1/[x-1]

1 2 5 3

p p q p p

Input tape

Output tape 0 2 2 4 2

July 18, 2015 SMT'15, San Fransisco

Page 37: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SYMBOLIC FINITE TRANSDUCERSProperties and algorithms

July 18, 2015 SMT'15, San Fransisco 37

Page 38: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

WHY SFTS?

• They have good algebraic properties (POPL'12)– SFTs are closed under composition– Equivalence is decidable in the single-valued case– domain of an SFT is an SFA

• SFAs are closed under Boolean operations

• Useful for various analysis tasks

July 18, 2015 SMT'15, San Fransisco 38

Page 39: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SFT COMPOSITION

AB = x.B(A(x))

July 18, 2015 SMT'15, San Fransisco 39

a1 a2A

B

x>0/ [x+1,x+2]

b1 b2x<5/ [] b3x<4/[x,x]

AB a1b1

x>0 x+1<5 x+2<4 / [x+2, x+2] a2b3

Page 40: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 40

• Composition:

• Equiv. checking for single-valued-SFTs:(undecidable in general)

Algorithms use SMT for satisfiability checking of character formulas

SFT A B

SFT ALGORITHMS

July 18, 2015

in outSFT Bin outSFT A

in outSFT A

in outSFT B

“input string” A and B not equivalent

Page 41: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 41

PROPERTY ANALYSIS (USENIX SEC'11)

• Does it matter if a sanitizer is applied twice? Idempotence:

• Does order of sanitizers matter? Commutativity:

July 18, 2015

“input string” A not idempotent

A AA A

A

“input string” A and B not commutative

B AB A

A BA B

Page 42: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

APPLICATIONS

July 18, 2015 SMT'15, San Fransisco 42

Page 43: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

APPLICATIONS OF SFAS/SFTS

• SFAs:– Regex support in parameterized unit testing– Fuzz testing of regexes– Password generation

• SFTs:– Analysis of string encoders/decoders– Security analysis of sanitizers

July 18, 2015 SMT'15, San Fransisco 43

Page 44: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 44

APPLICATION 1REGEXES IN PARAMETERIZED UNIT TESTING

• Rex component in Pex• Generate values for s that reach the return branches

– s is a string of Unicode characters (16-bit bit-vectors)

July 18, 2015

bool IsValidEmail(string s) { string r1 = @"^[A-Za-z0-9]+@(([A-Za-z0-9\-])+\.)+([A-Za-z\-])+$"; string r2 = @"^\d.*$"; if (System.Text.RegularExpressions.Regex.IsMatch(s, r1)) if (System.Text.RegularExpressions.Regex.IsMatch(s, r2)) return false; //branch 1 else return true; //branch 2 else return false; //branch 3 }

Solve: sL(r1)L(r2) [eg. s = “[email protected]”]

Solve: sL(r1)\L(r2) [eg. s = “[email protected]”]

Solve: sL(r1) [eg. s = “[email protected]”]

Page 45: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

APPLICATION 2 PASSWORD GENERATIONGiven constraints:• Length is k: "^[\x21-\x7E]{k}$"• Contains 2 capital letters: "[A-Z].*[A-Z]"• Contains a digit: "\d"• Contains a non-word character: "\W"Generate random instances with uniform distribution that match all the above conditions.k=4 : http://www.rise4fun.com/Rex/4nE

http://www.rise4fun.com/Bek/c3j

July 18, 2015 SMT'15, San Fransisco 45

Page 46: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 46

APPLICATION 3SAFETY ANALYSIS

Example: suppose good output = “NoEars"NoEars = [^\uDE38-\uDE40]*bad output: WithEars = Complement(NoEars)

x(smileycipher(x) WithEars) ?

{x | smileycipher(x) WithEars}

Does there exist an input x that causes “ears" in the

output ?

http://www.rise4fun.com/Bek/5sHO

July 18, 2015

Page 47: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

EXTENSIONS

July 18, 2015 SMT'15, San Fransisco 47

Page 48: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

EXTENSIONS OF SFAS AND SFTS

• ESFT– SFA/SFTswith look-ahead [CAV'13]– BEX language

• STT – Symbolic automata/transducer over trees– FAST language [PLDI’14]

• k-SFT – SFT with lookback [POPL’15]

July 18, 2015 SMT'15, San Fransisco 48

Page 49: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

ESFAS AND ESFTS

• Unlike in the classical caselook-ahead breaks many properties– e.g. equivalence of ESFAs is undecidable

July 18, 2015 SMT'15, San Fransisco 49

x1≤FF ∧ x2≤FF ∧ x3≤FF / [x1>>2, ((x1&3)<<4)|(x2>>4), ((x2&0xF)<<2)|(x3>>6), x3&0x3F]

q

above ESFT, reads 3 and writes 4 symbols

(base64encoder)

http://www.rise4fun.com/Bex/tutorial/guide

M a n M a n

T W F u T W F u

Page 50: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 50

FAST (TREE TRANSDUCERS)

• Trees are common input/output data structures– XML query, type-checking, etc…– Natural Language translators (from parse tree to parse

tree)– Compilers/optimizers (from parse tree to parse tree)– Tree manipulating programs: data structures algorithms,

ontologies, etc…– Augmented Reality

– http://www.rise4fun.com/Fast/tutorial/guide July 18, 2015

Page 51: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 51

TransducerModel

Z3

Transformation Analysis Does it do the right thing?

AnalysisquestionAutomata-.NET

s := iter(c in t)[b := false;] {        case (!b && c in "[\"\\]"):

       b := false;        yield('\\', c);        case (c == '\\'):

           b := !b;           yield(c); case (true):

          b := false; yield(c);

};

DSL

Code Gen

C# JavaScript C

Code Gen

OUR RECIPE FOR EACH TASK

July 18, 2015

Page 52: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

SMT'15, San Fransisco 52

Automata-.NET will be open source on GitHub under MIT license

Some references:

BEK• Fast and precise sanitizer analysis with BEK

Hooimeijer, Livshits, Molnar, Saxena, Veanes, USENIX11• Symbolic finite state transducers: algorithms and applications

Veanes, Hooimeijer, Livshits, Molnar, Bjorner, POPL12

BEX• Static analysis of string encoders and decoders

D’Antoni, Veanes, VMCAI13• Equivalence of extended symbolic finite transducers

D’Antoni, Veanes, CAV13• Data parallel string manipulating programs

Veanes, Mytkowicz, Molnar, Livshits, POPL15July 18, 2015

Page 53: STRINGS AND AUTOMATA MODULO THEORIES Margus Veanes July 18, 2015SMT'15, San Fransisco1

QUESTIONS?

Links to related online tutorials:– Bek

http://rise4fun.com/Bek/tutorial

– Bexhttp://rise4fun.com/Bex/tutorial

– Rexhttp://rise4fun.com/rex/

– Fasthttp://rise4fun.com/Fast/tutorial

SMT'15, San Fransisco 53July 18, 2015