Transcript slides
Equivalence of Extended Symbolic Finite Transducers Presented By: Loris D’Antoni Joint work with: Margus Veanes Outline 1. Symbolic Automata and Transducers 2. Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3. A friendlier restriction with decidable equivalence 2 Motivations Automata and Transducers are great!! Used in many applications (NLP, XML, program analysis, regex matching…) Can only handle finite alphabets Do not scale when the alphabet is very big (UTF16 has 216 elements) 3 Symbolic Finite Automata (SFA) [POPL12] λx. x mod 2=1 λx. x mod 2=0 λx. x mod 2=0 p Initial state Final states q λx. x mod 2=1 Set of states Symbolic transition function: labeled with a predicate 4 Symbolic Finite Automata (SFA) [POPL12] Execution Example λx. x mod 2=1 λx. x mod 2 =0 λx. x mod 2=0 p q λx. x mod 2=1 1 p 2 p 5 q 3 p p p is final accept the input 5 Symbolic Finite Transducers (SFT) [POPL12] p λx.x mod 2 = 0 / [λx.x+1, λx.x+2] Input guard = predicate (here int bool) q Output = sequence of functions from input theory to output theory (here int int) 6 Symbolic Finite Transducers (SFT) [POPL12] x mod 2 =1/[x-1] x mod 2 =0/[] x mod 2 =0/[x, x] p q x mod 2 =1/[x-1] Input tape 1 p Output tape 2 p 0 5 q 2 3 p 2 p 4 2 7 Closure and Decidability Properties All closure properties and decidability results from classical automata theory still hold • Alphabet theory is required to be – A Boolean algebra (closed under Boolean operations) – Decidable (we can check for satisfiability) • Example: SFA intersection x>5 q1 q2 x<10 p1 q1 p1 x>5 ∧ x<10 q2 p2 p2 8 Applications • Analysis of .NET regular expressions (use the theory of bit-vectors for input alphabet) • Automatic password generation • Analysis of string sanitizers (BEK) A limitation of Symbolic Transducers BASE64 encoder Text content M a n Bytes 77 97 110 Bit Pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0 Index 19 22 5 46 Base64 Encoded T W F u 3 Bytes 4 Base64 characters Reading one input at a time will cause a blowup in the number of states! 10 Outline 1. Symbolic Automata and Transducers 2. Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3. A friendlier restriction with decidable equivalence 11 Extended Symbolic Finite Automaton x1>0 ∧ (x2<x3) x1 x2 x3 1 7 8 3 p p … p 3 Reads sequences of 3 consecutive symbols [x1,x2,x3] Extended Symbolic Finite Transducers x1≤FF ∧ x2≤FF ∧ x3≤FF / [x1>>2, ((x1&3)<<4)|(x2>>4), ((x2&0xF)<<2)|(x3>>6), x3&0x3F] M p 3 a n p Each output symbol can be a function of all the 3 symbols … p T W F u … 12 A common misconception All the results in classical automata theory trivially extend to the symbolic setting… A common misconception While for the previous models (SFAs, SFTs) most results extend to the symbolic setting… In the finite case they do not add expressiveness In finite alphabet setting reading multiple input symbols at a time does not matter ab/[cde] 0 0 1 a/[] 1 b/[cde] 2 15 ESFAs are more expressive than SFAs This is not true for the symbolic case x1>x2 0 1 ? 16 Emptiness of ESFAs Intersection: UNDECIDABLE • Given two ESFAs A and B, is there an input accepted by both A and B? • The problem is undecidable: – Given a two counter machine M we construct two ESFAs A and B such that A ∩ B is empty iff M does not halt on any input 17 Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1. Inc(a) 2. Dec(a) 3. Inc(b) 4. if(a=0) goto 3 else goto 5 5. Dec(b) 6. Halt Encode M’s run as following sequence a 0 1 0 0 0 0 … b 0 0 0 1 1 2 … PC 1 2 3 4 3 4 … 1. Inc(a) 18 Proof that Emptiness of ESFA Intersection is undecidable (1/2) Machine M 1. Inc(a) 2. Dec(a) 3. Inc(b) 4. if(a=0) goto 3, goto 5 5. Dec(b) 6. Halt Intersection is empty if the two counter machine doesn’t halt a 0 1 0 0 0 0 … b 0 0 0 1 1 2 … PC 1 2 3 4 3 4 … We are only checking half of the configurations x1.pc=1 ∧ x2.pc=2 ∧ x2.a=x1.a+1 ∧ x1.b=x2.b V ……… V x1.pc=4 ∧ x2.pc=3 ∧ x1.a=x2.a ∧ x1.a=0 ∧ x1.b=x2.b V x1.pc=4 ∧ x2.pc=5 ∧ x1.a=x2.a ∧ ¬x1.a=0 ∧ x1.b=x2.b V …… x1.pc=6 2 0 1 x1.pc=6 1 1 19 Other Negative Results Universality of ESFA is undecidable ESFA equivalence is undecidable ESFAs are not closed under intersection ESFAs are not closed under complement Nondeterministic ESFAs are strictly more expressive than deterministic ESFAs ESFTs equivalence is undecidable ESFTs are not closed under composition Symbolic automata are not so trivial after all 20 Some Positive Results Emptiness (reachability) is decidable for both ESFAs and ESFTs Nondeterministic ESFAs are closed under union • Not quite satisfactory, and very limited… – Can we do better? 21 Outline 1. Symbolic Automata and Transducers 2. Extended Symbolic Automata and Transducers – Some negative results – Some positive results 3. A friendlier restriction with decidable equivalence 22 A Simpler Model: Cartesian ESFAs and ESFTs Most negative results use binary guards in predicate guards p x1=x2+1 q We can restrict the model to avoid this issue: Cartesian ESFAs and Cartesian ESFTs only allow guards to be conjunctions of unary predicates p x1>5 ; x2=1 / [x1+x2, x2, x1] q It can be decided if an ESFT (ESFA) is Cartesian 23 Cartesian ESFA = SFA Cartesian ESFAs are now equivalent (but more succinct) to SFAs x1>5 ∧ x2=1 0 0 1 x>5 1 x=1 2 24 Cartesian ESFTs > SFTs Cartesian ESFTs are strictly more expressive than SFTs!! 0 x1>5 ∧ x2=1 / [x1+x2, x2, x1] 1 ? 25 Equivalence of Cartesian ESFTs • Given two Cartesian ESFTs A and B, A is equivalent to B if – A and B have the same domain • The domain of a Cartesian ESFT is a Cartesian ESFA (just drop outputs) • Cartesian ESFAs are equivalent to SFAs • Equivalence of SFAs is decidable [POPL12] – For every input in the intersection of the domains, A and B produce the same output (one-equality) • …. 26 One-Equality of Cartesian ESFTs x1<5, x2>2 / [x1+x2] q0 x1<10, x2>0, x3=1 / [x1, x2, x3] q1 2 p0 3 p1 Align inputs x1<5 ∧ x1<10 / [x1+x2], [x1,x2,x3] q0 p0 ?? ∧ x3=1 / ??, [ ] x2>2 ∧ x2>0 / [ ], [ ] qt1 pt1 q1 pt1 ? p1 Align outputs ?? ∧ x3=1 / ??, [x3] x1<5 ∧ x1<10 / [x1+x2], [x1, x2] q0 p0 q1 pt1 ? p1 27 Result Summary A theoretical analysis of ESFAs and ESFTs A new model: Cartesian ESFAs and ESFTs (can model BASE 64) Clear line for decidability of equivalence: ESFTs vs Cartesian ESFTs This and other algorithms at http://rise4fun.com/Bex/ (still in Beta) 28 Applications • Analysis of string encoders: • Proved correctness of BASE64, UTF8, etc. • Succinct representation of regex pattern matching • Fast code generation Future Work • Analysis of composition of ESFTs – Partially discussed in [VMCAI13] • Use ESFAs to compute range of symbolic transducers – Range of SFT is not SFA but maybe is an ESFA? – Use range for synthesizing program inversion 30 Thank you Loris D’Antoni [email protected] Questions? 31 Symbolic Finite Automaton (SFA) [POPL12] • Classical acceptor modulo a rich alphabet – Alphabet is an effective Boolean Algebra • Core Idea: represent labels with predicates – Separation of concerns: finite graph / algebra of labels Concrete transitions: p a b q Symbolic transition: p … z bitvector predicate x. 6116 ≤ x ≤ 7A16 q 32 Symbolic Finite Transducers Example • Utf8 encoder – Input: valid utf16 encoded string – Output: equivalent utf8 encoded string For example utf8encode(“\uFF28\uFF29”) = “\xEF\xBC\xA8\xEF\xBC\xA9” Equiv. classical transducer has 216 transitions 5 states & 11 transitions Dagstuhl Seminar 13021 33 Complete Rutf8 34 One-Equality of Cartesian ESFTs 1. We incrementally build a product ESFT using a depth-first search x1<5, x2>2 / [x1+1, x2] q0 x1<10, x2>0, x3=1 / [x1, x2, x3] q1 2 p0 2 p1 Build early product x1<5 ∧ x1<10 / [x1+1, x2], [x1,x2,x3] q0 p0 Found inequivalence qt1 pt1 q1 pt1 Try aligning x1<5 ∧ x1<10 / [x1+1], [x1] q0 p0 ?? ∧ x3=1 / _, _ x2>2 ∧ x2>0 / _,_ ?? ∧ x3=1 / ??, [x3] x2>2 ∧ x2>0 / [x2], [x2] qt1 pt1 Continue with every possible state ? p1 q1 pt1 ? p1 35 One-Equality of Cartesian ESFTs Case with predicates that can’t be completely shifted x1<5, x2>2 / [x1+x2] q0 q1 2 x1<5 ∧ x1<10 / [x1+x2], [x1] q0 p0 x1<10, x2>0, x3=1 / [x1, x2, x3] p0 2 x2>2 ∧ x2>0 / [ ], [x2] qt1 pt1 ?? ∧ x3=1 / ??, [x3] q1 pt1 ? p1 ?? ∧ x3=1 / ??, [x3] x1<5 ∧ x1<10 / [x1+x2], [x1, x2] q0 p0 p1 q1 pt1 ? p1 36 One-Equality of Cartesian ESFTs Case with predicates that can’t be shifted at all x1<5, x2>2 / [x1+x2] q0 q1 2 x1<5 ∧ x1<10 / [x1+x2], [x1] q0 p0 x1<10, x2>0, x3=1 / [x1, x2+x3] p0 2 ?? ∧ x3=1 / ??, [] x2>2 ∧ x2>0 / [ ], [x2+x3] qt1 pt1 p1 q1 pt1 ? p1 Alignment not possible! Easy to generate witness for inequivalence in this case 37