Transcript slides

Minimization of Symbolic
Automata
Presented By:
Loris D’Antoni
Joint work with:
Margus Veanes
01/24/14, POPL14
What is automata minimization?
2
Deterministic Finite Automaton
A = (Q,q0,F,δ,Σ)
a
b
a
q
q0
b
3
Automata Minimization
Minimization = find and collapse equivalent states
s
Non final
p
distinguishable
s
q
Final
4
a
a
a
2
1
5
b
b
a,b
0
b
a
a
3
4
0
1,3
a,b
6
b
b
a,b
a,b
2,4
a,b
5,6
6
a,b
A simple Application:
Random Password generation
Given constraints:
• Length is k: "^.{5,20}$"
• Contains 2 capital letters: "[A-Z].*[A-Z]"
• Contains a digit: "\d“
Generate random instances with uniform
distribution that match all the above conditions.
6
Key idea
^.{5,20}$
[A-Z].*[A-Z]
∩
\d
7
Problems
Big automaton
Big alphabet
216 characters
in UTF16
Minimization
Symbolic
Automata
8
Symbolic Finite Automaton (SFA)
Input sort: in
this case int
A = (Q,q0,F,δ,σ)
λx. x mod 2=1
λx. x mod 2=0
λx. x mod 2=0
q
q0
λx. x mod 2=1
Separate theory
for the input
alphabet
SMT SOLVER
9
Symbolic Finite Automata (SFA)
Execution
Example
λx. x mod 2=1
λx. x mod 2 =0
λx. x mod 2=0
p
q
λx. x mod 2=1
1
p
2
p
5
q
3
p
p
p is final  accept the input
10
Advantages of Symbolic Automata
• Alphabet is represented symbolically
– UTF16 abstracted using BDDs
– Integer using predicates over integers
• Succinctness
– at most n2 transitions
– One transition captures many symbols
• BUT: do DFA algorithms generalize to SFAs?
11
An example: SFA intersection
A1:
A2:
p1
p2
1
2
REQUIREMENTS:
Input theory must be a
Boolean algebra, and
decidable
q1
A1A2:
p1
p2
12
X
q1
q2
q2
delete when
12 unsatisfiable
12
Moore’s algorithm
p
a
s
p’
distinguishable
distinguishable
q
a
q’
s
n2 iterations over k symbols
O(kn2)
13
Symbolic Moore’s algorithm
p
φ
p’
φ ∧ ψ satisfiable
distinguishable
q
ψ
Initially D = F x (Q\F) U (Q\F) x F
for each (p’,q’) in D, (p,q) not in D
let φ,ψ guards of δ(p,p’), δ(q,q’)
if(isSat(φ ∧ ψ))
add (p,q) to D
distinguishable
q’
m transitions
O(m2 f(k))
k = size of biggest
predicate in SFA
14
Sometimes Moore is Less
From: Rani Abdellatif
Sent: Tuesday, November 13, 2012 12:55 PM
To: Margus Veanes
Cc: Patrick McFalls
Subject: RE: Password generation help
Margus,
I tested the perf of the sample you sent me with password lengths from 8 to 15 chars and here are the
results:
Chars
8
9
10
11
12
13
14
15
Time ms
171
406
1061
2044
3698
6271
11591
18362
18 sec for 15
characters!
the culprit
should scale
up to 128
characters!
This time is the time it takes to run sfa.Determinize(rex.Solver).Minimize(rex.Solver). The time required
to create the SFA or generate samples once it’s created is quite small in comparison.
We are expecting 15 characters to be on the shorter end of password we’ll generate, going up to 128
characters.
15
Hopcroft’s algorithm: intuition
Q\F
F
16
Hopcroft’s algorithm: intuition
S
A
R
a
a
a
17
Hopcroft’s algorithm: intuition
R
b
P1
P2
b
P3
P4
Keep partitioning with respect to W
for every input symbol
18
Hopcroft’s algorithm: intuition
Let’s assume I already split according to R
R
P1
P2
19
Hopcroft’s algorithm: intuition
Let’s assume I already split according to R
Q
R
P1
P2
Do I need to consider both P1 and for P2 future
splitting?
20
Hopcroft’s algorithm: intuition
Let’s assume I already split according to R
a
Q
R
P1
a
a
P2
Do I need to consider both P1 and for P2 future
splitting?
21
Hopcroft’s algorithm: intuition
Let’s assume I already split according to R
Q
a
a
a
R
P1
P2
Do I need to consider both P1 and for P2 future
splitting?
22
Hopcroft’s algorithm: intuition
Let’s assume I already split according to R
Q
a
R
a
P1
a
P2
Do I need to consider both P1 and for P2 future
splitting?
NO I ONLY NEED
ONE!
23
Hopcroft’s algorithm
P := {F, Q\F}
W := {if |F|< |Q\F| then F else Q\F}
while W != { }
log n iterations
R:=pickFrom(W)
O(kn log n)
foreach a in Σ
S := δ-1(R,a)
while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠ {}
P,W := split(P, P∩S , P\S)
return partitioned DFA
24
Hopcroft’s algorithm example
P2
P1
a
a
a
2
1
5
b
b
a,b
0
b
4
b
PARTITION: {P1, P2}
TO ANALYZE: {P2}
a,b
a
a
3
R
6
b
Hopcroft’s algorithm example
P11
P12
P2
a
a
a
2
1
5
b
b
a,b
0
b
4
b
PARTITION: {P11, P12, P2}
TO ANALYZE: {P2, P12}
a,b
a
a
3
R
6
b
Hopcroft’s algorithm example
P11
P12
a
a
a
P2
R
2
1
5
b
b
a,b
0
b
a
a
3
4
b
PARTITION: {P11, P12, P2}
TO ANALYZE: {P12}
a,b
6
b
Hopcroft’s algorithm example
a
a
a
2
1
5
b
b
a,b
0
b
a
a
3
4
0
1,3
a,b
6
b
b
a,b
a,b
2,4
a,b
5,6
6
a,b
Symbolic Hopcroft’s algorithm
P := {F, Q\F}
W := {if |F|< |Q\F| then F else Q\F}
while W != { }
R:=pickFrom(W)
Alphabet might
foreach a in Σ
not be finite
S := δ-1(R,a)
while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠ {}
P,W := split(P, P∩S , P\S)
return partitioned DFA
29
Finitize the alphabet
φ1
φ‘1
φ‘2
φ‘5
φ‘4
φ'3
φ3
φ2
φ‘8
φ‘6
φ‘7
Predicates:
{x>5, x<10, x=3}
Minterms:
{x=3, x≤5, 5<x<10, x≥10}
30
Symbolic Hopcroft’s algorithm
P := {F, Q\F}
W := {if |F|< |Q\F| then F else Q\F}
while W ≠ {}
R:=pickFrom(W)
foreach φ in Minterms(A)
S := δ-1(R, φ)
log n iterations
O(2mnlog n+2mf(mk))
while ∃ T ∈ P. T∩S ≠ {} ∧ T \S ≠ {}We need something
better
P,W := split(P, P∩S , P\S)
return partitioned DFA
31
New Algorithm: Intuition
R
A
Φ
P1 p
P2
q
ψ
Φ\ψ
What if
Φ ≠ ψ?
32
Example 1/2
Q\F
x<0
1
R
-2<x<5
2
-5<x<3
F
5
true
0
true
x≥0
3
-2<x<5
4
false ≠ -5<x<3
-5<x<3
6
Example 1/2
R
x<0
1
-2<x<5
2
-5<x<3
5
true
0
true
x≥0
3
-2<x<5
4
-5<x<3
6
Example 2/2
x≥2
x≥5
x<2
5
p
R
q6
true
x<5
Both p and q go to r, but…
x≥2  x≥5 ?? NO
Then p is distinguishable from q
r
Example 2/2
x≥2
x≥5
x<2
5
p
R
q6
true
x<5
Both p and q go to r, but…
x≥2  x≥5 ?? NO
Then p is distinguishable from q
r
New Algorithm
P := {F, Q\F}
W := {if |F|< |Q\F| then F else Q\F}
while W ≠ { }
R := pickFrom(W);
S := δ-1(R, true);
while ∃ A ∈ P. A∩S ≠ {} ∧ ∃p1,p2. δ-1(p1) ≠ δ-1(p2)
P,W := split(P, P∩S , P\S, witness(δ-1(p1) ≠ δ-1(p2))
return partitioned DFA
log n iterations
O(n2log n f(nk))
37
Experiments
1. Randomly generated DFAs
SFAs using BDDs (sort = bitvec 7 bits)
2. SFAs generated from regexes
SFAs using BDDs (sort = bitvec 16 bits)
3. A corner case of Minterm generation
SFAs using BDDs (sort = bitvec 20 bits)
4. Randomly generated SFAs over string x int
SFAs over using Z3 (sort = string x int)
5. Monadic second order logic to DFA transformation
SFAs using BDDs (sort = bitvec 40 bits)
1) Randomly generated DFAs
5 billion DFAs: 10 to 100 states, 2 to 50 symbols
From [Almeida, Moreira, Reis, TR05]
2) SFAs generated from regexes
(regexplib.com)
3000 regexes over UTF16 alphabet (216 elems)
From [regexplib.com]
Both axis
logscale
More States =>
Moore Worse
3) A corner case of Minterm
generation
This SFA has 2k minterms!!
Logscale
brics.automata.dk
Uses intervals instead
of BDDs
4) Randomly generated SFAs over
string x int
Randomly generated 10 SFAs over string x int and
minimized all the intersections, complement,
difference, and union of such SFAs
Random generation causes many
predicate overlaps  minterms
5) MSO logic to DFA transformation
[IJFCS05]
State of the art
for MSO
Conclusion
Results
• Adapted classical minimization algorithm to the
symbolic setting
• New minimization algorithm for symbolic
automata (faster than previous ones)
Future work
• Extend to tree automata
• Extend classical automata problems to SFAs
– Edit distance?
– Regex for symbolic automata?
44