The Space Efficiency of OSHL Swaha Miller David A. Plaisted UNC Chapel Hill How do humans prove theorems? Semantics Case analysis Sequential search through space of possible structures Focus.

Download Report

Transcript The Space Efficiency of OSHL Swaha Miller David A. Plaisted UNC Chapel Hill How do humans prove theorems? Semantics Case analysis Sequential search through space of possible structures Focus.

The Space Efficiency of
OSHL
Swaha Miller
David A. Plaisted
UNC Chapel Hill
How do humans prove theorems?
Semantics
Case analysis
Sequential search through space of possible
structures
Focus on the theorem
“Systematic methods can now
routinely solve verification problems
with thousands or tens of thousands
of variables, while local search
methods can solve hard random
3SAT problems with millions of
variables.”
(from a conference announcement)
DPLL Example
{p,r},{p,q,r},{p,r}
p=T
{T,r},{T,q,r},{T,r}
p=F
{F,r},{F,q,r},{F,r}
SIMPLIFY
SIMPLIFY
{q,r}
{r},{r}
{}
SIMPLIFY
Hyper Linking
Problem
Ph5
Ph9
Latinsq
Salt
Zebra
Input
Clauses
45
297
16
44
128
OTTER
(sec)
38606.76
>24 hrs
>24 hrs
1523.82
>24 hrs
Hyper
Linking
1.8
2266.6
56.4
28.0
866.2
Eliminating Duplication with the HyperLinking Strategy, Shie-Jue Lee and David A.
Plaisted, Journal of Automated Reasoning 9
(1992) 25-42.
Later propositional strategies
Billon’s disconnection calculus, derived from
hyper-linking
Disconnection calculus theorem prover
(DCTP), derived from Billon’s work
FDPLL
Performance of DCTP on TPTP,
2003
DCTP 1.3 first in EPS and EPR (largely
propositional)
DCTP 10.2p third in FNE (first-order, no
equality) solving same number as best
provers
DCTP 10.2p fourth in FOF and FEQ (all firstorder formulae, and formulae with equality)
DCTP 1.3 is a single strategy prover.
Strategy Selection in E
Strategy Selection
Schulz, Stephan, E-A Brainiac Theorem Prover,
Journal of AI Communications 15(2/3):111-126,
2002.
Strategy Selection
The Vampire kernel provides a fairly large number of
features for strategy selection. The most important
ones are:
Choice of the main saturation procedure : (i) OTTER
loop, with or without the Limited Resource Strategy,
(ii) DISCOUNT loop.
A variety of optional simplifications.
Parameterised reduction orderings.
A number of built-in literal selection functions and
different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter
clauses are preferred for inference selection.
Set-of-support strategy.
Strategy Selection
The automatic mode of Vampire 7.0 is derived from
extensive experimental data obtained on problems
from TPTP v2.6.0. Input problems are classified
taking into account simple syntactic properties,
such as being Horn or non-Horn, presence of
equality, etc. Additionally, we take into account the
presence of some important kinds of axioms, such
as set theory axioms, associativity and
commutativity. Every class of problems is assigned
a fixed schedule consisting of a number of kernel
strategies called one by one with different time
limits.
DCTP Strategy Selection
DCTP 1.31 has been implemented as a monolithic
system in the Bigloo dialect of the Scheme
language.
DCTP 1.31 is a single strategy prover. Individual
strategies are started by DCTP 10.21p using the
schedule based resource allocation scheme known
from the E-SETHEO system. Of course, different
schedules have been precomputed for the syntactic
problem classes. The problem classes are more or
less identical with the sub-classes of the
competition organisers.
In CASC-J2 DCTP 10.21p performed substantially
better.
Goal of OSHL
First-order logic
Clause form
Propositional efficiency
Semantics
Requires ground decidability
Structure of OSHL
Goal sensitivity if semantics chosen properly
Choose initial semantics to satisfy axioms
Use of natural semantics
For group theory problems, can specify a group
Sequential search through possible
interpretations
Thus similar to Davis and Putnam’s method
Propositional Efficiency
Constructs a semantic tree
Ordered Semantic Hyperlinking (Oshl)
Reduce first-order logic problem to propositional
problem
Imports propositional efficiency into first-order logic
The algorithm
Imposes an ordering on clauses
Progresses by generating ground instances Di of input
clauses and refining interpretations
I0
D0
I1
I2
D1
unsatisfiable
I3
D2
…
T
Semantics
Trivial semantics:
Positive: Choose I0 to falsify all atoms, first D is
all positive. Forward chaining.
Negative: Choose I0 to satisfy all atoms, first D is
all negative. Backward chaining.
Natural semantics: I0 chosen by user
Semantics Ordering
<t a well founded ordering on atoms, extended
to literals
Extend <t to interpretations as follows:
I and J agree on L if they interpret L the same
Suppose I0 is given
I <t J if I and J are not identical, A is the
minimal atom on which they disagree, and I
agrees with I0 on A
Rules of OSHL
Start with empty sequence
(C1,C2, …, Cn), D minimal ground instance of an input
clause that contradicts I, I minimal model of sequence
(C1,C2, …, Cn,D)
(C1,C2, …, Cn, D), Cn “out of order”
(C1,C2, …, Cn-1,D)
(C1,C2, …, Cn,D), max resolution possible
(C1,C2, …, Cn-1,res(Cn,D,L))
Proof if empty clause derived
╨
Propositional Example (p I0
p)
()
({-p1, -p2, -p3}) I0[-p3]
({-p1, -p2, -p3}, {-p4, -p5, -p6}) I0 [-p3,-p6]
({…}, {…}, {-p7}) I0 [-p3,-p6,-p7]
({…}, {…}, {-p7}, {p3, p7})
({…}, {-p4, -p5, -p6}, {p3})
({-p1, -p2, -p3},{p3})
({-p1, -p2 }) I0 [-p2]
U Rules
Choose clauses instances to match existing
literals. Look for a contradiction.
Basic clauses and U clauses
Basic clauses are used in three rules given
Sequence can also have U clauses on the end
U clauses have a selected literal
In basic clauses the max. lit. is selected
In U clauses other literals can be selected.
Significant performance enhancement.
UR Resolution Example
Given the sequence ({s(a), p(b) }, {t(a), q(b)})
Xb
and the clause {p(X), q(X), r(X)}
create the sequence
({s(a), p(b)}, {t(a), q(b)}, {p(b), q(b), r(b)} )
Filtering Example
Given the sequence ({s(a), p(b)}, {t(a), q(b)})
and the clause {p(X), q(X)}
Xb
create the sequence
({s(a), p(b)}, {t(a), q(b)}, {p(b), q(b)} )
Case Analysis Example
Given the sequence ({s(a), p(b)}, {t(a), q(b)})
and the clause
{q(X), r(X), s(X)}
Xb
create the sequence
({s(a), p(b)}, {t(a), q(b)}, {q(b), r(b), s(b)} )
Example Proof Using U Rules
All positive semantics
Clauses:
A1. XY, YX, X=Y
A2. ZX, XY, ZY
A3. g(X,Y)X, XY
A4. g(X,Y)Y, XY
A5. ZX, ZX  Y
A6. ZY, ZX  Y
A7. ZX  Y, ZX, ZY
T. A  B = B  A
Example Proof Using U Rules
1. {A  B = B  A} (T)
2. {A  B  B  A, B  A  A  B, A  B = B 
A} (Case Analysis, A1)
3. {g(A  B, B  A)  B  A, A  B  B  A} (UR
resolution, A4)
4. {g(A  B, B  A)  B  A, g(…)  B} (UR
resolution, A5)
5. {g(A  B, B  A)  B  A, g(…)  A} (UR
resolution, A6)
6. {g(…)  B, g(…)  A, g(…)  A  B} (UR
resolution, A7)
7. {A  B  B  A, g(…)  A  B} (Filtering, A3)
Example Proof Using U Rules
1. {A  B = B  A}
2. {A  B  B  A, B  A  A  B, A  B = B 
A} (Case Analysis)
3. {g(A  B, B  A)  B  A, A  B  B  A} (UR
resolution)
4. {g(A  B, B  A)  B  A, g(…)  B} (UR
resolution)
5. {g(A  B, B  A)  B  A, g(…)  A} (UR
resolution)
8. {g(…)  B, g(…)  A, A  B  B  A,}
(Resolution of 6. and 7.)
Example Proof Using U Rules
1. {A  B = B  A}
2. {A  B  B  A, B  A  A  B, A  B = B 
A} (Case Analysis)
3. {g(A  B, B  A)  B  A, A  B  B  A} (UR
resolution)
4. {g(A  B, B  A)  B  A, g(…)  B} (UR
resolution)
9. {g(A  B, B  A)  B  A, g(…)  B, A  B  B
 A} (Resolution of 8. and 5.)
Example Proof Using U Rules
1. {A  B = B  A}
2. {A  B  B  A, B  A  A  B, A  B = B 
A} (Case Analysis)
3. {g(A  B, B  A)  B  A, A  B  B  A} (UR
resolution)
10. {g(A  B, B  A)  B  A} (Resolution of 9. and
4.)
Example Proof Using U Rules
1. {A  B = B  A}
2. {A  B  B  A, B  A  A  B, A  B = B 
A} (Case Analysis)
11. {A  B  B  A} (Resolution of 10. and 3.)
Example Proof Using U Rules
1. {A  B = B  A}
12. {B  A  A  B, A  B = B  A} (Resolution of
11 and 2)
Now the other half of the proof
will be done. Note that there is
only one ascending sequence of
clauses constructed by OSHL and
we are only indicating part of it.
Implementation Results
Slower implementation speed of OSHL
Uniform strategy versus strategy selection
The choice of Otter
Influence of U rules on an earlier version:
None: 233 proofs in 30 seconds on TPTP
problems
Using them: 900 proofs in 30 seconds
All results for trivial semantics
Implementation Results
OSHL has no special data structures.
Implemented in OCaML
No special equality methods
Semantics was implemented but frequently
only trivial semantics was used.
Thus significant performance improvements
are possible.
Various Provers
PTTP solved 999 of 2200 tested problems.
Otter proved 1595.
leanCoP proved 745.
Source:
Jens Otten and Wolfgang Bibel.
leanCoP: Lean Connection-Based Theorem
Proving. Journal of Symbolic Computation,
Volume 36, pages 139-161. Elsevier Science, 2003.
Vampire 6.0: 3286 refutations of 7267 problems, more
solved
Total Number of Proofs
#
P
R
O
B
S
All
# Otter Proofs
All H Non-Horn
O All R
R
4417 1697
FLD
143
28
SET
604
168
# OSHL-U Proofs
All H Non-Horn
O All R
R
R
N
=
0
>
0
R
N
=
0
>
0
764 933
636
297 1027
311 716
451
265
68
21
47
2 209
114
97
0
28
17
11
68
2 166
126
40
211
0
R denotes the TPTP difficulty rating
30 second time limit on each problem with each prover
Implementation Results
Shows that a prover working entirely at the
ground level can come into the range of
performance of a respectable resolution
theorem prover.
DCTP and FDPLL probably perform better
than OSHL.
DCTP and FDPLL do not work entirely at the
ground level and do not use natural
semantics.
Search space
All
Horn
Non-Horn
R=0
R>0
Non-Horn, R>0
Otter
708
90
618
357
351
348
OSHL-U
104
39
65
78
26
26
Number of clauses generated (in 1,000s) computed on 827
problems that were proved by both provers
Ratio
OSHL-U
Otter
All
Horn
Non-Horn
0.147
0.433
0.105
R=0
R>0
0.218 0.075
Ratio of number of clauses generated
Non-Horn, R>0
0.075
Storage space
All
Otter
OSHL-U
Horn
Non-Horn
R=0
R>0
Non-Horn, R>0
423
81
342
230
193
192
91
37
55
67
25
25
Max. number of clauses stored (in 1,000s) computed on 827
problems that were proved by both provers
Ratio
OSHL-U
Otter
All
Horn
Non-Horn
0.215
0.457
0.161
R=0
R>0
0.291 0.130
Ratio of number of clauses stored
Non-Horn, R>0
0.130
Implementation Results
In a given number of inferences OSHL finds
more proofs than Otter for non Horn
problems