Background for Rule Learning and Inductive Logic Programming

Download Report

Transcript Background for Rule Learning and Inductive Logic Programming

Predicate Calculus Syntax
• Countable set of predicate symbols, each with
specified arity  0. For example: clinical data with
multiple tables of patient information. Each such
table can be a predicate and each record (tuple) is an
atomic formula.
Table – Patient
ID
DOB
Gender
1
3-3-03
f
--
--
--
--
--
--
patient(1,3-3-03,f) – a predicate with arity = 3.
Further Notation….
•
•
•
•
•
Variables – start with upper-case letter
Predicate/atomic symbols – start with lower-case letter
Constant – start with lower-case letter
Logical connectives: ,,,,,
We will also use negation by failure (please see
Skolemization slides)
• Limiting ourselves to atomic formulas and (if -then)
rules (clauses)
• Learn Non-recursive rules – predicate in consequent
cannot appear in antecedent
• Can have recursive rules elsewhere (background
knowledge)
Examples…
1. p(a,b,c)^p(d,e,f) true if both atomic formulas are
true
2. p(a,b,c) p(d,e,f) true if first atomic formula is
false and/or second atomic formula is true
3. XYZ (q(X,Y)^q(Y,Z)  p(X,Z)) non-recursive
rule
4. If a and b are formulas then so are a, ab, ab,
ab, ab
5. If X is a variable and a is a formula then Xa and
Xa are formulas. We say that X is quantified in the
formulas Xa and Xa.
Some Notes
• Predicates of arity 0 are also called
propositions, the only atomic formulas allowed
in propositional logic
• An expression is an atomic formula
• A sentence is any formula in which all
variables are quantified
Skolemization
• Process is applied to one sentence at a time and
applied only to the entire sentence (so outermost
quantifier first). Each sentence initially has empty
vector of free variables.
• Replace X A(X) with A(X), and add X to vector of
free variables.
• Replace X A(X) with A(x(V)) where x is a new
function symbol and V is the current vector of free
variables.
• If no function symbols, don’t need second bullet, and
x is just a constant (nullary function) in third bullet
Learning …
To learn, summarizing all information (eg. clinical
data) and coming up with 1000-10,000 features
could be one approach; however, we lose
information in this process. Another option is a join,
but the frequency of subjects in database alters the
algorithm results.
Inductive logic programming (ILP) can handle these
issues by using predicate calculus to represent the
database, and by learning rules.
Inductive Logic Programming: The
Problem Specification
• Given:
– Examples: first-order atomic formulas (atoms),
each labeled positive or negative.
– Background knowledge: definite clause (if-then
rules) theory.
– Language bias: constraints on the form of
interesting new rules (clauses).
ILP Specification (Continued)
• Find:
– A hypothesis h that meets the language constraints
and that, when conjoined with B, implies (lets us
prove) all of the positive examples but none of the
negative examples.
• To handle real-world issues such as noise, we
often relax the requirements, so that h need
only entail significantly more positive
examples than negative examples.
A Common Approach
• Use a greedy covering algorithm.
– Repeat while some positive examples remain
uncovered (not entailed):
• Find a good clause (one that covers as many positive
examples as possible but no/few negatives).
• Add that clause to the current theory, and remove the
positive examples that it covers.
• ILP algorithms use this approach but vary in
their method for finding a good clause.
Scoring function
MDL – minimum description length
Score(R) = PositiveR - NegativeR -A (# atomic formulas)
FOIL – uses information gain as scoring function.
Example
Label
predicate
+
mother(john,sue)
+
mother(jane,sue)
+
mother(joe,jane)
mother(sue,jane)
mother(joe,john)
mother(fred,jane)
mother(jane,fred)
Background knowledge
parent(john, fred)
parent(john,sue)
parent(jane,fred)
parent(joel,jane)
parent(joel,jack)
parent(jane,sue)
male(john)
male(jack)
male(fred)
male(joel)
female(sue)
female(jane)
We want to learn predicate ‘mother’ with scoring function
S_MDL = S(P,N,A) = P – N - A
mother(X,Y) P N A S
3 4 1 -2 meaning S(3,4,2)=-2
Now see if we can cover all +examples combination of clauses
mother(X,Y)parent(X,Y)
PNAS
3120
mother(X,Y)female(X)
PNAS
1 2 2 -3
mother(X,Y)female(Y)
PNAS
3 2 2-1
Lattice can grow further by including substitution of variables but generalization is
limited in this case and can be misleading.
mother(X,sue)
PNAS
20 11
Building Bottom clause (refinement)
Start substituting variables in predicate to generate all ‘+
examples’ and write down atomic formulas from background
knowledge which explains the predicate.
For example: mother(X,Y)X = john, Y = sue
parent(X,Y)^ male(X)^female(Y)
parent(X,Z), Z = fred etc
Generate lattice to get these bottom clause for each ‘+
example’. Each example will provide a sub-lattice. Just search
the sub-lattice above this bottom clause (only use atoms in this
bottom clause).