A Perspective on Inductive Logic Programming
Download
Report
Transcript A Perspective on Inductive Logic Programming
Comparing between machine
learning methods for a remote
monitoring system.
Ronit Zrahia
Final Project
Tel-Aviv University
Overview
The remote monitoring system
The project database
Machine learning methods:
Decision of Association Rules
Inductive Logic Programming
Decision Tree
Applying the methods for project
database and comparing the results
Remote Monitoring System Description
Support Center has ongoing information on
customer’s equipment
Support Center can, in some situations, know
that customer is going to be in trouble
Support Center initiates a call to the customer
Specialist connects to site from remote and tries
to eliminate problem before it has influence
Remote Monitoring System Description
Product
TCP/IP
[FTP]
Gateway
AIX/NT
AIX/NT/95
Modem
TCP/IP [Mail/FTP]
Modem
Customer
Support
Server
Remote Monitoring System Technique
One of the machines on site, the Gateway, is able to
initiate a PPP connection to the support server or to ISP
All the Products on site have a TCP/IP connection to the
Gateway
Background tasks on each Product collect relevant
information
The data collected from all Products is transferred to the
Gateway via ftp
The Gateway automatically dials to the support server or
ISP, and sends the data to the subsidiary
The received data is then imported to database
Project Database
12 columns, 300 records
Each record includes failure information of
one product at a specific customer site
The columns are: record no., date, IP
address, operating system, customer ID,
product, release, product ID, category of
application, application, severity, type of
service contract
Project Goals
Discover valuable information from
database
Improve the products marketing and the
customer support of the company
Learn different learning methods, and use
them for the project database
Compare the different methods, based on
the results
The Learning Methods
Discovery of Association Rules
Inductive Logic Programming
Decision Tree
Discovery of Association
Rules - Goals
Finding relations between products which
are bought by the customers
Impacts on product marketing
Finding relations between failures in a
specific product
Impacts on customer support (failures can be
predicted and handled before influences)
Discovery of Association
Rules - Definition
A technique developed specifically for
data mining
Given
A dataset of customer transactions
A transaction is a collection of items
Find
Correlations between items as rules
Example
Supermarket baskets
Determining Interesting
Association Rules
Rules have confidence and support
IF x and y THEN z with confidence c
If x and y are in the basket, then so is z in c% of
cases
IF x and y THEN z with support s
The rule holds in s% of all transactions
Discovery of Association
Rules - Example
Transaction
12345
12346
12347
12348
Items
ABC
AC
AD
BEF
Input Parameters: confidence=50%; support=50%
If A then C: c=66.6% s=50%
If C then A: c=100% s=50%
Itemsets are Basis of
Algorithm
Transaction
12345
12346
12347
12348
Items
ABC
AC
AD
BEF
Rule A => C
s=s(A, C) = 50%
c=s(A, C)/s(A) = 66.6%
Itemset
A
B
C
A, C
Support
75%
50%
50%
50%
Algorithm Outline
Find all large itemsets
Sets of items with at least minimum support
Apriori algorithm
Generate rules from large itemsets
For ABCD and AB in large itemset the rule
AB=>CD holds if ratio s(ABCD)/s(AB) is large
enough
This ratio is the confidence of the rule
Pseudo Algorithm
(1) L1 { frequent 1 - item - sets }
(2) for (k 2; Lk 1 ; k ) do begin
(3) C k apriori_ge n( Lk 1 )
(4) for all transactions t T
(5)
subset(Ck , t)
(6) Lk { C Ck | c.count minsup }
(7) end
(8) Answer Lk
Relations Between Products
Item Set ( L )
1-3
1-9
2–3
2-6
3-6
2–3-6
Confidence ( CF )
Association Rules
18 / 24 = 0.75
13
18 / 24 = 0.75
31
21 / 24 = 0.875
19
21 / 21 = 1
91
19 / 19 = 1
23
19 / 24 = 0.79
32
17 / 19 = 0.89
26
17 / 20 = 0.85
62
20 / 24 = 0.83
36
20 / 20 = 1
63
17 / 19 = 0.89
2 3, 6
17 / 20 = 0.85
3, 6 2
17 / 24 = 0.71
3 2, 6
17 / 17 = 1
2, 6 3
17 / 20 = 0.85
6 2, 3
17 / 19 = 0.89
2, 3 6
Relations Between Failures
Item Set ( L )
4-6
5-10
Confidence ( CF )
Association Rules
14 / 16 = 0.875
46
14 / 15 = 0.93
64
15 / 18 = 0.83
5 10
15 / 15 = 1
10 5
Inductive Logic Programming
- Goals
Finding the preferred customers, based
on:
The number of products bought by the
customer
The failures types (i.e severity level) occurred
in the products
Inductive Logic Programming
- Definition
Inductive construction of first-order clausal
theories from examples and background
knowledge
The aim is to discover, from a given set of preclassified examples, a set of classification rules
with high predictive power
Examples:
IF Outlook=Sunny AND Humidity=High THEN
PlayTennis=No
Horn clause induction
Given:
P: ground facts to be entailed (positive examples);
N: ground facts not to be entailed (negative examples);
B: a set of predicate definitions (background theory);
L: the hypothesis language;
Find a predicate definition H L (hypothesis) such that
1. for every p P : B H | p (completeness)
2. for every n N : B H | n (consistency)
Inductive Logic Programming
- Example
Learning about the relationships between people in a
family circle
grandfather ( X , Y ) father( X , Z ), parent ( Z ,Y )
father(henry, jane)
B
mother( jane, john)
mother( jane, alice )
grandfather (henry, john)
E
grandfather (henry, alice )
grandfather ( john, henry )
E
grandfather (alice , john)
H parent ( X , Y ) mother( X , Y )
Algorithm Outline
A space of candidate solutions and an acceptance criterion
characterizing solutions to an ILP problem
The search space is typically structured by means of the dual
notions of generalization (induction) and specialization
(deduction)
A deductive inference rule maps a conjunction of clauses G onto
a conjunction of clauses S such that G is more general than S
An inductive inference rule maps a conjunction of clauses S onto
a conjunction of clauses G such that G is more general than S.
Pruning Principle:
When B and H don’t include positive example, then
specializations of H can be pruned from the search
When B and H include negative example, then generalizations of
H can be pruned from the search
Pseudo Algorithm
QH : Initialize
repeat
Delete H from QH
Choose the inference rules r1 ,..., rk R to be applied to H
Apply the rules r1 ,..., rk to H to yield H 1 , H 2 ,..., H n
Add H 1 ,..., H n to QH
Prune QH
until stop - criterion( QH ) satisfied
The preferred customers
If ( Total_Products_Types( Customer ) > 5 )
and ( All_Severity(Customer) < 3 ) then
Preferred_Customer
17%
Preferred Customers
Others
83%
Decision Trees - Goals
Finding the preferred customers
Finding relations between products which
are bought by the customers
Finding relations between failures in a
specific product
Compare the Decision Tree results to the
previous algorithms results.
Decision Trees - Definition
Decision tree representation:
Each internal node tests an attribute
Each branch corresponds to attribute value
Each leaf node assigns a classification
Occam’s razor: prefer the shortest hypothesis
that fits the data
Examples:
Equipment or medical diagnosis
Credit risk analysis
Algorithm outline
A the “best” decision attribute for next node
Assign A as decision attribute for node
For each value of A, create new descendant of
node
Sort training examples to leaf nodes
If training examples perfectly classified, Then
STOP, Else iterate over new leaf nodes
Pseudo algorithm
ID3( Examples , Target_att ribute , Attributes )
Create a Root node for the tree
If all Examples are in the same class C,
Return the single - node tree Root , with label c
If Attributes is empty,
Return the single - node tree Root , with label most common
value of Target_att ribute in Examples
Otherwise Begin
A the attribute from Attributes that best classifies Examples
( i.e the attribute with the highest informatio n gain )
The decision attribute for Root A
For each possible value, vi , of A,
Add a new tree branch below Root , correspond ing to the test A vi
Let Examples v be the subset of Examples that have value vi for A
i
If Examples v is empty
i
Then below this new branch add a leaf node with label most
common val ue of Target_att ribute in Examples
Else below this new branch add the subtree
ID3( Examples v , Target_att ribute , Attributes { A})
i
End
Return Root
Information Measure
Entropy measures the impurity of the sample of
training examples S : Entropy(S) p log p
c
i 1
i
2
i
p i is the probability of making a particular decision
There are c possible decisions
The entropy is the amount of information
needed to identify class of an object in S
Maximized when all p i are equal
Minimized (0) when all but one p i is 0 (the remaining
p i is 1)
Information Measure
Estimate the gain in information from a
particular partitioning of the dataset
Gain(S, A) = expected reduction in entropy due
to sorting on A
The information that is gained by partitioning S
is then:
Gain(S, A) Entropy(S)
v Values(A)
Sv
Entropy(Sv )
S
The gain criterion can then be used to select the
partition which maximizes information gain
Decision Tree - Example
Day
Outlook
Temperature
Humidity
Wind
PlayTennis
D1
sunny
hot
high
weak
No
D2
sunny
hot
high
strong
No
D3
overcast
hot
high
weak
Yes
D4
rain
mild
high
weak
Yes
D5
rain
cool
normal
weak
Yes
D6
rain
cool
normal
strong
No
D7
overcast
cool
normal
strong
Yes
D8
sunny
mild
high
weak
No
D9
sunny
cool
normal
weak
Yes
D10
rain
mild
normal
weak
Yes
D11
sunny
mild
normal
strong
Yes
D12
overcast
mild
high
strong
Yes
D13
overcast
hot
normal
weak
Yes
D14
rain
mild
high
strong
No
Decision Tree - Example
(Continue)
Which attribute is the best classifier?
high
N
[3+,4-]
E=0.985
S: [9+,5-]
E=0.940
S: [9+,5-]
E=0.940
humidity
wind
normal
[6+,1-]
P
E=0.592
Gain (S, Humidity)
= .940 - (7/14).985 - (7/14).592
= .151
Gain(S, Outlook) = 0.246
Gain(S, Temperature) = 0.029
weak
[6+,2-]
E=0.811
strong
[3+,3-]
E=1.00
Gain (S, Wind)
= .940 - (8/14).811 - (6/14)1.0
= .048
Decision Tree Example –
(Continue)
{D1, D2, …, D14}
[9+,5-]
outlook
sunny
{D1,D2,D8,D9,D11}
[2+,3-]
?
overcast
{D3,D7,D12,D13}
[4+,0-]
Yes
rain
{D4,D5,D6,D10,D14}
[3+,2-]
?
Ssunny = {D1,D2,D8,D9,D11}
Gain(Ssunny, Humidity) = .970 – (3/5)0.0 – (2/5)0.0 = .970
Gain(Ssunny, Temperature) = .970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 = .570
Gain(Ssunny, Wind) = .970 – (2/5)1.0 – (3/5).918 = .019
Decision Tree Example –
(Continue)
outlook
sunny
overcast
humidity
high
No
rain
Yes
wind
normal
strong
Yes
No
weak
Yes
Overfitting
The tree may not be generally applicable called
overfitting
How can we avoid overfitting?
Stop growing when data split not statistically
significant
Grow full tree, then post-prun
The post-pruning approach is more common
How to select “best” tree:
Measure performance over training data
Measure performance over separate validation data
set
Reduced-Error Pruning
Split data into training and validation set
Do until further pruning is harmful:
1. Evaluate impact on validation set of pruning
each possible node (plus those below it)
2. Greedily remove the one that most improves
validation set accuracy
•
Produces smallest version of most
accurate sub-tree
The Preferred Customer
Target attribute is TypeOfServiceContract
MaxSev
< 2.5
>= 2.5
NO: 7
YES: 0
NoOfProducts
< 4.5
NO: 0
YES: 3
>= 4.5
NO: 3
YES: 8
Relations Between Products
Target attribute is Product3
Product6
0
1
NO: 0
YES: 15
Product2
0
1
NO: 0
YES: 1
Product9
0
NO: 0
YES: 1
1
NO: 4
YES: 0
Relations Between Failures
Application10
Target attribute is Application5
0
1
NO: 0
YES: 11
Application8
0
1
NO: 2
YES: 2
Application2
0
NO: 5
YES: 1
1
NO: 1
YES: 0