A Perspective on Inductive Logic Programming

Download Report

Transcript A Perspective on Inductive Logic Programming

Comparing between machine
learning methods for a remote
monitoring system.
Ronit Zrahia
Final Project
Tel-Aviv University
Overview
The remote monitoring system
The project database
Machine learning methods:
Decision of Association Rules
Inductive Logic Programming
Decision Tree
Applying the methods for project
database and comparing the results
Remote Monitoring System Description
Support Center has ongoing information on
customer’s equipment
Support Center can, in some situations, know
that customer is going to be in trouble
Support Center initiates a call to the customer
Specialist connects to site from remote and tries
to eliminate problem before it has influence
Remote Monitoring System Description
Product
TCP/IP
[FTP]
Gateway
AIX/NT
AIX/NT/95
Modem
TCP/IP [Mail/FTP]
Modem
Customer
Support
Server
Remote Monitoring System Technique
 One of the machines on site, the Gateway, is able to
initiate a PPP connection to the support server or to ISP
 All the Products on site have a TCP/IP connection to the
Gateway
 Background tasks on each Product collect relevant
information
 The data collected from all Products is transferred to the
Gateway via ftp
 The Gateway automatically dials to the support server or
ISP, and sends the data to the subsidiary
 The received data is then imported to database
Project Database
12 columns, 300 records
Each record includes failure information of
one product at a specific customer site
The columns are: record no., date, IP
address, operating system, customer ID,
product, release, product ID, category of
application, application, severity, type of
service contract
Project Goals
Discover valuable information from
database
Improve the products marketing and the
customer support of the company
Learn different learning methods, and use
them for the project database
Compare the different methods, based on
the results
The Learning Methods
Discovery of Association Rules
Inductive Logic Programming
Decision Tree
Discovery of Association
Rules - Goals
Finding relations between products which
are bought by the customers
Impacts on product marketing
Finding relations between failures in a
specific product
Impacts on customer support (failures can be
predicted and handled before influences)
Discovery of Association
Rules - Definition
A technique developed specifically for
data mining
Given
A dataset of customer transactions
A transaction is a collection of items
Find
Correlations between items as rules
Example
Supermarket baskets
Determining Interesting
Association Rules
Rules have confidence and support
IF x and y THEN z with confidence c
If x and y are in the basket, then so is z in c% of
cases
IF x and y THEN z with support s
The rule holds in s% of all transactions
Discovery of Association
Rules - Example
Transaction
12345
12346
12347
12348
Items
ABC
AC
AD
BEF
 Input Parameters: confidence=50%; support=50%
 If A then C: c=66.6% s=50%
 If C then A: c=100% s=50%
Itemsets are Basis of
Algorithm
Transaction
12345
12346
12347
12348
Items
ABC
AC
AD
BEF
 Rule A => C
 s=s(A, C) = 50%
 c=s(A, C)/s(A) = 66.6%
Itemset
A
B
C
A, C
Support
75%
50%
50%
50%
Algorithm Outline
Find all large itemsets
Sets of items with at least minimum support
Apriori algorithm
Generate rules from large itemsets
For ABCD and AB in large itemset the rule
AB=>CD holds if ratio s(ABCD)/s(AB) is large
enough
This ratio is the confidence of the rule
Pseudo Algorithm
(1) L1  { frequent 1 - item - sets }
(2) for (k  2; Lk 1   ; k  ) do begin
(3) C k  apriori_ge n( Lk 1 )
(4) for all transactions t  T
(5)
subset(Ck , t)
(6) Lk  { C  Ck | c.count  minsup }
(7) end
(8) Answer   Lk
Relations Between Products
Item Set ( L )
1-3
1-9
2–3
2-6
3-6
2–3-6
Confidence ( CF )
Association Rules
18 / 24 = 0.75
13
18 / 24 = 0.75
31
21 / 24 = 0.875
19
21 / 21 = 1
91
19 / 19 = 1
23
19 / 24 = 0.79
32
17 / 19 = 0.89
26
17 / 20 = 0.85
62
20 / 24 = 0.83
36
20 / 20 = 1
63
17 / 19 = 0.89
2  3, 6
17 / 20 = 0.85
3, 6  2
17 / 24 = 0.71
3  2, 6
17 / 17 = 1
2, 6  3
17 / 20 = 0.85
6  2, 3
17 / 19 = 0.89
2, 3  6
Relations Between Failures
Item Set ( L )
4-6
5-10
Confidence ( CF )
Association Rules
14 / 16 = 0.875
46
14 / 15 = 0.93
64
15 / 18 = 0.83
5  10
15 / 15 = 1
10  5
Inductive Logic Programming
- Goals
Finding the preferred customers, based
on:
The number of products bought by the
customer
The failures types (i.e severity level) occurred
in the products
Inductive Logic Programming
- Definition
Inductive construction of first-order clausal
theories from examples and background
knowledge
The aim is to discover, from a given set of preclassified examples, a set of classification rules
with high predictive power
Examples:
IF Outlook=Sunny AND Humidity=High THEN
PlayTennis=No
Horn clause induction
Given:
P: ground facts to be entailed (positive examples);
N: ground facts not to be entailed (negative examples);
B: a set of predicate definitions (background theory);
L: the hypothesis language;
Find a predicate definition H  L (hypothesis) such that
1. for every p  P : B  H |  p (completeness)
2. for every n  N : B  H |  n (consistency)
Inductive Logic Programming
- Example
 Learning about the relationships between people in a
family circle
 grandfather ( X , Y )  father( X , Z ), parent ( Z ,Y )
 father(henry, jane) 

B
mother( jane, john) 
mother( jane, alice ) 
 grandfather (henry, john) 
E 
 grandfather (henry, alice ) 

 grandfather ( john, henry )
E 
 grandfather (alice , john)

H  parent ( X , Y )  mother( X , Y )
Algorithm Outline
 A space of candidate solutions and an acceptance criterion
characterizing solutions to an ILP problem
 The search space is typically structured by means of the dual
notions of generalization (induction) and specialization
(deduction)
A deductive inference rule maps a conjunction of clauses G onto
a conjunction of clauses S such that G is more general than S
An inductive inference rule maps a conjunction of clauses S onto
a conjunction of clauses G such that G is more general than S.
 Pruning Principle:
When B and H don’t include positive example, then
specializations of H can be pruned from the search
When B and H include negative example, then generalizations of
H can be pruned from the search
Pseudo Algorithm
QH : Initialize
repeat
Delete H from QH
Choose the inference rules r1 ,..., rk  R to be applied to H
Apply the rules r1 ,..., rk to H to yield H 1 , H 2 ,..., H n
Add H 1 ,..., H n to QH
Prune QH
until stop - criterion( QH ) satisfied
The preferred customers
If ( Total_Products_Types( Customer ) > 5 )
and ( All_Severity(Customer) < 3 ) then
Preferred_Customer
17%
Preferred Customers
Others
83%
Decision Trees - Goals
Finding the preferred customers
Finding relations between products which
are bought by the customers
Finding relations between failures in a
specific product
Compare the Decision Tree results to the
previous algorithms results.
Decision Trees - Definition
Decision tree representation:
Each internal node tests an attribute
Each branch corresponds to attribute value
Each leaf node assigns a classification
Occam’s razor: prefer the shortest hypothesis
that fits the data
Examples:
Equipment or medical diagnosis
Credit risk analysis
Algorithm outline
 A  the “best” decision attribute for next node
Assign A as decision attribute for node
For each value of A, create new descendant of
node
Sort training examples to leaf nodes
If training examples perfectly classified, Then
STOP, Else iterate over new leaf nodes
Pseudo algorithm
ID3( Examples , Target_att ribute , Attributes )
Create a Root node for the tree
If all Examples are in the same class C,
Return the single - node tree Root , with label  c
If Attributes is empty,
Return the single - node tree Root , with label  most common
value of Target_att ribute in Examples
Otherwise Begin
A  the attribute from Attributes that best classifies Examples
( i.e the attribute with the highest informatio n gain )
The decision attribute for Root  A
For each possible value, vi , of A,
Add a new tree branch below Root , correspond ing to the test A  vi
Let Examples v be the subset of Examples that have value vi for A
i
If Examples v is empty
i
Then below this new branch add a leaf node with label  most
common val ue of Target_att ribute in Examples
Else below this new branch add the subtree
ID3( Examples v , Target_att ribute , Attributes  { A})
i
End
Return Root
Information Measure
Entropy measures the impurity of the sample of
training examples S : Entropy(S)    p log p
c
i 1
i
2
i
 p i is the probability of making a particular decision
There are c possible decisions
The entropy is the amount of information
needed to identify class of an object in S
Maximized when all p i are equal
Minimized (0) when all but one p i is 0 (the remaining
p i is 1)
Information Measure
Estimate the gain in information from a
particular partitioning of the dataset
 Gain(S, A) = expected reduction in entropy due
to sorting on A
The information that is gained by partitioning S
is then:
Gain(S, A)  Entropy(S) 

v Values(A)
Sv
Entropy(Sv )
S
The gain criterion can then be used to select the
partition which maximizes information gain
Decision Tree - Example
Day
Outlook
Temperature
Humidity
Wind
PlayTennis
D1
sunny
hot
high
weak
No
D2
sunny
hot
high
strong
No
D3
overcast
hot
high
weak
Yes
D4
rain
mild
high
weak
Yes
D5
rain
cool
normal
weak
Yes
D6
rain
cool
normal
strong
No
D7
overcast
cool
normal
strong
Yes
D8
sunny
mild
high
weak
No
D9
sunny
cool
normal
weak
Yes
D10
rain
mild
normal
weak
Yes
D11
sunny
mild
normal
strong
Yes
D12
overcast
mild
high
strong
Yes
D13
overcast
hot
normal
weak
Yes
D14
rain
mild
high
strong
No
Decision Tree - Example
(Continue)
Which attribute is the best classifier?
high
N
[3+,4-]
E=0.985
S: [9+,5-]
E=0.940
S: [9+,5-]
E=0.940
humidity
wind
normal
[6+,1-]
P
E=0.592
Gain (S, Humidity)
= .940 - (7/14).985 - (7/14).592
= .151
Gain(S, Outlook) = 0.246
Gain(S, Temperature) = 0.029
weak
[6+,2-]
E=0.811
strong
[3+,3-]
E=1.00
Gain (S, Wind)
= .940 - (8/14).811 - (6/14)1.0
= .048
Decision Tree Example –
(Continue)
{D1, D2, …, D14}
[9+,5-]
outlook
sunny
{D1,D2,D8,D9,D11}
[2+,3-]
?
overcast
{D3,D7,D12,D13}
[4+,0-]
Yes
rain
{D4,D5,D6,D10,D14}
[3+,2-]
?
Ssunny = {D1,D2,D8,D9,D11}
Gain(Ssunny, Humidity) = .970 – (3/5)0.0 – (2/5)0.0 = .970
Gain(Ssunny, Temperature) = .970 – (2/5)0.0 – (2/5)1.0 – (1/5)0.0 = .570
Gain(Ssunny, Wind) = .970 – (2/5)1.0 – (3/5).918 = .019
Decision Tree Example –
(Continue)
outlook
sunny
overcast
humidity
high
No
rain
Yes
wind
normal
strong
Yes
No
weak
Yes
Overfitting
The tree may not be generally applicable called
overfitting
How can we avoid overfitting?
Stop growing when data split not statistically
significant
Grow full tree, then post-prun
The post-pruning approach is more common
How to select “best” tree:
Measure performance over training data
Measure performance over separate validation data
set
Reduced-Error Pruning
 Split data into training and validation set
 Do until further pruning is harmful:
1. Evaluate impact on validation set of pruning
each possible node (plus those below it)
2. Greedily remove the one that most improves
validation set accuracy
•
Produces smallest version of most
accurate sub-tree
The Preferred Customer
Target attribute is TypeOfServiceContract
MaxSev
< 2.5
>= 2.5
NO: 7
YES: 0
NoOfProducts
< 4.5
NO: 0
YES: 3
>= 4.5
NO: 3
YES: 8
Relations Between Products
Target attribute is Product3
Product6
0
1
NO: 0
YES: 15
Product2
0
1
NO: 0
YES: 1
Product9
0
NO: 0
YES: 1
1
NO: 4
YES: 0
Relations Between Failures
Application10
Target attribute is Application5
0
1
NO: 0
YES: 11
Application8
0
1
NO: 2
YES: 2
Application2
0
NO: 5
YES: 1
1
NO: 1
YES: 0