Dynamic Integration of Virtual Predictors

Download Report

Transcript Dynamic Integration of Virtual Predictors

Dynamic Integration of
Virtual Predictors
Vagan Terziyan
University of Jyvaskyla, Finland
e-mail: [email protected]
http://www.cs.jyu.fi/ai/vagan/index.html
Discovering Knowledge from Data
-
one of
the basic abilities of an intelligent agent
Data
Knowledge
2
Basic Reference
Terziyan V., Dynamic Integration of Virtual
Predictors, In: L.I. Kuncheva, F. Steimann, C.
Haefke, M. Aladjem, V. Novak (Eds),
Proceedings of the International ICSC Congress
on Computational Intelligence: Methods and
Applications - CIMA'2001, Bangor, Wales, UK,
June 19 - 22, 2001, ICSC Academic Press,
Canada/The Netherlands, pp. 463-469.
3
Acknowledgements
Information Technology
Research Institute
(University of Jyvaskyla):
Academy of Finland
Project (1999):
Dynamic Integration of
Classification Algorithms
Customer-oriented research and
development in Information Technology
http://www.titu.jyu.fi/eindex.html
Multimeetmobile (MMM) Project
(2000-2001):
Location-Based Service System and Transaction
Management in Mobile Electronic Commerce
http://www.cs.jyu.fi/~mmm
4
Contents
The problem
Virtual Predictor
Classification Team
Team Direction
Dynamic Selection of Classification Team
Implementation for Mobile e-Commerce
Conclusion
5
The Problem: Knowledge Discovery
 Knowledge discovery in databases (KDD) is a
combination of data warehousing, decision support,
and data mining and it is an innovative new approach
to information management.
 KDD is an emerging area that considers the process
of finding previously unknown and potentially
interesting patterns and relations in large
databases*.

__________________________________________________________________________________________________________________________________
* Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge
Discovery and Data Mining, AAAI/MIT Press, 1996.
6
Classification Problem
Training set
Vector
classified
m classes, n training observations,
p object features
Given: n training pairs (xi, yi)
Classifiers
Classification
with xiRp and yi{1,…,m}
denoting class membership
Class
membership
Goal: given: new x0
select classifier for x0
predict class y0
7
The Research Problem
During the past several years, in a variety of
application domains, researchers in machine
learning, computational learning theory, pattern
recognition and statistics have tried to combine
efforts to learn how to create and combine an
ensemble of classifiers.
The primary goal of combining several classifiers is to
obtain a more accurate prediction than can be
obtained from any single classifier alone.
8
Approaches to Integrate
Multiple Classifiers
Integrating Multiple Classifiers
Combination
Selection
Decontextualization
Global
(Static)
Local
(Dynamic)
Local
Global
(“Virtual”
(Voting-Type) Classifier)
9
Inductive learning with
integration of predictors
 xt1, xt 2 ,..., xtm 
Sample Instances
Learning Environment
Predictors/Classifiers
 xr1, xr 2 ,...,xrm  yr 
P1
P2
...
Pn
yt
10
Virtual Classifier
Virtual Classifier is a group of seven cooperative agents:
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP,
TI
,
FS,
DE,
CL

 

Team Instructor s
Classification Team
TC - Team Collector
FS - Feature Selector
TM - Training Manager
DE - Distance Evaluator
TP - Team Predictor
CL - Classification Processor
TI - Team Integrator
11
Classification Team:
Feature Selector
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP,
TI
,
FS
,
DE,
CL

 

Team Instructor s
Classification Team
FS - Feature Selector
12
Feature Selection
Feature selection methods try to pick a
subset of features that are relevant to the
target concept;
Each of these methods has its strengths and
weaknesses based on data types and domain
characteristics;
The choice of a feature selection method
depends on various data set characteristics:
(i) data types, (ii) data size, and (iii) noise.
13
Classification of feature selection
methods [Dash and Liu, 1997]
Feature Selection
Methods
Complete
Exhaustive
All
Non-Exhaustive
Random
Heuristic
Forward Backward Combined InstanceBased
F/B
Selection Selection
Type I
Type II
Breadth- Branch Best Beam
&
First
First Search
Bound
MDLM
Focus
Sch193
MIFES_1
B&B
Bobr88
Ichi-Skla84a
Ichi-Skla84b
AMB&B
BFF
BS
POE1ACC
DTM
SFS
PRESET
Sege84
Quei-Gels84
Koll-Saha96
SBS
SBS-Slash
RC
PQSS
BDS
Moor94
LVF
LVW
RGSS
GA
SA
RMHC-PF1
Relief
Relief-F
14
Feature Selector:
to find the minimally
sized feature subset that is sufficient for
correct classification of the instance
Sample Instances
Sample Instances
 Χr  yr 
 Χ'r  yr , Χ' r  Χr
15
Classification Team:
Distance Evaluator
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP,
TI
,
FS,
DE
,
CL

 

Team Instructor s
Classification Team
DE - Distance Evaluator
16
Use of distance evaluation
Distance between instances is useful to
recognize nearest neighborhood of any
classified instance;
Distance between classes is useful to
define the misclassification error;
Distance between classifiers is useful to
evaluate weights of every classifier for
their further integration.
17
Well known distance functions
[ Wilson & Martinez 1997]
Euclidian:
Minkowsky:
 m

D ( x , y )    | xi  yi |r 
 i 1

1/ r
Camberra:
 ( xi  yi ) 2
D( x , y ) 
D( x , y ) 
|x  y |
 | xi  yi |
i 1
i 1
Manhattan/city block:
i
i
Quadratic:
m
D( x , y ) 
m
m
D( x , y )  ( x  y ) T  Q( x  y ) 
 | xi  yi |
i 1

Chebychev:
m
m
 m

   ( xi  yi )  q ji   ( x j  y j )
j  1 i  1

Q is a problem-specific positive definite mxm
weight matrix
D ( x , y )  max| xi  yi |
i 1
Correlation:
m
Mahalanobis:
D ( x , y )   det V 1 / m ( x  y ) T  V  1 ( x  y )
D( x , y ) 
A1  A m
V
of
is
Aj
the covariance
matrix
, and
is the vector of
values for attribute j occurring in
the training set instances 1…n.
m
1
 sum
i
i 1
 xi
yi 

 


 size x size y 
size x
is the sum of all values for
attribute i occurring in the
training set, and
is the sum
of all values in the vector x.
xi  yi
i 1
m
m
i 1
i 1
 ( xi  xi ) 2   ( yi  yi ) 2
and is the average value for
attribute i occurring in the training set.
Kendall’s Rank Correlation:
Chi-square:
D (sum
x , yi) 
 ( xi  xi )  ( yi  yi )
2
D( x , y )  1 
2

n  ( n  1)
m i 1
   sign( xi  x j )  sign( yi  y j ),
i 1 j 1
 1, x  0;

sign( x )   0, x  0;
 1, x  0.

18
Distance between Two Instances with
Heterogeneous Attributes (e.g. Profiles)
i  d ( xi , yi )
D( X , Y ) 
2
i , xiX , yiY
where:

0, if xi  yi
if i  th attributeis nominal- 

1, otherwise
d ( xi , yi )  
else : xi  yi

rangei

d (“red”, “yellow”) = 1
d (15°, 25°) = 10°/((+50°)-(-50°)) = 0.119
Distance Evaluator:
to measure
distance between instances based on their
numerical or nominal attribute values
 xi1, xi 2 ,...,xim 
 x j1, x j 2 ,...,x jm 
Distance Evaluator
dij
20
Classification Team:
Classification Processor
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP,
TI
,
FS,
DE,
CL

 

Team Instructor s
Classification Team
CL - Classification Processor
21
Classification Processor:
to predict class
for a new instance based on its selected features
and its location relatively to sample instances
 xi1, xi 2 ,...,xim 
Sample Instances
Feature
Selector
Classification
Processor
Distance
Evaluator
yi
22
Team Instructors:
Team Collector
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP,
TI
,
FS,
DE,
CL


 
Team Instructor s
Classification Team
TC - Team Collector completes
Classification Teams for training
23
Team Collector - completes classification
teams for future training
Distance Evaluation
functions
Classification
rules
Feature Selection
methods
Team Collector
FSi
DEj
CLk
24
Team Instructors:
Training Manager
Constant Team Members ElectiveTeam Members

 
 TC,
TM
,
TP,
TI
,
FS,
DE,
CL

 

Team Instructor s
Classification Team
TM - Training Manager trains all
completed teams on sample instances
25
Training Manager - trains all completed
teams on sample instances
Training Manager
Sample Instances
 xr1, xr 2 ,...,xrm  yr 
FSi1
DEj1
CLk1
FSi2
DEj2
CLk2
FSin
DEjn
Sample Metadata
 xr1, xr 2 ,...,xrm  wr1, wr 2 ,...,wrn 
CLkn
Classification Teams
26
Team Instructors:
Team Predictor
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP
,
TI
,
FS,
DE,
CL

 

Team Instructor s
Classification Team
TP - Team Predictor predicts weights for
every classification team in certain location
27
Team Predictor - predicts weights for every
classification team in certain location
Predicted weights
of classification teams
Location
 xi1, xi 2 ,...,xim 
Team Predictor:
e.g. WNN algorithm
 wi1, wi 2 ,...,win 
Sample Metadata
 xr1, xr 2 ,...,xrm  wr1, wr 2 ,...,wrn 
28
Team Predictor - predicts weights for
every classification team in certain location
Sample metadata
NN2
<w21, w22,…, w2n>
NN3 <w31, w32,…, w3n>
d2
d3
d1
NN1
<w11, w12,…, w1n>
Pi
NN4
dmax
<wi1, wi2,…, win>
<w41, w42,…, w4n>
wij= F(w1j, d1, w2j, d2, w3j, d3, dmax)
29
Weighting Neighbors Example
X2
(0;0;0)
(1;0;0)
(0;0;0)
NN2
NN3
(0;0;0)
(0;0;0)
(0;0;0)
(0;1;0)
d3
d2
d1 NN
1
(0;0;1)
(0;0;0) (0;1;0)
dmax
Pi
(0;0;0)
(1;1;0)
X1
The values of distance measure are used to derive the weight wk for
each of selected neighbours k = 1,…,l using for example a cubic
function:
w k  (1  ( d k / d max ) 3 ) 3
30
Team Prediction: Locality assumption
Each team has certain subdomains in the space
of instance attributes, where it is more reliable
than the others;
This assumption is supported by the experiences,
that classifiers usually work well not only in certain
points of the domain space, but in certain
subareas of the domain space [Quinlan, 1993];
If a team does not work well with the instances
near a new instance, then it is quite probable that
it will not work well with this new instance also. 31
Team Instructors:
Team Integrator
Constant Team Members ElectiveTeam Members

 
 TC,
TM,
TP
,
TI
,
FS,
DE,
CL

 

Team Instructor s
Classification Team
TI - Team Integrator produces classification
result for a new instance by integrating
appropriate outcomes of learned teams
32
Team integrator - produces classification result
for a new instance by integrating appropriate
outcomes of learned teams
Weights of classification teams
in the location of a new instance
 xt1, xt 2 ,..., xtm 
 wt1, wt 2 ,...,wtn 
FSi1
DEj1
CLk1
yt1
FSi2
DEj2
CLk2
yt2
FSin
DEjn
CLkn
yt1
Classification teams
Team Integrator
New instance
yt
33
Dynamic Selection of the
Team: Penalty Kick Example
x4
team selected
for the kick
x2
x1
x3
34
Simple case:
static or dynamic selection of a
classification team from two
Assume that we have two different classification teams
and they have been learned on a same sample set with n
instances.
Let the first team classifies correctly m1, and the second
one m2 sample instances respectively.
We consider two possible cases to select the best team for
further classification: a static selection case and a
dynamic selection case.
35
Static Selection
Static selection means that we try all
teams on a sample set and for further
classification select one, which achieved
the best classification accuracy among
others for the whole sample set. Thus we
select a team only once and then use it
to classify all new domain instances.
36
Dynamic Selection
Dynamic selection means that the team is
being selected for every new instance
separately depending on where this
instance is located. If it has been predicted
that certain team can better classify this
new instance than other teams, then this
team is used to classify this new instance.
In such case we say that the new instance
belongs to the “competence area” of that
classification team.
37
Theorem
The average classification accuracy in the case of
(dynamic) selection of a classification team for
every instance is expected to be not worse than
the one in the case of (static) selection for the
whole domain.
The accuracy of these two cases can be equal if
and only if :
min(m1, m2 )  k
where k is amount of same instances correctly
classified by both teams
38
“Competence areas ” of
classification teams in dynamic
selection
n instances
m1 instances
m2 instances
k instances
39
M-Commerce LBS system
http://www.cs.jyu.fi/~mmm
In the framework of the Multi Meet
Mobile (MMM) project at the
University of Jyväskylä, a LBS pilot
system, MMM Location-based
Service system (MLS), has been
developed. MLS is a general LBS
system for mobile users, offering
map and navigation across multiple
geographically distributed services
accompanied with access to
location-based information through
the map on terminal’s screen. MLS
is based on Java, XML and uses
dynamic selection of services for
customers based on their profile
and location.
40
Architecture of LBS system
Positioning Service
Geographical,
spatial data
Mobile
network
Personal Trusted Device
Location-Based
Service
Location-based data:
(1) services database
(access history);
(2) customers
database (profiles)
41
Positioning systems
Satellite
position
(Sxi, Syi, Szi)
Cellular network
based positioning
Ri
Antenne
position
(Ux, Uy, Uz)
GPS RECEIVER
Satellite-based
positioning
Measurement of
Base Station
error margin
Base Station
Base Station
42
Opening a Connection to
Location Service
Internet_____________
Mail
WWW
MMM Map Service
Select
Settings
MMM
Map Service
Login
Password
Cancel
********
********
Accept
43
Request and Receive Map
from the Location Service
MMM Map Service
Welcome to
MMM
MMM
Map Service
Map Service
Map delivered
Press OK button !
Exit
Request map
Cancel
OK
Close
Zoom
Update
44
Selecting Point of Interest
on the Map
MMM Map Service
Close
Zoom
Update
45
Receiving Information Content
Related to Point of Interest
MMM Map Service
MMM
Hotel Alba ***
Map Service
Address: Mattillaniemi A1
Tel. GSM: 0504563872
Content delivered
Press OK button !
Cancel
OK
Rooms available:
Single: 380 FIM
Double: 450 FIM
Return to map
Call
46
based services’ access
history
profile features



 xi1 , xi 2 , ......, xim1 , xim 

location features
Mobile customer
description
Ordered service
yi
47
Adaptive interface for MLS
client
Only predicted services, for the customer with known profile
and location, will be delivered from MLS and displayed at
the mobile terminal screen as clickable “points of interest”
48
Route-based personalization
[Katasonov, 2001]
Static Perspective
Dynamic Perspective
49
Conclusion
 Knowledge discovery with an ensemble of classifiers is
known to be more accurate than with any classifier alone
[e.g. Dietterich, 1997].
 If a classifier somehow consists of certain feature selection
algorithm, distance evaluation function and classification
rule, then why not to consider these parts also as
ensembles making a classifier itself more flexible?
 We expect that classification teams completed from
different feature selection, distance evaluation, and
classification methods will be more accurate than any
ensemble of known classifiers alone, and we focus our
research and implementation on this assumption.
50