4, Section 4.7

Download Report

Transcript 4, Section 4.7

Data Mining – Algorithms:
Instance-Based Learning
Chapter 4, Section 4.7
Instance Based Representation
• Concept not really represented (except via
examples)
• Training examples are merely stored (kind of
like “rote learning”)
• Answers are given by finding the most similar
training example(s) to test instance at testing
time
• Has been called “lazy learning” – no work until
an answer is needed
Instance Based – Finding Most
Similar Example
• Nearest Neighbor – each new instance is compared
to all other instances, with a “distance” calculated for
each attribute for each instance
• Class of nearest neighbor instance is used as the
prediction <see next slide and come back>
• Combination of distances – city block or euclidean
(crow flies)
– Higher powers increase the influence of large differences
Nearest Neighbor
•x
x
•x
•y
•x
x
•y
•x
•x
•x
•z
•z
•z
•z
x
•z
•z
•z
•z
•y
•y
•z
T
•y
•y
•y
•y
•y
Example Distance Metrics
Attributes
A
B
C
Sum
Test
Train1
Train 2
Train 3
City Block 1
City Block 2
5
6
7
5
1
2
5
4
3
5
1
2
5
9
7
10
4
2
6
6
City Block 3 0
0
5
5
Euclidean 1
1
1
16
18
Euclidean 2
4
4
4
12
Euclidean 3
0
0
25
25
More Similarity/Distance
• Normalization is necessary – as discussed in Chapter 2
• Nominal Attributes frequently considered all or
nothing - a complete match or no match at all
– Match  similarity = highest possible value, or distance = 0
– Not Match  similarity = 0; or distance = highest possible
value
• Nominals that are actually ordered ought to be treated
differently (e.g. partial matches)
Missing Values
• Frequently treated as maximum distance to ANY other
value
• For numerics, the maximum distance depends on what
value comparing to
– E.g. if values range from 0-1 and comparing a missing value
to .9, maximal possible distance is .9
– If comparing a missing value to .3, maximal possible
distance is .7
– If comparing missing value to .5, maximal possible distance
is .5
Dealing with Noise
• Noise is something that makes a task harder (e.g.
real noise makes listening/hearing harder)
(noise on data transmission makes communication
more difficult)
(noise in learning is incorrect values for attributes,
including class, or could be un-representative
instance)
• In instance-based learning, an approach to dealing
with noise is to use greater number of neighbors, so
are not led astray by an incorrect or weird example
K-nearest neighbor
• Can combine “opinions” by having the K
nearest neighbors vote for the prediction to
make
• Or, more sophisticated weighted k-vote
– An instance’s vote is weighted by how close it is to
the test instance – closest neighbor is weighted more
than further neighbor
– WEKA allows you to choose weight (distance
weighting) as 1 – dist or 1 / dist
Effect of Distance Weighting Scheme
Dist
.1
.2
.3
.4
.5
.6
.7
.8
.9
Vote 1 – dist .9
.8
.7
.6
.5
.4
.3
.2
.1
Vote 1 / dist
5
3.3 2.5
2
1.7 1.4 1.2 1.1
•1
10
– dist is smoother
•1 / dist gives a lot more credit to instances that
are very close
Let’s try WEKA
• Experiment with K and weighting on Basketball
(discretize),
K-nearest, Numeric Prediction
• Average prediction of k-nearest
• OR weighted average of k-nearest based on
distance
Weighted Similarity / Distance
• Distance/Similarity function should weight
different attributes differently – key task is
determining those weights
• Next slide sketches general wrapper approach
(see chapt 6, p195-6)
Learning weights
• Divide training data into training and validation (a
sort of pre-test) data
• Until time to stop
– Loop through validation data
• Predict, and see success / or not
• Compare validation instance to training instances used to
predict
• Attributes that lead to correct prediction have weights increased
• Attributes that lead to incorrect prediction have weights
decreased
• Re-normalize weights to avoid chance of overflow
Learning re: Instances
• May not need to save all instances
– Very normal instances may not all need be be saved
– One strategy – classify during training, and only keep
instances that are misclassified
• Problem – will accumulate noisy or idiosyncratic examples
– More sophisticated – keep records for how often examples
lead to correct and incorrect predictions and discard those
that have poor performance (details on Aha’s method p 1945)
– An in between strategy – weight instances based on their
previous success or failure (I’m experimenting with)
– Some approaches actually do some generalization
Class Exercise
• Let’s run WEKA IBk on japanbank
• K=3
End Section 4.7