Memory-Based Reasoning

Download Report

Transcript Memory-Based Reasoning

Memory-Based Reasoning
이재현
PASTA Lab.
POSTECH
PASTA IE POSTECH
1. Introduction
Memory-Based Reasoning(MBR) is

Identifying similar cases from experience

Applying the information from these cases to the problem at hand.

MBR finds neighbors similar to a new record and uses the neighbors for classification
and prediction.
It cares about the existence of two operations

Distance function ; assigns a distance between any two records

Combination function ; combines the results from the neighbors to arrive at an answer.
Applications of MBR span many areas;

Fraud detection

Customer response prediction

Medical treatments

Classifying responses
PASTA IE POSTECH
2. How does MBR work?
What is the most likely movie last seen by a respondent based on the source of the record
and the age of the individual?
MBR has two distinct phases

The learning phase generates the historical database

The prediction phase applies MBR to new cases
PASTA IE POSTECH
2.1. The three main issues in solving a problem with MBR
Choosing the appropriate set of historical records

The historical records, also known as the training set, is a subset of available records.

The training set needs to provide good coverage of the records so that the nearest
neighbors to an unknown record are useful for predictive purposes.
Representing the historical records

The performance of MBR in making predictions depends on how the training set is
represented in the computer.
Determining the distance function, Combination function, and number of neighbors

The distance function, combination function, and number of neighbors are the key
ingredients in determining how good MBR is at producing results.
PASTA IE POSTECH
3. Case study ; Classifying News Stories
What are the codes?

News provider assigns codes to news stories in order to describe the content of the
stories. These codes help users search for stories of interest.
Applying MBR

Choosing the training set
The training set consisted of 49,652 news stories

Choosing the Distance function
In this case, a distance function already existed, based on a notion called relevance
feedback that measures the similarity of two documents based on the words they contain.
PASTA IE POSTECH
3. Case study ; Classifying News Stories
Relevance Feedback function
d classification ( A, B) 1 
score( A, B)
score( A, A)
Choosing the combination function
The combination function used a weighted summation technique.

Choosing the number of neighbors
The investigation varied the number of nearest neighbors between 1 and 11 inclusive.

PASTA IE POSTECH
3. Case study ; Classifying News Stories
The result

Recall and precision are two measurements that are useful when measuring how well
a set of codes get assigned.
Recall ; “How many of the correct codes did MBR assign to the story?”
Precision ; “How many of the codes assigned by MBR were correct?”
Codes by MBR
Correct codes
Recall
Precision
A,B
A,B,C,D
50%
100%
A,B,C,D,E,F,G,H
A,B,C,D
100%
50%
Category
Recall
Precision
Government
85%
87%
Industry
91%
85%
Market Sector
93%
91%
Product
69%
89%
Region
86%
64%
Subject
72%
53%
PASTA IE POSTECH
4. Measuring Distance
Three most common distance functions

Absolute value of the difference ; |A-B|

Square of the difference ; (A-B)2

Normalized absolute value |A-B|/(maximum difference)
Example
Recnum
Gender
Age
Salary
1
Female
27
$ 19,000
2
Male
51
$ 64,000
3
Male
52
$105,000
4
Female
33
$ 55,000
5
Male
45
$ 48,000
Gender
Dgender(female,female) = 0, Dgender(male,female) = 1
Dgender(female,male) = 1, Dgender(male,male) = 0

PASTA IE POSTECH
4. Measuring Distance

Age
27
51
52
33
45
27
0.00
0.96
1.00
0.24
0.72
51
0.96
0.00
0.04
0.72
0.24
52
1.00
0.04
0.00
0.76
0.28
33
0.24
0.72
0.76
0.00
0.48
45
0.72
0.24
0.28
0.48
0.00
Merge into a single record distance function.
Summation ; dsum(A,B) = dgender(A,B) + dage(A,B) + dsalary(A,B)
Normalized summation ; dnorm(A,B) = dsum(A,B)/max(dsum)
Euclidean distance ; deuclid(A,B) = sqrt(dgender(A,B)2 + dage(A,B)2 + dsalaty(A,B)2)

PASTA IE POSTECH
4. Measuring Distance
Set of nearest neighbors for three distance functions
dsum
dnorm
deuclid
1
1,4,5,2,3
1,4,5,2,3
1,4,5,2,3
2
2,5,3,4,1
2,5,3,4,1
2,5,3,4,1
3
3,2,5,4,1
3,2,5,4,1
3,2,5,4,1
4
4,1,5,2,3
4,1,5,2,3
4,1,5,2,3
5
5,2,3,4,1
5,2,3,4,1
5,2,3,4,1
Insert new customer

Gender ; Female, Age ; 45, Salary ; $100,000
Set of nearest neighbor for new customer
1
2
3
4
5
neighbors
dsum
1.662
1.659
1.338
1.003
1.640
4,3,5,2,1
dnorm
0.554
0.553
0.446
0.334
0.547
4,3,5,2,1
deuclid
0.781
1.052
1.251
0.494
1.000
4,1,5,2,3
PASTA IE POSTECH
5. The combination function ; Asking the neighbors for the answer
The basic approach ; Democracy

The basic combination function used for MBR is to have the K nearest neighbors vote
on the answer-”democracy” in data mining.

Customers with Attrition History
Recnum
Gender
Age
Salary
Attriter
1
Female
27
$ 19,000
No
2
Male
51
$ 64,000
Yes
3
Male
52
$105,000
Yes
4
Female
33
$ 55,000
Yes
5
Male
45
$ 48,000
No
new
Female
45
$100,000
?
PASTA IE POSTECH
5. The combination function ; Asking the neighbors for the answer


Using MBR to determining if the new customer will attrite
Neighbors
Neighbor
Attrition
K=1
K=2
K=3
K=4
K=5
dsum
4,3,5,2,1
Y,Y,N,Y,N
yes
yes
yes
yes
yes
deuclid
4,1,5,2,3
Y,N,N,Y,Y
yes
?
no
?
yes
Attrition prediction with confidence
K=1
K=2
K=3
K=4
K=5
dsum
Yes, 100%
Yes, 100%
Yes, 67%
Yes, 75%
Yes, 60%
deuclid
Yes, 100%
Yes, 50%
Yes, 67%
Yes, 50%
Yes, 60%
PASTA IE POSTECH
5. The combination function ; Asking the neighbors for the answer
Weighted voting

Weighted voting is similar to voting except that the neighbors are not all created equal

Closer neighbors have stronger votes than neighbors farther away do.

The size of the vote is inversely proportional to the distance from the new record.

To prevent problems when the distance might be 0, it is common to add 1 to the
distance before taking the inverse.

Attrition prediction with weighted voting
K=1
K=2
K=3
K=4
K=5
dnorm
0.749 to 0
1.441 to 0
1.441 to 0.647
2.085 to 0.647
2.085 to 1.290
deuclid
0.669 to 0
0.669 to 0.562
0.669 to 1.062
1.157 to 1.062
1.601 to 1.062

Confidence with weighted voting
K=1
K=2
K=3
K=4
K=5
dnorm
Yes, 100%
Yes, 100%
Yes, 69%
Yes, 76%
Yes, 62%
deuclid
Yes, 100%
Yes, 54%
Yes, 61%
Yes, 52%
Yes, 60%
PASTA IE POSTECH
6. Conclusion
Strengths of Memory-Based Reasoning

It produces results that are readily understandable.

It is applicable to arbitrary data types, even non-relational data.

It works efficiently on almost any number of fields.

Maintaining the training set requires a minimal amount of effort.
Weaknesses of Memory-Based Reasoning

It is computationally expensive when doing classification and prediction.

It requires a large amount of storage for the training set.

Results can be dependent on the choice of distance function, combination function,
and number of neighbors
PASTA IE POSTECH