Transcript Document

Bayesian Networks
October 9, 2008
Sung-Bae Cho
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
Probabilities
• Probability distribution P(X|x)
– X is a random variable
• Discrete random variable: Finite set of possible outcomes
X  x1 , x2 , x3 ,..., xn  P( xi )  0  P( xi )  1
n
i 1
X binary: P( x)  P( x ) 1
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
X1
X2
X3
X4
• Continuous random variable: Probability distribution (density function)
over continuous values
X  0,10
10
 P( x)dx  1
P( x)  0
P (x)
7
P(5  x  7)   P( x)dx
0
– x is background state of information
5
5
7
x
2
Rules of Probability
• Product Rule
P( X , Y )  P( X | Y ) P(Y )  P(Y | X ) P( X )
• Marginalization
n
P(Y )   P(Y , xi )
i 1
X binary: P(Y )  P(Y , x)  P(Y , x )
• Bayes Rule
P( H , E )  P( H | E ) P( E )  P( E | H ) P( H )
P( H | E ) 
P( E | H ) P( H )
P( E )
3
Rules of Probability
• Chain rule of probability
p( x)  p( x1 ,..., xn )
 p( x1 ) p( x2 | x1 ) p( x3 | x1 , x2 )...

n
 p( x | x ,..., x
i
1
i 1 )
i 1
4
The Joint Distribution
• Recipe for making a joint distribution of M
variables
– Make a truth table listing all combinations of
values of your variables
– For each combination of values, say how
probable it is
5
Using the Joint
• Once you have the joint distribution, you can ask for the probability of any
logical expression involving your attribute
P( E) 
 P(row)
rows matchingE
6
Using the Joint
P( Poor Male)  0.4654
P( E) 
 P(row)
rows matchingE
P( Poor)  0.7604
P( E) 
 P(row)
rows matchingE
7
Inference with Joint
P( E1 , E 2 )
P( E1 | E2 ) 

P ( E2 )
 P(row)
 P(row)
rows matching E1 and E2
rows matching E2
8
Inference with the Joint
P(Male | Poor)  0.4654 / 0.7604  0.612
P( E1 | E2 ) 
P( E1 , E 2 )

P ( E2 )
 P(row)
 P(row)
rows matching E1 and E2
rows matching E2
9
Joint Distributions
• Good news
– Once you have a joint
distribution, you can ask
important questions about stuff
that involves a lot of uncertainty
• Bad news
– Impossible to create for more
than about ten attributes
because there are so many
numbers needed when you build
them
10
Bayesian Networks
• In general
– P(X1, … Xn) needs at least 2n – 1numbers to specify the joint probability
– Exponential storage and inference
• Overcome the problem of exponential size by exploiting conditional
independence
REAL
Joint Probability Distribution
(2n-1 numbers)
Conditional
Independence
(Domain knowledge or
Derived from data)
Graphical Representation of
Joint Probability Distribution
(Bayesian Network)
11
Bayesian Networks?
P(A)
P(S)
Visit to Asia
P(T|A)
Tuberculosis
BN  (G, Θ)
Smoking
P(L|S)
P(B|S)
Lung Cancer
P(C|T,L)
P(D|T,L,B)
Chest X-ray
P(A, S, T, L, B, C, D)
CPD:
Bronchitis
Dyspnoea
=
T
0
0
0
0
L
0
0
1
1
B
0
1
0
1
D=0 D=1
0.1 0.9
0.7 0.3
0.8 0.2
0.9 0.1
...
P(A) P(S) P(T|A) P(L|S) P(B|S)
P(C|T,L) P(D|T,L,B)
Conditional Independencies
Efficient Representation
[Lauritzen & Spiegelhalter, 95]
12
Bayesian Networks?
• Structured, graphical representation of probabilistic relationships between
several random variables
• Explicit representation of conditional independencies
• Missing arcs encode conditional independence
• Efficient representation of joint pdf (probabilistic distribution function)
• Allows arbitrary queries to be answered
• P(lung cancer=yes | smoking=no, dyspnoea=yes)=?
13
Bayesian Networks?
• Also called belief networks, and (directed acyclic) graphical models
• Bayesian network
– Directed acyclic graph
• Nodes are variables (discrete or continuous)
• Arcs indicate dependence between variables
– Conditional Probabilities (local distributions)
14
Bayesian Networks
S  no, light , heavy
Smoking
P( S=no)
0.80
P( S=light)
0.15
P( S=heavy)
0.05
Smoking=
Cancer
C none, benign, malignant
no
light
heavy
P( C=none)
0.96
0.88
0.60
P( C=benign)
0.03
0.08
0.25
P( C=malig)
0.01
0.04
0.15
15
Product Rule
• P(C,S) = P(C|S) P(S)
S C
no
light
heavy
none
0.768
0.132
0.035
benign
0.024
0.012
0.010
malignant
0.008
0.006
0.005
S C none benign malig total
0.768
0.024 0.008
.80
no
0.132
0.012 0.006
.15
light
0.035
0.010 0.005
.05
heavy
total 0.935 0.046 0.019
P(Smoke)
P(Cancer)
16
Bayes Rule
P( S | C ) 
P(C | S ) P( S ) P(C , S )

P(C )
P(C )
S C none
benign
0.768/.935 0.024/.046
no
0.132/.935 0.012/.046
light
0.030/.935 0.015/.046
heavy
Cancer=
malig
0.008/.019
0.006/.019
0.005/.019
none
benign
malignant
P( S=no)
0.821
0.522
0.421
P( S=light)
0.141
0.261
0.316
P( S=heavy)
0.037
0.217
0.263
17
Missing Arcs Represent Conditional Independence
Battery
Engine
Turns Over
Start
Start and Battery are independent, given Engine Turns Over
p( s | b, t )  p(s | t )
p(b, t , s)  p(b) p(t | b) p( s | t )
General product (chain) rule for Bayesian networks
p( x1 , x2 ,...,xn ) 
n
 p( x | parents( x ))
t
t
t 1
18
Bayesian Network
P( A, G, E , S , C , L, SC ) 
Age
Gender
P( A)  P(G) 
Exposure
to Toxics
Smoking
P( E | A)  P( S | A, G) 
P(C | E , S ) 
Cancer
Serum
Calcium
Lung
Tumor
P( SC | C )  P( L | C )
19
Bayesian Network Knowledge Engineering
• Objective: Construct a model to perform a defined task
• Participants: Collaboration between domain expert and BN modeling expert
• Process: iterate until “done”
– Define task objective
– Construct model
– Evaluate model
20
The KE Process
• What are the variables?
• What are their values/states?
• What is the graph structure?
• What is the local model structure?
• What are the parameters (Probabilities)?
• What are the preferences (utilities)?
21
The Knowledge Acquisition Task
• Variables:
– collectively exhaustive, mutually exclusive values
– clarity test: value should be knowable in principle
• Structure
– if data available, can be learned
– constructed by hand (using “expert” knowledge)
– variable ordering matters: causal knowledge usually simplifies
• Probabilities
– can be learned from data
– second decimal usually does not matter; relative probabilities
– sensitivity analysis
22
What are the Variables?
• “Focus” or “query” variables
– Variables of interest
• “Evidence” or “Observation” variables
– What sources of evidence are available?
• “Context” variables
– Sensing conditions, background causal conditions
• Start with query variables and spread out to related variables
23
What are the Values/States?
• Variables/values must be exclusive and exhaustive
– Naive modelers sometimes create separate (often Boolean) variables for
different states of the same variable
• Types of variables
– Binary (2-valued, including Boolean)
– Qualitative
– Numeric discrete
– Numeric continuous
• Dealing with infinite and continuous domains
– Some BN software requires that continuous variables be discretized
– Discretization should be based on differences in effect on related
variables (i.e., not just be even sized chunks)
24
What is a variable?
• Collectively exhaustive, mutually exclusive values
x1  x2  x3  x4
 ( xi  x j ) i  j
•
Error Occurred
No Error
Values versus Probabilities
Risk of Smoking
Smoking
25
What is the Graph Structure?
• Goals in specifying graph structure
– Minimize probability elicitation: fewer nodes, fewer arcs, smaller state
spaces
– Maximize fidelity of model
• Sometimes requires more nodes, arcs, and states
• Tradeoff between more accurate model and cost of additional
modeling
• Too much detail can decrease accuracy
26
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
Inference
• We now have compact representations of probability distributions: Bayesian
Networks
• Network describes a unique probability distribution P
• How do we answer queries about P?
• We use inference as a name for the process of computing answers to such
queries
28
Inference - The Good & Bad News
• We can do inference
• We can compute any conditional probability
• P( Some variables | Some other variable values )
P( E1 | E2 ) 
P( E1 , E 2 )

P ( E2 )
 P(row)
 P(row)
joint entries matching E1 and E2
joint entries matching E2
The sad, bad news
Conditional probabilities by enumerating all matching entries in the joint are expensive:
Exponential in the number of variables
Sadder and worse news
General querying of Bayesian networks is NP-complete
Hardness does not mean we cannot solve inference
It implies that we cannot find a general procedure that works efficiently for all networks
For particular families of networks, we can have provably efficient procedures
29
Example
H=T
0.7
M=T
0.3
H=F
0.3
M=F
0.7
M
H
H=T
G
M=T
M=F
M=T
M=F
G=T
0.8
0.6
0.7
0.3
G=F
0.2
0.4
0.3
0.7
S
J
G=T
G=F
J=T
0.8
0.6
J=F
0.2
0.4
P( H  T | S  T ) 
H=F
G=T
G=F
S=T
0.6
0.7
S=F
0.4
0.3
23 JP
Computation
 P(G, H  T , J , M , S  T )
 P(G, H , J , M , S  T )
P( H  T , S  T ) G , J , M

P( S  T )
G, H , J ,M ,S
24 JP
computation30
Queries: Likelihood
• There are many types of queries we might ask.
• Most of these involve evidence
– Evidence e is an assignment of values to a set E variables in the domain
– Without loss of generality E = { Xk+1, …, Xn }
• Simplest query: compute probability of evidence
P(e ) 
 P( x ,, x , e )
1
x1
k
xk
• This is often referred to as computing the likelihood of the evidence
31
Queries
• Often we are interested in the conditional probability of a variable given the
evidence
P (X , e )
P (X | e ) 
P (e )
• This is the a posteriori belief in X, given evidence e
• A related task is computing the term P(X, e)
– i.e., the likelihood of e and X = x for values of X
– we can recover the a posteriori belief by
P (X  x | e ) 
P (X  x , e )
 P (X  x , e )
x
32
A Posteriori Belief
This query is useful in many cases:
• Prediction: what is the probability of an outcome given the starting condition
– Target is a descendent of the evidence
• Diagnosis: what is the probability of disease/fault given symptoms
– Target is an ancestor of the evidence
• As we shall see, the direction between variables does not restrict the
directions of the queries
– Probabilistic inference can combine evidence form all parts of the network
33
Approaches to Inference
• Exact inference
– Inference in Simple Chains
– Variable elimination
– Clustering / join tree algorithms
• Approximate inference
– Stochastic simulation / sampling methods
– Markov chain Monte Carlo methods
– Mean field theory
34
Inference in Simple Chains
X1
X2
• How do we compute P(X2)?
P (x2 )  P (x1, x2 )  P (x1 )P (x2 | x1 )
x1
X1
X2
x1
X3
• How do we compute P(X3)?
P (x3 )  P (x2, x3 )  P (x2 )P (x3 | x2 )
x2
x2
• we already know how to compute P(X2)...
P (x2 )  P (x1, x2 )  P (x1 )P (x2 | x1 )
x1
x1
35
Inference in Simple Chains
X1
X2
Xn
X3
How do we compute P(Xn)?
• Compute P(X1), P(X2), P(X3), …
• We compute each term by using the previous one
P (xi 1 )  P (xi )P (xi 1 | xi )
xi
• Complexity:
– Each step costs
O(| Val( X i ) |  | Val( X i  1) | ) operations
• Compare to naïve evaluation, that requires summing over joint values of n-1
variables
P( X n ) 
 P( X , X
1
2 ,...,X n )
X 1 , X 2 ,..., X n1
36
Elimination in Chains
A
B
C
E
D
P (e )  P (a , b, c ,d , e )
d
c
b
a
P (e )   P (a , b, c ,d , e )
d
c
b
a
d
c
b
a
  P (a )P (b | a )P (c | b )P (d | c )P (e | d )
P (e )   P (a )P (b | a )P (c | b )P (d | c )P (e | d )
d
c
b
d
c
b
a
  P (c | b )P (d | c )P (e | d ) P (a )P (b | a )
a
37
Elimination in Chains
X
A
B
C
E
D
P (e )   P (c | b )P (d | c )P (e | d ) P (a )P (b | a )
d
c
b
d
c
b
a
  P (c | b )P (d | c )P (e | d ) p (b )
X
A
X
B
C
D
E
P (e )   P (c | b )P (d | c )P (e | d ) p (b )
d
c
d
c
d
c
b
  P (d | c )P (e | d ) P (c | b ) p (b )
b
  P (d | c )P (e | d ) p (c )
38
Variable Elimination
General idea:
• Write query in the form
P (Xn , e )  P (xi | pai )
xk
x3 x2
i
• Iteratively
– Move all irrelevant terms outside of innermost sum
– Perform innermost sum, getting a new term
– Insert the new term into the product
39
Stochastic Simulation
• Suppose you are given values for some subset of the variables, G, and want
to infer values for unknown variables, U
• Randomly generate a very large number of instantiations from the BN
– Generate instantiations for all variables – start at root variables and work
your way “forward”
• Only keep those instantiations that are consistent with the values for G
• Use the frequency of values for U to get estimated probabilities
• Accuracy of the results depends on the size of the sample (asymptotically
approaches exact results)
40
Markov Chain Monte Carlo Methods
• So called because
– Markov chain – each instance generated in the sample is dependent on
the previous instance
– Monte Carlo – statistical sampling method
• Perform a random walk through variable assignment space, collecting
statistics as you go
– Start with a random instantiation, consistent with evidence variables
– At each step, for some non-evidence variable, randomly sample its value,
consistent with the other current assignments
• Given enough samples, MCMC gives an accurate estimate of the true
distribution of values
41
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
Why Learning
knowledge-based
(expert systems)
-Answer Wizard, Office 95, 97, & 2000
-Troubleshooters, Windows 98 & 2000
-Causal discovery
-Data visualization
-Concise model of data
-Prediction
data-based
43
Why Learning?
• Knowledge acquisition is bottleneck
– Knowledge acquisition is an expensive process
– Often we don’t have an expert
• Data is cheap
– Amount of available information growing rapidly
– Learning allows us to construct models from raw data
• Conditional independencies & graphical language capture structure of many
real-world distributions
• Graph structure provides much insight into domain
– Allows “knowledge discovery”
• Learned model can be used for many tasks
• Supports all the features of probabilistic learning
– Model selection criteria
– Dealing with missing data & hidden variables
44
Why Struggle for Accurate Structure?
Earthquake
Alarm Set
Burglary
Sound
Adding an arc
Earthquake
Alarm Set
Missing an arc
Burglary
Sound
• Increases the number of parameters to
be fitted
• Wrong assumptions about causality and
domain structure
Earthquake
Alarm Set
Burglary
Sound
• Cannot be compensated by accurate
fitting of parameters
• Also misses causality and domain
structure
45
Learning Bayesian Networks from Data
Bayesian
networks
data
X1
X2
X3
True
1
0.7
False
5
-1.6
3
5.9
true
2
6.3
…
X2
X1
X3
X4
X5
X6
…
False
Bayesian
network
learner
+
Prior/expert information
X7
X8
X9
46
Learning Bayesian Networks
• Known Structure, Complete Data
– Network structure is specified
• Inducer needs to estimate
parameters
– Data does not contain missing
values
• Known Structure, Incomplete Data
– Network structure is specified
– Data contains missing values
• Need to consider
assignments to missing
values
• Unknown Structure, Complete Data
– Network structure is not
specified
• Inducer needs to select
arcs & estimate parameters
– Data does not contain missing
values
• Unknown Structure, Incomplete Data
– Network structure is not
specified
– Data contains missing values
• Need to consider
assignments to missing
values
47
Two Types of Methods for Learning BNs
• Constraint based
– Finds a Bayesian network structure whose implied independence
constraints “match” those found in the data
• Scoring methods (Bayesian, MDL, MML)
– Find the Bayesian network structure that can represent distributions that
“match” the data (i.e. could have generated the data)
• Practical considerations
– The number of possible BN structures is super exponential in the number
of variables.
– How do we find the best graph(s)?
48
Approaches to Learning Structure
• Constraint based
– Perform tests of conditional independence
– Search for a network that is consistent with the observed dependencies
and independencies
• Pros & Cons
– Intuitive, follows closely the construction of BNs
– Separates structure learning from the form of the independence tests
– Sensitive to errors in individual tests
– Computationally hard
49
Approaches to Learning Structure
• Score based
– Define a score that evaluates how well the (in)dependencies in a structure
match the observations
– Search for a structure that maximizes the score
• Pros & Cons
– Statistically motivated
– Can make compromises
– Takes the structure of conditional probabilities into account
– Computationally hard
50
Score-based Learning
• Define scoring function that evaluates how well a structure matches the data
• Search for a structure that maximizes the score
E, B, A
<Y, Y, Y>
<Y, Y, Y>
<N, N, Y>
<N, Y, Y>
.
.
<N,Y,Y>
E
B
E
E
A
A
A
B
B
51
Structure Search as Optimization
• Input:
– Training data
– Scoring function
– Set of possible structures
• Output
– A network that maximizes the score
• Key computational property: Decomposability
score(G) 
score( X
in G)
52
Tree Structured Networks
• Trees
– At most one parent per variable
• Why trees?
– Elegant math
• We can solve the optimization problem
– Sparse parameterization
• Avoid overfitting
53
Beyond Trees
• When we consider more complex network, the problem is not as easy
– Suppose we allow at most two parents per node
– A greedy algorithm is no longer guaranteed to find the optimal network
– In fact, no efficient algorithm exists
54
Model Search
• Finding the BN structure with the highest score among those structures with
at most k parents is NP hard for k>1 (Chickering, 1995)
• Heuristic methods
– Greedy
– Greedy with restarts
– MCMC methods
initialize
structure
score
all possible
single changes
any
changes
better?
perform
best
change
yes
no
return
saved structure
55
Algorithm B
56
Scoring Functions: MDL
• Minimum Description Length (MDL)
• Learning  data compression
<9.7
<0.2
<1.3
<??
0.6 8 14 18>
1.3 5 ?? ??>
2.8 ?? 0 1 >
5.6 0 10 ??>
……………….
log N
MDL ( BN | D )   log P( D | , G ) 
||
2
DL(Data|model)
DL(Model)
• Other: MDL = -BIC (Bayesian Information Criterion)
• Bayesian score (BDe) - asymptotically equivalent to MDL
57
Typical Operations
S
C
E
S
C
D
E
D
S
C
S
C
E
E
D
D
58
Exploiting Decomposability in Local Search
S
S
C
C
E
E
D
D
S
C
S
C
E
E
D
D
• Caching: To update the score of after a local change,
we only need to re-score the families that
were changed in the last move
59
Greedy Hill-Climbing
• Simplest heuristic local search
– Start with a given network
• empty network
• best tree
• a random network
– At each iteration
• Evaluate all possible changes
• Apply change that leads to best improvement in score
• Reiterate
– Stop when no modification improves score
• Each step requires evaluating approximately n new changes
60
Greedy Hill-Climbing: Possible Pitfalls
• Greedy Hill-Climbing can get struck in:
– Local Maxima:
• All one-edge changes reduce the score
– Plateaus:
• Some one-edge changes leave the score unchanged
• Happens because equivalent networks received the same score and
are neighbors in the search space
• Both occur during structure search
• Standard heuristics can escape both
– Random restarts
– TABU search
61
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
What are BNs useful for?
Cause
• Prediction: P(symptom|cause)=?
• Diagnosis: P(cause|symptom)=?
• Classification:
Predictive
Inference
max P(class|data)
Effect
class
Cause
• Decision-making (given a cost function)
• Data mining: induce best model from data
Medicine
Effect
Bioinformatics
Speech
recognition
Stock market
Diagnostic
Reasoning
Text
Classification
Computer
troubleshooting
63
Why use BNs?
• Explicit management of uncertainty
• Modularity (modular specification of a joint distribution) implies maintainability
• Better, flexible and robust decision making – MEU (Maximization of Expected
Utility), VOI (Value of Information)
• Can be used to answer arbitrary queries - multiple fault problems (General
purpose “inference” algorithm)
• Easy to incorporate prior knowledge
• Easy to understand
64
Example from Medical Diagnostics
Visit to Asia
Smoking
Patient Information
Tuberculosis
Lung Cancer
Bronchitis
Medical Difficulties
Tuberculosis
or Cancer
XRay Result
Dyspnea
Diagnostic Tests
• Network represents a knowledge structure that models the relationship between medical
difficulties, their causes and effects, patient information and diagnostic tests
65
Example from Medical Diagnostics
Tuber
Lung Can
Tub or Can
Visit to Asia
Present
Present
True
Present
Absent
True
Absent
Present
True
Absent
Absent
False
Tuberculosis
Patient Information
Lung Cancer
Tuberculosis
or Cancer
XRay Result
Smoking
Bronchitis
Dyspnea
Medical Difficulties
Tub or Can
Bronchitis
Present
Absent
True
Present
0.90
0.l0
True
Absent
0.70
0.30
False
Present
0.80
0.20
False
Absent
0.10
0.90
Dyspnea
Diagnostic Tests
• Relationship knowledge is modeled by deterministic functions, logic and conditional
probability distributions
66
Example from Medical Diagnostics
V isit To Asia
Visit
1.00
N o Visit 99.0
Tuberculosis
Present 1.04
A bsent 99.0
Smoking
Smoker
50.0
N onSmoker 50.0
Lung Cancer
Present 5.50
A bsent 94.5
Bronchitis
Present 45.0
A bsent 55.0
Tuberculosis or Cancer
True
6.48
False
93.5
XRay Result
A bnormal 11.0
N ormal
89.0
D yspnea
Present 43.6
A bsent 56.4
• Propagation algorithm processes relationship information to provide an
unconditional or marginal probability distribution for each node
• The unconditional or marginal probability distribution is frequently called the belief
function of that node
67
Example from Medical Diagnostics
V isit To Asia
Visit
100
N o Visit
0
Tuberculosis
Present 5.00
A bsent 95.0
Smoking
Smoker
50.0
N onSmoker 50.0
Lung Cancer
Present 5.50
A bsent 94.5
Bronchitis
Present 45.0
A bsent 55.0
Tuberculosis or Cancer
True
10.2
False
89.8
XRay Result
A bnormal 14.5
N ormal
85.5
D yspnea
Present 45.0
A bsent 55.0
• As a finding is entered, the propagation algorithm updates the beliefs attached to each
relevant node in the network
• Interviewing the patient produces the information that “Visit to Asia” is “Visit”
• This finding propagates through the network and the belief functions of several nodes are
updated
68
Example from Medical Diagnostics
V isit To Asia
Visit
100
N o Visit
0
Tuberculosis
Present 5.00
A bsent 95.0
Smoking
Smoker
100
N onSmoker
0
Lung Cancer
Present 10.0
A bsent 90.0
Bronchitis
Present 60.0
A bsent 40.0
Tuberculosis or Cancer
True
14.5
False
85.5
XRay Result
A bnormal 18.5
N ormal
81.5
D yspnea
Present 56.4
A bsent 43.6
• Further interviewing of the patient produces the finding “Smoking” is “Smoker”
• This information propagates through the network
69
Example from Medical Diagnostics
V isit To Asia
Visit
100
N o Visit
0
Tuberculosis
Present 0.12
A bsent 99.9
Smoking
Smoker
100
N onSmoker
0
Lung Cancer
Present 0.25
A bsent 99.8
Bronchitis
Present 60.0
A bsent 40.0
Tuberculosis or Cancer
True
0.36
False
99.6
XRay Result
A bnormal
0
N ormal
100
D yspnea
Present 52.1
A bsent 47.9
• Finished with interviewing the patient, the physician begins the examination
• The physician now moves to specific diagnostic tests such as an X-Ray, which results in a
“Normal” finding which propagates through the network
• Note that the information from this finding propagates backward and forward through the arcs
70
Example from Medical Diagnostics
V isit To Asia
Visit
100
N o Visit
0
Tuberculosis
Present 0.19
A bsent 99.8
Smoking
Smoker
100
N onSmoker
0
Lung Cancer
Present 0.39
A bsent 99.6
Bronchitis
Present 92.2
A bsent 7.84
Tuberculosis or Cancer
True
0.56
False
99.4
XRay Result
A bnormal
0
N ormal
100
D yspnea
Present 100
A bsent
0
• The physician also determines that the patient is having difficulty breathing, the finding
“Present” is entered for “Dyspnea” and is propagated through the network
• The doctor might now conclude that the patient has bronchitis and does not have tuberculosis
or lung cancer
71
Applications
• Industrial
• Processor Fault Diagnosis - by
Intel
• Auxiliary Turbine Diagnosis GEMS by GE
• Diagnosis of space shuttle
propulsion systems - VISTA by
NASA/Rockwell
• Situation assessment for nuclear
power plant - NRC
• Military
• Automatic Target Recognition MITRE
• Autonomous control of unmanned
underwater vehicle - Lockheed
Martin
• Assessment of Intent
• Medical Diagnosis
• Internal Medicine
• Pathology diagnosis - Intellipath
by Chapman & Hall
• Breast Cancer Manager with
Intellipath
• Commercial
• Financial Market Analysis
• Information Retrieval
• Software troubleshooting and
advice - Windows 95 & Office 97
• Pregnancy and Child Care Microsoft
• Software debugging - American
Airlines’ SABRE online
reservation system
72
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
Modeling Users in Location-Tracking
• Modeling Users in Location-Tracking Application [Abdelsalam 2004]
– Taxi calling service for customer by tracking roving users and connecting
suitable taxi
• Sending the position of user has a specified time interval
– The position in the time interval is uncertain
• General input: Direction, last position, velocity
Proposed solution:
Considering additional evidences by Bayesian network
– Habit of user, user behavior, environments
• The proposed methods
– Considering unique goals, personalities, and task for location reasoning
– Goal: Build location-tracking applications that take into account the
individual characteristics, habits, and preferences of the users
W. Abdelsalam, Y. Ebrahim, “Managing uncertainty: Modeling users in locationtracking applications,” IEEE Pervasive Computing, 2004.
74
Modeling Users
• Temporal variables
– Time information
– Ex) Events occurred, time of year, day of the week, time of day
• Spatial variables
– Information about locations of user
– Ex) Building, town, certain part of town, certain road or highway
• Environmental variables
– Environmental information
– Ex) Weather conditions, road conditions, special events
• Behavioral variables
– Behavioral feature & information
– Ex) Typical speeds, resting patterns, preferred work areas, common
reactions in certain situations
75
Collecting User Data
• User behavior log
– User-specific data
– Environment-specific data
– User location data
• Data feature driven frequency
– Location: periodic
– Event & occurrence: non-periodic
76
Building the User Model
• Bayesian Networks
– Need flexible and automatic techniques  Bayesian Network!
• cf. logic-based methods
• Two major purposes of using BN
– Reasonability of causes from observation
– Considering the changed evidence with reduced computation
77
Using Bayesian Networks
• Taxicab location-tracking application
– Event
– Time of day
•
•
•
•
•
Morning rush hour
Lunch hour
Evening rush hour
Late night
Etc
– Source: start location
• airport, downtown, …
– Destination: end location
– Weather conditions
• Sunny, rainy, snowing, …
Last
location
Route
Speed
Future
location
– Route
• All available routes are considered
• Considering route is local or highway
– Speed
• Speed variation range
78
Experimental Results (1)
• Comparison with LSR (Last Speed Reported) method
– length = velocity X time
• The simulation
– Data: Artificially generated data
– Simulating roving users
• Simplified the speed values for routes and weather
• 45%(less than 10km/h)+25%(20km/h)+20%(50km/h)+10%(100km/h)
– 200 routes are generated
• local or highway is randomly selected
– A set of Trip Segments (TSs)
• Speed + duration
– Weather
• Good or Bad
79
Experimental Results (2)
• Standard distribution of distance error for each reporting interval
80
Agenda
• Bayesian Network
– Introduction
– Inference of Bayesian Network
– Modeling of Bayesian Network
• Bayesian Network Application
– Bayesian Network Application Example
– Life Log Application Example
• Summary & Review
Summary & Review
• Bayesian Network
– Inference & Modeling Methods
– Applications
• A Life Log Application Example
– Using BN for inference by user modeling and observation
– Ongoing research
• Detail modeling: Using SOM
• Route reduction: Using key and secondary route segments
• Application on LBS: Utilization of personalized context for location
based services
82