Transcript Document

- A Maximum Likelihood Approach
Vinod Kumar Ramachandran
ID: 902 234 6077
Wireless Sensor Networks – Introduction
• A Network with thousands of tiny sensors deployed in a region of interest to perform a
specific task. For example, to detect and classify vehicles moving in a given region
•
Application
(1) Battlefield Surveillance (2) Disaster Relief (3) Environmental Monitoring
• Key Design Issues
 Low Battery Power of Sensors  Efficient Collaborative Signal
Processing Algorithms (CSP) must be developed
 Design of proper communication schemes to transmit information to
a manager node
Philosophy
• Each Sensor has Limited Sensing and Processing Capabilities
• But thousands of such sensors can coordinate to achieve a very complex task
• Main Advantage of Sensor Networks – Ability to accomplish a very complex task with tiny,
inexpensive sensors
• In detection and classification of targets, factors affecting individual decisions

Measurement Noise

Statistical Variability of Target Signals
Multi-Target Classification
•
Objective: To detect and classify the targets present in a region of interest
• Suppose M distinct targets are possible. Then this becomes a N-ary hypothesis testing
problem where N = 2M
• For example, when M = 2, four hypotheses are possible
•
H0: No target present
H1: Target 1 alone is present
•
H2: Target 2 alone is present
H3: Both target 1 and target 2 is present
•Signal Model
• Signals from all sources are assumed Gaussian
• Region of interest can be divided into spatial coherence regions (SCRs)
• Signals obtained from distinct SCRs are independent and signals from sensors with a
single SCR are highly correlated
• Correlated signals within individual SCRs can overcome measurement noise
• Independent signals across SCRs can overcome statistical variability of target signals
Problem Formulation
•
For M distinct targets, N = 2M hypotheses possible H j , j  0,..., N 1
Probability of m-th target being present =
1  qm
bm ( j) denotes the presence (bm ( j) = 1) or absence ( bm ( j)= 0 ) of m-th target.
• Prior probabilities under different hypotheses is given by
M
 j  P( H j )  [bm ( j )(1  qm )  (1  bm ( j ))qm ],
j  0,..., N  1
m 1
where bM ( j )bM 1 ( j)...b1 ( j ) is the binary representation of integer j.
• Signals from different targets are assumed Gaussian with m-th target having a
mean of zero and covariance matrix  m
• Assume there are G SCRs (independent measurements). The N-ary hypothesis
testing problem is
H j : z k  s k  n k , k  1,..., G; j  0,..., N  1
M
sk
CN (0,  j ),  j   bm ( j )m , nk
m 1
where measurement noise is added to the signal vector
CN (0,  2I)
Problem Formulation (contd…)
•
Under Hj, the probability density function at the k-th SCR is:
1
 z H  1z
p j ( z k )  p( z k | H j )  N
e k j k ,  j   j   2I
 0 j
and the G measurements are i.i.d.
Basic Architecture for Multi-target Classification
Classification Strategies
Centralized Fusion: Data from all SCRs are directly combined in an optimal fashion.
Advantage: Serves as a benchmark as it is optimal.
Disadvantage: Heavy communication burden because all data have to be transmitted to manager node.
Decision Fusion: Individual SCRs make the decision and the decisions themselves are
combined in an optimal manner in the manager node
Advantage: Very low communication burden because only decisions are transmitted
Disadvantage: May not be as reliable as centralized fusion
Optimal Centralized Classifier: Decides the correct hypothesis according to
C (z1 ,..., z G )  arg max p j (z1 ,..., z G ) j
j  0,... N 1
G
where
This can also be written as
p j (z1 ,..., z G )  p(z1 ,..., z G | H j )   p j (z k )
k 1
C (z1 ,..., z G )  arg min l j (z1 ,..., z G )
j  0,... N 1
1
1 G
1
where l j (z1 ,..., z G )   log p j (z1 ,..., z G ) j    log p j (z k )  log  j
G
G k 1
G
Ignoring constants that do not depend on the class,
1 G H 1
1
l j (z1 ,..., z G )  log  j   z k Σ j z k  log  j
G k 1
G
Performance of Optimal Classifier
• Note that each SCR has to transmit N likelihoods to the manager node.
• The probability of error of the optimal centralized classifier can be bounded as [1]
N 1
N 1
Pe (G )  

m  0 j  0, j  m
Ajm (G )e
 D*jm ( G ) G
where
Ajm (G )   j
*
 m1
(G )
*
(G )
D*jm (G )    jm ( * (G ))
j
 (G )  arg min log 
0 1 G
 m
*


   jm ( )

 jm ( )   log  m  j 1  log (1   )I   m  j 1
• Thus the probability of error goes to zero as the number of independent measurements G goes
to infinity provided the Kullback-Leibler distances D( pm || p j ) between any pairwise pdfs are strictly
positive
[1] J.Kotecha, V.Ramachandran, A.Sayeed, “Distributed Multitarget Classification in Wireless
Sensor Networks”, IEEE Journal on Selected Areas in Commn., December 2003
Local Classifiers
•
Local decisions are made at the individual SCRs based on the single measurement in
that SCR.
• The local decisions uk are communicated to the manager node that makes the final
optimal decision. These classifiers are also called distributed classifiers.
• Since only the decisions are transmitted, this comes as a big relief to the network in
terms of the communication burden
Optimal Local Classifier
•
The optimal local classifier in the k-th SCR makes the decision as
uk  arg max p j (z k ) j , k  1,..., G
j  0,..., N 1
The pmfs under different hypothesis are characterized by:
pm[ j]  P( p j (zk ) j  pl (zk ) l
for all l  j | Hm ),
j, m  0,..., N 1
The optimal local classifier need to calculate N likelihoods for each decision and this grows
exponentially with M targets.
Sub-optimal Local Classifiers
• To circumvent exponential complexity, the suboptimal local classifiers conduct M
tests, one for each target for the presence or absence of the target
c
• Partition the set of hypotheses into two sets, H m and H m where H m contains
those hypotheses in which the m-th target is present and H c contains those
m
hypotheses in which it is absent
Define
Sm  { j {0,..., N  1}: bm ( j )  1}
Smc  { j {0,..., N  1}: bm ( j )  0}  {0,..., N  1}  Sm
Then
Hm 
Hj
jSm


 N 1

Hm 
H j   H j   Hm
c
 j 0

jSm


c
• The feature vector will be distributed as a weighted sum of Gaussians
1
1
1
 z Hk i 1z k
p(z k | H m ) 

p
(
z
)


e
 i i k 1  q i
i
1  qm iSm
 N0 i
m Sm
1
1
1
 z Hk  i 1z k

p
(
z
)


e
 i i k q c i  N0 
qm iS mc
m iS m
i
It has been shown in [1] that the suboptimal classifiers give perfect classification in
the limit of large G, the number of independent measurements
p(z k | H mc ) 
Sub-optimal Local Classifiers (Contd…)
Mixture Gaussian Classifer (MGC)
• For m = 1,…, M, let bˆm (z k ) denote the value of the m-th binary hypothesis
test between H and H c in the k-th SCR
m
• bˆm = 1 if
m

(1  qm ) p(zk | Hm )  qm p(zk | H mc )

iSm
i
pi ( zk | H i )    i pi ( zk | H i )
c
iS m
or bˆm = 0 otherwise
For the simple case of M = 2, bˆ1 = 1 if
q2 (1  q1 )CN (0,  2I  Σ1 )  q1CN (0,  2I)  
(1  q2 ) (1  q1 )CN (0,  2I  Σ1  Σ2 )  q1CN (0,  2I  Σ2 )   0
It can be observed that the above test is a weighted sum of two tests
Single Gaussian Classifier (SGC)
• This classifier approximates the pdfs as
pˆ (z k | H m ) 
pˆ (z k | H mc ) 
1

N0
ˆ m
1
 N0 ˆ
e
 z Hk ˆ m1z k
, ˆ m    i m
iSm
e
 z Hk ˆ 1m z k
, ˆ  m 
m
• The value of the m-th test in k-th SCR is one if
 
i
m
c
iS m
(1  qm ) pˆ (zk | Hm )  qm pˆ (zk | H mc )
Fusion of Local Decisions at the Manager Node
• Ideal Communication Links

The classifier at the final manager node makes the decision as:
Cideal (u1 ,..., uG )  arg max p j [u1 ,..., uG ] j  arg max
j  0,..., N 1
G
j  0,..., N 1
 p [u ]
k 1
j
k
j
which can also be written as
Cideal (u1 ,..., uG )  arg min l j ,ideal [u1 ,..., uG ]
j  0,..., N 1
l j ,ideal [u1 ,..., uG ]  
1
1 G
1
log p j [u1 ,..., uG ] j    log p j [uk ]  log  j
G
G k 1
G

The above expressions apply to all three local classifiers; the only difference is that
the different local classifiers induce different pmfs
• Noisy Communication Links
In noisy channels, each SCR sends an amplified version of its local hard decision.
yk  uk  wk , k  1,..., G
The hard decisions are corrupted by additive white Gaussian noise (AWGN).
The manager node makes the final decision as:
Cnoisy (y )  arg min l j ,noisy ( y )
j
l j ,noisy (y )  
p j ,noisy ( y ) 
1
1 G
log p j ,noisy (y )    log p j ,noisy ( yk )
G
G k 1
1
2 w2
N
e
i  N
 ( y  i )2 / 2 w2
p j [i ]
Simulations
•
Data Source: DARPA SensIT program
• Simulation Tool: MATLAB
Results for Two Targets (M = 2) – Probability of Error versus G (independent Measurements)
Measurement SNR = -4 dB Communication SNR = 0 dB, 10 dB
Measurement SNR = 10 dB Communication SNR = 0 dB, 10 dB
Chernoff Bounds for M = 2 Targets with Ideal Communication Channel
Results for M = 3 Targets
– Probability of Error versus G (independent measurements)
Measurement SNR = -4 dB Communication SNR = 0 dB, 10 dB
Measurement SNR = 10 dB Communication SNR = 0 dB, 10 dB
Observations
• In all cases, the error probabilities of the optimal centralized classifier serves as the lower bound and
hence this is the best classifier possible.
• The key observation is that for all the classifiers, the probability of error goes down exponentially with the
number of independent measurements, G. (Note that the plots shown are in the log scale). This indicates
that we can attain reliable classification by combining a relatively moderate number of much less reliable
independent local decisions.
• The performance improves with increasing measurement SNRs for all classifiers
• For any distributed classifier, the probability of error is higher with noisy communication channel than
with ideal communication links.
• The performance of the mixture Gaussian classifier (MGC) is close to the optimal distributed classifier in
all cases thereby making it an attractive choice in practice.
• The performance of the single Gaussian classifier (SGC) is worse compared to the MGC and the
difference is large at higher measurement and communication SNRs
• In the simulations performed, the KL distances for all pairwise hypotheses were found to be positive and
hence perfect classification is possible in the limit of large G.
• The Chernoff bounds shown match the error exponent fairly but exhibit an offset.
• The performance of all the classifiers is worse for M = 3 (eight hypotheses) case than for M = 2 (4
hypotheses) case
• The difference between the MGC and the optimal distributed classifier seem to increase for the three
target case.
Conclusions
The performance of the optimal centralized and distributed classifiers with
exponential complexity have been compared with the sub-optimal distributed
classifiers with linear complexity
•
Several issues warrant further investigation such as:
• Signal model does not assume path loss in sensing measurements. When
path loss is considered, the G measurements will be independent but not
identically distributed. The path loss will limit the number of independent
measurements.
• Relationship between the number of targets and the dimensionality of the
feature vector. Increasing the feature vector dimension may result in improved
performance for higher number of targets
• The sub-optimal CSP algorithms can be combined with other sub-optimal
approaches such as tree-based classifiers to develop a classifier with even
lower complexity