The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau Recall: Distance-Based Reconstruction: • Input: distances between all taxon-pairs • Output: a tree (edge-weighted)

Download Report

Transcript The Neighbor Joining Tree-Reconstruction Technique Lecture 13 ©Shlomo Moran & Ilan Gronau Recall: Distance-Based Reconstruction: • Input: distances between all taxon-pairs • Output: a tree (edge-weighted)

The Neighbor Joining
Tree-Reconstruction Technique
Lecture 13
©Shlomo Moran & Ilan Gronau
Recall: Distance-Based Reconstruction:
• Input: distances between all taxon-pairs
• Output: a tree (edge-weighted) best-describing the distances
D 
0









9
0
21 19 15 16 

22 20 16 17 
0 18 14 15 

0 8 9
0 3 
0 
4
5
10
7
6
1
2
1
2
2
Definition: Tree metric or additive distances are distances which
can be realized by a weighted tree.
Requirements from Distance-Based
Tree-Reconstruction Algorithms
1.
2.
3.
Consistency: If the input metric is a tree metric, the returned tree
should be the (unique) tree which fits this metric.
Efficiency: poly-time, preferably no more than O(n3), where n is the
number of leaves (ie, the distance matrix is nXn).
Robustness: if the input matrix is “close” to tree metric, the algorithm
should return the corresponding tree.
A natural family of algorithms which satisfy 1 and 2
is called “Neighbor Joining”, presented next. Then we present
one such algorithm which is known to be robust in practice.
3
The Neighbor Joining Tree-Reconstruction Scheme
Start with an n✕n distance matrix D over a set S of n taxa (or vertices, or leaves)
..
 0 ..



0




0
..


0
..



0
.. 


0



0 


0 

v
D
i
.
.
0



.
. 
 0

0
.
.


0
. 


0
.


0



0 

D’
j
1.
Use D to select pair of neighboring leaves (cherries) i,j
2.
Define a new vertex v as the parent of the cherries i,j
3.
Compute a reduced (n-1)✕(n-1) distance matrix D’, over S’=S \ {i,j}{v}:
Important: need to compute distances from v to other vertices in S’, s.t.
D’ is a distance matrix of the reduced tree T’, obtained by prunning
i,j from T.
4
The Neighbor Joining Tree-Reconstruction Scheme
.
.
0



.
. 
 0

0
.
.


0
. 


0
.


0 


0 

v
D’
T’
v
i
4.
j
Apply the method recursively on the reduced matrix D’, to get
the reduced tree T’.
5.
In T’, add i,j as children of v (and possibly update edge
lengths).
Recursion base: when there are only two objects, return a tree with 2 leaves.
5
Consistency of Neighbor Joining
Theorem: Assume that the following holds for each input tree-metric D defined
by some weighted tree T:
1. Correct Neighbor Selection: The vertices chosen at step 1 are cherries in T.
2. Correct Updating: The reduced matrix D’ is a distance matrix of some
weighted tree T’ , which is obtained by replacing in T the cherries i,j by
their parent v (T’ is the reduced tree).
Then the neighbor joining scheme is consistent: For each D which defines a tree
metric it returns the corresponding tree T.
6
Consistency proof
By the correct neighbor selection and the correct updating assumptions, the
algorithm:
1. Selects i and j which are cherries in T.
2. Computes a distance matrix D’ for the reduced subtree T’.
i
v j
k
By induction (on the number of taxa), the reduced tree T’ is correctly
reconstructed.
Hence T is correctly reconstructed by adding i,j as children of v
7
Consistent Neighbor Joining for
Ultrametric Trees
First we show a NJ algorithm which is correct only for ultrametric trees.
A B C D E
8 8 8 4
B
4 6 8
C
6 8
D
E
8
4
Neighbor joining reconstructs the tree
3
2
Ultrametric matrix
B
C
D E
A
0
A
By the consistency theorem, we need to define correct neighbor
selection and correct distance updates for ultrametric input
matrices.
Solution:
Neighbor Selection: select closest leaves
Distance updates: use the distances of (one of the) selected
cherries
9
A Consistent Neighbor Joining Algorithm for Ultrametric Matrices:
A i
j
D E
A v D E
A 0 8 8 8 4
A 0 8 8 4
i
8 0 4 6 8
j
8 4 0 6 8
D 8 6 0 8
D 8 6 6 0 8
E 4 8 8 0
v 8 0 6 8
Neighbor selection:
Two closest leaves
Updating distances:
For each k,
d’(v,k) = d(i,k)=d(j,k)
E 4 8 8 8 0
4
I
3
H
2
v
i
v j
G
D E
A
10
0
Robustness Requirement
In practice, it is important that the reconstruction algorithm
will be robust: if the input matrix is not ultrametric, the
algorithm should return a “close” ultrametric tree.
Such a robust neighbor joining technique for ultrametric trees is
UPGMA. It achieves its robustness by the way it updates the
distances in the reduced matrix.
UPGMA is used in many other applications, such as data
mining.
11
UPGMA Clustering
Unweighted Pair Group Method using Averages
UPGMA follows the “ultrametric neighbor joining scheme”. The only difference
is in the distance updating procedure:
Each vertex i is identified with a cluster Ci , consisting of its descendants leaves.
 Initially for each leaf i, there is a cluster Ci ={i}.
 Neighbor joining: two closest vertices i and j are selected as neighbors, and
replaced by a new vertex v. Set Cv = Ci  Cj .
 Updating distances: For each k , the distance from k to v is the average of
the distances from the objects in Ck to the objects in Cv .
12
One iteration of UPGMA
i and j are closest neighbors
•Replace i and j by v,
•Update the distances from v to all other leaves:
D(v,k) = αD(i,k) + (1-α)D(j,k)

| Ci |
| Ci |  | C j |
i
v j
k
HW5 question: Show that this reduction formula guarantees the
UPGMA invariant: the distance between any two vertices i, j is the average
of distances between the taxa in the corresponding clusters:
d (i, j ) 
1
d ( p, q)


| Ci || C j | pCi qC j
13
Complexity of UPGMA
•Naïve implementation: n iterations, O(n2) time for each
iteration (to find a closest pair)  O(n3) total.
•Constructing heaps for each row and updating them each
iteration O(n2log n) total
• Optimal implementation: O(n2) total time. One such
implementation, using “mutually nearest neighbors” is
presented next.
14
The “Nearest Neighbor Chain” Algorithm
Definition: j is a nearest neighbor (NN) of i if
[ j  i ] & [d (i, j )  min{d (i, k ) : k  i}]
(i, j) are mutual nearest neighbors if:
i is NN of j and j is NN of i .
In other words, if:
d (i, j )  min{d (i, k ), d ( j, k ) : k  i, j}
Basic Observation: i,j are cherries in
ultrametric trees iff they are mutual
nearest neighbors
15
Implementing UPGMA by
Mutual Nearest Neighbors
While(#vertices> 1) do:
•
•
Choose i,j which are mutual nearest neighbors
Replace i,j by a new vertex v , set Cv = C i  Cj
•
Reduce the distance matrice D to D’:
For k≠v, D’(v,k) = αD(i,k) + (1-α)D(j,k)

| Ci |
| Ci |  | C j |
16
θ(n2) implementation the Mutual Nearest Neighbor:
Use nearest neighbors chain
i1 i2
D:
0
0
C= (i0, i1,..,il ) is a Neraest Neighbor Chain if
0
i0
36 4 0 5 82 6 9 3 2
i1
57 3 2 4 8 01 9 7 6
0
D(ir ,ir+1) is minimal in row ir. i.e. ir+1 is a nearest
0
0
0
0
neighbour of ir.
C is a Complete NN chain if il-1 ,il are mutual
0
nearest neighbours.
Constructing Complete NN chain:
• ir+1 is a Nearest Neighbour of ir
• Final pair (il-1 ,il ) are mutual nearest neighbors.
i0
i1
2
i2
1
Mutual NN
17
An θ(n2) implementation using Nearest Neighbors Chains:
-
Extend a chain until it is complete.
-
Select final pair (i,j) for joining. Remove i,j from chain, join them to a new vertex v,
and compute the distances from v to all other vertices.
Note: in the reduced matrix, the remaining chain is still NN chain, i.e. ir+1 is still
the nearest neighbor of ir - since v didn’t become a nearest neighbor of any
vertex in the NN chain (since i and j were not)
i
j
Mutual NN
18
Complexity Analysis:
• Count number of “row-minimum” calculations (each
taking O(n) time) :
- n-1 terminations throughout the execution
- 2(n-1) Edge deletions  2(n-1) extensions
- Total for NN chains operations: O(n2).
- Updates: O(n) each iteration, total O(n2).
- Altogether O(n2).
19
Consistent Neighbor Joining for
General Tree Metrics
Neighbor Selection in General Weighted Trees
Unlike in ultrametric trees, closest vertices aren’t necessarily cherries in
general weighed trees. Hence we need a different way to select
cherries.
A
B
C
D
A
B
C
D
A
0
3
6
8
1
0
7
7
0
12
B
1
1
5
6
0
C
D
Idea: instead of using distances, use “LCA depths”
21
LCA Depth
Let i,j be leaves in T, and let r i,j be a vertex in T.
LCAr(i,j) is the Least Common Ancestor of i and j when r is
viewed as a root.
If r is fixed we just write LCA(i,j) .
dT(r,LCA(i,j)) is the “depth of LCAr(i,j)”.
r
dT(r,LCA(i,j))
i
j
22
Matrix of LCA Depths
A weighted tree T with a designated root r defines a matrix of LCA
depths:
r
3
5
A
dT(r,LCA(A,D)), = 3
2
3
A
B
4
2 E D
4
B
3
C
C
D
E
A
B
C
D
E
8
0
0
3
5
9
5
0
0
8
0
0
7
3
7
23
Finding Cherries via LCA Depths
Let T be a weighted tree, with a root r.
For leaves i,j ≠r , let L (i,j)=dT(r,LCA(i,j))
Then i,j are cherries with parent v, iff:
r
v
i
j
j
i
j
k  i, j : L(i, j )  L(i, k ), L( j, k )
In other words, i and j are cherries iff they have the same deepest
ancestor. In this case we say that i and j are mutual deepest
neighbors.
Matrices of LCA depth are called LCA matrices. Next we
charcterize such matrices.
24
LCA Matrices
Definition: A symmetric nonnegative matrix L is an LCA matrix iff
1. For all i, L(i,i)=maxjL(i,j).
2. It satisfies the “3 points condition”: Any subset of 3 indices
can be labeled i, j, k s.t. L(i,j)= L(i,k)≤L(j,k) (i.e., the minimal
value appears twice)
j
j
i
k
8 0 0 3 5
9 5 0 0
8 0 0
i
7 3
k
7
25
LCA Matrices
Weighted Rooted Trees
Theorem: The following conditions are equivalent for a
symmetric matrix L over a set S:
1. L is an LCA matrix.
2.
There is a weighted tree T with a root r and leaves-set
S, s.t. for each i,j in S:
L(i,j) = dT(r,LCA(i,j))
26
Weighted Tree T rooted at r  LCA Matrix:
L:
r
‫זנים‬
A
B
C
D
A
7
4
3
1
9
3
1
6
1
B
1
C
D
7
L(A,A) = 7 =dT(r, A)
L(A,B) = 4 =dT(r,LCA(A,B))
27
LCA Matrices weighted tree T rooted at r
Proof of this direction is identical to the proof that an ultrametric
matrix corresponds to distances in an ultrametric tree (that we saw
last week, also in HW5).
Alternative proof is by an algorithm that constructs a tree from LCA
matrix.
28
DLCA: Neighbor Joining algorithm for LCA matrices:
Input: an LCA matrix L over a set S.
Output: A tree T with leaves in S∪{r}, such thati, j  S : L(i, j )  dT (r, LCA(i, j ))
•
•
•
•
Stopping condition: If S={i} and L=[w] return the tree:
Neighbor Selection:
Choose mutual deepest neighbors i,j ,
Reduction:
w
i
r
In the matrix L, delete rows i,j and add new row v, with values:
L(v,v) L(i,j);
For k≠v, L(v,k) L(i,k)
//Note: L(i,k)=L(j,k)//
Recursively call DLCA on the reduced matrix
Neighbor connection:
In the returned tree, connect i and j to v, with edge weights:
w(v,i)  L(i,i)-L(i,j)
w(v,j)  L(j,j)-L(i,j)
29
One Iteration of DLCA:
A B C D E
A
8
0
0 3
5
B 0
9
5 0
0
C 0
5
8
0
0
D 3
0
0
7
3
E 5
0
0 3
7
V B C D E
Replace rows
A,E by V.
V 5
0
0 3
5
B 0
9
5 0
0
C 0
5
8 0
0
D 3
0
0 7
3
E 5
0
0 3
7
v
Neighbor Connection (at the end) 3
A
2
E
30
θ(n2) implementation of DLCA
by Deepest Neighbor Chains
The algorithm has n-1 iterations
Each iteration :
1. Mutual Deepest Neighbors are selected
2. Two rows are deleted, and one row is added to the matrix.
Step 2 requires O(n) operations per iteration, total O(n2).
Step 1 (finding mutual deepest neighbors) can also be done in total
O(n2) time – as done by NN chains in the UPGMA algorithm.
31
Running DLCA from (Additive) Distance Matrix D:
When the input is an (additive) distance matrix D, we apply on D the
following LCA reduction to obtain an (LCA) matrix L:
•
Choose any leaf as a root r
•
Set for all i,j : L(i,j) = ½(D(r,i) + D(r,j) - D(i,j))
• Run DLCA on L.
Important observation: If D is an additive distance matrix corresponding to a
tree T, then L is an LCA matrix in which
L(i,j)=dT(r,LCA(i,j))
r
j
i
32
Example
A tree with the corresponding additive distance matrix
A
B
C
D
r
A
B
C
D
r
-
8
7
12
7
-
9
14
9
-
11
6
-
7
r
1
33
Use D to compute an LCA matrix L
1
L(i, j )  ( D(r , i )  D(r , j )  D (i, j ))
2
1
L( A, B)  (7  9  8)  4
2
D:
L:
‫זנים‬
A
B
C
D
r
A
-
8
7
12
7
-
9
14
9
-
11
6
-
7
B
C
D
r
‫זנים‬
A
B
C
D
A
7
4
3
1
9
3
1
6
1
B
C
D
7
34
The relation of L to the original tree:
L:
r
‫זנים‬
A
B
C
D
A
7
4
3
1
9
3
1
6
1
B
1
C
D
7
L(A,A) = 7 =dT(r, A)
L(A,B) = 4 =dT(r,LCA(A,B))
35
Discussion of DLCA
•
•
•
Consistency: If the input matrix L is an LCA matrix, then the output is
guaranteed to be the unique weighted tree which realizes the LCA distances
in L.
Complexity: It can be implemented in optimal O(n2) time.
Robustness to noise:
• Theoretical: it has optimal robustness when 0≤ α ≤1.
• Practical: it is inferior to other NJ algorithms – possibly due to the fact
that its neighbor-selection criterion is biased by the selected root r.
Next we present a neighbor selection criterion which use the
original distance matrix. This criterion is known to be most
robust to noise in practice.
36
Saitou & Nei’s Neighbor Joining Algorithm (1987)

~13,000 citations (Science Citation Index)

Implemented in numerous phylogenetic packages

Fastest implementation - θ(n3)

Usually referred to as “the NJ algorithm”

Identified by its neigbor selection criterion
Saitou & Nei’s
Q(i, j )   D(r , i )   D(r , j )  (n  2) D(i, j )  neighbor-selection
r
r
criterion
select i, j which maximize the sum
37
Consistency of Seitou&Nei method
Q(i, j )   D(r , i)   D(r , j )  (n  2) D(i, j )
r
r
Theorem (Saitou&Nei) Assume all edge weights of T are positive.
If Q(i,j)=max{i’,j’} Q(i’,j’) , then i and j are cherries in the tree.
Proof: in the following slides.
38
1st step in the proof:
Express Saitou&Nei selection criterion in terms of LCA distances
Saitou & Nei’s Selection criterion:
Select i,j which maximize
Q(i, j )   D(r, i)   D(r, j )  (n  2) D(i, j )
r
r


 2    LCAr (i, j )  D(i, j ) 
 r i , j

Intuition: NJ “tries” to selects taxon-pairs with average deepest LCA
The addition of D(i,j) is needed to make the formula consistent .
Next we prove the above equality.
Proof of equality in previous slide
ri
rj

 

Q(i, j )   D(i, j )   D(i, r )    D( j, i)   D( j, r )   (n  2) D(i, j )
r i , j
r i , j

 

-2d(r,LCAr(i,j))

 [ D(i, r )  D( j, r )  D(i, j)]  2D(i, j)
r i , j


 2  D(i, j )   d (r , LCAr (i, j )) 
r i , j


40
2nd step in proof:
Consistency of Saitou&Nei Neighbor Selection
Rest of T
We need to show that a pair of leaves i, j which maximize
r
Q '(i, j )  Q(i, j ) / 2  D(i, j )   D(r , LCAu (i, j ))
r i , j
must be cherries. First we express Q ' as a sum of edge weights.
e
For a vertex i, and an edge e:
Ni(e) = |{rS : e is on path(i,r)}|
Then:
Q '(i, j )  D(i, j )   D(r, LCAr (i, j )) 
r i , j
i
path(i,j)
j

epath (i , j )
w(e) 

epath (i , j )
Ni (e)w(e)
Note: If e’ is a “leaf edge”, then w(e’) is added exactly once to Q(i,j).
41
Consistency of Saitou&Nei (cont)
Assume for contradiction that Q’(i,j) is maximized for i,j
which are not cherries.
Let (see the figure below):
• path(i,j) = (i,...,k,j).
• T1 = the subtree rooted at k. WLOG that T1 has at most n/2 leaves.
•T2 = T \ T1.
i’
T2
Let i’,j’ be any two cherries in T1. We
will show that Q’(i’,j’) > Q’(i,j).
j’
T1
j
i
k
42
Consistency of Saitou&Nei (cont)
Proof that Q’(i’,j’)>Q’(i,j):
Q '(i, j ) 

w(e) 
e p ( i , j )
Q '(i ', j ') 

e p ( i ', j ')

e p ( i , j )
w(e) 
N i (e)w(e)

e p ( i ', j ')
i’
T2
j’
T1
N i ' (e)w(e)
j
i
k
Each leaf edge e adds w(e) both to Q’(i,j) and to Q’(i’,j’), so we can
ignore the contribution of leaf edges to both Q’(i,j) and Q’(i’,j’)
43
Consistency of Saitou&Nei (end)
Contribution of internal edges to
Q(i,j) and to Q(i’,j’)
Location of
internal edge e
# w(e) added
to Q’(i,j)
# w(e) added
to Q’(i’,j’)
epath(i,j)
1
Ni’(e)≥2
epath(i’,j)
eT\path(i,i’)
Ni (e) < n/2
i’
T2
j’
T1
j
i
Ni’(e) ≥ n/2
k
Ni (e) = Ni’(e)
Since there is at least one internal edge e in path(i,j),
Q’(i’,j’) > Q’(i,j). QED
44
Complexity of Seitou&Nei NJ Algorithm
Initialization: θ(n2) to compute Q(i,j) for all i,jL.
Each Iteration:
 O(n2) to find the maximal Q(i,j), and to update the
values of Q(x,y)
Total: O(n3)
45
A characterization of additive metrics:
The 4 points condition
Ultrametric distances and LCA distances were
shown to satisfy “3 points conditions”.
Tree metrics (aka “additive distances”) have a
characterization known as the “4 points
condition”, which we present next.
Distances on 3 objects are always realizable by a
(unique) tree with one internal node.
i
j
k
k
i 0 a+b a+c
c
a
i
m
b
j
j
0
k
b+c
0
d (i, j )  a  b
d (i, k )  a  c
d ( j, k )  b  c
For instance
1
c  d (k , m)  [d (i, k )  d ( j , k )  d (i, j )]  0
2
47
How about four objects?
Not all distance metrics on 4 objects are additive:
eg, there is no tree which realizes the below distances.
i
j
i 0
2
2
2
j
0
2
2
0
3
k
l
k
l
0
48
The Four Points Condition
A necessary condition for distances on four objects to be
additive: its objects can be labeled i,j,k,l so that:
d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)
k
i
{{i,j},{k,l}} is a “split” of {i,j,k,l}.
j
l
Proof: By the figure...
49
The Four Points Condition
Definition: A distance metric satisfies the four points condition iff
any subset of four objects can be labeled i,j,k,l so that:
d(i,k) + d(j,l) = d(i,l) +d(k,j) ≥ d(i,j) + d(k,l)
k
i
j
l
50
Equivalence of the Four Points Condition
and the Three Points Condition
4 PC  d (r , k )  d ( j , l )  d (r , l )  d ( j , k )  d (r , j )  d (k , l )
d (r , k )  d ( j , k )  d (r , l )  d ( j , l ) and d (r , l )  d (k , l )  d (r , j )  d ( j , k )
2d (r , LCA(k , l ))  d (r , k )  d (r , l )  d (k , l )  d (r , k )  d (r , j )  d ( j , k )[  2d (r , LCA( j , k ))]
 d (r , j )  d (r , l )  d ( j , l )  2d (r , LCA( j , l ))  3PC
k
r
j
l
i.e., a matrix D satisfies the 4PC on
all quartets that include r iff the
LCA reduction applied on D and r
outputs a matrix L which satisfies
the 3PC for LCA distances.
51
The Four Points Condition
Theorem: The following 3 conditions are equivalent for a distance
matrix D on a set S of L objects
1. D is additive
2. D satisfies the four points condition for all quartets in S.
3. There is a vertex r in S, s.t. D satisfies the 4 points condition for
all quartets that include r.
k
i
j
l
52
The Four Points Condition
Proof: we’ll show that 1231.
1 2
Additivity 4P Condition satisfied by all quartets: By the figure...
k
i
j
l
23: trivial
53
Proof that 3 1
4PC on all quartets which include r  additivity
The proof is as follows:
• All quartets in D which include r satisfy the 4PC 
• The matrix L obtained by applying the LCA reduction on D
and r is an LCA matrix
• The tree T output by running DLCA on L realizes the LCA
depths in L
• T realizes the distances in D.
54