Chapter 7 Genome Rearrangements

Download Report

Transcript Chapter 7 Genome Rearrangements

Phylogentic Tree
2015/7/7
1
Evolution
Evolution of organisms is
driven by
 Diversity


Different individuals
carry different variants
of the same basic blue
print
Mutations

2015/7/7
The DNA sequence can
be changed due to single
base changes,
deletion/insertion of
DNA segments, etc.
2
Basic Assumptions




Closer related organisms have more similar
genomes.
Highly similar genes are homologous (have the
same ancestor).
A universal ancestor exists for all life forms.
Phylogenetic relation can be expressed by a
dendrogram (a “tree”) .
2015/7/7
3
phylogenetic tree
phylogenetic tree is a tree that describes the
sequence of speciation events that lead to the
forming of a set of current day species;
2015/7/7
4
Common Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages
A
B
C
D
Ancestral Node
or ROOT of
the Tree
2015/7/7
Internal Nodes
E
5
Phylogenetic trees diagram the evolutionary
relationships between the taxa
Taxon B
Taxon C
Taxon A
Taxon D
No meaning to the
spacing between the
taxa, or to the order in
which they appear from
top to bottom.
Taxon E
This dimension either can have no scale, can be proportional to
genetic distance or amount of change (for ‘phylograms’ or
‘additive trees’), or can be proportional
to time.
((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses
These say that B and C are more closely related to each other than either is to A,
and that A, B, and C form a clade that is a sister group to the clade composed of
D and E. If the tree has a time scale, then D and E are the most closely related.
2015/7/7
6
Historical Note


Until mid 1950’s phylogenies were constructed
by experts based on their opinion (subjective
criteria)
Since then, focus on objective criteria for
constructing phylogenetic trees


Thousands of articles in the last decades
Important for many aspects of biology


Classification
Understanding biological mechanisms
2015/7/7
7
Morphological vs. Molecular


Classical phylogenetic analysis:
morphological features: number of legs,
lengths of legs, etc.
Modern biological methods allow to use
molecular features



Gene sequences
Protein sequences
Analysis based on homologous sequences in
different species
2015/7/7
8
Morphological topology
(Based on Mc Kenna and Bell, 1997)
2015/7/7
Bonobo
Chimpanzee
Man
Gorilla
Sumatran orangutan
Bornean orangutan
Common gibbon
Barbary ape
Baboon
White-fronted capuchin
Slow loris
Tree shrew
Japanese pipistrelle
Long-tailed bat
Jamaican fruit-eating bat
Horseshoe bat
Little red flying fox
Ryukyu flying fox
Mouse
Rat
Vole
Cane-rat
Guinea pig
Squirrel
Dormouse
Rabbit
Pika
Pig
Hippopotamus
Sheep
Cow
Alpaca
Blue whale
Fin whale
Sperm whale
Donkey
Horse
Indian rhino
White rhino
Elephant
Aardvark
Grey seal
Harbor seal
Dog
Cat
Asiatic shrew
Long-clawed shrew
Small Madagascar hedgehog
Hedgehog
Gymnure
Mole
Armadillo
Bandicoot
Wallaroo
Opossum
Platypus
Archonta
Glires
Ungulata
Carnivora
Insectivora
Xenarthra9
From sequences to a phylogenetic tree
Rat
QEPGGLVVPPTDA
Rabbit
QEPGGMVVPPTDA
Gorilla QEPGGLVVPPTDA
Cat
REPGGLVVPPTEG
There are many possible types of
sequences to use.
2015/7/7
10
Mitochondrial (线粒体) topology
(Based on Pupko et al.,)
2015/7/7
Donkey
Horse
Indian rhino
White rhino
Grey seal
Harbor seal
Dog
Cat
Blue whale
Fin whale
Sperm whale
Hippopotamus
Sheep
Cow
Alpaca
Pig
Little red flying fox
Ryukyu flying fox
Horseshoe bat
Japanese pipistrelle
Long-tailed bat
Jamaican fruit-eating bat
Asiatic shrew
Long-clawed shrew
Mole
Small Madagascar hedgehog
Aardvark
Elephant
Armadillo
Rabbit
Pika
Tree shrew
Bonobo
Chimpanzee
Man
Gorilla
Sumatran orangutan
Bornean orangutan
Common gibbon
Barbary ape
Baboon
White-fronted capuchin
Slow loris
Squirrel
Dormouse
Cane-rat
Guinea pig
Mouse
Rat
Vole
Hedgehog
Gymnure
Bandicoot
Wallaroo
Opossum
Platypus
Perissodactyla
Carnivora
Cetartiodactyla
Chiroptera
Moles+Shrews
Afrotheria
Xenarthra
Lagomorpha
+ Scandentia
Primates
Rodentia 1
Rodentia 2
Hedgehogs
11
What can we get from
phylogenetic trees?

A few examples of what can be inferred from
phylogenetic trees built from DNA
or protein sequence data:

Which species are the closest living relatives of modern
humans?

Did the infamous Florida Dentist infect his patients with
HIV?
2015/7/7
12
Which species are the closest living relatives of modern
humans?
Humans
Chimpanzees
Bonobos
Gorillas
Orangutans
14
0
MYA
Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization all show
that bonobos and chimpanzees are related more closely to humans than either are to
gorillas.
2015/7/7
13
Did the Florida Dentist infect his patients with HIV?
Phylogenetic tree
of HIV sequences
from the DENTIST,
his Patients, & Local
HIV-infected People:
DENTIST
Patient C
Patient A
Patient G
Patient B
Patient E
Patient A
DENTIST
Yes:
The HIV sequences from
these patients fall within
the clade of HIV sequences
found in the dentist.
Local control 2
Local control 3
Patient F
No
Local control 9
Local control 35
Local control 3
Patient D
2015/7/7
No
14
Types of trees
Unrooted tree represents the same phylogeny
without the root node
2015/7/7
15
Rooted versus unrooted trees
Tree A
Tree B
Tree C
b
a
c
2015/7/7
Represents
the three rooted trees
16
Inferring evolutionary relationships between
the taxa requires rooting the tree:
B
To root a tree mentally,
imagine that the tree is
made of string. Grab the
string at the root and
tug on it until the ends of
the string (the taxa) fall
opposite the root:
Root
2015/7/7
D
Unrooted tree
A
A
Note that in this rooted tree, taxon A is
no more closely related to taxon B than
it is to C or D.
C
B
C
D
Rooted tree
Root
17
Now, try it again with the root at another position:
B
C
Root
Unrooted tree
D
A
A
B
C
D
Rooted tree
Root
2015/7/7
Note that in this rooted tree, taxon A is most
closely related to taxon B, and together they
are equally distantly related to taxa C and D.
18
An unrooted, four-taxon tree theoretically can be rooted in five
different places to produce five different rooted trees
A
The unrooted tree 1:
4
1
B
Rooted tree 1a
2
Rooted tree 1b
C
5
D
3
Rooted tree 1c
Rooted tree 1d
Rooted tree 1e
B
A
A
C
D
A
B
B
D
C
C
C
C
A
A
D
D
D
B
B
These trees show five different evolutionary relationships among the taxa!
2015/7/7
19
Each unrooted tree theoretically can be rooted
anywhere along any of its branches
C
A
D
B
A
C
B
A
B
2015/7/7
D
E
C
F
D
E
# Taxa
3
4
5
6
7
8
9
.
.
.
.
30
N
# Unrooted
Trees x
1
3
15
105
945
10,935
135,135
.
.
.
.
~3.58 x 1036
(2N - 5)!/(2N - 3 (N - 3)!)
# Roots
3
5
7
9
11
13
15
.
.
.
.
57
# Rooted
Trees
3
15
105
945
10,395
135,135
2,027,025
.
.
.
.
~2.04 x 1038
(2N - 3)!/(2N - 2 (N - 2)!)
20
There are two major ways to root trees:
By outgroup:
Uses taxa (the “outgroup”) that
are known to fall outside of the
group of interest (the “ingroup”).
Requires some prior knowledge
about the relationships among the
taxa.
outgroup
By midpoint or distance:
Roots the tree at the midway point
between the two most distant taxa in
the tree, as determined by branch
lengths. This assumption is built into
some of the distance-based tree
building methods.
2015/7/7
A
d (A,D) = 10 + 3 + 5 = 18
Midpoint = 18 / 2 = 9
10
C
3
B
2
2
5
D
21
Two Methods of Tree Construction


Distance- A tree that recursively combines
two nodes of the smallest distance.
Parsimony – A tree with a total minimum
number of character changes between nodes.
2015/7/7
22
Types of data used in phylogenetic inference:
Character-based methods: Use the aligned characters, such as DNA
or protein sequences, directly during tree inference.
Taxa
Species
Species
Species
Species
A
B
C
D
Characters
ATGGCTATTCTTATAGTACG
ATCGCTAGTCTTATATTACA
TTCACTAGACCTGTGGTCCA
TTGACCAGACCTGTGGTCCG
Species E
TTGACCAGTTCTCTAGTTCG
Distance-based methods: Transform the sequence data into pairwise
distances (dissimilarities), and then use the matrix during tree building.
Species
Species
Species
Species
Species
2015/7/7
A
B
C
D
E
A
---0.23
0.87
0.73
0.59
B
0.20
---0.59
1.12
0.89
C
0.50
0.40
---0.17
0.61
D
0.45
0.55
0.15
---0.31
E
0.40
0.50
0.40
0.25
---23
Distance-Based Method
Input: distance matrix between species
For two sequences si and sj , perform a pairwise (global)
alignment. Let f = the fraction of sites with different residues.
Then
3
4
d ij   log(1  f ) (Jukes-Cantor Model)
4
3
Outline:
 Cluster species together
 Initially clusters are singletons
 At each iteration combine two “closest” clusters
to get a new one
2015/7/7
24
Unweighted Pair Group Method using
Arithmetic Averages (UPGMA)

UPGMA is a type of Distance-Based algorithm

UPGMA steps:.
1.
2.
3.
2015/7/7
Cluster the two species with the smallest distance putting
them into a single group.
Recalculate the distance matrix with the new group
against other groups:
With the new distance matrix repeat 1 until all species
have been grouped.
25
Algorithm
2015/7/7
26
UPGMA Step 1
Species
A
B
C
D
B
9
–
–
–
C
8
11
–
–
D
12
15
10
–
E
15
18
13
5
d(DE)A = 0.5 * (dDA+dEA) = 0.5*(12+15) = 13.5
d(DE)B = 0.5 * (dDB+dEB) = 0.5*(15+18) = 16.5
d(DE)C = 0.5 * (dDC+dEC) = 0.5*(10+13) = 11.5
2015/7/7
Merge D & E
D
E
Species
B
A
9
B
–
C
–
C
8
11
–
DE
13.5 16.5 11.5
27
UPGMA Step 2
Species
B
A
9
B
–
C
–
C
8
11
–
DE
Merge A & C
A
C
E
B
10
AC
–
16.5
12.5
13.5 16.5 11.5
Species
AC
DE
2015/7/7
D
28
UPGMA Steps 3 & 4
Species
AC
DE
B
10
AC
–
16.5
12.5
Merge B & AC
A
C
B
D
E
Merge ABC & DE
A
C
B
D
E
(((A,C)B)(D,E))
2015/7/7
29
Most Parsimonious Tree (MP Tree)
Optimality criterion:
The ‘most-parsimonious’ tree is the one that requires the
fewest number of evolutionary events (e.g., nucleotide substitutions, amino acid
replacements) to explain the sequences.
Parsimony-score:
Number of character-changes (mutations) along the evolutionary tree
Example:
Score = 4
0
1 AAA
AAG
0
AAA
1
AGA
0
AAA
Score = 3
0
1 AAA
AAA 2
GGA
AAA
AAG
0
AAA
1
0
AGA
AGA
1
GGA
Most parsimonious tree:
 Tree with minimal parsimony score
2015/7/7
30
There are many trees..,
We cannot go over all the trees. We will try to find
a way to find the best tree.
There are approximate solutions… But what if we
want to make sure we find the global maximum.
There is a way more efficient than just go over all
possible tree. It is called BRANCH AND BOUND
and is a general technique in computer science,
that can be applied to phylogeny.
2015/7/7
31
BRANCH AND BOUND
To exemplify the BRANCH AND BOUND (BNB)
method, we will use an example not connected to
evolution. Later, when the general BNB method is
understood, we will see how to apply this method
to finding the MP tree. We will present the
traveling sales person path problem (TSP).
2015/7/7
32
Branch and Bound for TSP


Find a minimum cost
round-trip path that
visits each intermediate
city exactly once
G
Greedy approach:
A,G,E,F,B,D,C,A
= 251
2015/7/7
A
93
20
17
C
46
D
82
B
59
31
57
12
82
35
E
68
F
15
33
Search all possible paths
A
93
20
All paths
D
82
B
59
31
57
12
G
17
C
46
82
35
E
68
F
AG (20)
AC (93)
15
AGF (88)
AGFB
Best estimate: 251
2015/7/7
AB (46)
AGFE
AGE (55)
AGFC
ACB (175)
ACD
ACF
ACBE (257)

34
Back to finding the MP
tree
Finding the MP tree
BNB helps, though it is still exponential…
2015/7/7
35
The MP search tree
1
3
4 is added to branch 1.
2
1
4
1
1
3
4
3
3
4
2
5 is added to branch 2.
There are 5 branches
2015/7/7
2
2
36
The MP search tree
30
4 is added to branch 1.
55
43
52
54
2015/7/7
52
53
58
61
56
59
39
61
69
53
51
42
47
47
37
MP-BNB
30
4 is added to branch 1.
55
43
52
54
52
53
58
61
56
59
39
61
69
53
51
42
47
47
Best (minimum) value = 52
2015/7/7
38
MP-BNB
30
4 is added to branch 1.
55
43
52
54
52
53
58
61
56
59
39
61
69
53
51
42
47
47
Best record = 52
2015/7/7
39
MP-BNB
30
4 is added to branch 1.
55
43
52
54
52
53
58
61
56
59
39
61
69
53
51
42
47
47
Best record = 52
2015/7/7
40
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52
2015/7/7
41
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52
2015/7/7
42
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52 51
2015/7/7
43
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52 51 42
2015/7/7
44
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52 51 42
2015/7/7
45
MP-BNB
30
43
52
54
52
53
58
55
39
53
51
42
47
47
Best record = 52 51 42
2015/7/7
46
MP-BNB
30
43
52
54
52
53
58
Total # trees visited: 14
2015/7/7
55
39
53
51
42
47
Best TREE.
MP score = 4247
47
Order of Evaluation Matters
The bound
after searching
this subtree
will be 42.
30
Evaluate all 3 first
43
55
39
53
51
42
47
47
Total tree visited: 9
2015/7/7
48