Transcript Document

Chapter

Evolutionary Changes in Nucleotide Sequences

Chau-Ti Ting [email protected]

Unless noted, the course materials are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Taiwan (CC BY-NC-SA 3.0)

Introduction

Calculate of the distance between two sequences is the simplest phylogenetic analysis Important because The first step in distance methods for phylogeny reconstruction Markov-process models of nucleotide substitution used in distance calculation form the basis of likelihood and Bayesian analysis The distance between two nucleotide sequences is defined as the expected number of nucleotide substitutions per site.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.

A simplest distance measure is the proportion of different sites, sometimes called the

p

-distance. If 10 sites are different between two sequences, each 100 bp long, then

p

= 10% = 0.1

However, a variable site may result from more than one substitutions that have occurred, and even a constant site may harbor back or parallel substitutions.

Multiple hits: multiple substitutions at the same site (

i.e.

, some changes are hidden) Note:

p

with

p

is usable only for high similar sequences, < 5%.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.

A

C

T G

G

A

G

G A A T C G C

ACTGAACGTAACGC

A

A

T G

A

A

A

G A A T C G C

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.

Jukes and Cantor

s one-parameter model

This simple model assumes that substitutions occur with equal probability among the four nucleotide types. The rate of substitution for each nucleotide is 3

pre unit time, and the rate of substitution is in each of the three possible directions of change is

. Because the model involves a single parameter,

it is called the one-parameter model.

A

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA,

G

   

C

T National Taiwan University Chau-Ti Ting

A

  

G

C

T National Taiwan University Chau-Ti Ting Since we start with A, the probability hat this site is occupied by A at time 0 is P A(0) =1. At time 1, the probability of still having A at this site is given by P A(1) = 1 – 3

In which 3

3

is the probability of A changing to T, C or G, and 1 – is the probability that A has remained unchanged.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.

The probability of having A at time 2 is

P A(2) = (1 – 3



P A(1) +



P A(1)

 To derive this equation, we consider two possible scenarios: 1) the nucleotide has remained unchanged from time 0 to time 2, and 2) the nucleotide has changed to T, C, or G at time 1, but has subsequently reverted to A at time 2.

I t=0 A P A(1) t=1 A (1 – 3



t=2 A II A



P A(1) Not A

A

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.

P A(2) = (1 – 3



P A(1) +



P A(1)

 Using the above formulation, we can show that the following recurrence equation applies to any

t

: P A(

t

+1) = (1 – 3



P A(

t

) +



P A(

t

)

We can rewrite this equation in terms of the amount of change in P A(

t

) per unit time as

P A(

t

) = P A(

t

+1)

– 3



P A( P A(

t

)

t

) = {(1 – 3 +



P



A(

t

)

P A(

t

) – 4



P A(

t

) +



+



P A(

t

)



}

P A(

t

)

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 69. Sinauer Associates, Inc. Sunderland, MA, USA.

d

P

A(

t

)

dt

= – 4



P

A(

t

)

+



P

A

(t)

=

1 4

+[ P

A(

0

)

1 4

]

e

– 4



t

When P

A(

0

)

= 1

P

A

(t)

=

1 4

+

3 4 e

– 4



t

P

A

(t)

=

1 4

+

3 4 e

– 4



t

P

AA

(t)

=

1 4

+

3 4 e

– 4



t

P

ii(t)

=

1 4

+

3 4 e

– 4



t

P

A

(t)

=

1 4

+[ P

A(

0

)

1 4

]

e

– 4



t

When P

A(

0

)

= 0

P

A

(t)

=

1 4

1 4 e

– 4



t

P

A

(t)

=

1 4

1 4 e

– 4



t

P

GA

(t)

=

1 4

1 4 e

– 4



t

P

ij(t)

=

1 4

1 4 e

– 4



t

where

i

j

P

0.25

Time

National Taiwan University Chau-Ti Ting

Kimura

s two-parameter model

In this model, the rate of transitional substitution at each nucleotide site is

per unit time, whereas the rate of each transversional substitution is

per unit time.

Source:

USA.

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 71. Sinauer Associates, Inc. Sunderland, MA,  A G     C  T National Taiwan University Chau-Ti Ting

 A G     C  T National Taiwan University Chau-Ti Ting Let us consider the probability that a site that has A at time 0 will have A time t. After one time unit, the probability of A changing to G is Thus the probability of A remaining unchanged after one time unit is

, and the probability of A changing to either C or T is 2

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 72. Sinauer Associates, Inc. Sunderland, MA, USA.

. P A(1) = 1 –

  

I II III IV t=0 A A transition A transversion A transversion t=1 A G transition C transversion T transversion t=2 A A A A

P A(2) = (1 –

  

P A(1) +



P T(1) +



P C(1) +



P G(1)

By extention,

P

A(t+1)

= (1 –

  

P

A(t)

+



P

T(t)

+



P

C(t)

+



P

G(t)

Similarly, we can obtain

P

T(t+1)

=

P

A(t)

+ (1 –

  

P

T(t)

+



P

C(t)

+



P

G(t)

P

C(t+1)

=

P

A(t)

+



P

T(t)

+ (1 –

  

P

C(t)

+



P

G(t)

P

G(t+1)

=

P

A(t)

+



P

T(t)

+



P

C(t)

+ (1 –

  

P

G(t)

P

AA(t)

=

1 4

+

1 4 e

– 4



t

+

1 2 e

2(

a+

 

t

P

AA(t)

=

1 4

+

1 4 e

– 4



t

+

1 2 e

2(

a+

 

t P AA(t)

=

P GG(t)

=

P CC(t)

=

P TT(t)

X

(t)

=

1 4

+

1 4 e

– 4



t

+

1 2 e

2(

a+

 

t

Let Y (t) = the probability that the initial nucleotide and the nucleotide at time t differ from each other by a

transition

.

Y

(t)

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.

=

P AG(t)

=

P GA(t)

=

P TC(t)

=

P CT(t

)

Y

(t)

=

1 4

+

1 4 e

– 4



t

1 2 e

2(

a+

 

t

The probability,

Z (t)

, that the initial nucleotide and the nucleotide at time

t

differ by a specific type of

transversion

is given by

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.

Z

(t)

=

1 4

1 4 e

– 4



t

X

(t)

=

1 4

+

1 4 e

– 4



t

+

1 2 e

2(

a+

 

t

Y

(t)

=

1 4

+

1 4 e

– 4



t

1 2 e

2(

a+

 

t

Z

(t)

=

1 4

1 4 e

– 4



t

Note that each nucleotide subject to two types of transversion, but only one type of transition. Also

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.

X

(t)

+

Y

(t)

+ 2

Z

(t)

= 1

Number of nucleotide substitutions between two DNA sequences

If two sequences of length

N

differ from each other at

n

site, then the proportion of differences,

n/N

, is referred to as the degree of divergence or Hamming distance.

If the degree of divergence is substantial, then the observed number of differences is likely to be smaller than the actual number of substitutions due to multiple substitution or multiple hit at the same site.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.

A C T G A A C G T A A C G C

ACTGAACGTAACGC

   

C G A C

 

T T

G A C

A T G A A C

A G T

A A A

T C G C

T

C

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.

single substitution sequential substitution Coincidental substitution Parallel substitution Convergent substitution Back substitution

Number of nucleotide substitutions between two noncoding sequences

Let us start with

one-parameter model

. In this model, it is sufficient to consider only

I (t)

, which is the probability that the nucleotide at a given site at the time t is the same in both sequences. Suppose that the nucleotide at a given site was A at time 0. At time t, the probability that a descendant sequence will have A at this site is

P AA(t)

, and consequently the probability that two descendant sequences have A at this site is

P 2 AA(t)

. Similarly, the probabilities that both sequence have T, C, G at this site are

P 2 AT(t) P 2 AC(t) P 2 AG(t)

, respectively. Therefore,

I (t)

=

P 2 AA(t) +P 2 AT(t) +P 2 AC(t) +P 2 AG(t)

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

I (t)

=

P 2 AA(t) +P 2 AT(t) +P 2 AC(t) +P 2 AG(t)

I

(t)

=

1 4

+

3 4 e

– 8



t

Note that the probability that the two sequences are

different

site at time t is

p

= 1

I (t)

. Thus,

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

at a

or

p

=

3 4 (1-e

– 8



t

) 8

t

=

ln [ 1

– (4/3)

p

]

ancestral sequence

3

t 3

t

sequence 1 sequence 2 National Taiwan University Chau-Ti Ting The time of divergence between two sequences is usually given not known, and thus we can not estimate parameter model,

K= 2(3



t)

,

where 3

substitutions per site in a single lineage.

. Instead, we compute K, which is the number of substitutions per site since the time of divergence between two sequences. In the case of the one t is the number of

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

8

t

=

ln [ 1

– (4/3)

p

]

K

= 2(3



t) We can calculate

K

as

K

= – (3/4) ln [ 1

– (4/3) p

]

Where

p

is observed proportion of different nucleotides between two sequences.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

In the case of

two-parameter model

, the differences between two sequences are classified into transitions and transversions. Let

P

and

Q

be the proportion of transitional and transversional differences between two sequences, respectively. Then the number of nucleotide substitutions per site between two sequences,

K

, is estimated by

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

K

= (1/2) ln [ 1 / (

1-2P-Q

)]+(1/4) ln [1/(

1-2Q

)]

One-parameter

K

= – (3/4) ln [ 1

– (4/3) p

]

Two-parameter

K

= (1/2) ln [ 1 / (

1-2P-Q

)]+(1/4) ln [1/(

1-2Q

)]

Ex.1: 2 sequences with 200 nucleotides that differ by 20 transitions and 4 transversions

One-parameter L

= 200

p = 24/200 = 0.12 Two-parameter P L Q

= 200 = 20/200 =0.10

= 4/200 = 0.02

K ≈ 0.13

K ≈ 0.13

One-parameter L

= 200

p = 24/200 = 0.12 Two-parameter L

= 200

P Q

= 20/200 =0.10

= 4/200 = 0.02

K ≈ 0.13

K ≈ 0.13

In this example, the two models give essentially the same estimate because the degree of divergence is small enough that the corrected degree of divergence (i.e., the number of nucleotide substitutions, K) is only only slightly larger than the uncorrected value (i.e., the number of nucleotide differences, p).

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.

One-parameter

K

= – (3/4) ln [ 1

– (4/3) p

]

Two-parameter

K

= (1/2) ln [ 1 / (

1-2P-Q

)]+(1/4) ln [1/(

1-2Q

)]

Ex.2: 2 sequences with 200 nucleotides that differ by 50 transitions and 16 transversions

One-parameter L

= 200

p = 66/200 = 0.33 Two-parameter P L Q

= 200 = 50/200 =0.25

= 16/200 = 0.08

K ≈ 0.43

K ≈ 0.48

One-parameter L

= 200

p = 66/200 = 0.33 Two-parameter L

= 200

P Q

= 50/200 =0.25

= 16/200 = 0.08

K ≈ 0.48

K ≈ 0.43

When the degree of divergence between two sequences is large, and especially in cases where there are prior reasons to believe that the rate of transition differs from the rate of transversion, the two parameter model tends to be more accurate than the one-parameter model.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.

Violation of assumptions

Several assumptions have been made that are not necessary met by the sequences under study.

1) The rate of substitution was assumed to be the same at all sites. This assumption might not hold, as the rate may vary greatly from site to site.

2) The substitution occur in an independent manner.

3) The substitution matrix was assumed not to change in time, so that the nucleotide frequencies are maintained at a constant equilibrium value throughout their evolution.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 79. Sinauer Associates, Inc. Sunderland, MA, USA.

Substitution mutations

Transition Transversion

Source:

Marjorie A. Hoy changes beween A and G, or between T and C changes between a purine and a pyrimidine 2003. Insect molecular genetics: an introduction to principles and applications, 2 nd edition, p. 23. Academic Press. USA.

Synonymous

(silent mutations) Nucleotide changes do not effect amino acid sequence.

Source:

A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.

2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.

Nonsynonymous

(replacement mutations) A change in single nucleotide in a codon can result in an amino acid replacement.

Source:

A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.

2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.

DNA

CCG C T G CT C

mRNA

CCG C U G CU C

Amino acid Proline Leucine Leucine

The transition/transversion rate ratio

Three definitions of the ‘transition/transversion rate ratio’ are in use 1.The ratio of numbers of transitional and transversional differences between the two sequences, without 2.

correcting multiple hits. (E(S)/E(V)) k = a / b k = 1 transitions and transversions 3.Average transition/transversion ratio (R): same as the

Source:

2006. Computational Molecular Evolution., p. 17. Oxford University Press Inc,, New York, USA.

Overall, R is convenient to use for comparing estimates under different models, while

is more suitable for formulating the null hypothesis of no transition/transversion rate difference.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 18. Oxford University Press Inc,, New York, USA.

Models of amino acid and codon substitution Introduction

With protein coding genes, we have the advantage of being able to distinguish synonymous or silent substitutions from the nonsynonymous or replacement substitutions.

Synonymous and nonsynonymous mutations are under very different selection pressures and are fixed at very different rates. Thus, comparison between synonymous and nonsynonymous substitution rates provides a means to understand the effect of natural selection on the protein. This comparison does not require estimation of absolute substitution rates or knowledge of the divergence time.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 40. Oxford University Press Inc,, New York, USA.

Models of amino acid replacement

Empirical models attempts to describe the relative rates of substitution between two amino acids without considering explicitly factors that influence the evolutionary process. They are often constructed by analyzing large quantities of sequence data, as compiled from database.

Mechanistic models consider the biological process involved in amino acid substitution, such as mutation biases in the DNA, translation of the codons into amino acid after filtering by natural selection. Mechanistic models have more interpretative power and are particular useful for study the forces and mechanisms of gene sequence evolution.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 40-41. Oxford University Press Inc,, New York, USA.

The first empirical amino acid substitution matrix was constructed by Dayhoff and colleagues. They compiled and analyzed protein sequences available at the time, using a parsimony argument to reconstruct ancestral protein sequences and tabulating amino acid changes along branches on the phylogeny. Dayhoff

et al

. approximated the transition-probability matrix for an expected distance of

0.01 changes per site

, call

1 PAM

(for point-accepted mutations).

Different PAM matrices are derived from the multiplication of the PAM1 matrix.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 41. Oxford University Press Inc,, New York, USA.

PAM matrix A R N D C Q E G H I L K M F P S T W Y V B Z X * A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -3 1 1 1 -6 -3 0 0 0 0 -8 R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 -1 0 -1 -8 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -3 0 1 0 -4 -2 -2 2 1 0 -8 D 0 -1 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 -1 0 0 -7 -4 -2 3 3 -1 -8 C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3 0 -2 -8 0 -2 -4 -5 -3 -8 Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1 -1 -5 0 -1 -1 -5 -4 -2 1 3 -1 -8 E 0 -1 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 -1 0 0 -7 -4 -2 3 3 -1 -8 G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2 -3 -5 0 1 0 -7 -5 -1 0 0 -1 -8 H -1 2 2 1 -3 3 1 -2 6 -2 -2 0 -2 -2 0 -1 -1 -3 0 -2 1 2 -1 -8 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2 2 1 -2 -1 0 -5 -1 4 -2 -2 -1 -8 L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3 4 2 -3 -3 -2 -2 -1 2 -3 -3 -1 -8 K -1 3 1 0 -5 1 0 -2 0 -2 -3 5 0 -5 -1 0 0 -3 -4 -2 1 0 -1 -8 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 0 -2 -2 -1 -4 -2 2 -2 -2 -1 -8 F -3 -4 -3 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 -5 -3 -3 0 7 -1 -4 -5 -2 -8 P 1 0 0 -1 -3 0 -1 0 0 -2 -3 -1 -2 -5 6 1 0 -6 -5 -1 -1 0 -1 -8 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 1 -2 -3 -1 0 0 0 -8 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 -5 -3 0 0 -1 0 -8 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 -4 -8 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 -2 -3 -4 -2 -8 V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 -2 -2 -1 -8 B 0 -1 2 3 -4 1 3 0 1 -2 -3 1 -2 -4 -1 0 0 -5 -3 -2 3 2 -1 -8 Z 0 0 1 3 -5 3 3 0 2 -2 -3 0 -2 -5 0 0 -1 -6 -4 -2 2 3 -1 -8 X 0 -1 0 -1 -3 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1 0 0 -4 -2 -1 -1 -1 -1 -8 * -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 1 National Taiwan University Chau-Ti Ting

BLOSUM (BLOcks of Amino Acid SUbstitution Matrix) http://en.wikipedia.org/wiki/BLOSUM

Features of these matrices: 1. amino acids with similar physico-chemical properties tend to interchange with each other at high rates than dissimilar amino acids. (D  E or I  V) 2. The “mutational distance” between amino acids determined by the structure of the genetic code. Amino acids separated by differences of two or three codon positions have lower rates than amino acids separated by a difference of one codon position. (R  K for nuclear proteins or for mitochondrial proteins) Both factors may be operating at the same time.

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 42. Oxford University Press Inc,, New York, USA.

Estimate synonymous and nonsynonymous substitutions rates

Two distances are usually calculated between protein-coding DNA sequences, for synonymous and nonsynonymous substitutions, respectively.

d S d N

or

K S

: the number of synonymous changes per synonymous site or

K N

: the number of nonsynonymous changes per nonsynonymous site Two classes of methods: heuristic counting methods and the ML method

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 49. Oxford University Press Inc,, New York, USA.

Counting Methods

Three steps: 1. Count synonymous and nonsynonymous sites 2. Count synonymous and nonsynonymous differences 3. Calculate the proportion of differences and correct for multiple hits

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,, New York, USA.

Wikipedia A G C T National Taiwan University Chau-Ti Ting

Nei and Gojobori (1986)

1. Count synonymous and nonsynonymous sites:

S

and

N

2. Count synonymous and nonsynonymous differences:

S d

3. Calculate the proportion of differences (

p S

and

p N

) as and

N d p S p N

= =

S d

/

S N d

/

N

apply the JC69 correction for multiple hits

d S

= 3 4

d N

= 3 4

n

(1 -

n

(1 4 3

p S

) 4 3

p N

)

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 50,. Oxford University Press Inc,, New York, USA.

Ser

T

C

T

Leu Ile Val CTT ATT GTT

T

T T Phe

T

G

T

Cys

T

A Tyr National Taiwan University Chau-Ti Ting

T

TT C TT A TT G Phe 1/3 Leu Leu 2/3

Ser Thr Glu Met Cys Leu TCA ACT GAG ATG TGT TTA TCG ACA GAG ATA TGT CTA Ser Thr Glu Ile Cys Leu

National Taiwan University Chau-Ti Ting

CCC (Pro) CAA (Gln) Path I CCC S R

CCA

CAA (Pro) (Pro) (Gln) Path II CCC R R

CAC

CAA (Pro) (His) (Gln) National Taiwan University Chau-Ti Ting

Nei and Gojobori (1986)

1. Count synonymous and nonsynonymous sites:

S

and

N

2. Count synonymous and nonsynonymous differences:

S d

3. Calculate the proportion of differences (

p S

and

p N

) as and

N d p S p N

= =

S d

/

S N d

/

N

apply the JC69 correction for multiple hits

d S

= 3 4

d N

= 3 4

n

(1 -

n

(1 4 3

p S

) 4 3

p N

)

Source:

Ziheng Yang 2006. Computational Molecular Evolution., p. 50,. Oxford University Press Inc,, New York, USA.

Number of substitutions between two protein-coding genes

nondegernerate

(

L 0

): all the possible changes at this site are nonsynonymous

twofold degenerate

(

L 2

): one of the three possible changes is synonymous

fourfold degenerate

(

L 4

): all possible changes at the site are synonymous

The nucleotide differences in each class are further classified into transitional (

S i

) and transversional ( degeneracy, respectively.

V i

) differences, where whereas transversitional changes are nonsynonymous.

i

= 0, 2, and 4 denoted nondegerneracy, twofold degeneracy and fourfold All the substitutions at nondegenerate sites are nonsynonymous. All the substitutions at fourfold degenerate sites are synonymous.

At twofold degenerate site, transitional changes are synonymous,

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 83. Sinauer Associates, Inc. Sunderland, MA, USA.

Transversion

UUU Phe UUC Phe UUA Leu UUG Leu UCU Ser UCC Ser UCA Ser UCG Ser

twofold degenerate/transition twofold degenerate/transition fourfold degenerate nondegernerate

National Taiwan University Chau-Ti Ting

The proportion of transitional differences at

i

-fold degenerate sites between two sequences is calculated as

P

i

=

S

i

L

i

Similarly, the proportion of transversional differences at

i

-fold degenerate sites between two sequences is calculated as

Q

i

=

V

i

L

i

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

Kimura ’ s two-parameter method is used to estimate the number of transitional (

A i

) and transversional (

B i

) substitutions per

i

th type site.

K

= -(3/4) ln (1 – (4/3)

p

A

i

= (1/2) ln (

a

i

) – (1/4) ln (

b

i

)

B

i

= (1/2) ln (

b

i

)

K

=

A

i

+

B

i

Where

a i

=1/(1

2

P i –Q i

),

b i

= 1/(1

2

Q i

)

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

The total number of substitutions per ith type of degenerate site,

K i

, is given by

K i

=

A i +B i A 2

and

B 2

denote the numbers of synonymous and nonsynonymous substitutions per twofold degenerate site, respectively.

K 4

=

A 4 +B 4

denote the numbers of synonymous substitutions per fourfold degenerate site.

K 0

=

A 0 +B 0

denote the numbers of nonsynonymous substitutions per nondegenerate site.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

then, the number of synonymous substitutions per synonymous site (

K S

) and the number of nonsynonymous substitutions per nonsynonymous site (

K A

) can be obtained by

K

S

L

= (

2

L A

2 2

+

L

/ 3) +

4

L K

4 4

K

A

L

= (2

2

L B

2 2

+

L

0

/ 3) +

K L

0 0

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

Li (1993) and Pamilo and Bianchi (1993) proposed to calculated the number of symnonymous substitution by taking (

L 2 A 2

+

L 4 K 4

)/ (

L 2

+

L 4

) as an estimate of the transition component of nucleotide substitution at twofold and fourfold degenerate site

K

S

=

L

2

A

2

L

2

+

L

4

+

L

4

K

4 + B 4

K

A

=

L

2

B

2

L

2

+

L

0

+

L

0

K

0 + A 0

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

Indirect estimations of the number of nucleotide substitution

Indirect estimate of K values are subject to much larger sampling errors than those based on direct comparisons of nucleotide sequence.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

National Taiwan University Chau-Ti Ting

Number of Amino acid replacements between two proteins

From the comparison of two amino acid sequences, we can calculate the observed proportion of different amino acid between two sequences as

p

=

n

/

L

where

n

is the number of amino acid differences between two sequences an

L

is the length of the aligned sequences.

A simple model that can be used to convert p into the number of amino acid replacements between two sequences is the Poisson process. The number of amino acid replacements per site,

d,

is estimated as

d

= – ln (1–

p

)

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.

Comparison of two homologous sequences involves the identification if the location of deletions and insertions that might have occurred in either of the two lineages since their divergence from a common ancestor. This process is referred to as sequence alignment.

There are three types of aligned pairs: A matched pair is on in which that same nucleotide appears in both sequence. A mismatched pair is a pair in which different nucleotides are found in the two sequences. A gap is a pair consisting a base from one sequence and a null base from the other. Null base are denoted by -. A gap indicates that a deletion has occurred in one sequence or an insertion has occurred in the other.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.

Range of Alignment ATTGTCAAAGGCTTGAGCTGATGCAT GGCAGGCTTTA CTTACAAGGGTATCG Mismatch Gap S = (identities, mismatches) - (gap penalties) Score Max(S) National Taiwan University Chau-Ti Ting

In evolutionary terms, each pair in an alignment represents an inference concerning positional homology, i.e., a claim to the effect that the two members of the pair descended from a common ancestral nucleotide.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

ATCGCATGGTTAACGACTG

       

ATCACATGGTTAA– –ACTCACC National Taiwan University Chau-Ti Ting

Manual alignment by visual inspection

Advantages: 1) it uses the most powerful and trainable of all tools – the brain, 2) it allows the direct integration of additional data.

The main disadvantage of this method is that it is subjective and unscalable, i.e., its results cannot be compared to those derived from other methods.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

The dot matrix

In a dot matrix, the two sequences to be aligned are written out as column and row headings of a two-dimensional matrix. A dot is put in the dot matrix plot at a position where the nucleotides in the two sequences are identical. The alignment is defined by a path through the matrix starting with the upper-left element and ending with the lower-right element. There are four possible types of steps in this path: 1) a diagonal step through a dot indicates a match, 2) a diagonal step through an empty element of the matrix indicates a mismatch, 3) a horizontal step indicates a null nucleotide in the sequence on the top of the matrix, 4) a vertical step indicates a null nucleotide in the sequence on the left of the matrix.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 88. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 92. Sinauer Associates, Inc. Sunderland, MA, USA.

Distance and similarity methods

The best possible alignment between two sequences, or the optimal alignment, is the one in which the numbers of mismatches and gaps are minimized according to certain criteria. Unfortunately, reducing the number of mismatches usually results in an increase in the number of gaps, and vice versa.

(I) A: B: TCAGACGATTG TCGGAGCTG TCAG-ACG-ATTG TC-GGA-GC-T-G L L A B =11 =9 # of mismatches = 0 # of gaps = 6 (II) (III) TCAGACGATTG TCGGAGCTG- TCAG-ACGATTG TC-GGA-GCTG # of mismatches = 5 # of gaps = 1 # of mismatches = 2 # of gaps = 4

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

As a consequence, we must find a common denominator with which to compare gaps and mismatches. The common denominator is called the gap penalty or gap cost. The gap penalty is a factor by which gap values are multiplied to make the gaps equivalent in value to the mismatches.

Fro any given alignment, we can calculate a

distance dissimilarity index

(

D

or ) between the two sequences in the alignment as

D

=

m i y i +

w k z k

where

y i

is the number of mismatches of type

i

,

m i

penalty for an

i

-type of mismatch,

z k

is the mismatches is the number of gaps of length

k

, and

w k

length

k.

is a positive number representing the penalty of gaps of

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.

Alternatively, the similarity between two sequences in an alignment may be measured by a

similarity index

(S). For any given alignment, the similarity between two sequences is

S

=

x

w k z k

where

x k

, and

w k

is the number of matches,

z k

is the number of gaps of length is a positive number representing the penalty of gaps of length

k.

In the most frequent used gap penalty systems, it is assumed that the gap penalty has two components, a

gap-opening penalty

and a

gap-extension penalty

.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.

Using a linear gap penalty system in which the mismatch penalty is 1, the gap-open penalty is 2 and the gap-extension penalty is 6.

(I) TCAG-ACG-ATTG TC-GGA-GC-T-G # of mismatches = 0 # of gaps = 6 (II) D = (0 x 1)+(6 x 2)+6 (1–1)=12 TCAGACGATTG TCGGAGCTG- # of mismatches = 5 # of gaps = 1 (III) D = (5 x 1)+(1 x 2)+6 (2–1)=13 TCAG-ACGATTG TC-GGA-GCTG # of mismatches = 2 # of gaps = 4 D = (2 x 1)+(4 x 2)+6 (1–1)=10

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

Using a different penalty system in which the mismatch penalty is 1, the gap-open penalty is 3 and the gap-extension penalty is 0.

(I) TCAG-ACG-ATTG TC-GGA-GC-T-G # of mismatches = 0 # of gaps = 6 (II) D = (0 x 1)+(6 x 3)=18 TCAGACGATTG TCGGAGCTG- # of mismatches = 5 # of gaps = 1 (III) D = (5 x 1)+(1 x 3)=8 TCAG-ACGATTG TC-GGA-GCTG # of mismatches = 2 # of gaps = 4 D = (2 x 1)+(4 x 3)=14

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.

Alignment algorithms

The Needleman-Wunsch algorithm used dynamic programming, which is a general computational technique used in many fields of study.

Dynamic programming can be applied to alignment problems because similarity indices obey the following rule:

S 1

x, 1

y

= max

S 1

x-1, 1

y-1

+

S x,y

In which

S 1

x, 1

y

residue

x

is the similarity index for the two sequences up to in the first sequence and residue

y

in the second sequence, max

S 1

x-1, 1

y-1

is the similarity index for the best alignment up to residue

x

-1 in the first sequence and

y

-1 in the second sequence, and

S x,y

is the similarity score for aligning residues

x

and

y

.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 96. Sinauer Associates, Inc. Sunderland, MA, USA.

Multiple sequence alignment

Multiple sequence alignment can be viewed as an extension of pairwise sequences alignment, but the complexity of the computation grows exponentially with the number of sequences being considered and, therefore, it is not feasible to search exhaustively for optimal alignment.

Most of the programs use some sort of incremental or progressive algorithm, in which a new sequences is added to a group of already aligned sequences in order of decreasing similarity.

It is usually advisable to take a look at the final multiple alignment, as such alignments can be frequently improved by visual inspection.

Source:

Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 97. Sinauer Associates, Inc. Sunderland, MA, USA.

Copyright Declaration

Work “Calculate of the distance between two sequences is the simplest phylogenetic analysis … The distance between two nucleotide sequences is defined as the expected number of nucleotide substitutions per site.” “A simplest distance measure is the proportion of different sites, … site (

i.e.

, some changes are hidden) Note:

p

is usable only for high similar sequences, with

p

< 5%.” Licensing “This simple model assumes that substitutions occur with … is  . Because the model involves a single parameter,  it is called the one-parameter model Author/Source Page Ziheng Yang 2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P2 Ziheng Yang 2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P3 P4 P5 76

Work “Since we start with A, the probability hat this site is occupied by A at time … probability that A has remained unchanged.” “The probability of having A at time 2 is … 1) the nucleotide has remained unchanged from time 0 to time 2, and 2) the nucleotide has changed to T, C, or G at time 1, but has subsequently reverted to A at time 2.” “Using the above formulation, we can show that the following recurrence equation applies to any … We can rewrite this equation in terms of the amount of change in P A(

t

) as” per unit time Licensing Author/Source Page P5, P6 National Taiwan University Chau-Ti Ting Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 69. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P6 P7 P8 P13 National Taiwan University Chau-Ti Ting 77

Work “In this model, the rate of transitional substitution at each nucleotide site is each transversional substitution is   per unit time, whereas the rate of per unit time. “ Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 71. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P14 P14, P15 National Taiwan University Chau-Ti Ting “Let us consider the probability that a site that has A at time 0 will have A time t. After one time unit, … of A changing to either C or T is 2  . Thus the probability of A remaining unchanged after one time unit is “ “Let Y (t) = the probability that the initial nucleotide and the nucleotide at time t differ from each other by a

transition

. “ “The probability, nucleotide at time

Z t (t)

, that the initial nucleotide and the differ by a specific type of

transversion

is given by “ Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 72. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P15 P19 P20 78

Work "Note that each nucleotide subject to two types of transversion, but only one type of transition. Also” Licensing “If two sequences of length

N

differ from each other at

n

site, then the proportion of … the actual number of substitutions due to multiple substitution or multiple hit at the same site.” Let us start with one parameter model. In this model, it is sufficient to consider only

I (t)

, which is the probability that the nucleotide … , C, G at this site are

P 2 AT(t) P 2 AC(t) P 2 AG(t)

, respectively. Therefore, Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P21 P22 P23 P24 79

Work “Note that the probability that the two sequences are

different

at a site at time t is

p

= 1 

I (t)

. Thus,” Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P25 P26 National Taiwan University Chau-Ti Ting “The time of divergence between two sequences is usually given not known, and thus we can not estimate  . Instead, we compute … K= 2(3  t), where 3  t is the number of substitutions per site in a single lineage.” “Where

p

is observed proportion of different nucleotides between two sequences.” “In the case of two-parameter model, the differences between two sequences are classified into transitions and … substitutions per site between two sequences,

K

, is estimated by “ Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P26 P27 P28 80

Work “In this example, the two models give essentially the same estimate because the … substitutions, K) is only only slightly larger than the uncorrected value (i.e., the number of nucleotide differences, p).” Licensing “When the degree of divergence between two sequences is large, and especially in cases where … of transversion, the two parameter model tends to be more accurate than the one parameter model.” “Several assumptions have been made that are not necessary … change in time, so that the nucleotide frequencies are maintained at a constant equilibrium value throughout their evolution.”

Transition beween A and G, or between T and C Transversion pyrimidine” changes changes between a purine and a Author/Source Page Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P30 Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 79. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Marjorie A. Hoy 2003. Insect molecular genetics: an introduction to principles and applications, 2 nd edition, p. 23. Academic Press. USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P32 P33 P34 81

Work “Synonymous (silent mutations) Nucleotide changes do not effect amino acid sequence. Licensing “Nonsynonymous (replacement mutations) A change in single nucleotide in a codon can result in an amino acid replacement.” “Three definitions of the ‘transition/transversion rate ratio’ are in use … ratio (R): same as the first one but with correction” “Overall, R is convenient to use for comparing estimates under different models, while  is more suitable for formulating the null hypothesis of no transition/transversion rate difference.” Author/Source Page A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.

2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.

http://www.ncbi.nlm.nih.gov/books/NBK21878/

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P34 A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.

2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.

http://www.ncbi.nlm.nih.gov/books/NBK21878/

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Ziheng Yang 2006. Computational Molecular Evolution., p. 17. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Ziheng Yang 2006. Computational Molecular Evolution., p. 18. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P34 P35 P35 82

Work “With protein coding genes, we have the advantage of being able to distinguish … comparison does not require estimation of absolute substitution rates or knowledge of the divergence time.” Licensing “Empirical models attempts to describe the relative rates of substitution between two amino acids … interpretative power and are particular useful for study the forces and mechanisms of gene sequence evolution” “The first empirical amino acid substitution matrix was constructed by Dayhoff and colleagues … multiplication of the PAM1 matrix.” Author/Source Page Ziheng Yang 2006. Computational Molecular Evolution., p. 40. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P36 Ziheng Yang 2006. Computational Molecular Evolution., p. 41 Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Ziheng Yang 2006. Computational Molecular Evolution., p. 41 Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P37 P38 P39 National Taiwan University Chau-Ti Ting Features of these matrices: … Both factors may be operating at the same time.

Ziheng Yang 2006. Computational Molecular Evolution., p. 42. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P41 83

Work “Two distances are usually calculated between protein coding DNA sequences, … methods: heuristic counting methods and the ML method” Licensing “Three steps: 1. Count synonymous and nonsynonymous sites 2. Count synonymous and nonsynonymous differences 3. Calculate the proportion of differences and correct for multiple hits” Author/Source Page Ziheng Yang 2006. Computational Molecular Evolution., p. 49. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Ziheng Yang 2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P42 P43 P44 Wikipedia http://en.wikipedia.org/wiki/Codon

It is

used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •Wikipedia Fundation Terms of Use P44 National Taiwan University Chau-Ti Ting

“Nei and Gojobori (1986)

1. Count synonymous and nonsynonymous sites:

S

and

N

… (

p S

and

p N

) as apply the JC69 correction for multiple hits “ Ziheng Yang 2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,, New York, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P45, P49 84

Work “nondegernerate (

L 0

): all the possible changes at this site … At twofold degenerate site, transitional changes are synonymous, whereas transversitional changes are nonsynonymous.” Licensing Author/Source Page P46 National Taiwan University Chau-Ti Ting P47 National Taiwan University Chau-Ti Ting P48 National Taiwan University Chau-Ti Ting Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 83. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P50, P52 P51 National Taiwan University Chau-Ti Ting “The proportion of transitional differences at

i

fold degenerate … degenerate sites between two sequences is calculated as” Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P53 85

Work “Kimura ’ s two-parameter method is used to estimate the number of transitional (

A i

) and transversional (

B i

) substitutions per

i

th type site … Where

b i

= 1/(1

– a

2

i Q

=1/(1

i

)”

2

P i –Q i

), Licensing “ The total number of substitutions per ith type … denote the numbers of nonsynonymous substitutions per nondegenerate site.

” “ then, the number of synonymous substitutions per synonymous site ( substitutions per

K S

) and the number of nonsynonymous nonsynonymous site (

K A

) can be obtained by ” “Li (1993) and Pamilo and Bianchi (1993) proposed to calculated the number of symnonymous substitution by … of the transition component of nucleotide substitution at twofold and fourfold degenerate site.” Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P54 P55 P56 P57 86

Work “Indirect estimate of K values are subject to much larger sampling errors than those based on direct comparisons of nucleotide sequence” Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P58 P58 National Taiwan University Chau-Ti Ting “ From the comparison of two amino acid sequences, … The number of amino acid replacements per site,

d,

is estimated as ” Comparison of two homologous sequences involves the identification if the location of deletions and insertions that might have … A gap indicates that a deletion has occurred in one sequence or an insertion has occurred in” Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P59 P60 P61 National Taiwan University Chau-Ti Ting 87

Work “In evolutionary terms, each pair in an alignment represents an inference concerning positional homology, i.e., a claim to the effect that the two members of the pair descended from a common ancestral nucleotide.

” Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P62 P62 National Taiwan University Chau-Ti Ting “Advantages: 1)it uses the most powerful and trainable of all tools – the brain, 2)it allows the direct … its results cannot be compared to those derived from other methods.

” “In a dot matrix, the two sequences to be aligned are written out as column and row headings of … a vertical step indicates a null nucleotide in the sequence on the left of the matrix.

” Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 88. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act P63 P64 P65 88

Work Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 92. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P66 P67 P67 P68 89

Work “The best possible alignment between two sequences, or the optimal alignment, is the one in which the numbers of mismatches and gaps are minimized according to certain criteria. … TC-GGA GCTG- # of gaps = 4” “As a consequence, we must find a common denominator with which to compare gaps and mismatches. …

z k

is the number of gaps of length

k

, and

w k

is a positive number representing the penalty of gaps of length

k”

“Alternatively, the similarity between two sequences in an alignment may be measured by … gap penalty systems, it is assumed that the gap penalty has two components, a gap-opening penalty and a gap-extension penalty.” Using a linear gap penalty system in which the mismatch penalty is 1, the gap-open penalty … D = (2 x 1)+(4 x 2)+6 (1–1)=10 Licensing Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P69 P70 P71 P72 90

Work “Using a different penalty system in which the mismatch penalty is 1, the gap-open penalty is 3 and the … D = (2 x 1)+(4 x 3)=14” Licensing “The Needleman-Wunsch algorithm used dynamic programming, which is a general computational technique used in many fields of study. … is the similarity score for aligning residues

x

and

y

.” “ Multiple sequence alignment can be viewed as an extension of pairwise sequences … as such alignments can be frequently improved by visual inspection.

” Author/Source Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 96. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Dan Graur and Wen-Hsiung Li 2000. Fundamentals of Molecular Evolution., p. 97. Sinauer Associates, Inc. Sunderland, MA, USA.

It is

( used subject to the fair use doctrine of: •Taiwan Copyright Act Articles 52 & 65 •The "Code of Best Practices in Fair Use for OpenCourseWare 2009 http://www.centerforsocialmedia.org/sites/default/files/10-305-OCW Oct29.pdf

)" by A Committee of Practitioners of OpenCourseWare in the U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act Page P73 P74 P75 P76 91

• Synonymous sites = 0 + 0 + 1/3 • Non-synonymous sites = 1+1+2/3