Correlating traits with phylogenies

Download Report

Transcript Correlating traits with phylogenies

Correlating traits with phylogenies

Using BaTS

Phylogeny and trait values

 A

phylogeny

describes a hypothesis about the evolutionary relationship between individuals sampled from a population  Discrete character

traits of interest

can be mapped onto the phylogeny  A

significant association

between a particular trait value and its distribution on a phylogeny indicates a potential causative relationship

Phylogeny and trait values

 A

phylogeny

describes a hypothesis about the evolutionary relationship between individuals sampled from a population

Phylogeny and trait values

 Discrete character

traits of interest

can be mapped onto the phylogeny

Phylogeny and trait values

 A

significant association

between a particular trait value and its distribution on a phylogeny indicates a potential causative relationship

Phylogeny and trait values

 Often, the phylogeny-trait relationship does not appear unequivocal by eye: an analytical framework may be needed. (clear association) (no association)

????

Phylogeny and trait values

The null hypothesis

The null hypothesis under test is one of random phylogeny-trait association; that is, that

“No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”

An example

   Salemi

et

al (2005) CNS tissues * : Dataset of HIV sequences sampled from

post mortem

Analysis by Slatkin-Maddison (1989) method, reanalyzed in BaTS ** .

Compartmentalization by tissue type: circulating viral populations defined by location in the body: Statistic AI PS Frontal lobe Occipital lobe Meninges Lymph nodes Temporal lobe Spinal cord * Salemi

et al.

(2005) J. Virol 79 (17): 11343-11352.

** Parker, Rambaut & Pybus (2008) MEEGID 8 (3):239 246.

p-value (BaTS)

<0.01

<0.01

<0.01

<0.01

<0.01

<0.01

<0.01

<0.01

Available methods

 Non-phylogenetic: ANOVA  Ignores shared ancestry  Phylogenetic:    Single tree mapping Slatkin-Maddison & AI BaTS

Methods: Single-tree mapping

 Method:   Map traits onto a tree Look for correlation  Pros:   Fast Simple  Cons:    No indication of significance Statistically weak (high Type II error) Conditional on a single topology

Methods: Slatkin-Maddison & AI

   Method:  Map traits onto a tree by parsimony & count migration events (Slatkin Maddison) or measure ‘association index’ within clades recursively (AI)  Compare observed value with a null (expected) value obtained by bootstrapping Pros:   Still reasonably fast Indication of significance Cons:  Still conditional on a single topology

Methods: BaTS

 Method:  See below(!)  Pros:    Indication of significance Statistically powerful and Type I error is correct Accounts for phylogenetic uncertainty  Cons:   Requires Bayesian MCMC sequence analysis Slower

BaTS: under the bonnet

 Use a

posterior distribution

of phylogenies from Bayesian MCMC analysis   Calculates migrations, AI and a variety of other measures of association Both observed and expected (null) values’

posterior distributions

sampled  Significance obtained by comparing observed vs. expected

BaTS: analysis workflow

 Preparation:    Sequence alignment Bayesian MCMC phylogeny reconstruction (BEAST, MrBAYES) to obtain

posterior distribution of trees

(PST) Taxa in PST marked up with discrete traits  BaTS analysis  Interpretation

Workflow: Preparation (i)

  Sequence alignment: CLUSTAL, BioEdit, SE-Al   Bayesian MCMC analysis: MRBAYES, BEAST  Taxa marked-up with traits

Workflow: Preparation (ii)

 Taxa marked-up with traits: Typical NEXUS format:

Workflow: Preparation (iii)

 Taxa marked-up with traits: begin states; a) Declare ‘states’ block b) Assign a trait to each taxon

in the order that they appear in the original #NEXUS file

c) Close the ‘states’ block. d) Omit ‘translate’ and ‘taxa’ blocks.

Workflow: BaTS analysis

To use BaTS from the command-line, type:

java –jar BaTS_beta_build2.jar [single|batch]

Where: single or batch asks BaTS to analyse either a single input file, or a whole directory (batch analysis) is the name and full location of the treefile or directory to be analysed, < reps > is the number (an integer > 1, typically 100 at least) of state randomizations to perform to yield a null distribution, and < states > is the number of different states seen .

The analysis

                       C:\joeWork\apps\BaTS\BaTS_beta_build2\BaTS_beta_build2>java -jar BaTS_beta_build 2.jar single example.trees 100 7 Performing single analysis.

File: example.trees

Null replicates: 100 Maximum number of discrete character states: 7 30 trees were detected in the input file analysing... 30 trees, with 7 states (housekeeping and debugging messages) analysing observed (using obs state data) 30 29 30 29 30 29 Output: statstics, one per line, tabulated 30 29 Statistic observed mean lower 95% CI upper 95% CU null mean lower 95% CI upper 95% CI significance AI 1.5555052757263184 1.1128820180892944 2.160351037979126 12.03488540649414 11.475320040039 12.6391201928711 0.0

PS 18.5 17.0 20.0 80.7713394165039 77.86666870117188 83.56666564941406 0.0

MC (state 0) 12.633333206176758 9.0 16.0 1.7496669292449951 1.399999976158142 2.1666667461395264 0.009999990463256836

MC (state 1) 19.0 19.0 19.0 1.7480005025863647 1.33333337306976 32 2.0999999046325684 0.009999990463256836

MC (state 2) 12.666666984558105 12.0 13.0 1.77991247559 1.33333697632 2.200000047683716 0.009999990463256836

MC (state 3) 8.566666603088379 3.0 11.0 1.66733866943 1.2333333492279053 2.133333444595337 0.009999990463256836

MC (state 4) 11.0 11.0 11.0 1.5526663064956665 1.16666662693023 68 2.0999999046325684 0.009999990463256836

MC (state 5) 3.433333396911621 2.0 6.0 1.4840000867843628 1.100000023841858 2.0333333015441895 0.009999990463256836

MC (state 6) 5.066666603088379 5.0 6.0 1.2973339557647705 1.0333333015441895 1.600000023841858 0.009999990463256836

done Done.

The ‘MC…’ statistics are reported in the

order in which they occur

in the input file

Workflow: Interpretation

The null hypothesis

The null hypothesis under test is one of random phylogeny-trait association; that is, that

“No single tip bearing a given character trait is any more likely to share that trait with adjoining taxa than we would expect due to chance”

Workflow: Interpretation

The statistics:

 Larger values  association increased phylogeny-trait  Significance indicated by

p

-value  In addition, observed posterior

values

are informative for some statistics:  

PS

: indicates migration events between trait values

MC( trait value )

: indicates number of taxon in largest clade monophyletic for that trait value

FAQs / common pitfalls

   

Java 1.5 or higher

for more.

is

required .

See java.sun.com

Large datasets can be slow,

so down-sample input tree files (uniformly, not randomly) where necessary, or to check BaTS input files are marked-up correctly.

A RAM (memory) shortage

use –Xmx can slow the analysis, switch to allocate virtual RAM*

Check input file mark-up

carefully if in doubt.

*See more: http://edocs.bea.com/wls/docs70/perform/JVMTuning.html

Author contact:

Joe Parker Department of Zoology Oxford University, UK OX1 3PS [email protected]

http://evolve.zoo.ox.ac.uk