Bumgarner_Array-Tutorial-2001.ppt

Download Report

Transcript Bumgarner_Array-Tutorial-2001.ppt

DNA Arrays - Technology and Uses : A tutorial

Roger Bumgarner 1/10/01

The University of Washington School of Medicine Department of Microbiology

Outline

• Types of arrays

– Choices to be made

• Applications of arrays

– Focus on Expression analysis

• Data Analysis

The University of Washington School of Medicine Department of Microbiology

DNA Arrays

• Spots of DNA arranged in a particular spatial arangement on a solid support • Supports - Filters(nylon, nitrocellulose), glass, silicon • Types – Spotted- cDNA’s, genomic clones, oligos – Synthesized - Light directed synthesis, spatially directed fluidics(ink-jet)

School of Medicine Department of Microbiology The University of Washington

The Original DNA Array

Petri dish with bacterial colonies Apply membrane and lift to make a filter containing DNA from each clone.

Probe and image to identify Clones homologous to the probe.

Robotic Spotters for Filters

The University of Washington School of Medicine Department of Microbiology

Major Suppliers of Filter Based Arrays

• Research Genetics (www.resgen.com) – Human (35-40k genes, some specific sets) – Rat (3 filters, 15k genes total) – Mouse (5.5k genes) – Yeast (6.2k genes) • Incyte (Genome Systems -www.incyte.com) – A variety genomic filters and microarrays • Clontech (www.clontech.com) – Human, mouse, rat - filters and glass - many custom sets

Department of Microbiology The University of Washington School of Medicine

Oligo Arrays

• Synthesized or spotted arrays of short (typically <20 base pairs) oligos of chosen sequence.

• Synthesis methods - light directed, ink jet.

• Spotting using reactive coupling.

• Used for re-sequencing, genotyping, diagnostics and expression arrays.

The University of Washington School of Medicine Department of Microbiology

Affymetrix Array Technology (www.affymetrix.com)

The University of Washington School of Medicine Department of Microbiology

The “Gene Chip” from Affymetrix

The University of Washington School of Medicine Department of Microbiology

Ink Jet Synthesis

C A OH OH OH OH OH 1) Deposit phosphoramadite OH OH OH OH C C A A OH 3) After deprotection C C A A OH 2) After coupling C G C C G C G T A G T A 4) Repeat OH

Ink Jet Synthesis Companies

• Rosetta InPharmatics (www.rii.com) – working system, “FlexJet™ Arrays” • Protogene (www.protogene.com) – working system, “FlexChip™ Arrays” • InCyte (www.InCyte.com) – licensed a patent from Caltech

The University of Washington School of Medicine Department of Microbiology

Spotted Oligo Arrays

Drop containing modified oligo (typically 5’ NH 2 )

C A G T T T G A C A G T T T G A

Reactive surface

The University of Washington School of Medicine Department of Microbiology

Reactive Slides • Surmodics (www.surmodics.com) (availability???) • Telechem (www.arrayit.com) • Noab Diagnostics

(http://www.noabdiagnostics.com)

• homebrew

Department of Microbiology The University of Washington School of Medicine

Relative merits of different methods of making oligo arrays

• Affy: – available now, small feature size possible • Inkjet: – much more flexible to design • Spotted: – less practical for large numbers (>a few 100) of oligo’s, can be made with std. spotting equipment.

Department of Microbiology The University of Washington School of Medicine

Arrays of longer DNA’s

• Typically PCR products – ORF’s with gene specific primers – cDNA inserts • Spotted onto derivatized slides – Vendors (Amersham, Corning, Telechem, Surmodics, etc.) – Homebrew (polylysine cmgm.stanford.edu/pbrown/mguide/index.html)

Department of Microbiology The University of Washington School of Medicine

Arrayers

• Amersham/Molecular Dynamics (www.amersham.com) • Genomic Solutions (www.genomicsolutions.com) • Genetix (www.genetix.co.uk) • GeneMachines (www.genemachines.com) • Genetic Microsystems (www.geneticmicro.com) • Intelligent Automation Systems (www.ias.com/bio.html) • many, many, others – See www.ncbi.nlm.nih.gov/ncigap “Expression Technologies”

The University of Washington School of Medicine Department of Microbiology

MD GenIII Arrayer

Plate hotel holds thirteen 384-well plates Gridding head, 12 pins Slide holder 36 slides Features: •36 slides in 5 hours •4608 genes spotted in duplicate •Built-in humidity control

Choices to be made

• Type of substrate: – Filters, glass, silicon (Affymetrix) ?

• Type of target – Oligo or longer (PCR product, clone) ?

• Where to obtain the arrays – In-house production or purchase/collaborate

The University of Washington School of Medicine Department of Microbiology

Decision Parameters

• Application – Genotyping - requires oligo arrays – Expression analysis can be done with oligo or cDNA arrays but ...

– Is separation of homologous genes, splice variants important? - oligos • Organism – human, mouse, rat, yeast, e-coli arrays are commercially available.

– Other- you must make.

Department of Microbiology The University of Washington School of Medicine

Decision Parameters - cont.

• Amount of sample – Glass or Affy arrays - typically 1-2ug of mRNA or 10-50ug of total – Filters - 10-20 ng of mRNA • Number of genes • $’s - Commercial arrays average $1000 for 5000 gene arrays - $500 for Affy (single color)

Department of Microbiology The University of Washington School of Medicine

Practical Advice

• For genotyping, oligos – small number of loci (a few 100): Make your own – large number of loci : purchase • Replicate measurments are essential so cost is a very important factor.

• For expression analysis, the cost of in-house cDNA arrays is at 2-5x less than commercial arrays - our cost is $260/array.

• A lot can be done with cDNA arrays.

Department of Microbiology The University of Washington School of Medicine

The UW Center for Expression Arrays

• Arrays – Human: 15k sequence verified set from Research Genetics – Mouse: 15k sequence verified set from NIA – Yeast: Full genome set from Fields lab – Psedomonas: Full genome set from Steve Lory • Each array contains between 4600 and 7600 genes spotted in duplicate (9200-15,200 spots) - $260/ea.

The University of Washington School of Medicine Department of Microbiology

The UW Center for Expression Services

• RNA QC - run on the Agilent 2100 bioanalyzer ($10/sample) • Scanning (included in the cost of a slide) • Analysis facilities – Computers in RPRC, Rosen – Home brew software + Rosetta’s Resolver package(1Q,2001) • Protocols • Contact Kimberly Smith - kimeyeam@u…. 732-6049

The University of Washington School of Medicine Department of Microbiology

How are we doing?

Typical Yeast Array Data

Typical Human Array from Training Session (2000101528 from 12/1/00: HeLa WT vs HepG2)

Where are arrays likely to go?

• Commercial arrays for common organisms will come down in price - must reach a few 100$’S or less.

• Oligo arrays are superior for most applications • In the future we will focus on hybs and data analysis, also “odd-ball” organisms.

School of Medicine Department of Microbiology The University of Washington

Applications for DNA Arrays

• Sequence checking/re-sequencing • Genotyping • Translation State Analysis • Gene expression analysis

The University of Washington School of Medicine Department of Microbiology

Sequencing By Hybridization (SBH)

TGTCATGCATATGCGGAATCACTTAGCATCGACTACGCATC...

ACAGTACGTATACGCCTTAGTGAATCGTAGCTGATGCGTAG...

ACAGTACGTA CAGTACGTAT AGTACGTATA GTACGTATAC TACGTATACG ACGTATACGC CGTATACGCC GTATACGCCT TATACGCCTT ATACGCCTTA ETC...

A sequence N bases long contains (N-10) 10 base pair sequences, each one of which has 9 base pairs of overlap with another sequence

Department of Microbiology The University of Washington School of Medicine

Problem with SBH

Suppose I have the following 43 bp sequence: With a repetitive sequence, there are fewer unique oligos (in the above case, instead of 33 unique 10 bp oligos, there are only 25. Eight 10 bp oligos occur twice. With repetive sequence, it is not possible to construct a unique sequence of the proper length by SBH.

The University of Washington School of Medicine Department of Microbiology

Re-sequencing format

....ACGTCG

T ATCGTAGTAGCAGCTGATCGTACGTACG.....

ACGTCG A ATCGTAGT ACGTCG C ATCGTAGT ACGTCG G ATCGTAGT ACGTCG T ATCGTAGT CGTCGT A TCGTAGTA } CGTCGT C TCGTAGTA CGTCGT G TCGTAGTA } CGTCGT T TCGTAGTA GTCGTA A CGTAGTAG GTCGTA C CGTAGTAG GTCGTA G CGTAGTAG GTCGTA T CGTAGTAG } etc.....

TATCGTAGTAG A C G T Chip of oligos distributed along the known sequence w/middle base varying

Department of Microbiology The University of Washington School of Medicine

Re-sequencing format applied to genotyping ....ACGTCG

T ATCGTAGTAGCAGCTGATCGTACGTACG.....

...ACGTCG

G ATCGTAGT.

...ACGTCG

T ATCGTAGT.

} Locus 1 Some other sequence, locus 2 etc., locus 3,4,.....

Individual A: heterozygote G/T Individual B: homozygote G etc......

Individual C: homozygote T

School of Medicine

etc......

Department of Microbiology The University of Washington

Arrayed Primer Extension

A G T A G C A G T A G G A A G C A G T A G G A G T C G T C A T C C T C GAGAGAC-------

The University of Washington School of Medicine Department of Microbiology

Translation State Array Analysis (TSAA)

CY3 CY5 The University of Washington School of Medicine Department of Microbiology

CY3 CY5 CY3 CY5

Analyze for changes in translation state

Cell Population #1 Cell Population #2 Extract mRNA Extract mRNA

Make cDNA Label w/ Green Fluor Make cDNA Label w/ Red Fluor

Scan Co-hybridize

……………………….

……………………….

Slide with DNA from different genes

The University of Washington School of Medicine Department of Microbiology

Towards Pathway Modeling

DNA---> RNA ---> Protein

[mRNA’s] [Protein’s] Rates TSAA Expression Arrays 2-D gels, other proteomic technologies

Department of Microbiology The University of Washington School of Medicine

Other Applications

• Genome - genome comparisons – Species-to-species – Individual-to-individual • Environmental surveys for presence/absence of given bacteria(um) • Identification of protein-DNA binding sites.

• Measurement of DNA replication rates.

• Many others…...

The University of Washington School of Medicine Department of Microbiology

Data Analysis (short)

• Normalization • Statistics

The University of Washington School of Medicine Department of Microbiology

What do we actually measure?

• Answer: We measure signal (radioactivity, Cy3 signal, or Cy5 signal) of cDNA target(s) which hybridize(s) to our probe (and backgrounds, ratios, standard deviations, dust etc.…) • What to we wish to know (an abstraction)?

[mRNA] 1a , [mRNA] 1b ,….. [mRNA] Na , [mRNA] Nb Where N = Number of Genes, a and b = different experimental conditions.

The University of Washington School of Medicine Department of Microbiology

Some observations

• Ratios we measure by 2-color expression arrays often underestimate the ratio as measured by other technologies (e.g Northerns or real-time PCR) • The above effect is worse for more highly expressed genes - e.g. ratios are more “compressed” at high expression.

• Everything that can go wrong generally conspires to compress the ratio.

• The measured ratio is dependent on the concentration of the probe (e.g. the amount of DNA on the spot).

• Hence, I don’t refer to our measurements using “fold change” terminology.

Department of Microbiology The University of Washington School of Medicine

Types of normalization

• To total signal • To “house keeping genes” • To genomic DNA spots (Research Genetics) or mixed cDNA’s • To internal spikes • Other ways …..

The University of Washington School of Medicine Department of Microbiology

me

Often we assume: [mRNA] n,a a signal n,a [mRNA] n,a = k * signal n,a “Normalization constant”

Data normalization - It’s more complicated than you might think...

• Experiment – Take RNA from a single sample and make aliquots – label one in Cy3 and one in Cy5 – hyb to the same array • Expect – same ratio for all detectable signals (+/- error) – can normalize to some controls to get a linear scaling factor

Department of Microbiology The University of Washington School of Medicine

Ratio sorted from most to least expressed

The University of Washington School of Medicine Department of Microbiology

How reproducible are array experiments?

The University of Washington School of Medicine Department of Microbiology

Std. Logic for Array Experiments

• Arrays are very expensive …..

• I can’t afford replicates….

• I’ll just do one experiment …..

• I can still get this published….

• So it must be OK!

• Is it?

The University of Washington School of Medicine Department of Microbiology

What does a typical error profile

Mammalian clone arrays)

The University of Washington School of Medicine Department of Microbiology

What is a statistically significant level of differential expression?

Is this point significant?

Log ratio

A few comments about histograms of ratios

• They are narrower at high gene expression due to decreased scatter in the signal.

– Thresh-hold for differentially expressed should be F(I).

• They are not necessarily log normal • They are almost always close to log normal if all data is included since error is log normal.

Department of Microbiology The University of Washington School of Medicine

Selection of differentially expressed genes

Number of genes Number of genes Log ratio Intensity

Error (Precision) Estimation Methods

Cost •

Large number of replicate measurements to calculate standard error n>=5.

•Small number of replicates to calculate standard error (n=3-5).

•Duplicates with a common error model.

•Duplicates with D to estimate error.

•Single measurement with error model borrowed from similar experiments.

•Single measurement (current standard in many publications).

Value

The University of Washington School of Medicine Department of Microbiology

Our “Typical” Experiment

Replicates within the array Replicate arrays Sample 1 Sample 2 Sample 1 Sample 2 Net result - 4 data points/ “gene”

The University of Washington School of Medicine Department of Microbiology

Reasons for doing a “flipped color” experiment

• Aids in data normalization – Fit a normalization function so that both color schemes agree with each other • Some data points do not invert ratio in a flipped color experiment! (~ 0.1% in human) – These will appear as differentially expressed in a single experiment but are false positives.

– Sequence specific incorporation effects?

The University of Washington School of Medicine Department of Microbiology

“Spot-on” Data Processing

“Spot-on Image”

Raw Data Select Genes which are differentially expressed by a

statistically significant

amount

Spot locations, intensities,backgrounds, ratios, error estimates “Spot-on Unite” “Spot-on Select” Mean ratios, error estimates links to external dB

“Spot-on select” The University of Washington School of Medicine Department of Microbiology

Recommendations/comments

• Normalization algorithms/methods should be looked at more carefully. Don’t ass u me linearity • All measured numbers from arrays should have associated error estimates of some kind.

• Error estimates are best obtained by replicate measurements on replicas which represent the true variability of the biology (sample and/or culture heterogeneity can be a major issue).

The University of Washington School of Medicine Department of Microbiology

Recommendations/comments

• Subsequent biology done by yourself or others on false positives/negatives is often more costly than the array analysis.

• Biologists often worry too much about false negatives and not enough about false positives.

• You can’t publish a false negative but you can publish a false positive.

The University of Washington School of Medicine Department of Microbiology

A few more comments

• The more experimental planning you do up front, the more you can extract from an array experiment.

– Can you control sample heterogeneity?

– What are the best controls?

– Do you have enough sample to do sufficient replica’s to get meaningful results?

The University of Washington School of Medicine Department of Microbiology

The Center for Expression Arrays

Current team: • Kim Smith, manager (206)732-6049, [email protected]

• Jada Quinn, Darren May, Suzanne Oakley (Research technicians) • Erick Hammersmark, Chuck Benson (software engineers).

• Aaron Valla, Alice Tanada, Min Hui (undergraduates).

Lots of help (past and present): Lee Hood lab, Stan Fields lab, Michael Katze lab ( Gary Geiss ), Jim Mullins lab ( Angelique van’t Wout ), Stephen Lory lab.

The University of Washington School of Medicine Department of Microbiology

When would one do custom arrays?

• Not very often – cost/array for large “standard” arrays is similar to cost/array for small custom arrays.

– there is a lot we don’t know.

• If RNA is

very

limited, then it makes sense.

– procedure: 1) Using an analogous system, identify differentially expressed genes. 2) make small arrays of these genes to use with tissue of interest.

Department of Microbiology The University of Washington School of Medicine