Transcript Slide 1
Sequence Analysis & Gene Expression Organism selection: genome size – why – what is the benefit - politics Decisions: mapping first, “shotgun sequencing”, BAC alignment/sequencing [BAC – bacterial artificial chromosome; also YAC (yeast)] Genome sequence: raw sequence – confirmed sequence gene models – verification Verification: is the gene model transcribed? Yes/no/perhaps “ubiquitous” gene, family specific, homolog - ortholog - paralog Transcript profiles: when – how much [abundant] – where transcript “variants” – inducible by condition X? MUPGRET workshop, Columbia, MO, June 2005 (HJ Bohnert, UIUC) [email protected] Genomics … not just genes genome & transcriptome sequences markers & QTLs ATCCGAAGCG CTTGGAAAA protein interaction B maps X Y biochemical genetics expression profiles knock-out sRNA & RNAi A Databases, Integration & Intuition protein localization dynamic metabolite catalogs TP Mal structure analysis information mining, hypotheses, experiment - insight, application, virtual life How (much) will ‘encyclopedic’ approaches lead to better understanding? Arabidopsis – model plant small, fast, prolific, mutants, lines, ecotypes, genome sequence Field on a dish! O3 control CO2 Columbia grown in Soy-FACE AQP are distributed over all Chromosomes - a few clusters, many duplications 5 Mb PIP1;3 10 20 TIP3;2 TIP2;xpseudo NIP3;1 Ch-1 Ch-2 NIP3;1pseudo TIP4;1NIP2;1pseudo NIP2;1 NIP6;1 SIP1;1 PIP2;6 PIP1;2 (4) rDNA TIP1;1 NIP7;1 TIP2;1 TIP1;2 TIP5;1 Ch-3 PIP2;2 PIP2;3 PIP2;1 SIP2;1 (14) PIP1;4 TIP1;3 NIP5;1 TIP2;2 NIP1;1 NIP1;2 PIP1;5 PIP2;7 PIP2;5 PIP1;1 (3) SIP1;2 Ch-5 TIP3;1 (15) PIP2;8 Ch-4 30 NIP4;1 NIP4;2 TIP2;3 PIP2;4 (12) - duplicated regions that include AQPs. Arabidopsis thaliana AGI, 2000 The Plant Genome Ecosystem – population – species – ecotype (- breeding line) Organism – organ – tissue – cell – compartment Nucleus – envelope & pore – nucleoplasm, nucleolus & chromosomes Euchromatin & heterochromatin – gene islands – gene Promoters – 5’-regulatory (untranslated = UTR) – introns & exons – mature coding region – 3’-regulatory (UTR) regions Plants in silico? Sure! And then: Plant Design from Scratch The Plant Genome • • • • • • • • • • • • • Controls for Gene Expression – many Switchboards Chromatin condensation state Local chromatin environment Transcription initiation Transcript elongation Levels of regulation that mRNA splicing mRNA export affect what we call mRNA place in the cell RNA half-life “gene expression” Killer microRNAs Ribosome loading Protein transport/targeting Protein modifications Protein turnover The Plant Transcriptome 5 years ago, we did not know that Killer RNAs such a control system existed! (there are micro-genes) microRNAs Result: no protein i.e., gene is essentially “silenced” The Plant Transcriptome How to sample the transcriptome? • Morphological dissection (root, leaf, flower - epidermis, guard cell, etc.) • Cell sorting make single cells, send through cell sorter (size, color, reporter gene) • Laser ablation micromanipulation of laser to cut individual cells • Biochemical dissection (compartment isolation) chloroplasts, mitochondria, ribosomes, other membranes Painting cells with a reporter gene here is GFP Green Fluorescence Protein The Plant Transcriptome Painting tissues then isolating desired cells Enzymatic staining The Endodermis of the root tip is highlighted in transgenic plants using pSCR::mGFP5. [requires plant transformation] Emerging lateral roots The Plant Transcriptome cDNA – complementary DNA converts messenger RNA into > cDNA libraries • “neat” • normalized • subtracted > SAGE libraries double-stranded DNA “Normalization” removes mRNAs for which there are many copies in a cell – thus enriching for “rare mRNAs” (not so much sequencing to do) Subtraction removes cDNAs which you already know (less sequencing) The Plant Transcriptome cDNA Libraries Primary cDNA Library Library Normalization Total RNA Poly(A)+ RNA primary cDNA library make ss-DNA out of primary library 1st strand cDNA ds-cDNA Size-selected double stranded cDNA (>500 bp) Ligate to EcoRI adapters/digest NotI Clone (EcoRI/NotI) digested pBSII/SK+ & adaptored cDNA Primary (neat) library may be used for “normalization” ss-DNA DNA “tracer” PCR inserts by T7 and T3 standard primers tracer/driver hybridization Cloning of root RNAs from segments S1 – S4 root tip (Sharp lab) sequenced ~18,000 clones DNA “driver” column chromatogr. (double-strands stick) Non-hybridized DNA from flowthrough = normalized clones found ~8,000 unique and ~130 novel genes How many genes make a root? Serial Analysis Gene Expression Velculescu et al. 1995 http://www.sagenet.org/ coding region (known or expected) forward p. reverse p. Amplicon (sequence or clone + sequence) 1 2 3 4 5 6 7 8 9 10 M results Real-time PCR) (quantitative) RNA (DNA-free) to cDNA use product in dilutions for amplification Assumption each cycle increases amount by factor 2 (or 1.8) Check by using known amount of cloned control cDNA Serial dilution 1x - 1/5x - 1/25x - 1/125x [cycle number] Melting curves [single products] Single genes have been amplified here Two amplicons are shown Each shows a single melting curve Melting curves [multiple products] More than one gene has been amplified here Homologous genes [identity – similarity – divergence] orthologous – paralogous relationships The Plant Transcriptome Taking SAGE & cDNA Quantitative PCR in 384-well plates (96 primer pairs, 3 repeats each) sequences together corn roots “express” 20-23,000 genes (i.e., mRNA is made) - The entire corn genome is expected to include ~50,000 genes Substrates for High Throughput Arrays Nylon Membrane Single label 33P GeneChip Single label biotin streptavidin Glass Slides Dual label Cy3, Cy5 TeleChem ChipMaker2 Pins Pin pick-up volume 100-250 nl Spot diameter 75-200 um Spot volume 0.2-1.0 nl Creating cDNA Arrays cDNA cloned into vector and transformed to create cDNA library Q-Pix Clones sequenced and unique set chosen and reracked Unique set of clones 384 well microtiter plate PCR on Tecan workstation Slides printed on Cartesian Arrayer Final product Printing Arrays on 50 slides NSF Soybean Functional Genomics Steve Clough / Vodkin Lab Slide Chemistry Glass O O O O O O O O O O O O Si O Si O Si O Si O Si O Si O Si O Si O Si O Si O Si O Coatings Amine Poly-L-lysine Silanated Aldehyde Silylated + NH3 O .. .NCCNCCNCC. .... .. O NH3 + OH O + HN 3 Si OH OH OH O Si C H OH OH NH3 + NSF Soybean Functional Genomics Steve Clough / Vodkin Lab We use SuperAmine and SuperAldehyde from TeleChem (arrayit.com) GSI Lumonics NSF Soybean Functional Genomics Steve Clough / Vodkin Lab GenePix Image Analysis Software Placenta vs. Brain – 3800 Cattle Placenta Array cy3 cy5 Troubleshooting The Good The Bad NSF Soybean Functional Genomics Steve Clough / Vodkin Lab The Ugly Post-Print Processing Printed slide Hot Water Snap dry Rehydrate spots UV light Hybridize & Scan Chemically block background. Denature to single strands. Fix DNA to coating Ratio of expression of genes from two sources Cells from condition A Cells from condition B mRNA Label Dye 1 Label Dye 2 cDNA Mix equal over under ScanArray 3000 Fluorescent Scanner Overlay Images Reverse Labeling Slide 1 Cy3 over-expressed Slide 2 Cy5 over-expressed Universal vs. Universal (control v. control) Problem area at low intensity readings Lung vs Control Clustered display of data from time course of serum stimulation of primary human fibroblasts. Cholesterol Biosynthesis Cell Cycle Immediate Early Response Signaling and Angiogenesis Wound Healing and Tissue Remodeling Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) pg 14865 Hierarchical Clustering: 14 Tissues 7653 Genes Differences in Technology Affymetrix • One sample, one chip • Single Color Scans • Labeling by incorporating Biotin into cRNA not Cy3 or Cy5 dyes • Oligonucleotides instead of full-length cDNAs • Higher Density Arrays –Feature sizes down to 18 mm instead of ~100 mm –Non-contact Creation of Arrays GeneChips Affy Technology Overview • Photolithography and combinatorial chemistry – Technology from microchip industry: “GeneChip” – Coat slides – “Mask” to apply light to only desired features, de-protects feature Technology Overview (cont.) • Apply required nucleotide base to array • Apply new mask to de-protect different features • Stack nucleotides on top of one another • Repeat with bases and masks until 25mer oligonucleotides are built directly onto array Technology Final Steps • Silicon “wafers” of 90 arrays are cut • Glass substrate is then added to plastic cartridge for: – – – – Safe handling Easy storage Easy hybridization Easy scanning Easy, convenient Expensive (very much so) No confirmation of quality Erroneous data when low intensity Problems with SNPs* *not with 70-mer oligo glass slides Questions? Give me a call or send a message 217-265-5475 [email protected] http://www.life.uiuc.edu/bohnert/ Remember: YOU CAN ALWAYS FIND EVERYTHING ON GOOGLE! (though not these slides)