Transcript Document
English okay? Masters studies offer tracks: This is part of: VL Microarray data analyis Tuesday, 8:30 – 10:00 Ü Thursday 10:15-11:45 (start: Oct. 23) Next semester: Praktikum + Seminar Thereafter possibility for Masters thesis. Anwesenheitspflicht in VL und Ü (Liste!) Literature: See course web page. 1 21. Okt Microarray-Technologien Martin Vingron 2 28. Okt Grundlagen der Datenanalyse Christine Steinhoff 3 4. Nov Varianzanalyse I Christine Steinhoff 4 11. Nov Varianzanalyse II Christine Steinhoff 5 18. Nov LOWESS, Varianzstabilisierung Anja von Heydebreck 6 25. Nov Statistisches Testen Anja von Heydebreck 7 2. Dez Clusterverfahren Anja von Heydebreck 8 9. Dez Klassifikation, Lin. Diskriminanzanalyse Rainer Spang 9 16. Dez Anwendungen in der Krebsforschung Rainer Spang 10 6. Jan Hauptkomponentenanalyse Martin Vingron 11 13. Jan Statistische Lerntheorie Rainer Spang 12 20. Jan Sequenzannotation Rainer Spang 13 27. Jan Bayessche Netzwerke Rainer Spang 14 15 3. Feb 10. Feb Regulation Martin Vingron Zusammenfassung, Wiederholung, Ausblick Genome Sequencing: Functional Genomics: Determination of DNA sequence Derivation of amino acid sequences Analysis, comparison, classification Study of gene function gene expression studies proteomics metabolic networks DNA gene transcription messenger RNA (mRNA) translation protein sequence structure A cell and its population of genes: What is the problem? Determine the amount of mRNA for each gene that is present in a cell/tissue. DNA forms double strands by a process called hybridization: Labeling Hybridization Expression Arrays cDNA Arrays Glas Arrays Oligonucleotide Arrays Membrane based Arrays Glass Slide Microarrays … were first produced at Stanford University (Schena et al, 1995). Whole cDNA: 500-1500 bp Filter “Macro”arrays … were first published by Lennon and Lehrach, 1991 7.5x2.5cm Ca 21 cm Oligonucleotide Arrays … were first published by Lockhardt et al, 1996 ... TGTGATGGTGGGAATGGGTCAGAAGGACTCCTATGTGGGTGACGAGGCC TTACCCAGTCTTCCTGAGGATACAC ca 25bp TTACCCAGTCTTGCTGAGGATACAC probe cell 1 PM MM 2 3 4 ... ... ... 17 18 19 20 probe set probe pair Probe - Reference CC CC G A CC G A G CC A CC G A CC G A G A CC G A CC G A There are other technologies, too, to estimate expression levels: • EST sequencing – „electronic northern“ • SAGE: tags of mRNAs are concatenated and sequenced • Reliability of results depends on depth of probing (number of ESTs, number of tags) Why do we want to know? • „tissue profiling“: which genes are expressed in a tissue • Comparing healthy and diseased (e.g., tumor) tissue • Studying dynamic processes: E.g., cell cycle (time series) Example: Renal clear cell carcinoma Comparison of kidney cancer cells to normal tissue. Which genes are altered in their expression? N98-8880 T98-8880 Molecular Genome Dr. Judith Boer Example: Cell cycle time course G1 S G2 M Spellman et al took several samples per time-point and hybridized the RNA to a glass chips with all yeast genes Data processing • Image collection • Image analysis, intensity determination • Within slide normalization Trends in Biotech Hess et al, 19(11),2001 OUPUT: Scanner + Scanner-Software Clone Index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Finger #Row Column S#1 Mean S#1 S.Dev S#1 Area S#1 BkMean S#1 BkS.Dev 1 3 5 7 9 11 1 3 5 7 9 11 2 4 6 8 10 12 2 4 6 8 10 12 1 3 5 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 1964.028 2149.386 906.1724 3588.557 60317.82 54301.75 771.2751 662.4827 1245.646 488.5027 5783.04 1961.644 2838.966 55542.37 3338.375 2955.312 61398.14 58695.57 5746.229 2045.466 1097.905 858.4306 6882.719 2503.947 740.4891 8222.649 2244.806 1655.791 ... 682.7736 769.6178 420.9323 1168.349 11562 20957.93 409.6172 309.9964 923.4761 297.9345 1924.275 1296.955 964.7534 20307.24 2077.73 984.0138 11946.8 15767.88 2064.34 682.7502 435.5942 387.9267 3266.915 770.0308 400.7088 3462.559 734.1207 711.0076 113 91 74 89 153 135 73 73 52 31 125 76 82 131 65 117 152 147 83 102 67 101 129 128 71 118 92 81 208.7262 152.5326 206.2414 162.2653 186.1003 148.7088 174.1211 140.9021 195.0722 157.5111 206.8428 178.7954 236.801 173.5277 203.6449 156.0123 190.193 150.2779 196.4706 163.1341 184.9516 152.659 191.4465 159.6247 230.2468 158.729 212.3941 148.894 173.0246 131.4185 196.8208 164.9264 172.9454 151.5878 135.3904 139.0316 222.6505 156.1099 229.2686 168.0402 167.7662 130.3175 200.3975 166.7363 170.8719 152.7547 136.53 139.233 226.4033 163.7236 Trends in Biotech 185.4035 134.3752 19(11),2001 195.8065Hess et al,153.7403 186.3808 150.0175 ... Different technologies • Support: membrane or glass slide • Spotted material: PCR product or oligo (short/long) • Labeling: – 1-channel: radioactive, Affy – Absolute values – 2-channel: 2 color fluorescent labeling – Relative values Quality issues 0.8 1.0 subpopulations: PCR 43 a73-u02400vene.txt 0.0 0.2 0.4 ^ F 0.6 Kidney1 Kidney2 Kidney3 Kidney4 Kidney5 Kidney6 Onco1 Onco2 Onco3 Onco4 Onco5 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 log(fg.green/fg.red) Remedies: improve PCR protocols; model “random effect” through plate-wise calibration 0.5 subpopulations: pin 0.8 1.0 41 (a42-u07639vene.txt) by spotting pin 0.0 0.2 0.4 ^ F 0.6 1:1 1:2 1:3 1:4 2:1 2:2 2:3 2:4 3:1 3:2 3:3 3:4 4:1 4:2 4:3 4:4 -0.8 -0.6 -0.4 -0.2 0.0 0.2 log(fg.green/fg.red) Remedies: handling of pins; pin-wise calibration Distribution of intensities: log-normal? intensities QQPlot Histogramm log intensities Chip design • Type of chip: – Global „whole genome“ (yeast, drosophila, mouse, man) – Domain specific, e.g. cancer, infection • Spots: – PCR products: E.g., 3´ UTR (avoid crosshyb.) – Oligos: uniqueness, stability Databases • • • • • Stanford TIGR Gene expression atlas GEO Arrayexpress • MIAME standard: Minimum Information About a Microarray Experiment Software • • • • R + Bioconductor Jexpress Genesprings Rosetta Resolver Affymetrix technology • Per gene, spot 20 perfectly matching oligos and 20 oligos with 1 mismatch • Intensity: weighted average of pixel intensities in perfect and mismatch oligos (More on this next week)