Lecture PowerPoint to accompany Molecular Biology Fifth Edition Robert F. Weaver Chapter 25 Genomics II: Functional Genomics, Proteomics, and Bioinformatics Copyright © The McGraw-Hill Companies, Inc.

Download Report

Transcript Lecture PowerPoint to accompany Molecular Biology Fifth Edition Robert F. Weaver Chapter 25 Genomics II: Functional Genomics, Proteomics, and Bioinformatics Copyright © The McGraw-Hill Companies, Inc.

Lecture PowerPoint to accompany
Molecular Biology
Fifth Edition
Robert F. Weaver
Chapter 25
Genomics II: Functional
Genomics, Proteomics,
and Bioinformatics
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
25.1 Functional Genomics: Gene
Expression on a Genomic Scale
• Functional genomics refers to those areas
that deal with the function or expression of
genomes
• All transcripts an organism makes at any
given time is an organism’s transcriptome
• Use of genomic information to block
expression systematically is called
genomic functional profiling
• Study of structures and functions of the
protein products of genomes is proteomics
24-2
Transcriptomics
• This area is the study of all transcripts an
organism makes at any given time
• Create DNA microarrays and microchips that
hold 1000s of cDNAs or oligos
– Hybridize labeled RNAs from cells to these arrays or
chips
– Intensity of hybridization at each spot reveals the
extent of expression of the corresponding gene
• Microarray permits canvassing expression
patterns of many genes at once
• Clustering of expression of genes in time and
space suggest products of these genes
collaborate in some process
24-3
Oligonucleotides on a Glass Substrate
24-4
Serial Analysis of Gene Expression
• Serial Analysis of Gene Expression
(SAGE) allows us to determine:
– Which genes are expressed in a given tissue
– The extent of that expression
• Short tags, characteristic of particular
genes, are generated from cDNAs and
ligated together between linkers
• These ligated tags are then sequenced to
determine which genes are expressed and
how abundantly
24-5
SAGE
24-6
Cap Analysis of Gene Expression (CAGE)
• CAGE gives the same information as
SAGE about which genes are expressed
and how abundantly, in a given tissue
• It focuses on the 5’-ends of mRNAs, which
allows for the identification of transcription
start sites and may help in locating
pormoters
24-7
Whole Chromosome Transcription Mapping
• High density whole chromosome transcriptional
mapping studies have shown a majority of
sequences in cytoplasmic poly(A)RNAs derive
from non-exon regions of human chromosomes
• Almost half of the transcription from these same
chromosomes is nonpolyadenylated
• Results indicate that great majority of stable
nuclear and cytoplasmic transcripts in these
chromosomes come from regions outside exons
• Helps to explain the great differences between
species whose exons are almost identical
24-8
Transcription maps of 10 Human Chromosomes
24-9
Genomic Functional Profiling
• Genomic functional profiling can be
performed in several ways
– A type of mutation analysis, deletion analysis mutants created by replacing genes one at a
time with antibiotic resistance gene flanked by
oligomers serving as barcode for that mutant
– A functional profile can be obtained by growing
the whole group of mutants together under
various conditions to see which mutants
disappear most rapidly
24-10
RNAi Analysis
• Another means of genomic functional
analysis on complex organisms can be
done by inactivating genes via RNAi
• An application of this approach targeting
the genes involved in early embryogenesis
in C. elegans has identified:
– 661 important genes
– 326 are involved in embryogenesis
24-11
Tissue-Specific Functional Profiling
• Tissue-specific expression profiling can be done by
examining a spectrum of mRNAs whose levels are
decreased by an exogenous miRNA
• Then compare to the spectrum of expression of
genes at the mRNA level in various tissues
• If that miRNA causes a decrease in the levels of
mRNAs naturally low in cells expressing the miRNA
– Suggests that the miRNA is at least a partial cause of
those natural low levels
• This type of analysis has implicated
– miR-124 in destabilizing mRNAs in brain tissue
– miR-1 in destabilizing mRNAs in muscle tissue
24-12
Locating Target Sites for Transcription Factors
• ChIP-chip analysis can be used to identify DNAbinding sites for activators and other proteins
• Small genome organisms - all of the intergenic
regions can be included in the microarray
• If genome is large, that is not practical
• To narrow areas of interest can use CpG islands
– These are associated with gene control regions
– If timing/conditions of activator’s activity are known,
control regions of genes known to be activated at
those times, or under those conditions, can be used
24-13
Locating Target Sites for Transcription Factors
• Tag sequencing, or ChIPSeq, in which chromatin
pieces precipitated by ChIP are repeatedly
sequenced, can also be used to identify
transcription factor-binding sites
• Knowledge of the sequence of multiple mammalian
genomes allows one to narrow the search for
human transcription factor binding sites by
beginning with conserved regions of the genome
• In addition, it is easier to search for cis-regulatory
modules (CRMs), which contain several
transcription factor binding sites
24-14
Locating enhancers that bind
unknown proteins
• There are still many enhancers whose
protein partners are unknown
• Pennachio and colleagues started the
search for vertebrate enhancers by looking
for highly conserved non-coding DNA
regions in 2006
• The strategy had a remarkably high
success rate but has a drawback in that it
only detects highly conserved sequences
and not all important control regions are
conserved
24-15
Locating promoters that bind
unknown proteins
• Ren and colleagues performed a genome-wide
search for human promoters and were surprised
to find that many genes have alternative
promoters located hundreds of bases away from
their primary promoters
• Class II promoters can be identified using ChIPchip analysis with an anti-TAF1 antibody
• In one study using human fibroblasts, over 9,000
promoters were identified and over 1600 genes
had multiple promoters
24-16
In Situ Expression Analysis
• The mouse can be used as a human
surrogate in large-scale expression
studies that would be ethically impossible
to perform on humans
• Scientists have studied the expression of
almost all the mouse orthologs of the
genes on human chromosome 21
– Expression followed through various stages of
embryonic development
– Catalogued the embryonic tissues in which
these genes are expressed
24-17
Single-Nucleotide Polymorphisms (SNPs)
• Single-nucleotide polymorphisms can
probably account for many genetic
conditions caused by single genes and
even some by multiple genes
• Might be able to predict response to a
drug
• Haplotype map with over 1 million SNPs
makes it easier to sort out important SNPs
from those with no effect
24-18
Structural Variation
• Structural variation is a prominent source of
variation in human genomes
–
–
–
–
Insertions
Deletions
Inversions
Rearrangements of DNA chunks
• Some structural variation can, in principle,
predispose certain people to contract diseases
– Some variation is presumably benign
– Some also is demonstrably beneficial
24-19
25.2 Proteomics
• The sum of all proteins produced by an
organism is its proteome
• Study of these proteins, even smaller
subsets, is called proteomics
• Such studies give a more accurate picture
of gene expression than transcriptomics
studies do
24-20
Protein Separations and Analysis
• Current research in proteomics requires first that
proteins be resolved, sometimes on a massive
scale
– Best tool for separation of many proteins at once is 2D gel electrophoresis
• After separation, proteins must be identified
– Best method of identification involves digestion of
proteins one by one with proteases
– Then identify the peptides by mass spectrometry
• In the future, microchips with antibodies
attached may allow analysis of proteins in
complex mixtures without separation
24-21
Quantitative Proteomics
• To determine the changes in protein levels upon
perturbation of a cell culture, one can label the
cells under the first condition with a light isotopic
tag, and under the second condition with a heavy
isotopic tag
• If the proteins are labeled in vivo, the cell cultures
can be mixed, the proteins can be extracted and
fragmented by proteolysis and upon further
separation can be subjected to mass spectronomy
• The ratio of heavy to light peak areas will reflect
the change in protein concentration as the growth
conditions change
24-22
Comparative Proteomics
• What makes a worm a worm and a fly a fly?
• Mass spectrometry data can be used to
compare protein concentrations in two different
organisms
• This type of analysis was applied to C.elegans
and Drosophila to reveal that the concentrations
of orthologous proteins are correlated much
better than the orthologous mRNAs in the two
organisms
24-23
Protein Interactions
• Most proteins work with other proteins to
perform their functions
• Several techniques are available to probe
these interactions
• Yeast two-hybrid analysis has been used
for some time, now other methods are
available
– Protein microarrays
– Immunoaffinity chromatography with mass
spectrometry
– Other combinations
24-24
Detecting Protein-Protein Interactions
24-25
25.3 Bioinformatics
• Bioinformatics involves the building and
use of biological databases
– Some of these databases contain the DNA
sequences of genomes
– Essential for mining the massive amounts of
biological data for meaningful knowledge
about gene structure and expression
24-26
Finding Regulatory Motifs in
Mammalian Genomes
Using computational biology techniques,
Lander and Kellis have discovered highly
conserved sequence motifs in 4 mammalian
species, including humans:
– In the promoter regions, these motifs probably
represent binding sites for transcription
factors
– 3’-UTRs motifs probably represent binding
sites for miRNAs
24-27
Using the Databases
• The National Center for Biological Information
(NCBI) website contains a vast store of
biological information, including genomic and
proteomic data
• Start with a sequence and discover the gene to
which it belongs, then compare that sequence
with that of similar genes
• Query the database with a topic for information
• View structures of protein in 3D by rotating the
structure on your computer screen
24-28