DNA/Protein structure-function analysis and Prediction

Download Report

Transcript DNA/Protein structure-function analysis and Prediction

C
E
N
T
R
E
F
O
R
I
N
T
E
G
R
A
T
I
V
E
B
I
O
I
N
F
O
R
M
A
T
I
C
S
V
U
Bioinformatics Master Course:
DNA/Protein Structure-Function Analysis
and Prediction
Lecture 13: Protein Function
[2]
[2]
Sequence-Structure-Function
Sequence
Threading
Homology
searching
(BLAST)
Structure
Function
Ab initio
prediction
and folding
impossible but for
the smallest
structures
Function
prediction
from
structure
very difficult
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
Functional Genomics – Systems Biology
Genome
Expressome
Proteome
TERTIARY STRUCTURE (fold)
Metabolomics
fluxomics
[3]
[3]
TERTIARY STRUCTURE (fold)
Metabolome
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[4]
[4]
Systems Biology
is the study of the interactions between the components of
a biological system, and how these interactions give rise to
the function and behaviour of that system (for example, the
enzymes and metabolites in a metabolic pathway). The aim
is to quantitatively understand the system and to be able to
predict the system’s time processes
• the interactions are nonlinear
• the interactions give rise to emergent properties, i.e.
properties that cannot be explained by the components in
the system
• Biological processes include many time-scales, many
compartments and many interconnected network levels
(e.g. regulation, signalling, expression,..)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[5]
[5]
Systems Biology
understanding is often achieved through modeling
and simulation of the system’s components and
interactions.
Many times, the ‘four Ms’ cycle is adopted:
Measuring
Mining
Modeling
Manipulating
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[6]
[6]
‘The
silicon
cell’
(some people think
‘silly-con’ cell)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[7]
[7]
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
A system response
Apoptosis: programmed cell death
Necrosis:
[8]
[8]
accidental cell death
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[9]
[9]
Human
Yeast
‘Comparative
metabolomics’
We need to be able to
do automatic pathway
comparison (pathway
alignment)
This pathway diagram shows a comparison of pathways in (left) Homo
sapiens (human) and (right) Saccharomyces cerevisiae (baker’s yeast).
Changes in controlling enzymes (square boxes in red) and the pathway
C E N T R E F O R I N T E G R A T I V E
itself have occurred (yeast has one altered (‘overtaking’)
path in the
B I O I N F O RM A T I C S V U
graph)
[10]
[10]
The citric-acid cycle
http://en.wikipedia.org/wiki/Krebs_cycle
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
The citric-acid cycle
M. A. Huynen, T. Dandekar and P. Bork
``Variation and evolution of the citric acid cycle: a
genomic approach'' Trends Microbiol, 7, 281-29
(1999)
[11]
[11]
Fig. 1. (a) A graphical representation of the reactions of the
citric-acid cycle (CAC), including the connections with
pyruvate and phosphoenolpyruvate, and the glyoxylate shunt.
When there are two enzymes that are not homologous to
each other but that catalyse the same reaction (nonhomologous gene displacement), one is marked with a solid
line and the other with a dashed line. The oxidative direction
is clockwise. The enzymes with their EC numbers are as
follows: 1, citrate synthase (4.1.3.7); 2, aconitase (4.2.1.3); 3,
isocitrate dehydrogenase (1.1.1.42); 4, 2-ketoglutarate
dehydrogenase (solid line; 1.2.4.2 and 2.3.1.61) and 2ketoglutarate ferredoxin oxidoreductase (dashed line;
1.2.7.3); 5, succinyl- CoA synthetase (solid line; 6.2.1.5) or
succinyl-CoA–acetoacetate-CoA transferase (dashed line;
2.8.3.5); 6, succinate dehydrogenase or fumarate reductase
(1.3.99.1); 7, fumarase (4.2.1.2) class I (dashed line) and
class II (solid line); 8, bacterial-type malate dehydrogenase
(solid line) or archaeal-type malate dehydrogenase (dashed
line) (1.1.1.37); 9, isocitrate lyase (4.1.3.1); 10, malate
synthase (4.1.3.2); 11, phosphoenolpyruvate carboxykinase
(4.1.1.49) or phosphoenolpyruvate carboxylase (4.1.1.32);
12, malic enzyme (1.1.1.40 or 1.1.1.38); 13, pyruvate
carboxylase or oxaloacetate decarboxylase (6.4.1.1); 14,
pyruvate dehydrogenase (solid line; 1.2.4.1 and 2.3.1.12) and
pyruvate ferredoxin oxidoreductase (dashed line; 1.2.7.1).
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
The citric-acid cycle
M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of the citric
[12]
[12] acid cycle: a genomic approach'' Trends Microbiol, 7, 281-29 (1999)
b) Individual species might not
have a complete CAC. This
diagram shows the genes for the
CAC for each unicellular species
for which a genome sequence has
been published, together with the
phylogeny of the species. The
distance-based phylogeny was
constructed using the fraction of
genes shared between genomes
as a similarity criterion29. The
major kingdoms of life are
indicated in red (Archaea), blue
(Bacteria) and yellow (Eukarya).
Question marks represent
reactions for which there is
biochemical evidence in the
species itself or in a related
species but for which no genes
could be found. Genes that lie in a
single operon are shown in the
same color. Genes were assumed
to be located in a single operon
when they were transcribed in the
same direction and the stretches
of non-coding DNA separating
them were less than 50
nucleotides in length.
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[13]
[13]
Experimental
• Structural genomics
• Functional genomics
• Protein-protein interaction
• Metabolic pathways
• Expression data
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[14]
[14]
Communicability: Functional Genomics
• Interpretation of genome-scale gene expression data
External Program
DNA-chip data
Cluster of
coregulated genes
 gene 1
 gene 2
 ...
 gene n
PFMP
query
Pathways affected
 pathway 1
 pathway 2
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[15]
[15]
Communicability: Functional Genomics
• Interpretation of genome-scale gene expression data
External Programs
DNA-chip data
Cluster of
coregulated genes
 gene 1
 gene 2
 ...
 gene n
Pattern discovery
 gene 1
 gene 2
 ...
(putative
regulatory sites)
PFMP
query
Similarities with
known regulatory
sites
 site 1
Factor 1
 site 2
Factor 2
 ...
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[16]
[16]
Other Issues
• Partial information (indirect interactions) and
subsequent filling of the missing steps
• Negative results (elements that have been shown not
to interact, enzymes missing in an organism)
• Putative interactions resulting from computational
analyses
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[17]
[17]
Protein function categories
• Catalysis (enzymes)
• Binding – transport (active/passive)
• Protein-DNA/RNA binding (e.g. histones, transcription factors)
• Protein-protein interactions (e.g. antibody-lysozyme)
(experimentally determined by yeast two-hybrid (Y2H) or bacterial
two-hybrid (B2H) screening )
• Protein-fatty acid binding (e.g. apolipoproteins)
• Protein – small molecules (drug interaction, structure decoding)
• Structural component (e.g. -crystallin)
• Regulation
• Signalling
• Transcription regulation
• Immune system
• Motor proteins (actin/myosin)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[18]
[18]
Catalytic properties of enzymes
Michaelis-Menten equation:
Vmax × [S]
V = ------------------Km + [S]
Km
•
•
•
•
•
•
•
kcat
Moles/s
Vmax
Vmax/2
E+S
ES
E+P
E = enzyme
Km
[S]
S = substrate
ES = enzyme-substrate complex (transition state)
P = product
Km = Michaelis constant
Kcat = catalytic rate constant (turnover number)
Kcat/Km = specificity constant (useful for comparison)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
Protein interaction domains
http://pawsonlab.mshri.on.ca/html/domains.html
[19]
[19]
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[20]
[20]
Energy difference upon binding
Examples of protein interactions (and functional
importance) include:
• Protein – protein
(pathway analysis);
• Protein – small molecules
(drug interaction, structure decoding);
• Protein – peptides, DNA/RNA
(function analysis)
The change in Gibb’s Free Energy of the protein-ligand
binding interaction can be monitored and expressed
by the following;
G =  H – T S
(H=Enthalpy, S=Entropy and T=Temperature)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[21]
[21]
Protein function
• Many proteins combine functions
• Some immunoglobulin structures are thought to have
more than 100 different functions (and active/binding
sites)
• Alternative splicing can generate (partially) alternative
structures
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[22]
[22]
Protein function & Interaction
Active site /
binding cleft
Shape complementarity
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[23]
[23]
Protein function evolution
Chymotrypsin
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[24]
[24]
How to infer function
• Experiment
• Deduction from sequence
• Multiple sequence alignment – conservation patterns
• Homology searching
• Deduction from structure
• Threading
• Structure-structure comparison
• Homology modelling
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[25]
[25]
Cholesterol Biosynthesis:
Cholesterol biosynthesis primarily occurs in
eukaryotic cells. It is necessary for membrane
synthesis, and is a precursor for steroid hormone
production as well as for vitamin D. While the
pathway had previously been assumed to be
localized in the cytosol and ER, more recent
evidence suggests that a good deal of the
enzymes in the pathway exist largely, if not
exclusively, in the peroxisome (the enzymes
listed in blue in the pathway to the left are
thought to be at least partly peroxisomal).
Patients with peroxisome biogenesis disorders
(PBDs) have a variable deficiency in cholesterol
biosynthesis
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[26]
[26]
Cholesterol Biosynthesis:
from acetyl-Coa to mevalonate
Mevalonate plays a role in epithelial cancers:
it can inhibit EGFR
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[27]
[27]
Epidermal Growth Factor as a Clinical
Target in Cancer
A malignant tumour is the product of uncontrolled cell proliferation. Cell
growth is controlled by a delicate balance between growth-promoting and
growth-inhibiting factors. In normal tissue the production and activity of
these factors results in differentiated cells growing in a controlled and
regulated manner that maintains the normal integrity and functioning of the
organ. The malignant cell has evaded this control; the natural balance is
disturbed (via a variety of mechanisms) and unregulated, aberrant cell
growth occurs. A key driver for growth is the epidermal growth factor
(EGF) and the receptor for EGF (the EGFR) has been implicated in the
development and progression of a number of human solid tumours
including those of the lung, breast, prostate, colon, ovary, head and neck.
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
Energy housekeeping:
Adenosine diphosphate (ADP) – Adenosine triphosphate (ATP)
[28]
[28]
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[29]
[29]
Chemical Reaction
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[30]
[30]
Enzymatic Catalysis
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[31]
[31]
Gene Expression
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[32]
[32]
Inhibition
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[33]
[33]
Metabolic Pathway: Proline Biosynthesis
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[34]
[34]
Transcriptional Regulation
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[35]
[35]
Methionine Biosynthesis in E. coli
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[36]
[36]
Shortcut Representation
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[37]
[37]
High-level Interaction
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[38]
[38]
Levels of Resolution
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[39]
[39]
Cholesterol Biosynthesis
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[40]
[40]
SREBP Pathway
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
Signal Transduction
Important signalling pathways:
Map-kinase (MapK) signalling
pathway, or TGF- pathway
[41]
[41]
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[42]
[42]
Transport
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[43]
[43]
Phosphate Utilization in Yeast
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[44]
[44]
Multiple Levels of Regulation
• Gene expression
• Protein activity
• Protein intracellular location
• Protein degradation
• Substrate transport
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[45]
[45]
Graphical Representation –
Gene Expression
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[46]
[46]
Experimental Data –
Gene Expression
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[47]
[47]
Experimental Data –
Transcriptional Regulation
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[48]
[48]
Experimental Data –
Transcriptional Regulation
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[49]
[49]
Transcriptional Regulation
Integrated View
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[50]
[50]
Pathways and Pathway Diagrams
• Pathways
• Set of nodes (entities)
and edges (associations)
• Pathway Diagrams
• XY coordinates
• Node splitting allowed
• Multiple views of the
same pathway
• Different abstraction
levels
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[51]
[51]
Metabolic
networks
Glycolysis
and
Gluconeogenesis
Kegg database (Japan)
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[52]
[52]
Gene Ontology (GO)
• Not a genome sequence database
• Developing three structured, controlled vocabularies
(ontologies) to describe gene products in terms of:
• biological process
• cellular component
• molecular function
in a species-independent manner
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[53]
[53]
The GO ontology
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U
[54]
[54]
Gene Ontology Members
• FlyBase - database for the fruitfly Drosophila melanogaster
• Berkeley Drosophila Genome Project (BDGP) - Drosophila informatics; GO database & software,
Sequence Ontology development
• Saccharomyces Genome Database (SGD) - database for the budding yeast Saccharomyces cerevisiae
• Mouse Genome Database (MGD) & Gene Expression Database (GXD) - databases for the mouse Mus
musculus
• The Arabidopsis Information Resource (TAIR) - database for the brassica family plant Arabidopsis
thaliana
• WormBase - database for the nematode Caenorhabditis elegans
• EBI GOA project : annotation of UniProt (Swiss-Prot/TrEMBL/PIR) and InterPro databases
• Rat Genome Database (RGD) - database for the rat Rattus norvegicus
• DictyBase - informatics resource for the slime mold Dictyostelium discoideum
• GeneDB S. pombe - database for the fission yeast Schizosaccharomyces pombe (part of the Pathogen
Sequencing Unit at the Wellcome Trust Sanger Institute)
• GeneDB for protozoa - databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei,
and several other protozoan parasites (part of the Pathogen Sequencing Unit at the Wellcome Trust
Sanger Institute)
• Genome Knowledge Base (GK) - a collaboration between Cold Spring Harbor Laboratory and EBI)
• TIGR - The Institute for Genomic Research
• Gramene - A Comparative Mapping Resource for Monocots
• Compugen (with its Internet Research Engine)
• The Zebrafish Information Network (ZFIN) - reference datasets and information on Danio rerio
C E N T R E F O R I N T E G R A T I V E
B I O I N F O RM A T I C S V U