Transcript Document

http://creativecommons.org/licenses/b
y-sa/2.0/
Integrating the Data
Prof:Rui Alves
[email protected]
973702406
Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08
Website of the Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/
Course: http://10.100.14.36/Student_Server/
Outline
• Methods for reconstruction of functional
protein networks
– Why is it important?
• Methods for reconstruction of physical
protein interactions
Proteins do not work alone!
Finding the social environment of a
protein
• Finding out what a protein does is not
enough
– Reductase, ok, but of what? (super-mouse)
• There is an incredible ammount of
information available regarding the biology
of many organisms
– Sequences, omics, pathways, etc…
Integrating the information is important
for network recontruction
• If we can integrate all the information
available for a given protein/gene, then we
are likely to be able to predict its social
network
• From here to reconstructing the causal set of
interactions in the network, there is only a
step
– Who does what to whom
Methods for network
reconstruction
• Mapping Gene onto known pathways
– If a gene is orthologous to genes in other
organisms for which we known the pathways
and circuits, then we can assume that they work
in that circuit in the new organism
Find a gene in a new genome
Sequence of
ste20
Orthologue
…Sequenced
… Genome…
gene
Reconstruct same pathway in new
organism
Ste20 new
organism
Methods for network
reconstruction
• Mapping Gene onto known pathways
• Using text analysis
–
–
–
–
Scientific literature as accumulated over centuries now.
No one can know everything and read everything.
However, information is buried in there
Mining that information can assist in network
reconstruction
Publication databases are source
of information
Meta text databases create network
models from publication analysis
iHOP is a sofisticated context
analysis motor
How does meta-text analysis create
networks?
Literature
database
Server/
scripts
Your
genes
Program
Entry
List of entries mentioning your gene
Gene
e.g Ste20
names
database
e.g activate,
Language
inhibit
rules
rescue
database
Gene list
Rule list
Problems with this set up
• Delay with respect to available information
• Disregards a lot of information available
over the web
Text Miner will address this
Text Miner
Text Miner
Text Miner
Text Miner
Things to do
• Statistical Significance
– Internal controls
– Overall controls
• Sentence Mining
– Definition of action words ontology to help
automated function mining
• Graphical Drawing
– Allowing for mouse drag and droping
• Selector for interaction that are to be trusted
and included in the model
Problems with this set up
• Slow, analysis and document retrieval is
done live
– In the future there will be an option so that if a
search has been done by someone before the
user will be able to use that, instead of doing a
live search
• There is more “junk info”
– However you can control that by selecting the
sources of information you want to use
Methods for network
reconstruction
• Mapping Gene onto known pathways
• Meta text analysis
• Evolutionary based protein interaction
prediction
– Proteins that work together (i.e. belong to the
smae close social network) evolve together
– Ergo, proteins that show co-evolution are to
likely to work together
Proteins that have coevolved share
a function
• If protein A has co-evolved with protein B, they
are likely to be involved in the same process
• Looking for proteins that coevolved will help
prediction social networks of proteins
• There are many methods to look for co-evolution
of proteins
– Phylogenetic profiling, gene neighbourhoods,
gene fusion events, phylogenetic trees…
Using phylogenetic profiles to predict
protein interactions
Your Sequence (A)
A
1
C
0.9
Server/
…
…
Program
B
Database of
proteins in
fully
sequenced
genomes
0.11
Target
Homologue in
Homologue in
…
… Genome
… 2?
Genome
Genome 1?
(A
and absent
of
A ProteinsCalculate
Y and C) that are present
N
…in the same setProtein
are likely to be involved
in the same…process and therefore
id A
B genomes N
Y
interact
C
Y
N
…
…
…
…
…
coincidence index
A if protein
B
CA is absent in all genomes in which protein
Similarly,
B isof
Database
i/number
ofisgenomes<1
present
there
a likelihood
that they perform the same profiles
function!for
0j/number
1 of genomes
2
each protein in
each organism
Phylogenetic coincidence server
• We have one that will be up in a few
months for yeast, coli, man, chimp,
candida and xanthus.
Syntheny/Conservation of gene
neighborhoods
Genome 1
Protein A
Genome 2
Protein C
Protein A
Protein B
Protein C
Protein D
Protein A
Protein B
Protein D
Protein B
Protein C
Protein D
Proteins A and B are in a conserved relative
Genome 3
Protein B Protein A
Protein C
Protein D
position
in most
genomes
which
is an
Which
of these
proteins
interact?
indication that they are likely to interact
Genome …
…
Gene fusion events
Genome 1
Genome 2
Protein A Protein B
Protein C
Protein D
Protein A
Protein B
Protein C
Protein D
Protein C
Protein A
Protein D
Protein B
Which of these proteins interact?
Proteins A and B have suffered gene fusion
Genome
3 in atProtein
Protein B Protein A
Protein C
D
events
least
some
genomes,
which is
an indication that they are likely to interact
Genome …
…
Building phylogenetic trees of
proteins
Genome 1
Protein A
Protein B
Protein C
Protein D
Phylogenetic trees represent the
Genome evolutionary
2
homologue
Protein D
Protein C history
Protein A of
Protein
B
genes/proteins based on their
sequence
Genome 3
Protein B Protein A
Protein C
Protein D
Genome …
…
Get sequence of all homogues, align and
build a phylogenetic tree
Similarity of phylogenetic trees
indicates interaction between proteins
B1
A1
B2
A2
B3
A3
…
…
C3
D2
…
Proteins A and B have similar
evolutionary
trees and thus are likely to interact
…
C2
C1
D1
D3
Protein/Gene interactions
• Often, people use these methods to say
that genes of proteins interact.
• The methods previously describe can not
be used accurately to describe
PHYSICAL interaction
• When people say interact in this context
one is forced to assume FUNCTIONAL
(not necessarily physical) interaction,
unless more info is available
Methods for network
reconstruction
•
•
•
•
Mapping Gene onto known pathways
Using meta text analysis
Using phylogenetic profiling
Using omics data
– If two proteins/genes have evolve to perform a function
in the same process, it is likely that their activity and
gene expression is co-regulated
– Conversely, if proteins/genes are co-regulated, then
they are likely to participate in the same process
Predicting gene functional
interactions using micro array
data
cells
Group of
genes/proteins
Purify cDNA Compare cDNA
involved in response
levels of
corresponding genes
to the stimulus
Purify cDNA
Stimulum
in the different
populations
Genes overexpressed
as a result of stimulus
Genes underexpressed
as a result of stimulus
cells
Genes with expression
independent of stimulus
Gene network reconstruction
• Reconstruction of gene networks based
on micro-array data is a very difficult
endeavor
• It is an inverse problem, meaning that
there is usually more than one solution
that fits the data
• Pioner groups used either petri nets (e.g.
Somogyi, Finland) or mathematical model
(Okamoto, Japan)
Predicting protein functional
interactions using mass spec data
cells
Group of proteins
involved in response
Purify proteins Identify Proteins and
to the stimulus
compare Protein
profiles/levels in the
Purify proteins different populations
Stimulum
Proteins present
as a result of stimulus
Proteins absent
as a result of stimulus
cells
Proteins Present
in both conditions
Protein network reconstruction
• Reconstruction of protein networks based
on mass spec proteomics data is still very
immature.
• To my knowledge no paradigmatic, large
scale example of it has yet been done
Regulation of gene expression
• Predicting which TF regulate gene
expression is an important part of
reconstructing biological circuits of
interest
• Omics data and bioinformatics can also
be used to do this
Predicting regulatory modules
with CHIP-ChIp experiments
Scan new genomes
for TF regulatory
modules
Crosslink
Protein/DNA
Derive consensus
sequences for TF
binding sites
Break DNA
Compare in Microarray
Reverse
cross link &
Purify DNA
Pieces
cells
Break DNA
Afinity
Purification
of
Transcription
factor
Reverse
cross link &
Purify DNA
Pieces
bound to TF
Predicting protein activity
modulation with NMR/IR/MS
Metabolomics
cells
Measuring Metabolites
Compare changes in
metabolic levels to infer
changes in protein activity
Stimulus
cells
Measuring Metabolites
Incorporating metabolomics
information
• These changes can be incorporated into
mathematical models and these models
can then be used predictively
Methods for network
reconstruction
•
•
•
•
•
Mapping Gene onto known pathways
Using meta text analysis
Using phylogenetic profiling
Using omics data
Using protein interaction data
– Large scale protein interaction data sets are available
– If proteins physically interact, it is likely that they work
together in the same network
Predicting protein networks using
protein interaction data
Server/
Program
Database of protein
interactions
A
C
D
Your Sequence (A)
E
Continue until you are satisfied
B or completed
F
the network
Outline
• Methods for reconstruction of functional
protein networks
• Methods for reconstruction of physical
protein interactions
How do proteins work within the
network?
• Assume we now have the network our
protein is involved in.
• How do we further analyze the role of the
protein?
Proteins work by binding
DNA
Effect
Proteins work by binding!
So what?
So, if we can predict how proteins DOCK to their
ligands, then we will be able to understand how the
binding allows them to work systemically
Design drugs to overcome mutations in binding
sites
Design proteins to prevent/enhance other
interactions
What is in silico protein docking?
• Given two molecules find their correct
association using a computer:
T
=
+
What types of in silico docking exist?
• Sequence Based Docking:
In silico two hybrid docking
Protein A
Protein B
E. coli
AGGMEYW….
E. coli
VCHPRIIE….
S. typhi
AA – CDWY…
S. typhi
VCH -KIIE…
…
…
…
…
Y. pestis
AGG –DYW
Y. pestis
VCH –KIIE…
D/K or E/R may be
involved in a salt bridge
A
G
G
…
Pearson Correlation
D
…
V C H P K I I E…
What types of in silico docking
exist?
• Sequence Based
Docking
• In silico structural
protein docking
Structure based docking
• Protein-Protein docking
– Rigid (usually)
•
Very demanding on
Protein-Ligand
docking resources
computational
– Rigid protein, flexible ligand
Structural docking in a nutshell
• Scan molecular surfaces of protein for best
surface fit
– First steric, then energetics
– Can (and should) include biologically relevant
information (e.g. residue X is known from mutation
experiments to be involved in the docking → discard
any docking not involving this residue)
Atom based docking
• First, a surface
Accessible (Connolly)
representationSurface
is needed
Van der Waals Surface
Solvent accessible
Surface
Calculating the best docking
• Scan molecular surfaces of protein for best
surface fit
– Calculate the position where a largest number of
atoms fits together, factor in energy + biology and
rank solutions according to that
Grid-based techniques
•Grid-based Techniques
–Alternative to calculating protein atom / ligand
atom interactions. more efficient (number of
grid points < number of atoms)
Grid based docking
Score 2
Score 3
Score 1
Score 4
Calculate
intermolecular
forces for
each grid
point
Place grid over protein
The docking function
• There are many and none is the best for all
cases
•Scores will depend on the exact docking
function you use
A docking function for surface
matching
•Molecules a, b placed
on l × m × n grid
0  outside the molecule


a,bl ,m,n  
  inside the molecule
1  on the surface of the molecule

•Match surfaces
N
Cal ',m ',n ' , b
•Fourier transform makes
calculation faster
l  step1, m  step 2, n  step 3
N
N
    al ',m ',n '  b l  step1,m  step 2,n  step 3
l '1 m '1 n ' 1
•Tabulate and rank all
possible conformations
A docking function for
electrostatics
• There
are many
•they use different force field
approximations to calculate energy of
electrostatic interactions.
•The basics:
Eelectrostatic    a ra   b rb a ra   b rb dV
Charge
distributions for
proteins
Potential for
proteins
The full docking function
• Calculates
a relative binding energy that
integrates electrostatic and shape matching
factors. For example:
Etot  cElectrostatic  EElectrostatic  cshape matching  Eshape matching
Overall process of docking
Overall process of docking
Mol 1
Mol 2
Rigid Body energy
calculation
List of Complexes
Final list of
solutions
 Energy(Form Matching, Electrostatics)
p1,i , p2, j
i, j
Re-rank using
statistics of residue
contact, H/bond,
biological
information, etc
Re-rank using rotamers,
flexibility in protein backbone
angles, Molecular dynamics,
etc.
Summary
• Methods for reconstruction of functional protein
networks
– Bibliomics
– Genomics
– Phenomics, etc
• Methods for reconstruction of protein interactions
– Sequence based
– Structure based
The overall picture
The overall picture
The overall picture
The overall picture
The overall picture
The overall picture
Grid-based techniques
• Grid-based Techniques
– Notes:
• Grids spaced <1 Å
– Results show very little change in error for grids
spacing between .25 and 1 Å
Problem Importance
• Computer aided drug design – a new drug should
fit the active site of a specific receptor.
• Many reactions in the cell occur through
interactions between the molecules.
• No efficient techniques for crystallizing large
complexes and finding their structure.