Overview of the Structure

Download Report

Transcript Overview of the Structure

Cancer Genes lists
Alfonso Valencia
Structural and Computational Biology Programme
Spanish National Cancer Research Centre
CNIO, Madrid
BioSapiens Workshop
From Genome to Proteome and Biological Function
Brussels
April 2008
Cancer genes
Transcriptome classification of B-cell non.Hodgkins lymphomas
Mohit Aggarwal et al. Cancer Cell 2007
QuickTime™ and a
decompressor
are needed to see this picture.
CGH and microarray data in Ewing
sarcomas
Ferreira et al., Oncogene. 2007 Oct 22
Epigenetics
The DNA Methylomes of
Double-Stranded DNA Viruses
Associated with Human Cancer
Agustin Fernandez-Fernandez1,
….. Osvaldo Graña2, Gonzalo
Gomez-Lopez2, David G.
Pisano2, Alfonso Valencia2, ……
Manel Esteller1å
The BioSapiens-sponsored project
concentrated on the protein coding loci and in
particular on the alternatively spliced products.
This work is part of the BioSapiens efforts for
the annotation of the human genome
(www.biosapiens.info).
BioSapiens
Network of Excellence
€12 Million between
26 partners in
14 different countries
“
The objective of the
BIOSAPIENS Network of
Excellence is to provide a
large-scale, concerted effort
to annotate genome data by
laboratories distributed
around Europe, using both
informatics tools and input
from experimentalists.
”
_Line of action 1_: Making information about cancer genes
accessible to experimental biologists.
The idea here is to take the lists of genes provided by experimental
groups, starting with the one published by Sjoblom et al., (ref: Science.
2006 Oct 13;314(5797):268-274), and add the information/annotations
provided by the different groups.
Other gene lists will be added as they are published, what makes
important to have the methods working as automatically as possible.
We need proposals of groups on what they can provide. We have to
avoid duplications.
Represent information for biologist. We can use the protein DAS or
CARGO system (see http://cargo.bioinfo.cnio.es)
The aim in this chapter is to publish a rich resource of annotated
cancer gene lists in a format useful for biologist. And the goal is to
do it by summer this year.
DO IT !
A web portal to integrate customized
biological information.
• CARGO is a configurable biological web portal designed
as a tool to facilitate, integrate and visualize results from
Internet resources, independently of their native format
or access method through the use of small agents,
called widgets (or BioWidgets).
• CARGO provides pieces of minimal, relevant and
descriptive biological information.
• The tool is designed to be used by experimental
biologists with no training in bioinformatics.
• Available at http://cargo2.bioinfo.cnio.es
Cases I, Pisano DG, Andres E, Carro A, Fernández JM, Gómez-López G, Rodriguez JM, Vera JF, Valencia A, Rojas AM.
CARGO: a web portal to integrate customized biological information. PubMed 17483515.
•
•
Cargo has a iGoogle Gadget version.
iGoogle Gadgets are simple HTML and JavaScript mini-applications served in
iFrames that can be embedded in webpages and other apps.
A widget for CARGO is described by an XML Document that contains
several fields providing information and documentation.
•How do widgets work?
PDB/seq
alignments
Distributed
Annotation
System.
FTP
SNP’s
3D files
•Ensembl request
Asyncronous Javascript And Xml (AJAX).
DAS Infrastructure
By Henning Hermjakob
By Andreas Prlic
•
•
Search for a term (like "regulation") or gene name ("p53")
See some gene lists related with cancer (Sjoblom et al.
Science, 2006, Matsuoka et al. Science, 2007, etc.) and
some protein lists.
Cancer
Spindle
Register new widgets, login and manage
accounts. New “Widget Manager” web form.
• Open any classified widget by clicking on their names at
menu bar on the top.
• See the global information related to the query made in
the "Input description panel”.
BioSapiens Ontology
• Aim: Standardise DAS feature types
• Developed protein feature ontology in close
collaboration with UniProt and HUPO PSI
• Three main branches:
– Positional features: “Donated” terms to the existing
Sequence Ontology from GO consortium
– Protein Modifications: Adopted the existing PSI MI MOD
ontology
– Non-positional features: BioSapiens
• Delivered as De107.8
By Gabby Reeves
and Henning Hermjakob
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
By Ildefonso Cases
Biosapiens Widgets
MIPS
Philip Wong
Corum: http://mips.gsf.de/genre/proj/corum/
the Comprehensive Resource of Mammalian protein complexes
PBD Cb-Cb 8a Pawel Smialowski
(Data are calculated directly from structures of biological units.)
.
Univ Roma
Alejandro Giorgetti, Tiziana Castrignano, Ildefonso Cases (CNIO)
PMDB: http://mi.caspur.it/PMDB/
Protein Models database
CNIO
iHop (Jose Manuel Rodríguez)
Text Mining
OMIM (Jose Maria Fernández)
Disease
FunCut (Jose Manuel Rodríguez)
Function
AllDomains (Ildefonso Cases)
Domains
Enviro (Jaime Fernández)
Interactions
SNP 3D (Ildefonso Cases)
Structure and SNPs
Mutation Viewer (Jaime Fernández)
Cancer Mutations
General Framework (Angel Carro, Eduardo Andrés León)
CIPF
Joaquin Tarraga
FatiGo: GO Classification Asignements
IDConverter: Ids Translator
PCB
Adam Hospital
MoDel : Molecular Dynamics Extended Library
Pmut: Prediction of pathological mutations
BSC
Dmitry Repchevsky
3D-Annotation: Domains annotation over 3D structures
MPI Inf.
Fidel Rodriguez
Anotation Similarity.
EBI- Thornton David Talavera
CSA and PDB Sum: http://www.ebi.ac.uk/thornton-srv/databases/CSA/
EBI-Brazma
Misha Kapushesky
ArrayExpress Top 5 experiments: http://www.ebi.ac.uk/microarray-as/aew/
Uni Bologna
Piero Fariselli, Ildefonso Cases (CNIO)
PhD-SNP:Predictor of human Deleterious Single Nucleotide Polymorphisms
http://gpcr2.biocomp.unibo.it/cgi/predictors/PhD-SNP/PhD-SNP.cgi
CBS
Peter Wad Sakett (service), Ildefonso Cases (CNIO)
ProtMod: Protein Modification and Transmembrane Predictions:
http://www.cbs.dtu.dk/services/
UCL Corin Yates, Joathan Lees
Gene3D and Cath
ENSEMBL Andreas Prlic
CNB
Natalia Jimenez
Visual Genomics: Gene Expression on Anatomical Atlases
Teresa Paramo
Gene2SNPs
SNPs in HapMap
Gene2tagSNPs
Tag SNPs
Gene3GADStudies Association Studies
UPF
Nuria Bigas
CGPROP Cancer gene properties
By Ildefonso Cases
Combining SNP3D and OMIM facilitates the study of the structural consequences of each
variant (SNPs and/or mutations). IN this case the mutations “0001,R248” is clearly
part of the DNA interaction site.
Comparative study with OMIN R249S, associated with Hepatocellular carcinoma is not
related to DNA binding. Related with phenotypic differences ?
“Functional Residues” widgets reports S249 shows that it is involved in ligand binding.
SNP-3D widget with 1GZH structure is part of the interaction interface between P53 and
P53-BP and part of the interaction with the SV40 Oncoprotein ( 2H1L structure).
“Enviro” Widget provides additional information on other interactions.
By Ildefonso Cases
_Line of action 2_: Annotating with detailed manual interpretation
of genes potentially associated with cancer and the mutations
already detected.
The plan here is to collaborate with the Sanger Cancer Genome Project
in the analysis of their list of genes. In particular in the analysis of
human protein kinases in a large collection of cancers (Greenman ...
Futreal and Stratton Patterns of somatic mutation in human cancer
genomes. Nature. 2007 Mar 8;446(7132):153-8.).
Possible functional consequences of the mutations knowing that less
than 1/3 of them are truly related with cancer. We will need here a
combination of structural bioinformatics and genomics (i.e. splicing
analysis, comparative genomics).
The automatic results of modelling and analysis tools will not be
sufficient and we have to think in how to develop a sufficiently robust
analysis framework valid for other families.
Interested people will be cancer groups in search for targets
interested in the relation between cancer/genes/SNPs/mutations.
For Discussion
Driver Vs Passenger mutations
There are 2 different kinds of mutations that arise with the cancer cell
spread-out:
–- Driver Mutations: Mutations that confer growth advantage on the cell in
which they occur, are casually implicated in cancer development and have
been therefore positively selected. They are by definition found in cancer
cells.
–- Passenger Mutations: Mutations not subject to positive selection.
Present in the cell that was the progenitor of the final clonal expansion of the
cancer, biologically neutral and do not confer growth advantage.
Normal Tissue
Mutation
Passenger
Driver
Cancer
(Greenman et al, Nature 2007)
30
(Wood et al, Science 2007)
Single Nucleotide Polymorphisms
A SNP is a DNA sequence variation occurring when a single nucleotide in the genome differs between
members of a species (or between paired chromosomes in an individual).
Almost all common SNPs have only two alleles, so we say they are dimorphic.
Within a population, SNPs can be assigned a minor allele frequency (the ratio of chromosomes in the
population carrying the less common variant to those with the most common variant). Only mutations with
a minor allele frequency of ≥ 1% (or 0.5%, depending on the dataset) are given the title "SNP". It is
important to note that there are variations between human populations, so a SNP allele that is common in
one geographical or ethnic group may be much rarer in another.
SNPs can localize everywhere in the genome:
- within coding sequences of genes,
- non-coding regions of genes,
- intergenic regions between genes.
A SNP, within a coding sequence, in which both forms lead to the same polypeptide sequence
(degeneracy of the genetic code) is termed synonymous (sometimes called a silent mutation) - if a
different polypeptide sequence is produced they are non-synonymous.
SNPs that are not in protein coding regions may still have consequences for gene splicing, transcription
factor binding, or the sequence of non-coding RNA.
31
By Jose M. G.-Izarzugaza
Maximal distance between changes
Dmax 
xn  x1
L
–Cancer-related mutations from the paper by Sjöblom et al. (2006).
–Ten randomly generated sets of positions
–SNPs downloaded from Ensembl
By David Talavera
Effect of mutations: effect on functional sites
Cancer-related
mutations
Random
positions
Ligand-binding
17%
21%
Metal-binding
7%
7%
Nucleic Acidbinding
10%
11%
Catalytic
0%
0%
By David Talavera
Effect of mutations: kind of substitution
Cancer-related
mutations
SNPs
Conservative
changes
55.1%
55.3%
Non-conservative
changes
44.9%
44.7%
•Cancer mutations are not randomly
distributed along the sequence;
however, there is no relation with
functional sites.
•Cancer-related mutations don’t
occur at extremely conserved
positions.
•Cancer-related mutations don’t
seem to be more drastic than SNPs.
By David Talavera
Protein Kinases
Protein kinases are enzymes that modify other proteins by chemically adding phosphate groups to them (phosphorylation).
Phosphorylation usually results in a functional change of the target protein (substrate) by changing enzyme activity, cellular
location, or association with other proteins.
The chemical activity of a kinase involves removing a phosphate group from ATP and covalently attaching it to one of three amino
acids that have a free hydroxyl group. Most kinases act on both serine and threonine, others act on tyrosine, and a number (dual
specificity kinases) act on all three.
The human genome contains about 520 protein kinase genes [Manning et al, 2001]
Disregulated kinase activity is a frequent cause of disease, particularly cancer, since kinases regulate cell growth, movement and
cell-death.
Protein Kinase is the most commonly found domain in known cancer genes
[Futreal et al, 2004]
Since protein kinases have key effects on the cell, their activity is highly regulated:
- by phosphorylation (sometimes auto-phosphorylation)
- by binding of activator proteins or inhibitor proteins.
- by binding of activator/inhibitor small molecules.
- by controlling their location in the cell relative to their substrates.
Drugs which inhibit specific kinases are being developed to treat
several diseases, and some are currently in clinical use,
including Gleevec (imatinib, leukaemia) and Iressa
(gefitinib, lung cancer).
35
35
By Jose M. G.-Izarzugaza
Many Structures
Kinases undergo a large
articulated motion when they
turn “on” and “off”
Active
Inactive
Source: Src tyrosine kinase from Protein DataBank
By Jose M. G.-Izarzugaza
Query Family (Kinases)
Family Members (From Kinbase)
Mutation analysis
workflow
Get SNPs
Map SNPs onto PDBs
for SNPs, very similar for Mutations
Family Representatives (From PDB)
Multiple
Structure
Alignment
Feature Distribution
Analysis
By Jose M. G.-Izarzugaza
Statistics on the PK PDB retrieval
Total Human Sequences in Kinbase
620
Sequences in Kinbase not Pseudogenes
516
Sequences with known Swissprot ID (asigned by BLAST)
488
Sequences with known Swissprot ID, Blast identity>95%
474
Kinases with at least one solver protein structure (PDB)
145
Human Kinase Sequences in the Multiple Seq. Alignment
266
Total Number of SNPs (Kinase Domain)
Synonymous SNPs
Non-Synonymous SNPs
569
263
306
Total Number of Mutations (Kinase Domain)
Driver Mutations
Passenger Mutations
140
73
63
By Jose M. G.-Izarzugaza
TreeDet vs firedb
TreeDet
vs
firedb
vs
conserv
By David de Juan
By David de Juan
Mean: 3.61
Median: 4.68
St.Dev: 3.12
Xd:
1.72
Mean: 6.50
Median: 6.35
St.Dev: 4.16
Xd:
1.24
Mean: 11.07
Median: 10.26
St.Dev: 7.06
Xd:
-0.07
Mean: 6.71
Median: 5.57
St.Dev: 3.69
Xd:
-0.30
Mean: 10.26
Median: 9.94
St.Dev: 6.32
Xd:
-0.78
Driver
Mean: 4.34
Median: 4.94
St.Dev: 2.58
Xd:
-0.89
Passenger
By Jose M. G.-Izarzugaza
Next
- “CARGO cancer gene list” paper to
be presented tomorrow with action
items (scope: Cancer Research)
- Mutation analysis is still a key
challenge. Creation of analysis
pipelines for all proteins and for protein
families (SNPs versus mutations,
driver versus passenger mutations)