Using MATLAB -SQL Server build local Swiss

Download Report

Transcript Using MATLAB -SQL Server build local Swiss

Swiss-Prot Database
Hongbo Xie
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
1
Presentation Outline




Background Introduction
Swiss-Prot database Features
Building local Swiss-Prot database
Experiments and Results
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
2
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
3
An Example of protein data
1.
2.
3.
4.
5.
6.
7.
8.
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
Sequence information
Structure information
Function information
Gene information
Name in order to search
Link to other database
Paper reference
Others.
4
Metadata
Swissprot -a curetted
database
...
OS
OC
OC
RN
RP
RC
RX
RA
RL
...
11SB_CUCMA
STANDARD;
PRT;
480 AA.
P13744;
01-JAN-1990 (REL. 13, CREATED)
01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE)
01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE)
11S GLOBULIN BETA SUBUNIT PRECURSOR.
CUCURBITA MAXIMA (PUMPKIN) (WINTER SQUASH).
EUKARYOTA; PLANTA; EMBRYOPHYTA; ANGIOSPERMAE; DICOTYLEDONEAE;
VIOLALES; CUCURBITACEAE.
[1]
SEQUENCE FROM N.A.
STRAIN=CV. KUROKAWA AMAKURI NANKIN;
MEDLINE; 88166744.
HAYASHI M., MORI H., NISHIMURA M., AKAZAWA T., HARA-NISHIMURA I.;
EUR. J. BIOCHEM. 172:627-632(1988).
[2]
SEQUENCE OF 22-30 AND 297-302.
OHMIYA M., HARA I., MASTUBARA H.;
PLANT CELL PHYSIOL. 21:157-167(1980).
-!- FUNCTION: THIS IS A SEED STORAGE PROTEIN.
-!- SUBUNIT: HEXAMER; EACH SUBUNIT IS COMPOSED OF AN ACIDIC AND A
BASIC CHAIN DERIVED FROM A SINGLE PRECURSOR AND LINKED BY A
DISULFIDE BOND.
-!- SIMILARITY: TO OTHER 11S SEED STORAGE PROTEINS (GLOBULINS).
EMBL; M36407; G167492; -.
PIR; S00366; FWPU1B.
PROSITE; PS00305; 11S_SEED_STORAGE; 1.
SEED STORAGE PROTEIN; SIGNAL.
SIGNAL
1
21
CHAIN
22
480
11S GLOBULIN BETA SUBUNIT.
CHAIN
22
296
GAMMA CHAIN (ACIDIC).
CHAIN
297
480
DELTA CHAIN (BASIC).
MOD_RES
22
22
PYRROLIDONE CARBOXYLIC ACID.
DISULFID
124
303
INTERCHAIN (GAMMA-DELTA) (POTENTIAL).
CONFLICT
27
27
S -> E (IN REF. 2).
CONFLICT
30
30
E -> S (IN REF. 2).
SEQUENCE
480 AA; 54625 MW; D515DD6E CRC32;
MARSSLFTFL CLAVFINGCL SQIEQQSPWE FQGSEVWQQH RYQSPRACRL ENLRAQDPVR
RAEAEAIFTE VWDQDNDEFQ CAGVNMIRHT IRPKGLLLPG FSNAPKLIFV AQGFGIRGIA
EAFQIDGGLV RKLKGEDDER DRIVQVDEDF EVLLPEKDEE ERSRGRYIES ESESENGLEE
TICTLRLKQN IGRSVRADVF NPRGGRISTA NYHTLPILRQ VRLSAERGVL YSNAMVAPHY
TVNSHSVMYA TRGNARVQVV DNFGQSVFDG EVREGQVLMI PQNFVVIKRA SDRGFEWIAF
KTNDNAITNL LAGRVSQMRM LPLGVLSNMY RISREEAQRL KYGQQEMRVL SPGRSQGRRE
CUCURBITA MAXIMA (PUMPKIN) (WINTER SQUASH).
EUKARYOTA; PLANTA; EMBRYOPHYTA; ANGIOSPERMAE; DICOTYLEDONEAE;
VIOLALES; CUCURBITACEAE.
[1]
SEQUENCE FROM N.A.
STRAIN=CV. KUROKAWA AMAKURI NANKIN;
MEDLINE; 88166744.
HAYASHI M., MORI H., NISHIMURA M., AKAZAWA T., HARA-NISHIMURA I.;
EUR. J. BIOCHEM. 172:627-632(1988).
(1 of ~100,000 entries)
Recordof history
DT
DT
DT
ID
AC
DT
DT
DT
DE
OS
OC
OC
RN
RP
RC
RX
RA
RL
RN
RP
RA
RL
CC
CC
CC
CC
CC
DR
DR
DR
KW
FT
FT
FT
FT
FT
FT
FT
FT
SQ
01-JAN-1990 (REL. 13, CREATED)
01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE)
01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE)
Sequence Data
MARSSLFTFL
RAEAEAIFTE
IPGCAETYQT
FADTRNVANQ
...
7/21/2015 8:05:25 AM
CLAVFINGCL
VWDQDNDEFQ
DLRRSQSAGS
IDPYLRKFYL
SQIEQQSPWE
CAGVNMIRHT
AFKDQHQKIR
//
AGRPEQVERG
FQGSEVWQQH
IRPKGLLLPG
PFREGDLLVV
VEEWERSSRK
Swiss-Prot Database --- Xie, H
RYQSPRACRL
FSNAPKLIFV
PAGVSHWMYN
GSSGEKSGNI
ENLRAQDPVR
AQGFGIRGIA
RGQSDLVLIV
FSGFADEFLE
5
SwissProt web search
ID,AC,etc
143B_HUMAN
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
6
Find the single entry.
But how about I try to
find all proteins with
same ProtoMap Index
Web search can not do everything, we definitely
Need some improvement.
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
7
Our Research



Goal: Study the whole SwissProt database in
order to find out the relationship between the
protein sequence, protein structure and
function.
Tool: MATLAB
Methodology : Building Localized Swiss-Prot
database in MS SQL Server to accelerate the
research
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
8
Connection between Matlab and Database

Data could be moved between Matlab and Database
Import
Matlab
Database
Export
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
9
Local SwissProt Database
Schema
KW table
ID
ProtoMap Table
KW
ID
ProtM_ID
DR table
Protein Table
ID
DR
ID
Pfam Table
ID
Prediction Table
LENGTH
NO
ID
C
SEQ
V
S
VL2
VL3
Pfam_ID
VL4
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
10
Search within our database
Back to the old question, instead of
questions like single query, we can try
some queries for a summary report.
Q:
“List the proteins’ IDs with the ProtoMap family
index = ‘840’ “
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
11
Type “querybuilder” in command window
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
12
Select certain table
name
Select variable
name we try to
collect
Select DB
name
select distinct id from protomap
where protfam=‘840’
Matlab
varible hold
the result
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
13
Result
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
14
Feedback/Questions ?
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
15
THANK
YOU !
7/21/2015 8:05:25 AM
Swiss-Prot Database --- Xie, H
16