2010年生命科学现代研究技术系列讲座 -

Download Report

Transcript 2010年生命科学现代研究技术系列讲座 -

生物信息学课程
-- 数据库与网络服务
杜舟
生物信息学
2007级
苏震实验室
博二的老人了
Concepts
• Bioinformatics
• Computational Biology
(Many who draw a distinction between bioinformatics and computational
biology portray the former as a tool kit and the latter as science. )
• Database
• Web server
Web service
Nucleic Acids Research
Database and Web Server issue
Database
Web Server
Database http://www.oxfordjournals.org/nar/database/c/
Web sever
http://bioinformatics.ca/links_directory/
Google !!!
Bioinformatics主要期刊
专业期刊(以计算文章为主):
Bioinformaitcs,plos computational biology, BMC bioinforma
tics, journal of computational biology, BMC genomics ,
BMC systems biology, molecular biology eolution...
准专业期刊(基本上每期都有一定比例):
genome biology, nucleic acids research, genome research,
molecular systems biology, american journal of human ge
netics,...
综合期刊:nature,science,pnas,plos one,...
其它(偶尔有计算类文章发表):
nature biotechnology, nature genetics, nature methods, cel
l,trends genetics, plos genetics,...
Part I Overview of the bioinformatics
Database and web server
Part II Introduction to bioinformatics web
services created in Su Zhen's lab
Part III Construction of database and web
services
Three major public DNA
EMBL
GenBank
databases
DDBJ
In 1988, 由此三家组成了国际核酸序列数据库协作组织
(INSDC),规定:
1、数据交换与共享(每24小时进行一次)
2、使用统一的数据记录格式处理提交数据,以保证各数
据库相应记录在内容上的一致性。
3、数据的维护与更新。Each database updates
only the records that were directly
submitted to it.
14
What is accession number?
Accession number 是用来确定一个记录的标签。
Examples (all for retinol-binding protein, RBP4):
X02775
NT_030059
Rs7079946
GenBank genomic DNA sequence(1+5,2+6)
Genomic contig in RefSeq
dbSNP (single nucleotide polymorphism)
DNA
N91759.1
NM_006744
An expressed sequence tag (1 of 170)
RefSeq DNA sequence (from a transcript)
NP_007635
AAC02945
Q28369
1KT7
RefSeq protein
GenBank protein
SwissProt protein
Protein Data Bank structure record
RNA
protein
19
Accession number series in RefSeq
Experimentally determined sequences
NT_123456
NM_123456
NP_123456
Genomic contigs (DNA)
mRNA
Proteins
• Sequences derived through genome
annotation efforts
XM_123456
XP_123456
Model mRNAs
Model proteins
20
NCBI简介
• NCBI(National Center for Biotechnology Information),建立于1988年
• 主要任务
–
–
–
–
开发数据库
进行计算生物学研究
开发基因组数据分析的工具
发布生物医学信息等
• 对于数据库
– 管理数据库
•
•
•
•
•
•
Genbank
Unigene
Refseq
dbSNP
dbEST
OMIM
– 提供Entrez数据库检索
– BLAST数据库序列搜索比对等
利用NCBI获取所有玉米的全长cDNA
1.利用关键字 FLI-CDNA搜索
2. 选择nucleotide
3. 选择物种 --- 玉米
4.选择浏览方式 (可选)
5. 选择下载方式,可直接下载fasta文件
Pfam
http://pfam.janelia.org/
Genome Browser
• 浏览基因组信息:原始测序序列、基因结构、EST
支持、转录因子、序列保守性、SNP等一系列信息
。
• 缺点:只适合手工浏览,不适和大规模处理
Jbrowser
UCSC Introduction
• University of California Santa Cruz (UCSC)
• Genome Browser Database
• URL:http://genome.ucsc.edu/
• 数据构成:
– 基因组数据
– 基因组间的比对信息
– 参考序列(mRNA, EST)
– 基因注释信息(ENCODE项目)
UCSC HomePage
Genome Browser
Customized UCSC Browser
苏震实验室数据库及网络服务介绍
植物mRNA数据库
Zhenhai Zhang, Jingyin Yu, Daofeng Li, Zuyong Zhang, Fengxia Liu, Xin Zhou, Tao
Wang, Yi Ling, and Zhen Su Nucleic Acids Research, 2010, Vol. 38, Database issue
D806-D813
大豆功能数据库
苜蓿数据库
Li D, Su Z, Dong J, Wang T. An expression database for roots of the model legume
Medicago truncatula under salt stress. BMC Genomics. 2009 Nov 11;10(1):517.
植物分泌蛋白数据库
植物泛素化系统数据库
Zhou Du, Xin Zhou, Li Li, Zhen Su, plantsUPS: a database of plants'
Ubiquitin Proteasome System, BMC Genomics, 2009, 10:227
玉米信号转导数据库
BMC genomics, 2010
EasyGO:GO富集分析平台
Xin Zhou, Zhen Su, EasyGO: Gene Ontology-based annotation and functional
enrichment analysis tool for agronomical species, BMC Genomics 2007, 8:246
agriGO:农业物种GO富集分析平台
Zhou Du, Xin Zhou, Yi Ling, Zhenhai Zhang and Zhen Su
Nucleic Acids Research, 2010
Faculty of 1000 biology “Recommend”
构建数据库或网络服务可能需要用到的技术
Biological
Meaning
Literature
mining
Database
Linux
Apache
Computer
technique
(LAMP) + HTML (CSS) + Javascript
MySQL
PHP/Python/Perl
谢谢 ~