生物資訊 bioinformatics 林育慶

Transcript 生物資訊 bioinformatics 林育慶

生物資訊
bioinformatics
林育慶
Bioinformatics
資訊information


資訊 (information) 包括各種形式，如新聞、文
獻、影片、報告、數字、統計等，是人類活動
重要的記錄。
數量龐大且混雜無序的資訊一定要透過適當的
處理與分析，才能成為知識，為各行各業所用。
What is Bioinformatics?
Computer
Science
Mathematics
and Statistics
分子生物學
生物化學
遺傳學
物理化學
結構生物學
演化生物學
核磁共振學
基因體學
蛋白體學
計算物理學
Biology
演算法
圖像及訊號處理
電腦架構及資料庫管理
電腦語言
程式設計
人工智慧及訊息理論
設計與模擬作業
數值分析
統計學
軟體工程及自動化
Bioinformatics的產生
1970基因重組技術問世
1980生物科技產業發展 1990人體基因計劃生物資訊
GenBank為美國國家衛生院支持的資料庫。到2002年3月為止，已儲存有
14,976,310個DNA序列

大量DNA序列及蛋白結構方面的數據，不得不
依賴電腦軟體來加以處理分析，使得生物資訊
學(bioinformatics)，又稱為計算生物學
(computational biology)或資訊生物學
(information biology)，應運而生。
生物資訊學之主要工作



生物資料庫及網站之建立
ex:NCBI (National Center for Biotechnology
Information) 、GenBank
生物資訊的搜尋
生物資訊的分析
ex:EMBOSS包括有約100種序列分析軟體，
提供核心軟體庫，並整合其它共用軟體資源
HOT




數據取得及處理
基因體圖譜及比較
分子模型構築及模擬
ex:蛋白質之資訊除了胺基酸序列外，亦包含
3D結構，要求的不只是平面資訊，而是立體
資訊
DNA及蛋白質序列及結構比較
Need sequence alignment
Sequence alignment

想瞭解同一種蛋白質，在不同物種間其胺基
酸相似程度，同時想知道它們之間關係?
進行多重序列的排列(multiple sequence
alignment)，便可以知道這些蛋白質之間的關
係了。
Pairwise Alignment
Sequence A: CTTAACT
Sequence B: CGGATCAT
An alignment of A and B:
C---TTAACT
CGGATCA--T
Sequence A
Sequence B
Pairwise Alignment
Sequence A: CTTAACT
Sequence B: CGGATCAT
An alignment of A and B:
Mismatch
Match
C---TTAACT
CGGATCA--T
Insertion
gap
Deletion
gap
Alignment Graph
Sequence A: CTTAACT
Sequence B: CGGATCAT
C
C
T
T
A
A
C
T
G
G
A
T
C
A
T
C---TTAACT
CGGATCA--T
如何判斷相似度



Match: +8 (w(x, y) = 8, if x = y)
Mismatch: -5 (w(x, y) = -5, if x ≠ y)
Each gap symbol: -3 (w(-,x)=w(x,-)=-3)
C - - - T T A A C T
C G G A T C A - - T
+8 -3
-3
-3 +8 -5 +8 -3
-3
Alignment score
+8 = +12
k best local alignments

Smith-Waterman
(Smith and Waterman, 1981; Waterman and Eggert, 1987)

FASTA
(Wilbur and Lipman, 1983; Lipman and Pearson, 1985)

BLAST
(Altschul et al., 1990; Altschul et al., 1997)
FASTA
1)
2)
3)
4)
Find runs of identities, and identify regions
with the highest density of identities.
Re-score using PAM matrix, and keep top
scoring segments.
Eliminate segments that are unlikely to be
part of the alignment.
Optimize the alignment in a band.
FASTA
Step 1: Find runes of identities, and identify regions
with the highest density of identities.
Sequence B
Sequence A
FASTA
Step 2: Re-score using PAM matrix, and
keep top scoring segments.
FASTA
Step 3: Eliminate segments that are unlikely to be part of
the alignment.
FASTA
Step 4: Optimize the alignment in a band.
BLAST

Basic Local Alignment Search Tool
(by Altschul, Gish, Miller, Myers and Lipman)

The central idea of the BLAST algorithm is
that a statistically significant alignment is
likely to contain a high-scoring pair of
aligned words.
BLAST
Step 1: Build the hash table for Sequence A. (3-tuple example)
For DNA sequences:
Seq. A = AGATCGAT
12345678
AAA
AAC
..
AGA
..
ATC
..
CGA
..
GAT
..
TCG
..
TTT
1
3
5
2
4
6
BLAST
Step2: Scan sequence B for hits.
Step 3: Extend hits.
hit
生物資訊的產業應用現況

主要應用：genomics、cheminformatics、
proteomics、pharmacogenomics…

例如開發新藥，可以節省30%的新藥開發時間
及33%的開發支出

長遠目標：基因藥理學，高產能蛋白質結構，
蛋白質間的互動分析…
其他相關
(補充)
Bioinformatics and Computational Biology-Related
Journals:











Bioinformatics (previously called CABIOS)
Bulletin of Mathematical Biology
Computers and Biomedical Research
Genome Research
Genomics
Journal of Bioinformatics and Computational Biology
Journal of Computational Biology
Journal of Molecular Biology
Nature
Nucleic Acid Research
Science
Bioinformatics and Computational Biology-Related
Conferences:





Intelligent Systems for Molecular Biology (ISMB)
Pacific Symposium on Biocomputing
(PSB)
The Annual International Conference on Research
in Computational Molecular Biology (RECOMB)
The IEEE Computer Society Bioinformatics
Conference (CSB)
...
最後..資源哪裡找
主要生物資訊網站




主要之生物資訊網站，已漸漸將資料庫、搜尋
引擎及分析軟體合而為一
NCBI (National Center for Biotechnology
Information)
ExPASy (Expert Protein Analysis System)
EMBnet (European Molecular Biology
network)
NCBI (National Center for
Biotechnology Information)


依據public law 100-687而設立了NCBI，它隸
屬於美國國家醫學圖書館 (NLM)而位於美國國
家衛生院 (NIH) 院區
主要任務：(1) 建立生物資訊儲存分析的自動
化系統。 (2) 改善生物資訊搜尋及分析之方法。
(3) 促進生物醫學工作者對生物資訊及軟體之
使用。
reference








EBI (European Bioinformatics Institute)：http://www.ebi.ac.uk/
ExPASy (Expert Protein Analysis System)：
http://www.expasy.org/
GenomeNet (Japanese Bioinformatics Center)：
http://www.genome.ad.jp/
NCBI (National Center for Biotech Information)：
http://www.ncbi.nlm.nih.gov/
NIH (National Institute of Health)：http://www.mh.nih.gov/
國家衛生研究院(National Health Research Institute)：
http://www.nhri.org.tw/
http://www.csie.ntu.edu.tw/~kmchao 趙坤茂老師投影片
前人投影片…

生物資訊 bioinformatics 林育慶

Transcript 生物資訊 bioinformatics 林育慶

Directory