MATLAB Applications in Bioinformatics

Download Report

Transcript MATLAB Applications in Bioinformatics

Kristen Amuzzini
Biotech, Pharmaceutical, & Medical Industry
The MathWorks, Inc.
© 2003 The MathWorks, Inc.
Developing
and for
Deploying
Bioinformatics
MATLAB
Bioinformatics
MATLAB Applications in Bioinformatics
Applications with MATLAB
1
© 2003 The MathWorks, Inc.
Presentation Layout
 MATLAB applications in Bioinformatics
 Customer success stories
 MATLAB & The Bioinformatics Toolbox
 Sequence analysis
 Microarray analysis
 Integrating MATLAB with other tools
 MATLAB as computational engine for Excel
 Questions/Answers & Wrap-up
© 2003 The MathWorks, Inc.
Bioinformatics Applications
• Sequence analysis
• Base calling algorithm design, sequence alignment,
sequence building algorithms
• Microarray analysis
• Image processing, QA/QC, data normalization, data analysis
• Proteomics
• Mass Spectrometry signal processing, protein marker
identification and classification, peptide sequence
identification, 2D-Gel image analysis
• Systems Biology
• Interaction network identification, simulation of metabolic
pathways, flux analysis
© 2003 The MathWorks, Inc.
Bioinformatics teams supporting multiple
constituencies with multiple tools.
Bioinformatics Team
• Algorithm development
• Custom one-off analyses
• Programs for biologists
Research Biologists
• Prefer UI/Web based
tools
• Want custom analyses
•
•
•
•
•
•
•
Software Engineers
• C++, Java
• Work off MATLAB
prototypes
C/C++, Java, Perl
VB, Excel Macros
SQL
GUI Based tools
Freeware
SPLUS, R, SAS, Mathematica
Web based tools
© 2003 The MathWorks, Inc.
Using MATLAB, bioinformatics teams can support
multiple constituencies.
Research Biologists
• Prefer UI/Web based
tools
• Want custom analyses
Software Engineers
• C++, Java
• Work off MATLAB
prototypes
Bioinformatics Team
• Algorithm development
• Custom one-off analyses
• Programs for biologists
MATLAB GUI’s,
analyses
MATLAB
prototypes/
Applications
© 2003 The MathWorks, Inc.
User example: Genetic Sequence Base Calling
Complete draft of the human genome,
accelerated by Applied Biosystems —
using MATLAB algorithms.
“Having one integrated package
is a big advantage. Using MATLAB and the
MATLAB Compiler reduced my development time
by a factor of 4 or 5.”
“MATLAB has always been ideal as an algorithm
prototyping tool,” Labrenz concludes, “but the
MATLAB Compiler and C/C++ Math and Graphics
Libraries add a whole new dimension, allowing
rapid delivery of sophisticated solutions.”
Jim Labrenz, Applied Biosystems
© 2003 The MathWorks, Inc.
User example: Breast Cancer Prognosis
Rosetta Inpharmatics recently developed a tool
that enables clinicians to determine a breast
cancer patient’s prognosis based on the gene
expression profile of the primary tumor.
“Since MATLAB and the Image Processing Toolbox are
fully integrated and the MATLAB platform is very good for
matrix calculation, we did not have to spend time writing
the low level image processing and the basic data
analysis routines like vector and matrix calculations”
“Our research scientists are happy with the quick
feedback,” Dr. Dai says. “Using MathWorks tools, we can
respond to their requests very fast, and it’s easy for the
scientists to use these tools. Using the GUIs that we
develop in MATLAB, they can access functions without
having to remember the underlying code.”
Dr. Hongyue Dai,
Rosetta Inpharmatics/Merck & Company
© 2003 The MathWorks, Inc.
Academic users
• Bioinformatics Teaching
• MIT, Stanford, Cornell, Carnegie Mellon, …
• Research
• Sequencing
• Base calling algorithm design
• Sequence analysis
• Computational biolinguistics
• Microarray analysis
• Statistical modeling of microarrays
• Proteomics
• Statistical modeling of protein-protein interaction
• Systems Biology
• Flux Analysis
© 2003 The MathWorks, Inc.
Thousands of universities teach students using
MathWorks products.
More than 600 textbooks for education and professional use, in 19
languages
–
–
–
–
Biosciences
Controls
Signal Processing
Image Processing
–
–
–
–
Mechanical Engineering
Mathematics
Natural Sciences
Environmental Sciences
© 2003 The MathWorks, Inc.
Industry Issues & Solutions
•Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported.
•MATLAB is a supported, open
architecture, user-friendly
environment for data analysis across
applications, algorithm development,
and deployment.
•There is no standard biological
data format.
•MATLAB and the Bioinformatics
Toolbox provides file format support
for common data sources (webbased, sequences, microarray, etc.).
•Applications must be easily
deployable within organizations.
•MATLAB’s deployment tools and
user-interface design environment
allow easy deployment of MATLAB
based applications.
© 2003 The MathWorks, Inc.
Robert Henson
The MathWorks, Inc.
© 2003 The MathWorks, Inc.
Developing
Bioinformatics
MATLAB &and
The Deploying
Bioinformatics
Toolbox
The Bioinformatics Toolbox
Applications with MATLAB
11
© 2003 The MathWorks, Inc.
The MathWorks Product Family
Integrated for:
technical computing, data analysis and visualization
 system modeling and simulation
 implementation of real-time embedded software

Blocksets
Toolboxes
DAQ cards
Instruments
Databases and files
Financial Datafeeds
Stateflow
Stateflow
Code Generation
PC-based real-time
systems
Desktop Applications
Automated Reports
© 2003 The MathWorks, Inc.
Bioinformatics Toolbox 1.0
•
File I/O
• FASTA, PDB, SCF, GPR, GAL
•
Web Connectivity
• GenBank, EMBL, PIR, PDB
•
Sequence Analysis & Alignment
212 PYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRGIR
| | |
:| | |
: |: | :
: : |: | | | : | | | : | :
: | ::
321 PYISRYYPELAVHGAYSE -SETYSEQDVREVAEFAKIYGVQ
• Needleman-Wunsch, Smith-Waterman
• DNA/RNA/AA conversions, pattern searching
•
Microarray Normalization & Visualization
• Lowess, global mean, MAD (median absolute deviation)
•
Protein Visualization
• Atomic composition, molecular weight, hydrophobicity profile
© 2003 The MathWorks, Inc.
MATLAB Desktop Tools
Launchpad:
Start other tools and
demos
Command Window
Workspace
Browser:
See your data
Command
History
© 2003 The MathWorks, Inc.
Sequence Alignment Tutorial Example
•
•
•
•
•
•
Get human and mouse genes from GenBank
Look for open reading frames (ORFs)
Convert DNA sequences to amino acid sequences
Create a dotplot of the two sequences
Perform global alignment
Perform local alignment
© 2003 The MathWorks, Inc.
Microarray Data Analysis Tutorial Example
•
•
•
•
•
Plot expression profiles for genes
Filter genes based on information content of profile
Perform hierarchical clustering
Perform K-means clustering
Perform Principal Component Analysis
Reference:
DeRisi, JL, Iyer, VR, Brown, PO. "Exploring the metabolic and genetic control of gene expression on a genomic scale." Science. 1997 Oct 24;278(5338):680-6.
© 2003 The MathWorks, Inc.
Robert Henson
The MathWorks, Inc.
© 2003 The MathWorks, Inc.
and Deploying
Integrating
Developing
andIntegrating
Deploying
and Deploying
Bioinformatics
Bioinformatics
Tools with
Bioinformatics Tools with MATLAB
Applications
MATLAB
with MATLAB
17
© 2003 The MathWorks, Inc.
Connecting to MATLAB
C/C++
Java
Perl
Excel / COM
File I/O
© 2003 The MathWorks, Inc.
Deploying with MATLAB
COM
Excel
© 2003 The MathWorks, Inc.
Push Data into MATLAB
Data I/O
• Import Excel ranges
into MATLAB
• Export MATLAB data into
Excel ranges
• Evaluate MATLAB Statements in
Excel
© 2003 The MathWorks, Inc.
Computational Engine for Excel
Spread Sheet Applications
•
MATLAB Excel Link can
be the computational
engine behind your Excel
applications
•
Fast scalable solution
MLPutMatrix("data",B2:H43)
MLPutMatrix("Genes",A2:A43)
MLPutMatrix("TimeSteps",B1:H1)
MLEvalString("clustergram(data,'RowLabels',…
Genes,'ColLabels',TimeSteps)")
© 2003 The MathWorks, Inc.
What else could you do?
Bioinformatics
Statistics
Signal Processing
Neural Networks
Image Processing
Optimization
© 2003 The MathWorks, Inc.
Robert Henson
The MathWorks, Inc.
© 2003 The MathWorks, Inc.
Integrating
Developing
and Deploying
and Summary
Deploying
Bioinformatics
Bioinformatics
Tools with
Applications
MATLAB
with MATLAB
23
© 2003 The MathWorks, Inc.
Industry Issues & Solutions
•Integrating tools from various
programming languages is
difficult, closed source tools are
not customizable, and freeware
is often not supported.
•MATLAB is a supported, open
architecture, user-friendly
environment for data analysis across
applications, algorithm development,
and deployment.
•There is no standard biological
data format.
•MATLAB and the Bioinformatics
Toolbox provides file format support
for common data sources (webbased, sequences, microarray, etc.).
•Applications must be easily
deployable within organizations.
•MATLAB’s deployment tools and
user-interface design environment
allow easy deployment of MATLAB
based applications.
© 2003 The MathWorks, Inc.
Further Information
• Bioinformatics Toolbox Product page
–Demos, technical literature, trial information
–www.mathworks.com/products/bioinfo
• MATLAB Central
– File exchange and newsgroup access for
MATLAB and Simulink users
– www.mathworks.com/matlabcentral
– Access to comp.soft-sys.matlab
file exchange and newsgroup access for
the MATLAB & Simulink user community
© 2003 The MathWorks, Inc.