Bioinformatics: a Data Centric Perspective

Download Report

Transcript Bioinformatics: a Data Centric Perspective

Bioinformatics: a Multidisciplinary Challenge

Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003

What is Bioinformatics?

• The application of information technology to life sciences research – modeling (abstraction) – analysis and collection – data integration and information retrieval • Enables the discovery and analysis of biomolecules and their properties (structure, function, interactions) for e.g.

– pharmaceutical research – medical diagnosis – agriculture • AKA

computational

or

dry

or

in silico

biology

Computational Sciences: Analytic and Predictive • Physics

– Universal : mechanics, electricity, particle physics – Started in the 17 th Century

• Chemistry

– Specific – 19 th materials Century

• Biology

– The study of living organisms – Metamorphosis coincides with the huge increase in data acquisition capabilities and computational power

Biological Revolution Necessitates Bioinformatics •New bio-technologies (automatic sequencing, DNA chips, protein identification, mass specs., etc.) produce large quantities of biological data.

• It is impossible to analyze data by manual inspection.

• Bioinformatics: Development of algorithms that enable the analysis of the data (from experiments or from databases).

Data produced by biologists and stored in database Bioinformatics Algorithms and Tools New information for biological and medical use

Central Dogma of Molecular Biology

Transcription Translation mRNA Gene (DNA) Cells express different subset of the genes in different tissues and under different conditions Protein

The Genetic Code

Central Paradigm of Bioinformatics Genetic Information Biochemical Function Molecular Structure Symptoms

• Exponential growth of biological information: .

• Efficient storage and management tools are most important.

Activities

• Development of new models , algorithms and statistical methods to assess and predict the relationships among members of very large data sets • Development and implementation of tools to efficiently access types of information .

and manage different • Application of these methods and tools to real problems in biology by conducting bioinformatic experiments

Primary Areas

• Genomics • Proteomics • Metabolomics • “Systems biology”

Genomics • Sequence analysis

– Homology searches – Assembly of ESTs – Domain and profile identification

• Gene hunting

– Promoter identification – Genomic maps • Comparative genomics – SNP detection ( point mutations) : among individuals – Genomic rearrangement: among species

Towards large scale genomic comparisons…

Human vs. Mouse

Proteomics

• Functional prediction • Localization • Expression analysis • Structure prediction • Docking information for biomolecules • …

Metabolomics and Systems Biology

• Metabolic and regulatory

pathways

• Cell simulation • Toxicological and phaprmacological parameters

Data Types

• Strings (over nucleotides and amino acids) • 2D and 3D geometric structures • Images • Numeric data (expression data, mass spec, …) • Graphs (pathways, networks, …) • Text articles • …

Some Queries

• What genes are connected to a disease ?

• What proteins are encoded by them ?

• Under what conditions are they expressed?

• What pathways do they participate in?

• Which are targets for new therapeutics ?

• What will happen if we introduce a virus into a certain environment?

• …

Data Sources

• Mostly public – NCBI, EMBL, KEGG, Swissprot, … • Also some commercial – Celera, Compugen, … • Ever changing …

Disciplines

• Life sciences – Biology – Biochemistry – Medicine – … • Computing – Mathematics – Computer science – Information management – Information theory

The Gap

• Life sciences – Descriptive – Based on observations, lots of exceptions – Constant evolution and change of paradigms based on new discovery • Computer science – Analytic – Exact and predictive – “Linear”, synthetic evolution

Bridging the Gap

• Study both disciplines – Start as early as possible • Work in joint – At all levels teams • Learn from each others’ methods – Increase [web] sophistication of life scientists – Teach computer scientists to model world the real

Example: Intro to Bioinformatics

• Grand tour of tools and methods – Extensive web presence – Many highly specialized tools – Diversity in each category – Require high skill in specific usage – Loose integration • Initial encounter with topic – Prereqs: Biology 1 and Intro to CS • Must bridge gaps among disciplines

Method

• All work in pairs of LS and CS students – Strict enforcement – Develop dialogue – Complementary skills • “Dry” labs , homework (reports) and final project (including presentation) • Topical presentations coupled with labs – Delivered by Esti Yeger-Lotem, a CS/Biology expert (speaks both languages) – Labs run by TAs