Transcript Document

341: Introduction to Bioinformatics
Dr. Nataša Pržulj
Department of Computing
Imperial College London
[email protected]
1
Course overview
Motivation:
 Flood of the available biological data:
 Sequences and microarrays (Dr. Rice)
 Protein 3D structure (Dr. Malod-Dognin)
 Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
2
Course overview
Motivation:
 Flood of the available biological data:
 Sequences and microarrays (Dr. Rice)
 Protein 3D structure (Dr. Malod-Dognin)
 Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
 The goal of systems biology:
 Systems-level understanding of biological systems, e.g. the cell
 Analyze not only individual components, but their interactions as
well and its functioning as a whole
 E.g.: Learn new biology from the topology (wiring patterns) of
such interaction networks
3
Course overview
Motivation:
 Flood of the available biological data:
 Sequences and microarrays (Dr. Rice)
 Protein 3D structure (Dr. Malod-Dognin)
 Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
 The goal of systems biology:
 Systems-level understanding of biological systems, e.g. the cell
 Analyze not only individual components, but their interactions as
well and its functioning as a whole
 E.g.: Learn new biology from the topology (wiring patterns) of
such interaction networks
 However, biological data analysis research faces
considerable challenges
 Incomplete and noisy data
 Computational intractability of many computational (e.g., graph
theoretic) problems
4
Course
overview
We will cover:
1.
2.
3.
Sequence analysis (Dr. Peter Rice)
Microarray analysis (Dr. Peter Rice)
Graph theoretic aspects:
•
•
•
4.
5.
Fundamental topics in graph theory (e.g., basic graph notation, graph representation, and
special graph types)
Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis)
Important computational complexity concepts (e.g., complexity classes, subgraph
isomorphism, and NP-completeness) which pose challenges on analyzing biological nets
Protein 3D structure (Dr. Malod-Dognin)
Biological networks aspects:
•
•
•
•
6.
Basic biological concepts (e.g., DNA, genes, proteins, gene expression, …)
Different types of biological networks
Experimental techniques for acquiring the data and their biases
Public databases and other sources of biological network data
Existing approaches for analyzing and modeling biological networks:
•
•
•
•
•
•
Structural properties of large networks
Network models
Network clustering
Network alignment
Integration of various heterogeneous networks
Software tools for network analysis
Applications – data analysis: interplay of topology and biology
7.
•
•
Learn how the above methods have been applied
Discuss valuable insights that have been learned: into biological function, evolution,
complex diseases (e.g., cancer) and drug discovery
5
Course overview
 Grading scheme:
 One coursework assignment
 Given out on Feb 13 by email and posted on class website
 Due on Thursday, March 5, by 2pm
 Written exam
 Standard DoC Grading Scheme will be used as
described by Degree Regulations at
https://www.doc.ic.ac.uk/internal/teachingsupport/re
gulations/index.htm
 Other departments: we provide coursework and
exam marks and a particular department decides on
the weighting for the final grade
6
Course overview
7
Course overview

Course organization:
1.
Lectures

2.
Tutorials

3.
Exercises covering concepts covered in class
One coursework assignment

4.
Relevant theoretical concepts and examples
Opportunity to solve problems using the methods learned in class
Written exam

Testing students’ understanding of the concepts learned in lectures
 Tutorial helpers:




Anida Sarajlic ([email protected] )
Dr. Noel Malod-Dognin ([email protected] )
Vladimir Gligorijevic ([email protected] )
Vuk Janjic ( [email protected] )
8
Course overview

Textbooks and readings

Recommended textbooks:
Pevzner and Shamir, “Bioinformatics for Biologists,” Cambridge University
Press, 2011
 Junker and Schreiber, “Analysis of Biological Networks,” Wiley, 2008
 West, “Introduction to graph theory,” 2nd edition, Prentice Hall, 2001
or T. Cormen et al., “Analysis of Algorithms”, 3rd edition, MIT press, 2009
 A list of up-to-date research papers selected by the instructor: see
http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html .


Recommended readings:





F. Kepes (Author, Editor), “Biological Networks (Complex Systems and
Interdisciplinary Science),” World Scientific Publishing Company; 1st edition,
2007
Bornholdt and Schuster (Editors), “Handbook of Graphs and Networks: From
the Genome to the Internet,” Wiley, 2003
or
Dorogovtsev and Mendes (Authors), “Evolution of Networks: From Biological
Nets to the Internet and WWW (Physics),” Oxford University Press, 2003.
Chapter 17 from: Chen and Lonardi (Editors), “Biological Data Mining,”
Chapman and Hall/CRC press, 2009
Chapter 4 from: Jurisica and Wigle (Editors), “Knowledge Discovery in
Proteomics,” CRC Press, 2005
“LEDA: A Platform for Combinatorial and Geometric Computing,” by Kurt 9
Mehlhorn, Stefan Näher, Cambridge University Press, 1999
Course overview
 When and where:
Thursdays 11-13h (LT 144) and Fridays 16-18h (LT 311)
Huxley Building
 Contact:
E-mail: [email protected]
Subject: “341 Bioinformatics”
 Office hours:
Fridays after class
Office: 407 C Huxley
10
Course overview
 Prerequisites: no formal ones, but
General computational/mathematical maturity
Basic programming skills are desirable
Introduction into biological concepts will be provided
 Course website (curriculum, class material, etc.):
 http://www.doc.ic.ac.uk/~natasha/course2012/index.html
also linked from CATE
 Academic code of honor
11
Topics






Introduction: biology (Dr. Przulj, 1 lecture)
Sequence analysis (Dr. Rice, 2 lectures)
Microarray analysis (Dr. Rice, 3 lectures)
Introduction to graph theory (Dr. Przulj, 2 lectures)
Protein 3D structure (Dr. Malod-Dognin, 2 lectures)
Network biology (Dr. Przulj, 8 lectures):
 Network properties
 Network/node centralities
 Network motifs
 Network models
 Network/node clustering
 Network comparison/alignment
 Network data integration
 Software tools for network analysis
 Interplay between topology and biology
12
Course overview
 Any questions so far?
13
Course overview
 About you…
14
Introduction: biology
15
Introduction: biology
 Cell - the building block of life
Cytoplasm and organelles separated by membranes:
 Mitochondria, nucleus, etc.
16
Introduction: biology
 Distinguish between:
Prokaryotes
Single-celled, no cell nucleus or any other
membrane-bound organelles
• The genetic material in prokaryotes is not membrane-bound
The bacteria and the archaea
Model organism: E.coli
Eukaryotes
Have “true” nuclei containing their DNA
May be unicellular, as in amoebae
May be multicellular, as in plants and animals
Model organism: S. cerevisiae (baker’s yeast)
17
Introduction: biology
 Nucleus contains DNA
Deoxyribonucleic acid
 DNA nucleotides: A and T, C and G
 DNA structure: double helix
18
Introduction: biology
 DNA is organized into Chromosomes
 RNA: similar to DNA, except T U and single stranded
19
Introduction: biology
 Main role of DNA: long-term storage of genetic information
 Genes: DNA segments that carry this information
 Intron: part of gene not translated into protein, spliced out of mRNA
(messenger RNA – conveys genetic info from DNA to ribosome
where proteins are made)
 Exon: mRNA translated into protein; protein consists only of exonderived sequences
 Genome: total set of all genes in an organism
Every cell (except sex cells and
mature red blood cells) contains
the complete genome of an organism
• So how can we have different cells (neuron, liver…)?
20
20
Introduction: biology
 Codons: sets of three nucleotides
 4 nucleotides  43=64 possible codons
 Each codon codes for an amino acid
 64 codons produce 20 different amino acids
 More than one codon stands for one amino acid. Why?
 Polypeptide:
 String of amino acids, composed from a 20-character alphabet
 Proteins:
 Composed of one or more polypeptide chains (70-3000 amino acids)
 Sequence of amino acids is defined by a gene
 Gene expression: information transmission from DNA to proteins
 Proteome: total set of proteins in an organism
21
Introduction: biology
 The 20 amino acids
22
Introduction: biology
 Levels of protein
structure:
23
23
Introduction: biology
 Genes vs. proteins
Genes – passive; proteins – active
 Protein synthesis: from genes to proteins
Transcription (in nucleus)
Splicing (eukaryotes)
Translation (in cytoplasm)
24
Introduction: biology
Transcription (in nucleus)
RNA polymerase enzyme builds an RNA strand
from a gene (DNA is “unzipped”)
The gene is transcribed to messenger RNA
(mRNA)
Transcription is regulated by proteins called
transcription factors
25
Introduction: biology
Transcription (in nucleus)
RNA polymerase enzyme builds an RNA strand
from a gene (DNA is “unzipped”)
The gene is transcribed to messenger RNA
(mRNA)
Transcription is regulated by proteins called
transcription factors
26
Introduction: biology
 Splicing (eukaryotes) – in nucleus, after and
concurrently with transcription
 Regions that are not coding for proteins
(introns) are removed from pre-mRNA
sequence
• Mature mRNA is produced
27
Introduction: biology
 Translation (in cytoplasm)
Ribosomes synthesize proteins from mRNA
mRNA is decoded and used as a template to guide the
synthesis of a chain of amino acids that form a protein
Translation: the process of converting the mRNA codon
sequences into an amino acid polypeptide chain
28
Introduction: biology
 Microarrays:
Measure mRNA abundance for each gene
The amount of transcribed mRNA correlates with
gene expression:
 The rate at which a gene produces the corresponding protein
It is hard to measure
protein level directly!
29
Introduction: biology
 Every cell* contains the complete genome of an organism
 How is the variety of different tissues encoded and
expressed?
30
Introduction: biology
22,000?
31
Introduction: biology
 -ome and –omics: studying collectively all genes,
proteins…
Genome and genomics
Proteome and proteomics
…
32