Transcript Document
341: Introduction to Bioinformatics
Dr. Nataša Pržulj
Department of Computing
Imperial College London
[email protected]
1
Course overview
Motivation:
Flood of the available biological data:
Sequences and microarrays (Dr. Rice)
Protein 3D structure (Dr. Malod-Dognin)
Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
2
Course overview
Motivation:
Flood of the available biological data:
Sequences and microarrays (Dr. Rice)
Protein 3D structure (Dr. Malod-Dognin)
Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
The goal of systems biology:
Systems-level understanding of biological systems, e.g. the cell
Analyze not only individual components, but their interactions as
well and its functioning as a whole
E.g.: Learn new biology from the topology (wiring patterns) of
such interaction networks
3
Course overview
Motivation:
Flood of the available biological data:
Sequences and microarrays (Dr. Rice)
Protein 3D structure (Dr. Malod-Dognin)
Networks: e.g., of protein interactions; expected to be as useful
as the sequence data in uncovering new biology (Dr. Pržulj)
The goal of systems biology:
Systems-level understanding of biological systems, e.g. the cell
Analyze not only individual components, but their interactions as
well and its functioning as a whole
E.g.: Learn new biology from the topology (wiring patterns) of
such interaction networks
However, biological data analysis research faces
considerable challenges
Incomplete and noisy data
Computational intractability of many computational (e.g., graph
theoretic) problems
4
Course
overview
We will cover:
1.
2.
3.
Sequence analysis (Dr. Peter Rice)
Microarray analysis (Dr. Peter Rice)
Graph theoretic aspects:
•
•
•
4.
5.
Fundamental topics in graph theory (e.g., basic graph notation, graph representation, and
special graph types)
Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis)
Important computational complexity concepts (e.g., complexity classes, subgraph
isomorphism, and NP-completeness) which pose challenges on analyzing biological nets
Protein 3D structure (Dr. Malod-Dognin)
Biological networks aspects:
•
•
•
•
6.
Basic biological concepts (e.g., DNA, genes, proteins, gene expression, …)
Different types of biological networks
Experimental techniques for acquiring the data and their biases
Public databases and other sources of biological network data
Existing approaches for analyzing and modeling biological networks:
•
•
•
•
•
•
Structural properties of large networks
Network models
Network clustering
Network alignment
Integration of various heterogeneous networks
Software tools for network analysis
Applications – data analysis: interplay of topology and biology
7.
•
•
Learn how the above methods have been applied
Discuss valuable insights that have been learned: into biological function, evolution,
complex diseases (e.g., cancer) and drug discovery
5
Course overview
Grading scheme:
One coursework assignment
Given out on Feb 13 by email and posted on class website
Due on Thursday, March 5, by 2pm
Written exam
Standard DoC Grading Scheme will be used as
described by Degree Regulations at
https://www.doc.ic.ac.uk/internal/teachingsupport/re
gulations/index.htm
Other departments: we provide coursework and
exam marks and a particular department decides on
the weighting for the final grade
6
Course overview
7
Course overview
Course organization:
1.
Lectures
2.
Tutorials
3.
Exercises covering concepts covered in class
One coursework assignment
4.
Relevant theoretical concepts and examples
Opportunity to solve problems using the methods learned in class
Written exam
Testing students’ understanding of the concepts learned in lectures
Tutorial helpers:
Anida Sarajlic ([email protected] )
Dr. Noel Malod-Dognin ([email protected] )
Vladimir Gligorijevic ([email protected] )
Vuk Janjic ( [email protected] )
8
Course overview
Textbooks and readings
Recommended textbooks:
Pevzner and Shamir, “Bioinformatics for Biologists,” Cambridge University
Press, 2011
Junker and Schreiber, “Analysis of Biological Networks,” Wiley, 2008
West, “Introduction to graph theory,” 2nd edition, Prentice Hall, 2001
or T. Cormen et al., “Analysis of Algorithms”, 3rd edition, MIT press, 2009
A list of up-to-date research papers selected by the instructor: see
http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html .
Recommended readings:
F. Kepes (Author, Editor), “Biological Networks (Complex Systems and
Interdisciplinary Science),” World Scientific Publishing Company; 1st edition,
2007
Bornholdt and Schuster (Editors), “Handbook of Graphs and Networks: From
the Genome to the Internet,” Wiley, 2003
or
Dorogovtsev and Mendes (Authors), “Evolution of Networks: From Biological
Nets to the Internet and WWW (Physics),” Oxford University Press, 2003.
Chapter 17 from: Chen and Lonardi (Editors), “Biological Data Mining,”
Chapman and Hall/CRC press, 2009
Chapter 4 from: Jurisica and Wigle (Editors), “Knowledge Discovery in
Proteomics,” CRC Press, 2005
“LEDA: A Platform for Combinatorial and Geometric Computing,” by Kurt 9
Mehlhorn, Stefan Näher, Cambridge University Press, 1999
Course overview
When and where:
Thursdays 11-13h (LT 144) and Fridays 16-18h (LT 311)
Huxley Building
Contact:
E-mail: [email protected]
Subject: “341 Bioinformatics”
Office hours:
Fridays after class
Office: 407 C Huxley
10
Course overview
Prerequisites: no formal ones, but
General computational/mathematical maturity
Basic programming skills are desirable
Introduction into biological concepts will be provided
Course website (curriculum, class material, etc.):
http://www.doc.ic.ac.uk/~natasha/course2012/index.html
also linked from CATE
Academic code of honor
11
Topics
Introduction: biology (Dr. Przulj, 1 lecture)
Sequence analysis (Dr. Rice, 2 lectures)
Microarray analysis (Dr. Rice, 3 lectures)
Introduction to graph theory (Dr. Przulj, 2 lectures)
Protein 3D structure (Dr. Malod-Dognin, 2 lectures)
Network biology (Dr. Przulj, 8 lectures):
Network properties
Network/node centralities
Network motifs
Network models
Network/node clustering
Network comparison/alignment
Network data integration
Software tools for network analysis
Interplay between topology and biology
12
Course overview
Any questions so far?
13
Course overview
About you…
14
Introduction: biology
15
Introduction: biology
Cell - the building block of life
Cytoplasm and organelles separated by membranes:
Mitochondria, nucleus, etc.
16
Introduction: biology
Distinguish between:
Prokaryotes
Single-celled, no cell nucleus or any other
membrane-bound organelles
• The genetic material in prokaryotes is not membrane-bound
The bacteria and the archaea
Model organism: E.coli
Eukaryotes
Have “true” nuclei containing their DNA
May be unicellular, as in amoebae
May be multicellular, as in plants and animals
Model organism: S. cerevisiae (baker’s yeast)
17
Introduction: biology
Nucleus contains DNA
Deoxyribonucleic acid
DNA nucleotides: A and T, C and G
DNA structure: double helix
18
Introduction: biology
DNA is organized into Chromosomes
RNA: similar to DNA, except T U and single stranded
19
Introduction: biology
Main role of DNA: long-term storage of genetic information
Genes: DNA segments that carry this information
Intron: part of gene not translated into protein, spliced out of mRNA
(messenger RNA – conveys genetic info from DNA to ribosome
where proteins are made)
Exon: mRNA translated into protein; protein consists only of exonderived sequences
Genome: total set of all genes in an organism
Every cell (except sex cells and
mature red blood cells) contains
the complete genome of an organism
• So how can we have different cells (neuron, liver…)?
20
20
Introduction: biology
Codons: sets of three nucleotides
4 nucleotides 43=64 possible codons
Each codon codes for an amino acid
64 codons produce 20 different amino acids
More than one codon stands for one amino acid. Why?
Polypeptide:
String of amino acids, composed from a 20-character alphabet
Proteins:
Composed of one or more polypeptide chains (70-3000 amino acids)
Sequence of amino acids is defined by a gene
Gene expression: information transmission from DNA to proteins
Proteome: total set of proteins in an organism
21
Introduction: biology
The 20 amino acids
22
Introduction: biology
Levels of protein
structure:
23
23
Introduction: biology
Genes vs. proteins
Genes – passive; proteins – active
Protein synthesis: from genes to proteins
Transcription (in nucleus)
Splicing (eukaryotes)
Translation (in cytoplasm)
24
Introduction: biology
Transcription (in nucleus)
RNA polymerase enzyme builds an RNA strand
from a gene (DNA is “unzipped”)
The gene is transcribed to messenger RNA
(mRNA)
Transcription is regulated by proteins called
transcription factors
25
Introduction: biology
Transcription (in nucleus)
RNA polymerase enzyme builds an RNA strand
from a gene (DNA is “unzipped”)
The gene is transcribed to messenger RNA
(mRNA)
Transcription is regulated by proteins called
transcription factors
26
Introduction: biology
Splicing (eukaryotes) – in nucleus, after and
concurrently with transcription
Regions that are not coding for proteins
(introns) are removed from pre-mRNA
sequence
• Mature mRNA is produced
27
Introduction: biology
Translation (in cytoplasm)
Ribosomes synthesize proteins from mRNA
mRNA is decoded and used as a template to guide the
synthesis of a chain of amino acids that form a protein
Translation: the process of converting the mRNA codon
sequences into an amino acid polypeptide chain
28
Introduction: biology
Microarrays:
Measure mRNA abundance for each gene
The amount of transcribed mRNA correlates with
gene expression:
The rate at which a gene produces the corresponding protein
It is hard to measure
protein level directly!
29
Introduction: biology
Every cell* contains the complete genome of an organism
How is the variety of different tissues encoded and
expressed?
30
Introduction: biology
22,000?
31
Introduction: biology
-ome and –omics: studying collectively all genes,
proteins…
Genome and genomics
Proteome and proteomics
…
32