Transcript Slide 1

Current Subject
Viral Identification
Using Microarray
Introduction to Bioinformatics
Dudu Burstein
Current Subject
Short Biology Introduction
Short Biology Introduction
DNA Microarrays
Introduction to Bioinformatics
3 of 25
Short Biology Introduction
Viruses
Introduction to Bioinformatics
4 of 25
The SARS Case
Round 1: Viral Identification
Using DNA Microarrays
Identification using microarray
Previous Identification
Techniques

Similar gene amplification

Antibody recognition (immunoscreening of cDNA
(degenerate PCR)
Libraries)
Drawbacks:

Limited candidates

Biased

Time consuming
Introduction to Bioinformatics
6 of 25
Identification using microarray
The DeRisi Lab Viral Microarray

Approx. 1,000 viruses

Probes 70 nucleotide long

10 most conserved of each virus

Amplification and hybridization
Objective: “create a microarray with the
capability of detecting the widest possible
range of both known and unknown viruses”
Introduction to Bioinformatics
7 of 25
Identification using microarray
The SARS Epidemic

SARS – Severe acute respiratory syndrome

Flu-like symptoms

Nov. 2002: first case in Gunangdong, China

15 Feb. 2003: Spreads to Hong-Kong

21 Feb.: 12 infections that will spread to
Hong Kong, Vietnam Singapore, Ireland,
Germany and Canada
Introduction to Bioinformatics
8 of 25
Identification using microarray
The SARS Epidemic

Cases in:
China, Hong Kong, Canada, Taiwan, Singapore,
Vietnam, USA, Philippines, Germany, Mongloia, Thailand, France,
Malaysia, Sweden, Italy, UK, India, Korea, Indonesia, South Africa,
Kuwait, Ireland, Romania, Russia, Spain, Switzerland.

Total 8,096 known cases

774 deaths

Mortality rate of 9.6%

April 2004 –
last reported case
Introduction to Bioinformatics
9 of 25
Identification using microarray
The SARS Identification

March 15th - WHO generate global alert

March 22th – samples obtained

Amplified and Hybridized with microarray
(1,000 viruses, 10 probes of 70 nucleotides)

Following results in less then 24 hours
Introduction to Bioinformatics
10 of 25
Identification using microarray
SARS Identification
Family
Virus
Corona
IBV
A
A
Corona
IBV
A
A
Corona
Bovine
corona
A
A
Corona
Human 229E
A
A
Astro
Turkey astro
A
A
Astro
Ovine astro
A
A
Astro
Avian
nephritis
A
A
Astro
Human astro
A
A
Introduction to Bioinformatics
11 of 25
Identification using microarray
SARS Identification
Family
Virus
Corona
IBV
A
A
Corona
IBV
A
A
Corona
Bovine
corona
A
A
Corona
Human 229E
A
A
Astro
Turkey astro
A
A
Astro
Ovine astro
A
A
Astro
Avian
nephritis
A
A
Astro
Human astro
A
A
Introduction to Bioinformatics
12 of 25
Identification using microarray
Summary (round 1)

Microarray of conserved sequences from thousands
of viruses

Hybridization enable identification

Rapid procedure

Limited homology suffice


Sequencing based on DNA recovered from
microarray
The SARS proof
Introduction to Bioinformatics
13 of 25
The E-Predict Algorithm
Round 2: The E-Predict
Algorithms
The E-Predict Algorithm
E-Predict Algorithm Challenges

Complex hybridization pattern, still time
consuming

Human interpretation might be biased

Separate closely related species

Unanticipated cross hybridization

Statistical significance

Signal from dozens or hundreds of species when
pure samples impossible to obtain (metagenomics)
Introduction to Bioinformatics
15 of 25
The E-Predict Algorithm
E-Predict Algorithm Outline
Introduction to Bioinformatics
16 of 25
The E-Predict Algorithm
Significance Estimation

Similarity ranking ≠ Probability that best
profile corresponds to virus in sample

1,009 independent diverse microarray data

For every virus, most data – false positive

Used as null (H0) Distribution
Introduction to Bioinformatics
17 of 25
The E-Predict Algorithm
Significance Estimation
Introduction to Bioinformatics
18 of 25
The E-Predict Algorithm
E-Predict Results – HPV18
Introduction to Bioinformatics
19 of 25
The E-Predict Algorithm
E-Predict Results – FluA
Introduction to Bioinformatics
20 of 25
The E-Predict Algorithm
Serotype Discrimination



HRV – species of the Rhinovirus genus, part
of the picornavirus family
HRV can be divided to:

HRV group A

HRV group B

HRV87 (closely related to enteroviruses)
Energy profiles of HRV89 (group A) and
HRV14 (group B)
Introduction to Bioinformatics
21 of 25
The E-Predict Algorithm
Serotype Discrimination
Introduction to Bioinformatics
22 of 25
The E-Predict Algorithm
Summary

Results achieved very rapidly

Minimal human interpretation: no bias

Not sensitive to noise

Handles complex hybridization pattern

Valid Interfamily and intrafamily separation

Serotype separation
Introduction to Bioinformatics
23 of 25
The E-Predict Algorithm
Possible Application



Pathogen detection:

clinical specimens

field isolates
Monitoring food/water contamination
Characterization of microbial communities
from soil/water
Introduction to Bioinformatics
24 of 25
The SARS Case
Thank You