Transcript Document

Detection Algorithms for
Biosurveillance: A tutorial
RODS: http://www.health.pitt.edu/rods
Auton Lab: http://www.autonlab.org
Copyright © 2002, 2003, 2004 Andrew Moore
Biosurveillance Detection Algorithms: Slide 1
Signal
The Basic Task: Analyze a time series data stream to
find outbreaks without sounding too many false alarms
Time
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 2
Many Methods!
Method
Time-weighted averaging
Serfling
ARIMA
SARIMA + External Factors
Univariate HMM
Kalman Filter
Recursive Least Squares
Support Vector Machine
Neural Nets
Randomization
Spatial Scan Statistics
Bayesian Networks
Contingency Tables
Scalar Outlier (SQC)
Multivariate Anomalies
Change-point statistics
FDR Tests
WSARE (Recent patterns)
PANDA (Causal Model)
FLUMOD (space/Time HMM)
Has
Pitt/CMU
tried it?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Tried
but little
used
Yes
Yes
Yes
Yes
Yes
Yes
Tried
and
used
Under development
Multivariate
signal
tracking?
Spatial
?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
(w/ Howard Burkom)
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Details of these methods and bibliography available from “Summary of Biosurveillance-relevant
statistical and data mining technologies” by Moore, Cooper, Tsui and Wagner. Downloadable
(PDF format) from www.cs.cmu.edu/~awm/biosurv-methods.pdf
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 3
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 4
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
Copyright © 2002, 2003, Andrew Moore
These are all
powerful statistical
methods, which
means they all
have to have one
thing in common…
Biosurveillance Detection Algorithms: Slide 5
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
Copyright © 2002, 2003, Andrew Moore
These are all
powerful statistical
methods, which
means they all
have to have one
thing in common…
Boring Names.
Biosurveillance Detection Algorithms: Slide 6
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
WSARE
Copyright © 2002, 2003, Andrew Moore
These are all
powerful statistical
methods, which
means they all
have to have one
thing in common…
Boring Names.
Univariate Anomaly
Detection
Multivariate
Anomaly Detection
Spatial Scan
Statistics
Biosurveillance Detection Algorithms: Slide 7
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
WSARE
Copyright © 2002, 2003, Andrew Moore
Univariate Anomaly
Detection
Multivariate
Anomaly Detection
Spatial Scan
Statistics
Biosurveillance Detection Algorithms: Slide 8
Signal
Univariate Time Series
Time
Example Signals:
•
•
•
•
•
Copyright © 2002, 2003, Andrew Moore
Number of ED visits today
Number of ED visits this hour
Number of Respiratory Cases Today
School absenteeism today
Nyquil Sales today
Biosurveillance Detection Algorithms: Slide 9
(When) is there an anomaly?
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 10
(When) is there an anomaly?
This is a time series of counts
of primary-physician visits in
data from Norfolk in December
2001. I added a fake outbreak,
starting at a certain date. Can
you guess the start date?
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 11
(When) is there an anomaly?
This is a time series of counts
of primary-physician visits in
data from Norfolk in December
2001. I added a fake outbreak,
starting at a certain date. Can
you guess when?
Here (much
too high for a
Friday)
(injected outbreak)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 12
Signal
An easy case
Time
Dealt with by Statistical Quality Control
Record the mean and standard deviation up
to the current time.
Signal an alarm if we go outside 3 sigmas
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 13
An easy case: Control Charts
Signal
Upper Safe Range
Mean
Time
Dealt with by Statistical Quality Control
Record the mean and standard deviation up
to the current time.
Signal an alarm if we go outside 3 sigmas
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 14
Control Charts on the Norfolk Data
Alarm Level
(injected
outbreak)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 15
Control Charts on the Norfolk Data
Alarm Level
(injected
outbreak)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 16
Control Charts on the Norfolk Data
Alarm Level
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 17
Looking at changes from yesterday
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 18
Looking at changes from yesterday
Alarm Level
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 19
Looking at changes from yesterday
Alarm Level
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 20
We need a happy medium:
Control Chart:
Too
insensitive to recent
changes
Copyright © 2002, 2003, Andrew Moore
Change from yesterday:
Too sensitive to recent
changes
Biosurveillance Detection Algorithms: Slide 21
Moving Average
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 22
Moving Average
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 23
Moving Average
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 24
Moving Average
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 25
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 26
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 27
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 28
Signal
Seasonal Effects
Time
Fit a periodic function (e.g. sine wave) to previous
data. Predict today’s signal and 3-sigma
confidence intervals. Signal an alarm if we’re off.
Reduces False alarms from Natural outbreaks.
Different times of year deserve different thresholds.
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 29
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 30
Day-of-week effects
Fit a day-of-week component
E[Signal] = a + deltaday
E.G: deltamon= +5.42, deltatue= +2.20, deltawed=
+3.33, deltathu= +3.10, deltafri= +4.02,
deltasat= -12.2, deltasun= -23.42
A simple form
of ANOVA
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 31
Regression using Hours-in-day & IsMonday
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 32
Regression using Hours-in-day & IsMonday
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 33
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 34
Regression using Mon-Tue
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 35
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 36
CUSUM
• CUmulative SUM Statistics
• Keep a running sum of “surprises”: a sum of
excesses each day over the prediction
• When this sum exceeds threshold, signal
alarm and reset sum
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 37
CUSUM
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 38
CUSUM
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 39
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 40
The Sickness/Availability Model
Counts = sickness * availability
Plot
this
Sickness = counts / availability
Sick people may seek care more often on
certain days due to availability of medical
services or time in their schedules, so
adjust for that phenomenon
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 41
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 42
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 43
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 44
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 45
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 46
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 47
The Sickness/Availability Model
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 48
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 49
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 50
Exploiting Denominator Data
Normalize
(divide)
by total
visits
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 51
Exploiting Denominator Data
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 52
Exploiting Denominator Data
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 53
Exploiting Denominator Data and Smoothing
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 54
Algorithm
Performance
Allowing one False Alarm
per TWO weeks…
standard control chart
using yesterday
Moving Average 3
Moving Average 7
Moving Average 56
hours_of_daylight
hours_of_daylight is_mon
hours_of_daylight is_mon ... is_tue
hours_of_daylight is_mon ... is_sat
CUSUM
sa-mav-1
sa-mav-7
sa-mav-14
sa-regress
Cough with denominator
Cough with MA
Copyright © 2002, 2003, Andrew Moore
Allowing one False Alarm
per SIX weeks…
0.39
0.14
0.36
0.58
0.54
0.58
0.7
0.72
0.77
0.45
0.86
0.87
0.86
0.73
0.78
0.65
3.47
3.83
3.45
2.79
2.72
2.73
2.25
1.83
2.11
2.03
1.88
1.28
1.27
1.76
2.15
2.78
0.22
0.1
0.33
0.51
0.44
0.43
0.57
0.57
0.59
0.15
0.74
0.83
0.82
0.67
0.59
0.57
4.13
4.7
3.79
3.31
3.54
3.9
3.12
3.16
3.26
3.55
2.73
1.87
1.62
2.21
2.41
3.24
Biosurveillance Detection Algorithms: Slide 55
Other state-of-the-art methods
•
•
•
•
Wavelets
Change-point detection
Kalman filters
Hidden Markov Models
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 56
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
WSARE
Copyright © 2002, 2003, Andrew Moore
Univariate Anomaly
Detection
Multivariate
Anomaly Detection
Spatial Scan
Statistics
Biosurveillance Detection Algorithms: Slide 57
Multiple Signals
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 58
Multivariate Signals
(relevant to inhalational diseases)
cough.syr.liq.dec
tabs.caps
throat.cough
nasal
2000
daily sales
1500
1000
500
0
7/1/99
10/1/99
Copyright © 2002, 2003, Andrew Moore
1/1/00
4/1/00
date
7/1/00
10/1/00
1/1/01
Biosurveillance Detection Algorithms: Slide 59
Multi Source Signals
Footprint of Influenza in Routinely Collected Data
Lab
Lab
Flu
Flu
WebMD
WebMD
School
School
Cough&
Cold
Cough
& Cold
Cough
Syrup
Throat
Resp
Resp
Viral
Viral
Death
Death
27
31
35
Copyright © 2002, 2003, Andrew Moore
39
43
47
51
3
7
11
15
weeks
19
23
27
31
35
39
43
47
51
3
Biosurveillance Detection Algorithms: Slide 60
What if you’ve got multiple signals?
Red: Cough Sales
Signal
Blue: ED Respiratory Visits
Time
Idea One:
Simply treat it as two separate alarm-fromsignal problems.
…Question: why might that not be the best
we can do?
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 61
Another View
Red: Cough Sales
Signal
Blue: ED Respiratory Visits
Cough Sales
Question: why might
that not be the
best we can do?
ED Respiratory Visits
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 62
Another View
Red: Cough Sales
Signal
Blue: ED Respiratory Visits
This should be
an anomaly
Cough Sales
Question: why might
that not be the
best we can do?
ED Respiratory Visits
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 63
N-dimensional Gaussian
Red: Cough Sales
Signal
Blue: ED Respiratory Visits
One Sigma
Good Practical Idea:
Cough Sales
Model the joint with a Gaussian
This is a sensible N-dimensional
SQC
2 Sigma
ED Respiratory Visits
Copyright © 2002, 2003, Andrew Moore
…But you can also do Ndimensional modeling of
dynamics (leads to the idea of
Kalman Filter model)
Biosurveillance Detection Algorithms: Slide 64
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
WSARE
Copyright © 2002, 2003, Andrew Moore
Univariate Anomaly
Detection
Multivariate
Anomaly Detection
Spatial Scan
Statistics
Biosurveillance Detection Algorithms: Slide 65
One Step of Spatial Scan
Entire area being scanned
(Philadelphia Metro)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 66
One Step of Spatial Scan
Entire area being scanned
Current region being considered
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 67
One Step of Spatial Scan
Entire area being scanned
Current region being considered
I have a population
of 5300 of whom
53 are sick (1%)
Everywhere else has a
population of 2,200,000 of
whom 20,000 are sick (0.9%)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 68
One Step of Spatial Scan
Entire area being scanned
Current region being considered
I have a population
of 5300 of whom
53 are sick (1%)
So... is that a big deal?
Evaluated with Score
Everywhere else has a
function (e.g. Kulldorf’s
population of 2,200,000 of score)
whom 20,000 are sick (0.9%)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 69
One Step of Spatial Scan
Entire area being scanned
Current region being considered
I have a population
of 5300 of whom
53 are sick (1%)
[Score = 1.4]
So... is that a big deal?
Evaluated with Score
Everywhere else has a
function (e.g. Kulldorf’s
population of 2,200,000 of score)
whom 20,000 are sick (0.9%)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 70
Many Steps of Spatial Scan
Entire area being scanned
Highest scoring region in search so far
Current region being considered
I have a population
of 5300 of whom
53 are sick (1%)
[Score = 9.3]
[Score = 1.4]
So... is that a big deal?
Evaluated with Score
Everywhere else has a
function (e.g. Kulldorf’s
population of 2,200,000 of score)
whom 20,000 are sick (0.9%)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 71
Scan Statistics
Standard approach:
Standard scan statistic question:
Given the geographical locations of
occurrences of a phenomenon, is
there a region with an unusually high
(low) rate of these occurrences?
Copyright © 2002, 2003, Andrew Moore
1.
Compute the likelihood of the data
given the hypothesis that the rate of
occurrence is uniform everywhere, L0
2.
For some geographical region, W,
compute the likelihood that the rate of
occurrence is uniform at one level
inside the region and uniform at
another level outside the region, L(W).
3.
Compute the likelihood ratio, L(W)/L0
4.
Repeat for all regions, and find the
largest likelihood ratio. This is the
scan statistic, S*W
5.
Report the region, W, which yielded
the max, S* W
See [Glaz and Balakrishnan, 99] for details
Biosurveillance Detection Algorithms: Slide 72
Significance testing
Standard approach:
Given that region W is the most
likely to be abnormal, is it
significantly abnormal?
Copyright © 2002, 2003, Andrew Moore
1.
Generate many randomized versions
of the data set by shuffling the labels
(positive instance of the phenomenon
or not).
2.
Compute S*W for each randomized
data set. This forms a baseline
distribution for S*W if the null
hypothesis holds.
3.
Compare the observed value of S*W
against the baseline distribution to
determine a p-value.
Biosurveillance Detection Algorithms: Slide 73
N
Fast
squares
speedup
N
• Theoretical complexity of fast squares: O(N2) (as
opposed to naïve N3), if maximum density region
sufficiently dense.
If not, we can use several other speedup tricks.
• In practice: 10-200x speedups on real and artificially
generated datasets.
Emergency Dept. dataset (600K records): 20
minutes, versus 66 hours with naïve approach.
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 74
N
Fast
rectangles
speedup
N
• Theoretical complexity of fast rectangles: O(N2log N)
(as opposed to naïve N4)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 75
N
Fast oriented
rectangles
speedup
N
• Theoretical complexity of fast rectangles: 18N2log N
(as opposed to naïve 18N4)
(Angles discretized to 5 degree buckets)
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 76
Why the Scan Statistic speed obsession?
• Traditional Scan
Statistics very
expensive,
especially with
Randomization
tests
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 77
Rectangular SS on Electrolyte Sales
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 78
Rectangular SS on Cough/cold Sales
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 79
Proposed new WSARE/Scan Statistic
hybrid
This is the strangest region because
the age distribution of respiratory
cases has changed dramatically for
no reason that can be explained by
known background changes
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 80
What you’ll learn about
• Noticing events in bioevent time series
• Tracking many series at
once
• Detecting geographic
hotspots
• Finding emerging new
patterns
WSARE
Copyright © 2002, 2003, Andrew Moore
Univariate Anomaly
Detection
Multivariate
Anomaly Detection
Spatial Scan
Statistics
Biosurveillance Detection Algorithms: Slide 81
A Limitation of Univariate Analysis
REPRESENTATIVE SURVEILLANCE DATA
:
Date
Time
Hospital
ICD9
Prodrome
Gender
Age
Home
Location
Many
more…
6/1/03
9:12
1
781
Fever
M
20s
NE
…
6/1/03
9:45
1
787
Diarrhea
F
40s
SE
…
:
:
:
:
Standard Approach
Select in advance which
subpopulations to monitor
(e.g., each county, zip)
Do not pay close attention
to effect of multiple testing
Copyright © 2002, 2003, Andrew Moore
:
:
:
:
WSARE Approach
Monitor hundreds of
thousands of subpopulations
Pay close attention to effect
of multiple testing
Biosurveillance Detection Algorithms: Slide 82
WSARE v2.0
• What’s Strange About Recent Events?
• Designed to be easily applicable to any
date/time-indexed biosurveillance-relevant
data stream.
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 83
WSARE v2.0
• Inputs:
1. Date/time-indexed
biosurveillancerelevant data stream
Copyright © 2002, 2003, Andrew Moore
2. Time Window
Length
3. Which
attributes to use?
Biosurveillance Detection Algorithms: Slide 84
WSARE v2.0
• Input
s:
1. Date/time-indexed
biosurveillancerelevant data stream
2. Time Window
Length
Primary Date
Key
Time
“ignore key and
weather”
“last 24 hours”
Example
Hospital ICD9 Prodrome Gender Age Home
3. Which
attributes to use?
Work
Large Medium Fine Large Medium Fine
Scale Scale
Scale Scale Scale
Scale
h6r32 6/2/2 14:12 Down- 781 Fever
town
Recent Recent (Many
Flu
Weather more…)
Levels
M
20s NE
15217 A5
NW 15213 B8
2%
70R
…
t3q15 6/2/2 14:15 River- 717 Respirat M
side
ory
60s NE
15222 J3
NE
15222 J3
2%
70R
…
t5hh5 6/2/2 14:15 Smith- 622 Respirat F
field
ory
80s SE
15210 K9
SE
15210 K9
2%
70R
…
:
:
:
:
:
:
Copyright © 2002, 2003, Andrew Moore
:
:
:
:
:
:
:
:
:
:
:
Biosurveillance Detection Algorithms: Slide 85
WSARE v2.0
• Inputs:
1. Date/time-indexed
biosurveillancerelevant data stream
• Outputs: 1. Here are the
2. Time Window
Length
3. Which
attributes to use?
2. Here’s why
3. And here’s how
seriously you
should take it
records that most
surprise me
Primary Date
Key
Time
Hospital ICD9 Prodrome Gender Age Home
Work
Large Medium Fine Large Medium Fine
Scale Scale
Scale Scale Scale
Scale
h6r32 6/2/2 14:12 Down- 781 Fever
town
Recent Recent (Many
Flu
Weather more…)
Levels
M
20s NE
15217 A5
NW 15213 B8
2%
70R
…
t3q15 6/2/2 14:15 River- 717 Respirat M
side
ory
60s NE
15222 J3
NE
15222 J3
2%
70R
…
t5hh5 6/2/2 14:15 Smith- 622 Respirat F
field
ory
80s SE
15210 K9
SE
15210 K9
2%
70R
…
:
:
:
:
:
:
Copyright © 2002, 2003, Andrew Moore
:
:
:
:
:
:
:
:
:
:
:
Biosurveillance Detection Algorithms: Slide 86
WSARE v2.0
• Given 500 day’s
worth of ER cases at
15 hospitals…
Date
Cases
Thu 5/22/2000
C1, C2, C3, C4 …
Fri 5/23/2000
C1, C2, C3, C4 …
:
:
:
:
Sat 12/9/2000
C1, C2, C3, C4 …
Sun 12/10/2000 C1, C2, C3, C4 …
Copyright © 2002, 2003, Andrew Moore
:
:
Sat 12/16/2000
C1, C2, C3, C4 …
:
:
Sat 12/23/2000
C1, C2, C3, C4 …
:
:
:
:
Fri 9/14/2001
C1, C2, C3, C4 …
Biosurveillance Detection Algorithms: Slide 87
WSARE v2.0
• Given 500 day’s
worth of ER cases at
15 hospitals…
• For each day…
• Take today’s cases
Copyright © 2002, 2003, Andrew Moore
Date
Cases
Thu 5/22/2000
C1, C2, C3, C4 …
Fri 5/23/2000
C1, C2, C3, C4 …
:
:
:
:
Sat 12/9/2000
C1, C2, C3, C4 …
Sun 12/10/2000 C1, C2, C3, C4 …
:
:
Sat 12/16/2000
C1, C2, C3, C4 …
:
:
Sat 12/23/2000
C1, C2, C3, C4 …
:
:
:
:
Fri 9/14/2001
C1, C2, C3, C4 …
Biosurveillance Detection Algorithms: Slide 88
WSARE v2.0
• Given 500 day’s
worth of ER cases at
15 hospitals…
• For each day…
• Take today’s cases
• The cases one week ago
• The cases two weeks ago
Copyright © 2002, 2003, Andrew Moore
Date
Cases
Thu 5/22/2000
C1, C2, C3, C4 …
Fri 5/23/2000
C1, C2, C3, C4 …
:
:
:
:
Sat 12/9/2000
C1, C2, C3, C4 …
Sun 12/10/2000 C1, C2, C3, C4 …
:
:
Sat 12/16/2000
C1, C2, C3, C4 …
:
:
Sat 12/23/2000
C1, C2, C3, C4 …
:
:
:
:
Fri 9/14/2001
C1, C2, C3, C4 …
Biosurveillance Detection Algorithms: Slide 89
WSARE v2.0
• Given 500 day’s worth
of ER cases at 15
hospitals…
• For each day…
DATE_ADMITTED
ICD9
12/9/00
12/9/00
12/9/00
12/9/00
:
12/16/00
12/16/00
12/16/00
12/16/00
12/23/00
12/23/00
12/23/00
PRODROME
GENDER
786.05
789
789
786.05
:
3
1
1
3
:
787.02
782.1
789
786.09
789.09
789.09
782.1
• Take today’s cases
:
:
:
12/23/00
786.09
786.09
• The cases one week ago 12/23/00
12/23/00
780.9
• The cases two weeks ago12/23/00 V40.9
2
4
1
3
1
1
4
3
3
2
7
F
F
M
M
:
M
F
M
M
M
F
M
:
M
M
F
M
place2
s-e
s-e
n-w
s-e
:
n-e
s-w
s-e
n-w
s-w
s-w
n-w
:
s-e
s-e
n-w
s-w
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
• Ask: “What’s different
about today?”
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 90
WSARE v2.0
• Given 500 day’s worth
of ER cases at 15
hospitals…
• For each day…
DATE_ADMITTED
ICD9
12/9/00
12/9/00
12/9/00
12/9/00
:
12/16/00
12/16/00
12/16/00
12/16/00
12/23/00
12/23/00
12/23/00
PRODROME
GENDER
786.05
789
789
786.05
:
3
1
1
3
:
787.02
782.1
789
786.09
789.09
789.09
782.1
2
4
1
3
1
1
4
F
F
M
M
:
M
F
M
M
M
F
M
:
M
M
F
M
• Take today’s
cases
:
:
Fields
we use::
12/23/00
786.09
3
12/23/00
786.09
3
• The cases one week ago 12/23/00 780.9
2
Date, Time of Day, Prodrome,
ICD9,
12/23/00 V40.9
7
•
The
cases
two
weeks
ago
Symptoms, Age, Gender, Coarse Location,
place2
s-e
s-e
n-w
s-e
:
n-e
s-w
s-e
n-w
s-w
s-w
n-w
:
s-e
s-e
n-w
s-w
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Fine Location,
Derived Features,
• Ask:
“What’sICD9
different
Census Block Derived Features, Work
about
today?”
Details, Colocation Details
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 91
Example of Output
Sat 12-23-2001 (daynum 36882, dayindex 239)
35.8% ( 48/134) of today's cases have 30 <= age < 40
17.0% ( 45/265) of
other cases have 30 <= age < 40
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 92
Example of Output
Sat 12-23-2001 (daynum 36882, dayindex 239)
FISHER_PVALUE = 0.000051
35.8% ( 48/134) of today's cases have 30 <= age < 40
17.0% ( 45/265) of
other cases have 30 <= age < 40
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 93
Searching for the best score…
•
•
•
•
•
•
Try ICD9 = x for each value of x
Try Gender=M, Gender=F
Try CoarseRegion=NE, =NW, SE, SW..
Try FineRegion=AA,AB,AC, … DD (4x4 Grid)
Try Hospital=x, TimeofDay=x, Prodrome=X, …
[In future… features of census blocks]
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 94
Corrected P value
Sat 12-23-2001 (daynum 36882, dayindex 239)
FISHER_PVALUE = 0.000051 RANDOMIZATION_PVALUE = 0.031
35.8% ( 48/134) of today's cases have 30 <= age < 40
17.0% ( 45/265) of
other cases have 30 <= age < 40
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 95
WSARE v2.0
• Inputs:
1. Date/time-indexed
biosurveillancerelevant data stream
• Outputs: 1. Here are the
2. Time Window
Length
3. Which
attributes to use?
2. Here’s why
3. And here’s how
seriously you
should take it
records that most
surprise me
Primary Date
Key
Time
Hospital ICD9 Prodrome Gender Age Home
Work
Large Medium Fine Large Medium Fine
Scale Scale
Scale Scale Scale
Scale
h6r32 6/2/2 14:12 Down- 781 Fever
town
Recent Recent (Many
Flu
Weather more…)
Levels
M
20s NE
15217 A5
NW 15213 B8
2%
70R
…
t3q15 6/2/2 14:15 River- 717 Respirat M
side
ory
60s NE
15222 J3
NE
15222 J3
2%
70R
…
t5hh5 6/2/2 14:15 Smith- 622 Respirat F
field
ory
80s SE
15210 K9
SE
15210 K9
2%
70R
…
:
:
:
:
:
:
Copyright © 2002, 2003, Andrew Moore
:
:
:
:
:
:
:
:
:
:
:
Biosurveillance Detection Algorithms: Slide 96
WSARE v2.0
• Input
s:
•Output
s:
1. Date/time-indexed
biosurveillancerelevant data stream
1. Here are the
records that most
surprise me
2. Time Window
Length
3. Which
attributes to use?
2. Here’s why
3. And here’s how
seriously you
should take it
Primary Date Time Hospita ICD Prodrom Gende Ag Home
Work
Recen Recent (Many
Normally, l 8% of9cases
in the
Key
e
r East
e
t Flu Weathe more…
Large Mediu Fine Large Mediu Fine
Levels r
)
are over-50s with respiratory Scale m
Scale Scale m
Scale
Scale
Scale
problems.
h6r32 6/2/2 14:12Down- 781 Fever M
But today
town it’s been 15%
t3q15 6/2/2 14:15River- 717 Respira M
side
tory
t5hh5 6/2/2 14:15Smith- 622 Respira F
field
tory
Copyright © 2002, 2003, Andrew Moore
20 NE
s
15217 A5
Don’t be too impressed!
NW 15213 B8
2%
70R
…
Taking into account all the patterns
60 NE 15222
J3been
NE searching
15222 J3 over,
2% there’s
70R a
…
I’ve
s
20% chance I’d have found a rule
80 SE 15210 K9
15210just
K9 by
2%
70R …
thisSE
dramatic
chance
s
Biosurveillance Detection Algorithms: Slide 97
WSARE on recent Utah Data
Saturday June 1st in Utah:
The most surprising thing about recent records is:
Normally:
0.8% of records (50/6205) have time before 2pm and prodrome = Hemorrhagic
But recently:
2.1% of records (19/907) have time before 2pm and prodrome = Hemorrhagic
Pvalue = 0.0484042
Which means that in a world where nothing changes we'd
expect to have a result this significant about once
every 20 times we ran the program
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 98
WSARE 3.0
•
•
•
•
•
“Taking into account recent flu levels…”
“Taking into account that today is a public holday…”
“Taking into account that this is Spring…”
“Taking into account recent heatwave…”
“Taking into account that there’s a known natural
Food-borne outbreak in progress…”
Bonus: More
efficient use of
historical data
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 99
Idea: Bayesian Networks
“Patients from West Park Hospital
are less likely to be young”
“On Cold Tuesday Mornings the
folks coming in from the North
part of the city are more likely to
have respiratory problems”
“The Viral prodrome is more
likely to co-occur with a Rash
prodrome than Botulinic”
Copyright © 2002, 2003, Andrew Moore
“On the day after a major
holiday, expect a boost in the
morning followed by a lull in
the afternoon”
Biosurveillance Detection Algorithms: Slide 100
WSARE 3.0
All historical
data
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 101
WSARE 3.0
All historical
data
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 102
WSARE 3.0
All historical
data
Today’s
Environment
What should
be happening
today?
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 103
WSARE 3.0
All historical
data
Today’s
Environment
What should
be happening
today?
Copyright © 2002, 2003, Andrew Moore
Today’s
Cases
What’s strange
about today,
considering its
environment?
Biosurveillance Detection Algorithms: Slide 104
WSARE 3.0
All historical
data
Today’s
Environment
What should
be happening
today?
Copyright © 2002, 2003, Andrew Moore
Today’s
Cases
What’s strange
about today,
considering its
environment?
And how big a deal is
this, considering how
much Detection
searchAlgorithms:
I’ve done?
Biosurveillance
Slide 105
WSARE 3.0
All historical
data
Today’s
Environment
Today’s
Cases
Cheap
What should
be happening
today?
Expensive
Copyright © 2002, 2003, Andrew Moore
What’s strange
about today,
considering its
environment?
And how big a deal is
this, considering how
much Detection
searchAlgorithms:
I’ve done?
Biosurveillance
Slide 106
WSARE 3.0
All historical
data
Today’s
Environment
• All-dimensions
Trees
Today’s
Cases
• Racing
Randomization
• Differential
Randomization
Cheap
• RADSEARCH
Expensive
Copyright © 2002, 2003, Andrew Moore
What should
be happening
today?
What’s strange
about today,
considering its
environment?
And how big a deal is
this, considering how
much Detection
searchAlgorithms:
I’ve done?
Biosurveillance
Slide 107
Standard
WSARE2.0
WSARE2.5
WSARE3.0
Results on Simulation
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 108
BARD (Bayesian Aerosol Release Detector)
Key Points
Meselson et al, 1994 Science
 Goal: detect aerosol release of B.
anthracis spores
 Automates the analysis done by Meselson
et al.
 Alarms when increase in disease activity
spatially and temporally consistent with
aerosol anthrax
 Makes use of inverted atmospheric
dispersion model and meteorological data
 In preliminary evaluation, no false
positives in 6.5 months
- By simply analyzing existing surveillance
data more thoroughly (without additional
data collection), BARD has the potential to
improve the earliness and specificity of
detection
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection
Algorithms:
Slide
More
info: BARD
Tech109
report
For further info
• Papers on these and other anti-terror
applications:
www.cs.cmu.edu/~awm/antiterror
• Papers on scaling up many of these
analysis methods:
www.cs.cmu.edu/~awm/papers.html
• Software implementing the above:
www.autonlab.org
• Copies of 18 lectures on 25 statistical
data mining topics:
www.cs.cmu.edu/~awm/781
• CD-ROM, powerpoint-synchronized
video/audio recordings of the above
lectures: [email protected]
Information Gain, Decision Trees
Probabilistic Reasoning, Bayes Classifiers, Density
Estimation
Probability Densities in Data Mining
Gaussians in Data Mining
Maximum Likelihood Estimation
Gaussian Bayes Classifiers
Regression, Neural Nets
Overfitting: detection and avoidance
The many approaches to cross-validation
Locally Weighted Learning
Bayes Net, Bayes Net Structure Learning, Anomaly
Detection
Andrew's Top 8 Favorite Regression Algorithms
(Regression Trees, Cascade Correlation, Group
Method Data Handling (GMDH), Multivariate
Adaptive Regression Splines (MARS), Multilinear
Interpolation, Radial Basis Functions, Robust
Regression, Cascade Correlation + Projection
Pursuit
Clustering, Mixture Models, Model Selection
K-means clustering and hierarchical clustering
Vapnik-Chervonenkis (VC) Dimensionality and
Structural Risk Minimization
PAC Learning
Support Vector Machines
Time Series Analysis with Hidden Markov Models
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 110
References
1. WSARE 3.0 : Bayesian Network based Anomaly Pattern
Detection
Wong, Moore, Cooper and Wagner [ICML/KDD 2003]
2. Fast Grid Based Computation of Spatial Scan Statistics
Neill and Moore [NIPS 2003]
These and other Biosurveillance algorithms papers and
free software available from
http://www.autonlab.org/
See also: http://www.health.pitt.edu/rods
Copyright © 2002, 2003, Andrew Moore
Biosurveillance Detection Algorithms: Slide 111