Fast Time Series Classification Using Numerosity Reduction

Download Report

Transcript Fast Time Series Classification Using Numerosity Reduction

Fast Time Series Classification
Using Numerosity Reduction
DME Paper Presentation
Jonathan Millin & Jonathan Sedar
Fri 12th Feb 2010
Fast Time Series Classification Using
Numerosity Reduction
• Appearing in Proceedings of 23rd International
Conference on Machine Learning 2006.
• Authors:
– Xiaopeng Xi, Eamonn Keogh, Christian Shelton, Li Wei,
• Computer Science & Engineering Dept, UC Riverside, CA
– Chorirat ‘Ann’ Ratanamahatana.
• Dept of Computer Engineering, Chulalongkorn Uni, Bangkok
• Cited by 34 papers (Google Scholar)
Fast Time Series Classification Using Numerosity Reduction
Overview
• High classification accuracy on time-series
data is achieved using Dynamic Time Warping
and a novel application of numerosity
reduction to efficiently reduce computational
complexity.
Fast Time Series Classification Using Numerosity Reduction
Agenda
• Introduction
• Methods
– Dynamic Time Warping
– Numerosity Reduction
– Adaptive Warping Window (AWARD)
– Fast AWARD
• Results
• Discussion
Introduction
Time Series Classification
Time-Series Data Classification
• Classifying through pattern matching
Methods
Dynamic Time Warping
What is Dynamic Time Warping?
• Compare similar time series allowing for
temporal skew:
Methods
Dynamic Time Warping
How does DTW Work?
• Align series
• Construct
distance
matrix
• Find optimal
warping
path
• Introduce
warping
window to
reduce
complexity
Methods
Dynamic Time Warping
DTW Performance
Fig. 3
Figs. 4,5,7
Reported comparisons
Test sets (shown later)
Methods
Dynamic Time Warping
DTW Vs Literature
ControlChart
• Xi et al. (2006) use 1NN-DTW:
error rate 0.33%
• Rodriguez & Alonso et al (2000)
use 1st order logic rules with
boosting: error rate 3.6%
• Nanopolus & Alcock et al. (2001)
use multi-layer perceptron NN:
error rate 1.9%
• Wu & Chang (2004) use ‘super
kernel fusion’: error rate 0.79%
• Chen & Kamel (2005) use ‘Static
Minimization-Maximization
approach’: best error rate 7.2%
ECG
• Xi et al. (2006) use 1NN-DTW and
Euclidian Distance: ‘perfect
accuracy’
• Kim & Smyth et al. (2004) use
HMM: 98% accuracy
Lighting (FORTE-2)
• Xi et al. (2006) use 1NN-DTW:
error rate 9.09%
• Eads & Glocer et al. (2005) use
grammar guided feature
extraction: error rate 13.22%
Methods
Dynamic Time Warping
Dynamic Time Warping
• DTW is ‘at least as accurate’ as Euclidean distance
Methods
Dynamic Time Warping
DTW gives great results, but
• Naive implementation is computationally
expensive
• LB_Keogh reduces amortised cost to O(n)
• At the limits of DTW algorithm optimisation
• Look elsewhere for classification speed gains...
...Numerosity reduction
Methods
Numerosity Reduction Techniques
Numerosity Reduction Techniques
• Naive Rank Reduction
• Adaptive Warping Window (AWARD)
• Fast Numerosity Reduction (FastAWARD)
Methods
Numerosity Reduction: Naive Rank Reduction
Naive Rank Reduction
• Principle: remove instances in an order which
minimises misclassifications.
1. Ranking (iterative O(n))
–
–
–
–
d1
Remove duplicates
Apply 1NN classification
Rank each x according to class of 1st NN
Break ties by proximity of nearest class
2. Thresholding
– User defined, (keep n highest, best n%)
x1
x2
d2
x3
d3
x4
d4
x5
d3 > d4 > d 2 > d1
Methods
Numerosity Reduction: Naive Rank Reduction
Naive Rank Reduction
• Classification accuracy declines when the size of
the dataset decreases
• Larger r
gives better
accuracy on
smaller
datasets
• Motivates
adaptive
window
Methods
Numerosity Reduction: AWARD
Adaptive Warping Window (AWARD)
• What
– Dynamically adjusting the window size during numerosity
reduction
• Why
– Larger windows give better accuracy on smaller datasets
• How
–
–
–
–
Initialise r to best warping size (exhaustive search r=1:100)
Begin Naïve Rank Reduction (shown earlier)
Tests accuracy of the reduced set with r and r+1
If accuracy(r+1) > accuracy(r) then r++
• Problems
– Provides a better accuracy during numerosity reduction, but the
additional checks increase complexity from O(n) to O(n3)
Methods
Numerosity Reduction: FastAWARD
FastAWARD
• What
– Essentially AWARD, but uses the calculations from
previous iterations to reduce complexity
• Why
– Reduce complexity to reduce execution time
• How
– performs incremental updates after each step to
reduce complexity of future steps
Methods
Numerosity Reduction: FastAWARD
How - Storing information
• Done by storing (for each i=r:100):
– Nearest neighbour matrix (A)
– Distance matrix (B)
– Accuracy array (ACC)
C
ACC
r
r
Q
Methods
Numerosity Reduction: FastAWARD
How – Incremental Updates
• After each item is discarded:
– Update A (Neighbors)
– Update B (Distances)
– Update ACC (Accuracy)
– Check if ACC[r+1]>ACC[r]
x1
x1
d1
d1
x2
d2
x2
bob
dnew
d3
x3
x3
d4
x4
d3 > d4 > d 1 > d 2
d4
x4
dnew > d1 > d3
Methods
Recap
Interim Recap
• Dynamic Time Warping accounts for skew
• Using AWARD numerosity reduction
• FastAWARD vs AWARD
...Does it work?
Results
Experimental Work
Experiments (Accuracy)
Results
Experimental Work
Experiments
Results
Experimental Work
Experiments (Accuracy)
• etc
Results
Experimental Work
Experiments (Efficiency)
• Massive improvements in efficiency of
numerosity reduction process
Results
Experimental Work
Experiments (Anytime Classification)
• Etc
Discussion
Summary
Summary
• 1NN-DTW is an excellent time series classifier
• DTW is computationally expensive because of the
number of pattern matches
• DTW algorithm is at limits of optimisation
• Improve speeds by reducing number of required
matches
• (Fast)AWARD adjusts the warping window with
numerosity – increases accuracy
• FastAWARD is several orders of magnitude faster
than AWARD
Discussion
Our Critique
Our Critique
•
•
•
•
Two Patterns dataset seems cherry-picked
DTW model may necessitate bespoke pre-processing
RandomFix vs RankFix – very similar results
AWARD efficiency comparisons ignore initialisation
effort and speed wasn’t compared to other methods
(RT1, 2, 3)
• Comparisons of r incomplete
• Anytime classification experiments seem rigged in
favour of AWARD
Discussion
Our Critique
Two Patterns dataset seems cherrypicked
Fig. 3
Figs. 4,5,7
Reported comparisons
Test sets (shown later)
Discussion
Our Critique
DTW model may necessitate bespoke
pre-processing
Discussion
Our Critique
RandomFix vs RankFix - similar results
Discussion
Our Critique
AWARD efficiency comparisons ignore
initialisation effort and speed wasn’t
compared to other methods (RT1, 2, 3)
Discussion
Our Critique
Comparisons of r incomplete
Discussion
Our Critique
Anytime classification is rigged?
Q&A
Thank You.