Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. • You may freely use these.
Download ReportTranscript Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. • You may freely use these.
Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully.
• You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance.
• My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).
• You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance.
• My name appears on each slide you use.
• You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.
(c) Eamonn Keogh, [email protected]
VLDB - Aug 2004 Themis Palpanas
1
Indexing Large Human-Motion Databases
Eamonn Keogh, Themis Palpanas Victor B. Zordan,Dimitrios Gunopulos
University of California, Riverside
Marc Cardle
University of Cambridge
Motion Capture
records motion data from live actors VLDB - Aug 2004 Themis Palpanas
3
Motion Capture
records motion data from live actors used for data-driven animation VLDB - Aug 2004 Themis Palpanas
4
Motion Capture in Games Industry
Street NBA Themis Palpanas Madden
5
VLDB - Aug 2004
Motion Capture in Movie Industry
Troy Themis Palpanas Lord of the Rings
6
VLDB - Aug 2004
Motivation
motion capture data segmented in short sequences, stored in motion libraries composed to create long, realistic motion sequences important to find similar sequences form pool of similar sequences choose the most promising, to continue the motion
7
VLDB - Aug 2004 Themis Palpanas
Motivation
Dynamic Time Warping (DTW) Considers only local adjustments in time, to match two time series However sometimes global adjustments are required DTW Uniform Scaling DTW is being extensively used uniform scaling is complementary combination of both techniques offers rich, high-quality result set VLDB - Aug 2004 Themis Palpanas
8
Uniform Scaling
time series query,
Q,
length
n
candidate,
C,
length
m
(
m>n
)
Q C
VLDB - Aug 2004 Themis Palpanas
9
Uniform Scaling
time series query,
Q,
length
n
candidate,
C,
length
m
(
m>n
) stretch
Q
to length
p Q p j = Q ┌ j*n/p ┐, 1 ≤ j ≤ p
(
n ≤p≤m
):
Q p
scaling factor,
sf = p/n
max scaling factor,
sf max = m/n Q Q C Q p
VLDB - Aug 2004 Themis Palpanas
10
Problem Statement
given time series,
Q
database of candidate time series,
{D}
find
argmin p { dist(Q p , {D} ) }
dist(Q p , {D} )= Euclidean Distance
between time series
11
VLDB - Aug 2004 Themis Palpanas
Problem Statement
given time series,
Q
database of candidate time series,
{D}
find
argmin p { dist(Q p , {D} ) }
dist(Q p , {D} )= Euclidean Distance
between time series challenges quickly solve the problem for two time series extend solution to scale-up to large time series databases VLDB - Aug 2004 Themis Palpanas
12
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions VLDB - Aug 2004 Themis Palpanas
13
Best Uniform Scaling Match
brute force algorithm: for each time series in
{D}
for each
sf
,
1 ≤ sf ≤ sf max
compute distance between the two time series find the best overall match time complexity:
O(|D|(m-n))
extremely expensive!
VLDB - Aug 2004 Themis Palpanas
14
Lower Bounding Uniform Scaling
lower bound distance between two time series, for any
sf
,
1 ≤ sf ≤ sf max
desiderata: fast to compute tight bound results in fast pruning of candidates that are guaranteed not to belong to the solution compute distance only for time series not pruned by lower bound VLDB - Aug 2004 Themis Palpanas
15
Lower Bounding Uniform Scaling
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100
C
Themis Palpanas
m
= 100 0 10 20 30 40 50 60 70 80 90 100
16
VLDB - Aug 2004
Lower Bounding Uniform Scaling
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100 build envelopes, length 80:
U i = max( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
L i = min( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
) )
n
= 80 L U 0 10 20 30 40 50 60 70 80 90 100 VLDB - Aug 2004 Themis Palpanas
17
Lower Bounding Uniform Scaling
Q
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100 build envelopes, length 80:
U i = max( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
L i = min( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
) )
0 10 20 30 40 50 60 70 80 90 100
18
VLDB - Aug 2004 Themis Palpanas
Lower Bounding Uniform Scaling
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100 build envelopes, length 80:
U i = max( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
L i = min( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
) )
VLDB - Aug 2004 Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100
19
Lower Bounding Uniform Scaling
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100 compute lower bound:
LB
_
Keogh
(
Q
,
C
)
i n
1 (
Q i
(
Q i
0
U
L i i
) 2 ) 2
if if Q i Q i
U
otherwise L i i
0 10 20 30 40 50 60 70 80 90 100
20
VLDB - Aug 2004 Themis Palpanas
Envelope Indexing
dimensionality of envelopes is high 80 points Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100
21
VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them Piecewise Constant Approximation
U
ˆ 8 points Themis Palpanas 0 10
L
ˆ 20 30 40 50 60 70 80 90 100
22
VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them Piecewise Constant Approximation assume query
Q
, length 80
Q
Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100
23
VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them Piecewise Constant Approximation assume query
Q
, length 80 we approximate it with 8 points
Q
Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100
24
VLDB - Aug 2004
Envelope Indexing
dimensionality of envelopes is high reduce dimensionality by approximating them Piecewise Constant Approximation assume query
Q
, length 80 approximated with 8 points compute approximation of lower bound:
MINDIST
(
Q
, )
i N
1
N n
(
Q i
(
Q i
0 ) 2 ˆ
i i
) 2
if if Q i
Q i
ˆ
i i otherwise
0 10 20 30 40 50 60 70 80 90 100
25
VLDB - Aug 2004 Themis Palpanas
Algorithms for Secondary Storage
use a multidimensional index VA-file -> FastScan algorithm R-tree -> RtreeProbe algorithm 2-pass algorithms: 1. scan approximated envelopes, prune search space 2. find exact answer using original series VLDB - Aug 2004 Themis Palpanas
26
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions VLDB - Aug 2004 Themis Palpanas
27
Datasets Used
motion capture data from 124 sensors placed on human actors mixed bag time series coming from: medicine, manufacturing, environmental monitoring, economics, sensor data experimented with time series databases of: size 5,000 – 80,000 time series length 64 – 1,024 points VLDB - Aug 2004 Themis Palpanas
28
Main Memory Experiments
assume database fits in memory measure pruning power: fraction of times each approach calls distance function our technique: 1 order of magnitude faster than CD-criterion VLDB - Aug 2004 Themis Palpanas
CD- criterion
1.20
1.10
1.05
LB_Keogh
64 128 256 0.1
0.05
64 128 256 0
29
0.15
0.25
0.2
Main Memory Experiments
brute force assume database fits in memory measure pruning power: fraction of times each approach calls distance function our technique: 1 order of magnitude faster than CD-criterion 3 orders of magnitude faster than brute force VLDB - Aug 2004 Themis Palpanas
CD- criterion
1.20
1.10
1.05
LB_Keogh
64 128 256 0.1
0.05
64 128 256 0
30
0.15
0.25
0.2
Disk-Based Experiments
comparison of: brute force FastScan RtreeProbe 1.20
1.10
1.05
Themis Palpanas VLDB - Aug 2004
31
Disk-Based Experiments
comparison of: FastScan RtreeProbe VLDB - Aug 2004 Themis Palpanas
32
Disk-Based Experiments
comparison of: FastScan RtreeProbe VLDB - Aug 2004 40000 20000 Themis Palpanas
33
Case Study
video VLDB - Aug 2004 Themis Palpanas
34
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions VLDB - Aug 2004 Themis Palpanas
35
Related Work
Dynamic Time Warping (DTW)
[Yi & Faloutsos’00][Keogh’02][Zhu & Shasha’03][Fung & Wong’03]
Longest Common SubSequence (LCSS)
[Das et al.’97][Vlachos et al.’03]
uniform scaling
[Argyros & Ermopoulos’03]
VLDB - Aug 2004 Themis Palpanas
36
Outline
Speeding Up Search Scaling Up To Large Databases Experimental Evaluation Related Work Conclusions VLDB - Aug 2004 Themis Palpanas
37
Conclusions
studied utility of uniform scaling similarity matching applications in: motion capture libraries, music retrieval, historical handwritten archives introduced first lower bounding technique proposed indexing method for bounding envelopes suitable for very large time series databases experimentally evaluated efficiency of technique demonstrated quality of results with real motion capture data VLDB - Aug 2004 Themis Palpanas
38
Outline
VLDB - Aug 2004 Themis Palpanas
39
Lower Bounding Uniform Scaling
assume: candidate
C
, length 100 query
Q
, length 80 wish to find best match for any scaling of
Q
between 80-100 build envelopes, length 80:
U i = max( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
L i = min( C
(i-1)*
m
/
n
+1 ,…, C
i*
m
/
n
) )
VLDB - Aug 2004 Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100
40