Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. • You may freely use these.

Download Report

Transcript Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. • You may freely use these.

Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully.

• You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance.

• My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).

• You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance.

• My name appears on each slide you use.

• You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.

(c) Eamonn Keogh, [email protected]

VLDB - Aug 2004 Themis Palpanas

1

Indexing Large Human-Motion Databases

Eamonn Keogh, Themis Palpanas Victor B. Zordan,Dimitrios Gunopulos

University of California, Riverside

Marc Cardle

University of Cambridge

Motion Capture

 records motion data from live actors VLDB - Aug 2004 Themis Palpanas

3

Motion Capture

 records motion data from live actors  used for data-driven animation VLDB - Aug 2004 Themis Palpanas

4

Motion Capture in Games Industry

Street NBA Themis Palpanas Madden

5

VLDB - Aug 2004

Motion Capture in Movie Industry

Troy Themis Palpanas Lord of the Rings

6

VLDB - Aug 2004

Motivation

 motion capture data  segmented in short sequences, stored in motion libraries  composed to create long, realistic motion sequences  important to find similar sequences  form pool of similar sequences  choose the most promising, to continue the motion

7

VLDB - Aug 2004 Themis Palpanas

Motivation

 Dynamic Time Warping (DTW)  Considers only local adjustments in time, to match two time series  However sometimes global adjustments are required DTW Uniform Scaling   DTW is being extensively used uniform scaling is complementary  combination of both techniques offers rich, high-quality result set VLDB - Aug 2004 Themis Palpanas

8

Uniform Scaling

 time series  query,

Q,

length

n

 candidate,

C,

length

m

(

m>n

)

Q C

VLDB - Aug 2004 Themis Palpanas

9

Uniform Scaling

 time series  query,

Q,

length

n

 candidate,

C,

length

m

(

m>n

)  stretch 

Q

to length

p Q p j = Q ┌ j*n/p ┐, 1 ≤ j ≤ p

(

n ≤p≤m

):

Q p

 scaling factor,

sf = p/n

 max scaling factor,

sf max = m/n Q Q C Q p

VLDB - Aug 2004 Themis Palpanas

10

Problem Statement

 given  time series,

Q

 database of candidate time series,

{D}

 find

argmin p { dist(Q p , {D} ) }

dist(Q p , {D} )= Euclidean Distance

between time series

11

VLDB - Aug 2004 Themis Palpanas

Problem Statement

 given  time series,

Q

 database of candidate time series,

{D}

 find

argmin p { dist(Q p , {D} ) }

dist(Q p , {D} )= Euclidean Distance

between time series  challenges  quickly solve the problem for two time series  extend solution to scale-up to large time series databases VLDB - Aug 2004 Themis Palpanas

12

Outline

 Speeding Up Search  Scaling Up To Large Databases  Experimental Evaluation  Related Work  Conclusions VLDB - Aug 2004 Themis Palpanas

13

Best Uniform Scaling Match

 brute force algorithm:  for each time series in

{D}

for each

sf

,

1 ≤ sf ≤ sf max

compute distance between the two time series find the best overall match  time complexity:

O(|D|(m-n))

 extremely expensive!

VLDB - Aug 2004 Themis Palpanas

14

Lower Bounding Uniform Scaling

  lower bound distance between two time series, for any

sf

,

1 ≤ sf ≤ sf max

desiderata:  fast to compute  tight bound  results in fast pruning of candidates that are guaranteed not to belong to the solution  compute distance only for time series not pruned by lower bound VLDB - Aug 2004 Themis Palpanas

15

Lower Bounding Uniform Scaling

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100

C

Themis Palpanas

m

= 100 0 10 20 30 40 50 60 70 80 90 100

16

VLDB - Aug 2004

Lower Bounding Uniform Scaling

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100  build envelopes, length 80:

U i = max( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

L i = min( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

) )

n

= 80 L U 0 10 20 30 40 50 60 70 80 90 100 VLDB - Aug 2004 Themis Palpanas

17

Lower Bounding Uniform Scaling

Q

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100  build envelopes, length 80:

U i = max( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

L i = min( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

) )

0 10 20 30 40 50 60 70 80 90 100

18

VLDB - Aug 2004 Themis Palpanas

Lower Bounding Uniform Scaling

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100  build envelopes, length 80:

U i = max( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

L i = min( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

) )

VLDB - Aug 2004 Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100

19

Lower Bounding Uniform Scaling

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100  compute lower bound:

LB

_

Keogh

(

Q

,

C

) 

i n

  1    (

Q i

(

Q i

0 

U

L i i

) 2 ) 2

if if Q i Q i

U

otherwise L i i

0 10 20 30 40 50 60 70 80 90 100

20

VLDB - Aug 2004 Themis Palpanas

Envelope Indexing

 dimensionality of envelopes is high 80 points Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100

21

VLDB - Aug 2004

Envelope Indexing

 dimensionality of envelopes is high  reduce dimensionality by approximating them  Piecewise Constant Approximation

U

ˆ 8 points Themis Palpanas 0 10

L

ˆ 20 30 40 50 60 70 80 90 100

22

VLDB - Aug 2004

Envelope Indexing

 dimensionality of envelopes is high  reduce dimensionality by approximating them  Piecewise Constant Approximation  assume query

Q

, length 80

Q

Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100

23

VLDB - Aug 2004

Envelope Indexing

 dimensionality of envelopes is high  reduce dimensionality by approximating them  Piecewise Constant Approximation  assume query

Q

, length 80  we approximate it with 8 points

Q

Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100

24

VLDB - Aug 2004

Envelope Indexing

 dimensionality of envelopes is high  reduce dimensionality by approximating them  Piecewise Constant Approximation  assume query

Q

, length 80  approximated with 8 points  compute approximation of lower bound:

MINDIST

(

Q

, ) 

i N

  1

N n

   (

Q i

(

Q i

0   ) 2 ˆ

i i

) 2

if if Q i

Q i

 ˆ

i i otherwise

0 10 20 30 40 50 60 70 80 90 100

25

VLDB - Aug 2004 Themis Palpanas

Algorithms for Secondary Storage

 use a multidimensional index  VA-file -> FastScan algorithm  R-tree -> RtreeProbe algorithm  2-pass algorithms: 1. scan approximated envelopes, prune search space 2. find exact answer using original series VLDB - Aug 2004 Themis Palpanas

26

Outline

 Speeding Up Search  Scaling Up To Large Databases  Experimental Evaluation  Related Work  Conclusions VLDB - Aug 2004 Themis Palpanas

27

Datasets Used

 motion capture  data from 124 sensors placed on human actors  mixed bag  time series coming from:  medicine, manufacturing, environmental monitoring, economics, sensor data  experimented with time series databases of:  size 5,000 – 80,000  time series length 64 – 1,024 points VLDB - Aug 2004 Themis Palpanas

28

Main Memory Experiments

 assume database fits in memory  measure pruning power:  fraction of times each approach calls distance function  our technique:  1 order of magnitude faster than CD-criterion VLDB - Aug 2004 Themis Palpanas

CD- criterion

1.20

1.10

1.05

LB_Keogh

64 128 256 0.1

0.05

64 128 256 0

29

0.15

0.25

0.2

Main Memory Experiments

brute force  assume database fits in memory  measure pruning power:  fraction of times each approach calls distance function  our technique:  1 order of magnitude faster than CD-criterion  3 orders of magnitude faster than brute force VLDB - Aug 2004 Themis Palpanas

CD- criterion

1.20

1.10

1.05

LB_Keogh

64 128 256 0.1

0.05

64 128 256 0

30

0.15

0.25

0.2

Disk-Based Experiments

 comparison of:  brute force  FastScan  RtreeProbe 1.20

1.10

1.05

Themis Palpanas VLDB - Aug 2004

31

Disk-Based Experiments

 comparison of:  FastScan  RtreeProbe VLDB - Aug 2004 Themis Palpanas

32

Disk-Based Experiments

 comparison of:  FastScan  RtreeProbe VLDB - Aug 2004 40000 20000 Themis Palpanas

33

Case Study

 video VLDB - Aug 2004 Themis Palpanas

34

Outline

 Speeding Up Search  Scaling Up To Large Databases  Experimental Evaluation  Related Work  Conclusions VLDB - Aug 2004 Themis Palpanas

35

Related Work

 Dynamic Time Warping (DTW) 

[Yi & Faloutsos’00][Keogh’02][Zhu & Shasha’03][Fung & Wong’03]

 Longest Common SubSequence (LCSS) 

[Das et al.’97][Vlachos et al.’03]

 uniform scaling 

[Argyros & Ermopoulos’03]

VLDB - Aug 2004 Themis Palpanas

36

Outline

 Speeding Up Search  Scaling Up To Large Databases  Experimental Evaluation  Related Work  Conclusions VLDB - Aug 2004 Themis Palpanas

37

Conclusions

 studied utility of uniform scaling similarity matching  applications in:  motion capture libraries, music retrieval, historical handwritten archives  introduced first lower bounding technique  proposed indexing method for bounding envelopes  suitable for very large time series databases  experimentally evaluated efficiency of technique  demonstrated quality of results with real motion capture data VLDB - Aug 2004 Themis Palpanas

38

Outline

VLDB - Aug 2004 Themis Palpanas

39

Lower Bounding Uniform Scaling

 assume:  candidate

C

, length 100  query

Q

, length 80  wish to find best match for any scaling of

Q

between 80-100  build envelopes, length 80:

U i = max( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

L i = min( C

(i-1)*

m

/

n

+1 ,…, C

i*

m

/

n

) )

VLDB - Aug 2004 Themis Palpanas 0 10 20 30 40 50 60 70 80 90 100

40