pitchTracking

Download Report

Transcript pitchTracking

Pitch Tracking (

音高追蹤

)

Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) [email protected]

, http://mirlab.org/jang

Pitch ( 音高)  Definition of pitch  Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform  Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)  Characteristics of pitch  Noise and unvoiced sound do not have pitch.

Pitch Tracking ( 音高追蹤 )  Pitch tracking: To compute the pitch vector of a give waveform ( 對整段音訊求取音高 )  Applications   Query by singing/humming ( 哼唱選歌 ) Tone recognition for Mandarin ( 華語的音調辨識 )  Intonation scoring for English ( 英語的音調評分 )  Prosody analysis for speech synthesis ( 語音合成中的韻律 分析 )  Pitch scaling and duration modification ( 音高調節與長度 改變 )

Pitch Tracking Algorithms  Two categories for pitch tracking algorithms  Time domain ( 時域 )  ACF (Autocorrelation function)  AMDF (Average magnitude difference function)  SIFT (Simple inverse filtering tracking)  Frequency domain ( 頻域 )  Harmonic product spectrum method  Cepstrum method

Typical Steps for Pitch Tracking  Chop signals into frames (aka frame blocking)  Compute pitch functions (ACF, AMDF, etc.)  Determine pitch for a frame  Max/min picking of the pitch function  Remove unreliable pitch  Via volume/clarity thresholding  Smooth the whole pitch vector  Via median filter, etc.

Frame Blocking 0.3

0.2

0.1

0 -0.1

-0.2

-0.3

-0.4

0 500 1000 1500 2000 2500 Overlap

Frame size=256 points

Frame

Overlap=84 points Frame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec Zoom in

-0.1

-0.2

-0.3

-0.4

0 0.3

0.2

0.1

0 50 100 150 200 250 300

ACF: Auto-correlation Function 1 128 Frame s(i): Shifted frame s(i+ t ):

acf

n i

  0 t  t  t =30 acf(30) = inner product of overlap part Pitch period 30

ACF Example 1  sunday.wav

 Sample rate = 16kHz  Frame size = 512 (starting from point 9000)  Fundamental frequency  Max of ACF occurs at index 132  FF = 16000/(132-1) = 123.077 Hz

ACF Example 2  If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point:  Min FF=40Hz  acf(fs/40:end) is not considered.

 Max FF=1000Hz  acf(1:fs/1000) is not considered.

Pitch Tracking via ACF  Specs  Sampe rate = 11025 Hz  Frame size = 353 points = 32 ms  Overlap = 0  Frame rate = 31.25 f/s  Playback  soo.wav

 sooPitch.wav

Variations of ACF to Avoid Tapering  Normalized version

acf

n i

  0 t

n

 t  t   Half-frame shifting:

acf

i n

/ 2   0  t 

Variations of ACF to Normalize Range  To normalize ACF to the range [-1 1]:

nsdf

  2 

s

2  

s

2  

i

t   t  This is based on the inequality:  

x

2 

y

2   2

xy

x

2 

y

2

AMDF: Average Magnitude Difference Function 1 128 Frame s(i): Shifted frame s(i+ t ): t =30 Pitch period amdf(30) = sum of abs. difference

amdf

n i

  0 t  t  30

AMDF Example  sunday.wav

 Sample rate = 16kHz  Frame size = 512 (starting from point 9000)  Fundamental frequency  Min of AMDF occurs at index 132  FF = 16000/(132-1) = 123.077 Hz

Variations of AMDF to Avoid Tapering  Normalized version

amdf

n i

  0 t

n

 t  t   Half-frame shifting:

amdf

i n

 / 2  0  t 

Combining ACF and AMDF Frame ACF AMDF ACF/AMDF

Example of Pitch Tracking soo.wav

200 100 0 -100 -200 1 2 3 4 5 6 7 8 60 58 56 54 52 PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22 1 2 3 4 5 pitch1: computed pitch 6 7 8

UPDUDP (1/4)  UPDUDP: Unbroken Pitch Determination Using DP  Goal: To take pitch smoothness into consideration cost 

p

,  ,

m

 

i n

  1

amdf i

 

i

  

i

 1

n

  1

p i

p i

 1

m

p

 

p

1 , 

p i

, 

p n

 : a given path in the AMDF matrix 

n

  

m

: Number of frames : Transition penalty : Exponent of the transition difference 18/44

UPDUDP (2/4)  Optimum-value function

D(i, j)

: the minimum cost starting from frame 1 to position

(i, j)

 Recurrent formula:

D

(

i

,

j

) 

amdf i i

 (

j

) 

k

 min 8 , 160     8 , 160  

D

(

i

 1 ,

k

)    Initial conditions :

D

( 1 ,

j

) 

amdf

1 (

j

), 

k

j

  8 , 160 

j

 Optimum cost :

j

 min 8 , 160 

D

(

n

,

j

) 2 

UPDUDP (3/4)  A typical example

UPDUDP (4/4)  Insensitivity in  3 x 10 4 xi lu 2 1 0 -1 -2 -3 0 x 0.5

i l u ch 1 a chan sheng nn sh 1.5

ng ch chang a ng 2 xi lu chan sheng chang 80 70 60 50 40 30 20 0 x 0.5

i l u ch 1 a Time (seconds) nn sh 1.5

ng ch a ng 2  =0      =2000 =4000 =6000 =8000 =10000 =12000 =16000 =18000 =20000

Harmonic Product Spectrum  hps.m

Frequency to Semitone Conversion  Semitone : A music scale based on A440

semitone

 12  log 2  Reasonable pitch range:

freq

440  69  E2 - C6  82 Hz - 1047 Hz ( )

Unreliable Pitch Removal  Pitch removal via volume thresholding Waveform of 小 毛 驢 .wav

100 50 0 -50 -100 1 2 3 4 Volume 5 6 7 10000 5000 0 1 2 3 4 Pitch 5 6 7 80 70 60 50 40 1 2 3 4 Time (sec) 5 6 7 8

Unreliable Pitch Removal  Pitch removal via volume/clarity thresholding Waveform of 小 毛 驢 .wav

100 0 -100 1 2 3 4 Volume 5 6 7 8 10000 5000 0 1 2 3 4 Clarity 5 6 7 1 0.5

0 1 2 3 4 Pitch 5 6 7 80 60 40 1 2 3 4 Time (sec) 5 6 7

With rests

Rest Handling

Without rests

Rest Handling Original PV Original pitch vectors with rests.

Rests are replaced by previous nonzero pitch. Good for LS.

Rests are removed. Good for DTW.

70 65 60 55 0 70 65 60 55 0 70 65 60 55 0 50 50 100 useRest=1 150 200 250 20 40 60 80 useRest=0 100 120 140 160 180 100 Frame index 150 200 250

Typical Result of Pitch Tracking

Pitch tracking via autocorrelation for

茉莉花

(jasmine)

Comparison of Pitch Vectors

Yellow line : Target pitch vector

Demo of Pitch Tracking  Real-time display of ACF for pitch tracking  toolbox/sap/goPtByAcf.mdl

 Real-time pitch tracking for real-time mic input  toolbox/sap/goPtByAcf2.mdl

 Pitch scaling  pitchShiftDemo/project1.exe

 pitchShift-multirate/multirate.m

 Intonation assessment  ap170/matlab/goDemo.m