Automatic Transcription of Polyphonic Piano music using a

Download Report

Transcript Automatic Transcription of Polyphonic Piano music using a

Automatic transcription of polyphonic
piano music using a note masking
technique
Mr Ronan Kelly and Dr Jacqueline Walker
Department of Electronic & Computer Engineering
University of Limerick
[email protected], [email protected]
Overview
•
•
•
•
•
•
Music transcription
Our approach
Onset detection
Algorithm
Results
Conclusions
Music Transcription
• Complex cognitive task
Example: Top of the Pops!
• A challenging task for a computer but
one which pushes boundaries of signal
processing, pattern recognition,
machine learning,….
Monophonic Music Transcription
• A solved problem
– Sliding window-based analysis of melody
line
– Steps – decimate – reduce data
– Onset detecton
– FFT or constant Q transform
– Note detection
Polyphonic Music Transcription
• Multiple simultaneous notes
• In Western Tonal Music (WTM), notes
played together almost inevitably share
harmonics
• Impact of rhythms, held notes
• Possibility of multiple instruments
Approaches to Polyphonic Transcription
• Human audition based
– Martin Cooke’s “Modelling Auditory Processing and
Organisation”, 1993
– Brown & Cooke, “Computational Auditory Scene Analysis”,
1994
• Signal processing based
– Tanguiane “Artificial Perception and Music
Recognition”, 1993
– Klapuri et al, since 1998
Our Approach
• Onset Detection
• Note Window & FFT
• Masking Scheme Iteration
Onset Detection
• NAE (Note Average Energy) Onset detection1.
Power Envelope
p(t)
(a)
Energy
e(t)
(b)
Average Energy
a(t)
(c)
Note Average Energy
NAE(t)
(d)
1
=
NAE (t)
t tn
 p(t) dt, (t <t <t
t
tn
n
In practice, we search
for local minima…
Figure 3 Energy (b), averaged energy (c) and note
average energy (d) of power envelope (a).
1. (Liu, R., Griffith J., Walker, J. & Murphy, P., TIME DOMAIN NOTE AVERAGE ENERGY BASED
MUSIC ONSET DETECTION, Proceedings of the Stockholm Music Acoustics Conference, August
6-9, 2003 (SMAC 03), Stockholm, Sweden
)
n+1
Note Window
• FFT performed on the whole note
• Avoids start-of-note and end-of-note effects
• Gives greater robustness against noise
Algorithm for Masking Scheme - 1
FFT on note window
Find max peak in
window
Remove peak from
window; add to list
Continue until
no peaks
above
threshold
Algorithm for Masking Scheme - 2
Apply mask to first
(lowest) frequency in list
Adjust amplitudes of all
affected frequencies by
mask
Add frequency to note
list; move to next
frequency
Continue until
list is empty
Masking Scheme - 1
C4, E4, G4
262 Hz, 330 Hz, 392 Hz
Max. peak amplitude = 29.9
@ 392 Hz (G4)
Next peak amplitude =
21.4 @ 330 Hz
Masking Scheme - 2
Detected frequency peaks
30
25
20
Am plitude 15
10
5
0
Frequency (Hz) Amplitude
262
330
392
523
Frequency (Hz)
262
11.2
330
21.4
392
29.9
523
7.1
Note mask
1
Frequency (Hz) Amplitude
0.8
Am plitude
260,261,262
100%
523,524
72%
784,785
41%
0.6
0.4
0.2
0
261
523
Frequency (Hz)
784
Masking Scheme - 3
Masking action
Note played:
C4
30
25
20
Am plitude 15
10
C4 Mask
Values Detected
5
0
262
After masking
392
523
Fre quency (Hz)
Frequency (Hz) Amplitude
30
Amplitude
330
25
20
15
10
5
0
Remaining detected values
262
330
392
Frequency (Hz)
523
330
21.4
392
29.9
523
3.1
Building a Note Mask - 1
A note is played with other notes and the significant
frequency peaks and amplitudes recorded:
D4 harmonics in common
in blue
harmonics of D4 in red
Building a Note Mask - 2
50
45
40
35
30
Amplitude 25
20
15
10
5
0
35
30
25
20
Amplitude
D4 Values
15
C4 Values
10
D4 Values
A4 Values
D4 + A4 values
5
0
262 523 785 1047 1309 1570 1832
294
587
1174
1469
Frequency (Hz)
Frequency (Hz)
D4 and C4
D4 and A4
2056
Building a Note Mask - 3
Extract values unique to D4 and normalise to
amplitude of highest peak:
Frequency D4, C4
(Hz)
D4, E4
D4, F4
D4, G4
D4, A4
D4, B4
294
1
1
1
1
1
1
587
0.70
0.67
0.76
0.75
0.84
0.65
881
0.38
0.37
0.44
0.44
1175
0.11
1468
0.17
0.16
0.15
1762
0.12
0.11
0.12
2056
0.27
0.25
0.40
0.12
0.28
0.28
0.17
0.14
0.30
0.18
Building a Note Mask - 3
Average across samples:
Frequency (Hz) Amplitude
294
100%
587
72.69%
881
40.63%
1175
11.49%
1468
15.93%
1762
11.61%
2056
26.03%
100
90
80
Amplitude % 70
60
of
50
Fundamental
40
Frequency 30
20
10
0
D4 Mask
294 587 881 1175 1468 1762 2056
Frequency
Experimental Set-up
• Keyboard used: Technics KN800
PCM Keyboard
• Note range: C2 to B6
• Recording – direct using line-in
• Isolated chords and polyphonic
music samples
Results
How to define error?
Need to account for both missed notes (m)
and spurious notes (x)
m+ x
%E = 
100%
 n 
n is number of notes detected – not number of notes
played
Results – Isolated Chords
Notes
Played
Notes
detected
Missed
notes
Spurious
notes
Total Error (%)
Chords
243
5-8 notes
225
18
0
8.0
Chords
648
3-4 notes
638
15
5
3.1
Chords
1906
69
77
7.7
1898
Results – Polyphonic Music
Notes
played
Notes
detected
Missed
notes
Spurious
notes
Total Error
(%)
Danny Boy
(slow)
87
94
7
14
22
Danny Boy
(moderate)
91
98
8
15
23.5
Danny Boy
(fast)
90
99
8
17
25
Effect of Onset Detection
• Effective onset detection is crucial
• Two types of errors:
Extra onset
less likely to cause a problem
but, … note divided up too finely
Missing onset
note windows not placed ‘correctly’
Results with Onset Detection
Notes
played
Notes
detected
Missed
notes
Spurious
notes
Total Error
(%)
Danny Boy
(slow)
87
120
10
43
44
Danny Boy
(moderate)
91
120
17
28
44
Danny Boy
(fast)
90
120
23
37
58
Future Work
• Develop model for note combinations
(polyphonic note masks)
• Use wider range of note combinations
• Develop an efficient approach to applying
polyphonic note masks
• Improve note onset detection