Transcript Document
TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY
Electrical
Engineering
Department
Software
Systems
Lab
Multi-Threading LAME MP3 Encoder
Performed by : Gilad Riachshtian
Copyright, 2004 © Gilad Raichshtain.
Talk Layout
What is the L.A.M.E. Project ?
Project Goal
MP3 Encoding & Hyper-Threading Overview
Multi-Threading strategies
Results & Remarks
Future Work
What is the L.A.M.E. Project?
An Open Source project
An Educational Tool used for learning about MP3
encoding
It’s goal is to improve
– Psycho-acoustics quality
– The speed of MP3 encoding
Lame is the most popular state of the art MP3
encoder/decoder used by today’s leading
products.
FOR MORE INFO...
http://lame.sourceforge.net
Project Goal
Speeding up the encryption of an audio stream
Turning LAME into a Multi-Threaded (MT) engine
Be 1:1 bit compatible with the original version
Optimize specifically for SMT platforms
(implementation on Intel’s P4 with HyperThreading Technology)
Thread Level Parallelism
Provides thread level parallelism on each
processor
Resulting in
– Increased use of processor execution resources
– Higher processing throughput
Achieved by duplicating the architectural state on
each processor, while sharing one set of
processor execution resources
MP3 Encoding Overview
Break up the audio stream into frames (uniform chunks,
typically ~1K)
Frame 1 FrameAudio
2 Frame
Stream
3 Frame 4
Read Frame
Perceptual
PsychoAcoustic
Model
Analysis
Filterbank
MDCT
Quantization
Specifically in LAME
Bitstream
Huffman
Encoding
Encode
LAME MT – Intuitive approach
The intuitive approach:
Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6
Thread 1:
Thread 2:
An unbreakable dependence
This is actually Data Decomposition
due to Huffman Encoding
LAME MT – Functional Decomposition
Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6
Floating Point Intensive
T1:
Read Frame
PsychoAcoustic
Analysis
Filterbank
MDCT
Quantization
T2:
Integer Intensive
Huffman
Encoding
Results
Results due to Multi-Threading
SMT Platform
CBR / VBR
SMP Platform
CBR / VBR
Using Microsoft’s
Compiler
22% / 32%
38% / 62%
Using Intel’s
Compiler 8.1
20% / 29%
44% / 59%
Results using Intel’s Compiler 8.1
SMT Platform
CBR / VBR
SMP Platform
CBR / VBR
LAME Original
Code 3.97a
21% / 19%
22% / 17%
LAME MT Code
19% / 17%
28% / 15%
Overall Performance Results
SMT Platform
CBR / VBR
LAME MT code
+
Using Intel’s
Compiler 8.1
SMP Platform
CBR / VBR
52% / 70% 78% / 109%
Remarks
Architectural Issues
– Pitfall found in version 3.93:
• Memory access to two different pages with the same offset
• ~11% speedup achieved by fixing it
• No longer relevant in later versions
– No major arch issues found in versions 3.94-3.97a
Implement a PNI version for FFT
– No significant gain achieved
Overall ~40 blocks of code were change and are
under #ifdef
Future work
Future Work
Splitting the encoding process into more than two
steps
Reading frames in parallel