Transcript Document

TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY
Electrical
Engineering
Department
Software
Systems
Lab
Multi-Threading LAME MP3 Encoder
Performed by : Gilad Riachshtian
Copyright, 2004 © Gilad Raichshtain.
Talk Layout






What is the L.A.M.E. Project ?
Project Goal
MP3 Encoding & Hyper-Threading Overview
Multi-Threading strategies
Results & Remarks
Future Work
What is the L.A.M.E. Project?

An Open Source project
 An Educational Tool used for learning about MP3
encoding
 It’s goal is to improve
– Psycho-acoustics quality
– The speed of MP3 encoding

Lame is the most popular state of the art MP3
encoder/decoder used by today’s leading
products.
FOR MORE INFO...
http://lame.sourceforge.net
Project Goal

Speeding up the encryption of an audio stream
 Turning LAME into a Multi-Threaded (MT) engine
 Be 1:1 bit compatible with the original version
 Optimize specifically for SMT platforms
(implementation on Intel’s P4 with HyperThreading Technology)
Thread Level Parallelism

Provides thread level parallelism on each
processor
 Resulting in
– Increased use of processor execution resources
– Higher processing throughput

Achieved by duplicating the architectural state on
each processor, while sharing one set of
processor execution resources
MP3 Encoding Overview
Break up the audio stream into frames (uniform chunks,
typically ~1K)
Frame 1 FrameAudio
2 Frame
Stream
3 Frame 4
Read Frame
Perceptual
PsychoAcoustic
Model
Analysis
Filterbank
MDCT
Quantization
Specifically in LAME
Bitstream
Huffman
Encoding
Encode
LAME MT – Intuitive approach
The intuitive approach:
Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6
Thread 1:
Thread 2:
An unbreakable dependence
This is actually Data Decomposition
due to Huffman Encoding
LAME MT – Functional Decomposition
Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6
Floating Point Intensive
T1:
Read Frame
PsychoAcoustic
Analysis
Filterbank
MDCT
Quantization
T2:
Integer Intensive
Huffman
Encoding
Results
Results due to Multi-Threading
SMT Platform
CBR / VBR
SMP Platform
CBR / VBR
Using Microsoft’s
Compiler
22% / 32%
38% / 62%
Using Intel’s
Compiler 8.1
20% / 29%
44% / 59%
Results using Intel’s Compiler 8.1
SMT Platform
CBR / VBR
SMP Platform
CBR / VBR
LAME Original
Code 3.97a
21% / 19%
22% / 17%
LAME MT Code
19% / 17%
28% / 15%
Overall Performance Results
SMT Platform
CBR / VBR
LAME MT code
+
Using Intel’s
Compiler 8.1
SMP Platform
CBR / VBR
52% / 70% 78% / 109%
Remarks

Architectural Issues
– Pitfall found in version 3.93:
• Memory access to two different pages with the same offset
• ~11% speedup achieved by fixing it
• No longer relevant in later versions 
– No major arch issues found in versions 3.94-3.97a

Implement a PNI version for FFT
– No significant gain achieved

Overall ~40 blocks of code were change and are
under #ifdef
Future work
Future Work

Splitting the encoding process into more than two
steps
 Reading frames in parallel