Transcript Document
TECHNION - ISRAEL INSTITUTE OF TECHNOLOGY Electrical Engineering Department Software Systems Lab Multi-Threading LAME MP3 Encoder Performed by : Gilad Riachshtian Copyright, 2004 © Gilad Raichshtain. Talk Layout What is the L.A.M.E. Project ? Project Goal MP3 Encoding & Hyper-Threading Overview Multi-Threading strategies Results & Remarks Future Work What is the L.A.M.E. Project? An Open Source project An Educational Tool used for learning about MP3 encoding It’s goal is to improve – Psycho-acoustics quality – The speed of MP3 encoding Lame is the most popular state of the art MP3 encoder/decoder used by today’s leading products. FOR MORE INFO... http://lame.sourceforge.net Project Goal Speeding up the encryption of an audio stream Turning LAME into a Multi-Threaded (MT) engine Be 1:1 bit compatible with the original version Optimize specifically for SMT platforms (implementation on Intel’s P4 with HyperThreading Technology) Thread Level Parallelism Provides thread level parallelism on each processor Resulting in – Increased use of processor execution resources – Higher processing throughput Achieved by duplicating the architectural state on each processor, while sharing one set of processor execution resources MP3 Encoding Overview Break up the audio stream into frames (uniform chunks, typically ~1K) Frame 1 FrameAudio 2 Frame Stream 3 Frame 4 Read Frame Perceptual PsychoAcoustic Model Analysis Filterbank MDCT Quantization Specifically in LAME Bitstream Huffman Encoding Encode LAME MT – Intuitive approach The intuitive approach: Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Thread 1: Thread 2: An unbreakable dependence This is actually Data Decomposition due to Huffman Encoding LAME MT – Functional Decomposition Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Floating Point Intensive T1: Read Frame PsychoAcoustic Analysis Filterbank MDCT Quantization T2: Integer Intensive Huffman Encoding Results Results due to Multi-Threading SMT Platform CBR / VBR SMP Platform CBR / VBR Using Microsoft’s Compiler 22% / 32% 38% / 62% Using Intel’s Compiler 8.1 20% / 29% 44% / 59% Results using Intel’s Compiler 8.1 SMT Platform CBR / VBR SMP Platform CBR / VBR LAME Original Code 3.97a 21% / 19% 22% / 17% LAME MT Code 19% / 17% 28% / 15% Overall Performance Results SMT Platform CBR / VBR LAME MT code + Using Intel’s Compiler 8.1 SMP Platform CBR / VBR 52% / 70% 78% / 109% Remarks Architectural Issues – Pitfall found in version 3.93: • Memory access to two different pages with the same offset • ~11% speedup achieved by fixing it • No longer relevant in later versions – No major arch issues found in versions 3.94-3.97a Implement a PNI version for FFT – No significant gain achieved Overall ~40 blocks of code were change and are under #ifdef Future work Future Work Splitting the encoding process into more than two steps Reading frames in parallel