A Real-Time Program Trace Compressor Utilizing Double Move
Download
Report
Transcript A Real-Time Program Trace Compressor Utilizing Double Move
Presenter: Shao-Jay Hou
This
paper introduces a new unobtrusive
and cost-effective method for the capture
and compression of program execution
traces in real-time, which is based on a
double move-to-front transformation. We
explore its effectiveness and describe a
cost-effective hardware implementation.
The proposed trace compressor requires
only 0.12 bits per instruction of trace port
bandwidth, at the cost of 25K gates.
Continual growth in the complexity of SoC makes
traditional approaches infeasible or impractical.
In-Circuit-Emulator (ICE)
。Invasively ,have to stop the processor
Software approach
。Use breakpoint
Software step-by-step debug waste too much
time.
There are another tracing system, but the cost or
the area is to high.
C.F.Kao’s LZ-based program trace.
My thesis(ideal)
The application
of tracer.
An Embedded Infrastructure of
Debug and Trace Interface for the
DSP Platform
The improve
of tracer.
A Real-Time Program Trace
Compressor Utilizing Double
Move-to-Front Method
This paper
The program trace system for each
embedded system processor
Another program trace system
and compression techniques
ARM ETM[1]
Altera Nios II[2]
Xilinx Microblaze[3]
Lauterbach[4]
For the experiment benchmark
Mibench[8]
[5-7]
Hardware architecture
The basic MTF method and bzip compression
MTF[10]
Bzip2[11]
To collect the symbols in system
SimpleScalar[9]
CAM[12]
For each instruction, there are two characteristics:
SA(staring address)
SL(stream length)
PC(program counter)
If the current PC is differ form the previous PC
than the instruction length, the current instruction
is the beginning of a new stream.
The unconditional direct branch we do not
terminate the current stream because the
address of the next instruction in sequence can
be inferred directly from the binary.
Some parameters in this method:
Ht(history table)
Input
Output
A easy example to explain it:
If the history table ht=[C,B,A]
The input is {AABC}
The output should be 2022
Use two-level history table
Mtf1
Mtf2
The flow of DMTF:
The output bits of DMTF:
Mtf2.zhr=>mtf2 zero entry hit rate
Mtf2.ohr=>mtf2 non-zero entry hit rate
Mtf1.hr=>mtf1 hit rate
Mtf1.size=>mtf1 size
Mtf2. size=> mtf2 size
Am example of DMTF
The result of DMTF
Last-Value Predictor for Upper Address Bits:
Upper bits of SA in stream is rarely change during
program execution
Use SA[31:20] as HLV
Zero Hit Trace Counters:
Mtf2 zero entry hit happen often
Use a counter to count and dynamically adjust the size
of ZLC
Use a head bit “0”
The block diagram of enhance DMTF trace
format:
bDMTF = basic DMTF
hDMTF = DMTF with HLV
eDMTF = enhance DMTF
content addressable memory (CAM)
most-recently used (MRU) stack
the gates count of DMTF(192,4) is less then 24600
The paper present a double move-to-front
method, and the compression ratio is between
82.7:1~29389:1(average is 268:1) and the
bandwidth is 0.001 to 0.39 bits/instruction
(average is 0.12)
And the hardware is area-save
Compare to C.F.Kao[5], the area is half.
The paper give a good method in compression.
And the paper use many example to show the
method how to run.
But the paper didn’t talk too much on how to
experiment, and how to get the data.