A Real-Time Program Trace Compressor Utilizing Double Move

Download Report

Transcript A Real-Time Program Trace Compressor Utilizing Double Move

Presenter: Shao-Jay Hou
 This
paper introduces a new unobtrusive
and cost-effective method for the capture
and compression of program execution
traces in real-time, which is based on a
double move-to-front transformation. We
explore its effectiveness and describe a
cost-effective hardware implementation.
The proposed trace compressor requires
only 0.12 bits per instruction of trace port
bandwidth, at the cost of 25K gates.

Continual growth in the complexity of SoC makes
traditional approaches infeasible or impractical.

In-Circuit-Emulator (ICE)
。Invasively ,have to stop the processor

Software approach
。Use breakpoint

Software step-by-step debug waste too much
time.

There are another tracing system, but the cost or
the area is to high.

C.F.Kao’s LZ-based program trace.
My thesis(ideal)
The application
of tracer.
An Embedded Infrastructure of
Debug and Trace Interface for the
DSP Platform
The improve
of tracer.
A Real-Time Program Trace
Compressor Utilizing Double
Move-to-Front Method
This paper
The program trace system for each
embedded system processor
Another program trace system
and compression techniques
ARM ETM[1]
Altera Nios II[2]
Xilinx Microblaze[3]
Lauterbach[4]
For the experiment benchmark
Mibench[8]
[5-7]
Hardware architecture
The basic MTF method and bzip compression
MTF[10]
Bzip2[11]
To collect the symbols in system
SimpleScalar[9]
CAM[12]

For each instruction, there are two characteristics:



SA(staring address)
SL(stream length)
PC(program counter)

If the current PC is differ form the previous PC
than the instruction length, the current instruction
is the beginning of a new stream.

The unconditional direct branch we do not
terminate the current stream because the
address of the next instruction in sequence can
be inferred directly from the binary.

Some parameters in this method:




Ht(history table)
Input
Output
A easy example to explain it:



If the history table ht=[C,B,A]
The input is {AABC}
The output should be 2022

Use two-level history table



Mtf1
Mtf2
The flow of DMTF:

The output bits of DMTF:





Mtf2.zhr=>mtf2 zero entry hit rate
Mtf2.ohr=>mtf2 non-zero entry hit rate
Mtf1.hr=>mtf1 hit rate
Mtf1.size=>mtf1 size
Mtf2. size=> mtf2 size

Am example of DMTF

The result of DMTF

Last-Value Predictor for Upper Address Bits:



Upper bits of SA in stream is rarely change during
program execution
Use SA[31:20] as HLV
Zero Hit Trace Counters:



Mtf2 zero entry hit happen often
Use a counter to count and dynamically adjust the size
of ZLC
Use a head bit “0”

The block diagram of enhance DMTF trace
format:



bDMTF = basic DMTF
hDMTF = DMTF with HLV
eDMTF = enhance DMTF



content addressable memory (CAM)
most-recently used (MRU) stack
the gates count of DMTF(192,4) is less then 24600

The paper present a double move-to-front
method, and the compression ratio is between
82.7:1~29389:1(average is 268:1) and the
bandwidth is 0.001 to 0.39 bits/instruction
(average is 0.12)

And the hardware is area-save

Compare to C.F.Kao[5], the area is half.

The paper give a good method in compression.

And the paper use many example to show the
method how to run.

But the paper didn’t talk too much on how to
experiment, and how to get the data.