VLSI SoC 軟硬體共同設計homework 5

Download Report

Transcript VLSI SoC 軟硬體共同設計homework 5

2015/4/13
RGB_YUV 硬體設計
林鼎原
Department of Electrical Engineering
National Cheng Kung University
Tainan, Taiwan, R.O.C
1
NCKU EE CAD
Program code
void main(void)
{ int a, b, c;
…….
RGB_2_Y(I_Frame, O_Frame);
…….
}
void RGB_2_Y(I_Frame, O_Frame);
{ int y;
for (i=1, i<64, i++) {
y=0.257*a +0.504*b+0.098*c+16;
write(y) to O_Frame;}
}
Ding Yuan Lin
2
Pipelining Scheduling for
6 Pipeline
Latency
a
1
64
c8
>=
Each cycle
1 adders
2 multipliers
c7
status
V7
s1
c2 *
c1 *
V2
V1
64
status
+
c7
Loop body
c8
>=
status
+ c7
V7
0.257
c1
V2
c5
c5 +
V4
16
V3 16
*
+
V4
V5
+
y
c6
V4
c4 *
V5
s2
s3
s4
s5
+ c6
s6
y
s7
s8
0.098
s9
c4
s10
16
V5
+ c6
*
s11
s12
s13
s14
s15
* c2
0.098
*
c3
V1 V2
c3 + V3
y
0.504
+
* c2
c
b
V1
*
c1
a
64
c
0.504
0.257
>=
0.098
+ c5
b
a
c
b
0.504
c3 + V3
c8
Ding Yuan Lin
0.257
+
NCKU EE CAD
c4
s16
s17
s18
3
NCKU EE CAD
Lifetimes of Values
1
2
3
4
5
Left edge algorithm to allocate values
into registers
1
6
3
4
5
6
V1
V1
R1
V2
V5
V3
V6
V4
V2
V5
2
R2
V3
V6
V4
R3
V7
Ding Yuan Lin
V7
4
NCKU EE CAD
Lifetimes of Operations
1
2
3
4
5
乘法器
4
C2
C4
C8
+
C5
C7
+
加法器
C6
+
C7
+
6
*
*
C4
5
*
+
C3
Ding Yuan Lin
3
C1
*
C2
C8
2
*
*
C1
1
6
+
+
+
C3
+
C5
C6
+
5
NCKU EE CAD
IP Data Path
R1 = {V1, V5,} ,R2 = {V2, V3, V4}
C1, C4  multiplier 1
C2
→ multiplier 2
R3.ena
R3
status
clk
Controller
rst
AlU_op
s1
s2
s3
s4
R1.ena
R2.ena
R3.ena
0.257
a
s1
c
0 1
M1
0.098
s2
*
valid
1
M2
R1
16
R2 64 1
b 0.504
0 1
s3
*
M3
AlU_op
0
2
s4
1
2
M4
+/-
busy
multiplier 1
Ding Yuan Lin
0
R2.ena
R1.ena
multiplier 2
out
6
NCKU EE CAD
IP Controller
Next_state
State
Enable
Enabl
M1
e
M2
M3
M4
adder
s3
s4
Alu_op
status=1
status
=0
valid
busy
R1
R2
R3
s1
s2
State0
State1
State1
0
0
0
0
1
0
0
10
01
0
State1
State2
State2
0
1
1
1
1
0
0
10
10
1
State2
State3
State3
0
1
1
0
1
0
0
00
00
0
State3
State4
State4
0
1
1
0
0
0
0
00
00
0
State4
State5
State5
0
1
1
1
0
1
1
00
00
0
State5
State6
State6
0
1
1
1
0
1
1
01
00
0
State6
State0
State1
1
1
1
1
0
1
1
00
00
0
status = 0
rst
S0
S1
S2
S3
S4
S5
S6
status = 1
Ding Yuan Lin
7
NCKU EE CAD
Pre-allocation:設計方法
根據loop body直接設計成硬體,總共有7個暫存器(R1~R6,counter)
,4個加法器以及3個乘法器。
 乘法運算部分,先將小數乘上 256(2的八次方),也就是左移8位
元。再與8bits 輸入資料相乘,得到的結果會是16位元,此時將
後8位元捨去,留下來的就是整數部分。
 控制單元有7個狀態(s0~s6)

S0: reset 。
 S1:接收input data R,G,B 判斷counter 是否大於等於 0,如果成立則繼續做,
否則跳出。
 S2:讀取R、G,並開始運算 RGB_R*0.257,RGB_G*0.504 , counter減1 。
 S3: 運算RGB_R*0.257值存入V1,RGB_G*0.504值存入V2 。
 S4: 讀取input data c ,並開始運算 RGB_B*0.098, V3=V1+V2 。
 S5:運算 V3+16 ,運算RGB_B*0.098 值存入V5
 S6: Y=V4+V5。

Ding Yuan Lin
8
NCKU EE CAD
Verilog Code for Pre-allocation Design(1/5)
`timescale 1ns / 1ps
module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid);
// Input and output port 宣告
input
clk,reset;
input [23:0] rgb_in;
output [7:0] Y;
輸入和輸出埠
output
busy;
output
valid;
當busy為high時,rgb_in暫停輸入直到busy為low。
reg
busy;
當valid為high時,輸出的值才是有效得。
reg
valid;
reg [6:0] counter;
用來計數做的次數,並判斷是否該結束
reg [7:0] RGB_R, RGB_G, RGB_B;
reg [2:0] present_state,next_state;
reg [7:0] R3_tmp,R4_tmp,R6_tmp;
wire [7:0] R1_tmp,R2_tmp,R5_tmp;
reg [15:0] m1,m2,m3; // for 3 mutiplier
reg [7:0] R1,R2,R3,R4,R5,R6;
//sate parameter
parameter [2:0] s0=3'd0,s1=3'd1, s2=3'd2,
s3=3'd3,s4=3'd4, s5=3'd5,
s6=3'd6;
Ding Yuan Lin
9
NCKU EE CAD
Verilog Code for Pre-allocation Design(2/5)
//counter
always @(posedge clk)
begin if(reset) counter<=7‘d0;
else if (present_state==s6) counter<=counter+7'd1;
else
counter<=counter;
end
//data or state registers
always @ (posedge clk or posedge reset)
begin
if(reset) begin
//初始化
present_state <=s0;
RGB_R<=8‘d0;
RGB_G<=8’d0;
RGB_B<=8‘d0;
R1<=8’d0;
R2<=8‘d0;
R3<=8’d0;
R4<=8‘d0;
R5<=8’d0;
R6<=8‘d0;
end
end
Ding Yuan Lin
用來計數做的次數,並判
斷是否該結束執行。
else begin
(1/2)
present_state <=next_state;
if(present_state==s1)//state 1讀值
begin
RGB_R<=rgb_in[23:16];
RGB_G<=rgb_in[15:8] ;
RGB_B<=rgb_in[7:0] ;
end
R1<=R1_tmp;
R2<=R2_tmp;
R3<=R3_tmp;
R4<=R4_tmp;
R5<=R5_tmp;
R6<=R6_tmp;
end
end
(2/2)
10
NCKU EE CAD
Verilog Code for Pre-allocation Design(3/5)
//next state logic
always @ (present_state) .
begin
case(present_state)
s0: next_state=s1;
s1: next_state=s2;
s2: next_state=s3;
s3: next_state=s4;
s4: next_state=s5;
s5: next_state=s6;
default: next_state=s1;
endcase
end
//control signal
always @ (present_state or busy or counter )
begin
case(present_state)
s0: begin valid=1'b0; busy=1'b0; end
s1: begin valid=1'b0; busy=1'b0; end
s2: begin valid=1'b0; busy=1'b1; end
s3: begin valid=1'b0; busy=1'b1; end
s4: begin valid=1'b0; busy=1'b1; end
s5: begin valid=1'b0; busy=1'b1; end
s6: begin valid=1'b1; busy=1'b1; end
default: if(counter==7'd0)
begin valid=1'b0;busy=1'bx; end
else begin valid=1'b1;busy=1'b0; end
endcase
end
assign R1_tmp=m1[15:8];
assign R2_tmp=m2[15:8];
捨棄後8bits
assign R5_tmp=m3[15:8];
assign Y = (present_state==s6)? R6 : 8‘d0 ;
Ding Yuan Lin
狀態S6 時 輸出Y
11
NCKU EE CAD
Verilog Code for Pre-allocation Design(4/5)
//rgb to y execution
always @(* )
begin case(present_state)
s0: begin
m1=16'd0;
m2=16'd0;
m3=16'd0;
R3_tmp=8'd0;
R4_tmp=8'd0;
R6_tmp=8'd0;
end
s1: begin
m1={R1,8'd0}; //read data
m2={R2,8'd0}; //read data
m3={R5,8'd0}; //read data
R3_tmp=R3;
R4_tmp=R4;
R6_tmp=R6;
end
s2: begin
m1=RGB_R * 8'd66; //action 0.257
m2=RGB_G * 8'd129; //action 0.504
m3={R5,8'd0};
R3_tmp=R3;
R4_tmp=R4;
R6_tmp=R6;
end
s3: begin
m1={R1,8'd0};
m2={R2,8'd0};
m3=RGB_B * 8'd25;//action 0.098
R3_tmp=R1+R2; //action
R4_tmp=R4;
R6_tmp=R6;
end
(2/4)
(1/4)
Ding Yuan Lin
12
NCKU EE CAD
Verilog Code for Pre-allocation Design (5/5)
s6: begin
s4: begin
m1={R1,8'd0};
m2={R2,8'd0};
m3={R5,8'd0};
R3_tmp=R3;
R4_tmp=R4;
R6_tmp=R6;
m1={R1,8'd0};
m2={R2,8'd0};
m3={R5,8'd0};
R3_tmp=R3;
R4_tmp=R3+8'd16;//action
R6_tmp=R6;
end
s5: begin
m1={R1,8'd0};
m2={R2,8'd0};
m3={R5,8'd0};
R3_tmp=R3;
R4_tmp=R4;
R6_tmp=R4+R5;
end
(3/4)
Ding Yuan Lin
end
default: begin
m1=16'd0;
m2=16'd0;
m3=16'd0;
R3_tmp=8'd0;
R4_tmp=8'd0;
R6_tmp=8'd0;
end
endcase
end
(4/4)
13
NCKU EE CAD
Post-allocation : 設計方法

根據Life time 分析,可找出以下共用的地方:
乘法器共用後只需2個
 加法器共用後只需1個
 暫存器:R1,R5,可共用
,並重新命名為R1
R2,R3,R4可共用,並重新命名為R2
counter ,
重新命名為R3


控制電路包含控制四個多工器用的控制訊號、adder加剪法運算
控制訊號、暫存器寫入訊號 reg_ena。
Ding Yuan Lin
14
NCKU EE CAD
Verilog Code for Post-allocation Design(1/6)
`timescale 1ns / 1ps
module rgb_to_yuv( clk,reset,rgb_in, Y,busy,valid);
// Input and output port 宣告
input
clk,reset;
input [23:0] rgb_in;
output [7:0] Y;
輸入和輸出埠
output
busy;
output
valid;
當busy為high時,rgb_in暫停輸入直到busy為low。
reg
busy;
當valid為high時,輸出的值才是有效得。
reg
valid;
reg [7:0] RGB_R, RGB_G, RGB_B;
reg [2:0] present, state,next_state;
reg [7:0] R1,R2,R3;//shared registers
reg [15:0] mux1, mux2;
reg [7:0] mux3, mux4;
reg [7:0] mul_reg1, mul_reg2;
//sate parameter
reg [15:0] mul1, mul2;//two multiplier
parameter [2:0] s0=3'd0,
reg [7:0] add;// one adder
s1=3'd1,
wire status;
s2=3'd2,
//select line
s3=3'd3,
reg R1_ena,sel_12,R2_ena,R3_ena,alu_op ;
s4=3'd4,
reg [1:0] sel_3 ,sel_4;
s5=3'd5,
s6=3'd6;
Ding Yuan Lin
15
NCKU EE CAD
Verilog Code for Post-allocation Design(2/6)
//data or state registers
always @ (posedge clk or posedge reset)
begin
if(reset) begin
//初始化
present_state <=s0;
RGB_R<=8‘d0;
RGB_G<=8’d0;
RGB_B<=8‘d0;
mul_reg1<= 8'd0; .
mul_reg2<= 8'd0;
R1<=8’d0;
R2<=8‘d0;
R3<=8’d0;
end
end
else begin
(1/2)
present_state <=next_state;
if(present_state==s1&& status ==1’d0)//state 1讀值
begin
RGB_R<=rgb_in[23:16];
RGB_G<=rgb_in[15:8] ;
RGB_B<=rgb_in[7:0] ;
end
mul_reg1 <= mul1 [15:8];
mul_reg2 <= mul2 [15:8];
R1
<= mul_reg1;
if (R2_ena==1'b0) R2<=mul_reg2;
else if(R3_ena==1'b1&& alu_op==1’b1)
R3<=mux3-mux4 ;
else R2<=add;
end
end
(2/2)
assign status=(R3>=0)?1'b0:1'b1;
assign Y = (present_state==s6)? add : 8‘d0 ;
Ding Yuan Lin
狀態S6時,輸出Y
16
NCKU EE CAD
Verilog Code for Post-allocation Design(3/6)
//next state logic
always @ (present_state) .
begin
case(present_state)
s0: next_state=s1;
s1: next_state=s2;
s2: next_state=s3;
s3: next_state=s4;
s4: next_state=s5;
s5: next_state=s6;
default: next_state=s1;
endcase
end
Ding Yuan Lin
//control signal
always @ (present_state or busy or counter )
begin
case(present_state)
s0: begin valid =1'b0;
busy =1'b0;
R2_ena=1'b0;
R3_ena=1'b1;
sel_12=1'b0;
sel_3 =2'b10;
sel_4 =2'b01;//64
alu_op=1'b0; //add
end
s1: begin valid =1'b0;
busy =1'b0;
R2_ena=1'b1;
R3_ena=1'b1;
sel_12=1'b0;
sel_3 =2'b10;//R3
sel_4 =2'b10;//1
alu_op=1'b1;//sub
end
(1/4)
17
NCKU EE CAD
Verilog Code for Post-allocation Design(4/6)
s2: begin valid=1'b0;
busy =1'b1;
R2_ena=1'b0;
R3_ena=1'b1;
sel_12=1'b0;
sel_3 =2'b0;
sel_4 =2'b0;
alu_op=1'b0;//add
end
s3: begin valid =1'b0;
busy =1'b1;
R2_ena=1'b0;
R3_ena=1'b0;
sel_12=1'b0;
sel_3 =2'b00;
sel_4 =2'b00;
alu_op=1'b0;
end
Ding Yuan Lin
(2/4)
s4: begin valid=1'b0;
busy=1'b1;
sel_12=1'b1;
R2_ena=1'b1;
R3_ena=1'b0;
sel_3 =2'b00;
sel_4 =2'b00;
alu_op=1'b0;
end
s5: begin valid=1'b0;
busy=1'b1;
R2_ena=1'b1;
R3_ena=1'b0;
sel_12=1'b1;
sel_3 =2'b01;
sel_4 =2'b00;
alu_op=1'b0;
end
(3/4)
18
NCKU EE CAD
Verilog Code for Post-allocation Design(5/6)
s6: begin valid=1'b1;
busy=1'b1;
sel_12=1'b0;
sel_3 =2'b00;
sel_4 =2'b00;
R2_ena=1'b1;
R3_ena=1'b0;
alu_op=1'b0;
end
default: begin valid=1'b0;
busy=1'b0;
sel_12=1'b0;
sel_3 =2'b00;
sel_4 =2'b00;
R2_ena=1'b0;
R3_ena=1'b0;
alu_op=1'b0;
end
endcase
end
Ding Yuan Lin
//Mux1 and Mux2
always@(sel_12 or RGB_R or RGB_B)
begin case(sel_12)
1'b0: begin
mux1 = RGB_R;
mux2 = 16'd66;// 0.257
end
default: begin
mux1 = RGB_B ;
mux2 = 16'd25; //0.098
end
endcase
end
(4/4)
19
NCKU EE CAD
Verilog Code for Post-allocation Design(6/6)
//Mux3
always@(sel_3 or R1 or R3 )
begin
case(sel_3)
2'b00: mux3 = R1;
2'b01: mux3 = 8'd16;
2'b10: mux3 = R3;
default: mux3 =8'd0;
endcase
end
//Mux4
always@(sel_4 or R2 )
begin
case(sel_4)
2'b00: mux4 = R2;
2'b01: mux4 = 8'd64;
2'b10: mux4 = 8'd1;
default: mux4 = 8'd0;
endcase
end
Ding Yuan Lin
//ALU
always@(mux1 or mux2 or mux3 or mux4 or
RGB_R or RGB_G or alu_op or R1 or R2 or R3 )
begin
mul1 = mux1 * mux2;
mul2 = RGB_G* 16'd129; //0.504
if(alu_op==1'b1)
add = mux3 - mux4;
else add = mux4+mux3;
end
20
NCKU EE CAD
波形圖
RGB 輸入 (hex)
Control signal
Ding Yuan Lin
busy 為high 時 暫停資料輸入
Valid high 輸出為有效的
alu_op 為high 時 adder 做減法
Status 為high時不再接受任何資料
21
NCKU EE CAD
Pattern 驗證結果
計算完的結果和預期結果比較正確性
總共64筆資料(0~63)。
Ding Yuan Lin
22
NCKU EE CAD
Quartus 參數設定
Ding Yuan Lin
23
NCKU EE CAD
數據分析
Pre_allocation
Post_allocation
由結果可看出,暫存器共用後的結果,total logic elements 由原先 125 減少為 91。
Ding Yuan Lin
24
NCKU EE CAD
Pre_allocation合成分析

Xlinx合成結果使用了3個乘法器、4個加減法器。
乘法器
加法器
State Machine
Ding Yuan Lin
25
NCKU EE CAD
Post_allocation合成分析

Xlinx合成結果使用了2個乘法器、1個加減法器。
Mux1
Mux2
乘法器
Mux4
Mux3
加減法器
State Machine
Ding Yuan Lin
26
NCKU EE CAD
Post sim

Post_sim 後的結果 也符合預期
Ding Yuan Lin
27