Chapter 2 - TMS320C6000 Architectural Overview

Transcript Chapter 2 - TMS320C6000 Architectural Overview

Chapter 2
TMS320C6000 Architectural
Overview
Learning Objectives




Chapter 2, Slide 2
Describe C6000 CPU architecture.
Introduce some basic instructions.
Describe the C6000 memory map.
Provide an overview of the peripherals.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
General DSP System Block Diagram
Internal Memory
Internal Buses
External
Memory
Central
Processing
Unit
Chapter 2, Slide 3
P
E
R
I
P
H
E
R
A
L
S
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Implementation of Sum of Products (SOP)
It has been shown in
Chapter 1 that SOP is the
key element for most DSP
algorithms.
So let’s write the code for
this algorithm and at the
same time discover the
C6000 architecture.
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Chapter 2, Slide 4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Implementation of Sum of Products (SOP)
So let’s implement the SOP
algorithm!
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The implementation in this
module will be done in
assembly.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Chapter 2, Slide 5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Multiply (MPY)
N
Y = 
an * xn
n = 1
= a1 * x1 + a2 * x2 +... + aN * xN
The multiplication of a1 by x1 is done in
assembly by the following instruction:
MPY
a1, x1, Y
This instruction is performed by a
multiplier unit that is called “.M”
Chapter 2, Slide 6
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Multiply (.M unit)
40
Y = 
an * xn
n = 1
.M
The . M unit performs multiplications in
hardware
MPY
.M
a1, x1, Y
Note: 16-bit by 16-bit multiplier provides a 32-bit result.
32-bit by 32-bit multiplier provides a 64-bit result.
Chapter 2, Slide 7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Addition (.?)
40
Y = 
an * xn
n = 1
.M
.?
Chapter 2, Slide 8
MPY
.M
a1, x1, prod
ADD
.?
Y, prod, Y
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Add (.L unit)
40
Y = 
an * xn
n = 1
.M
.L
MPY
.M
a1, x1, prod
ADD
.L
Y, prod, Y
RISC processors such as the C6000 use registers to
hold the operands, so lets change this code.
Chapter 2, Slide 9
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Register File - A
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
a1, x1, prod
ADD
.L
Y, prod, Y
A15
32-bits
Let us correct this by replacing a, x, prod and Y by the
registers as shown above.
Chapter 2, Slide 10
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Specifying Register Names
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
A15
32-bits
The registers A0, A1, A3 and A4 contain the values to be
used by the instructions.
Chapter 2, Slide 11
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Specifying Register Names
40
Register File A
A0
A1
a1
x1
A2
A3
Y = 
an * xn
n = 1
prod
Y
.M
.
.
.
.L
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
A15
32-bits
Register File A contains 16 registers (A0 -A15) which
are 32-bits wide.
Chapter 2, Slide 12
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data loading
Register File A
A0
A1
a1
x1
A2
A3
Q: How do we load the
operands into the registers?
prod
Y
.M
.
.
.
.L
A15
32-bits
Chapter 2, Slide 13
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Load Unit “.D”
Register File A
A0
A1
a1
x1
A2
A3
Q: How do we load the
operands into the registers?
prod
Y
.M
.
.
.
.L
A: The operands are loaded
into the registers by loading
them from the memory
using the .D unit.
.D
A15
32-bits
Data Memory
Chapter 2, Slide 14
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Load Unit “.D”
Register File A
A0
A1
a1
x1
A2
A3
It is worth noting at this
stage that the only way to
access memory is through the
.D unit.
prod
Y
.M
.
.
.
.L
.D
A15
32-bits
Data Memory
Chapter 2, Slide 15
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Load Instruction
Register File A
A0
A1
a1
x1
A2
A3
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
prod
Y
.M
.
.
.
.L
.D
A15
32-bits
Data Memory
Chapter 2, Slide 16
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Load Instructions (LDB, LDH,LDW,LDDW)
Register File A
A0
A1
a1
x1
A2
A3
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
prod
Y
.M
.
.
.
.L
A: The load instructions.
.D
A15
32-bits
Data Memory
Chapter 2, Slide 17
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
Before using the load unit you
have to be aware that this
processor is byte addressable,
which means that each byte is
represented by a unique
address.
Data
address
00000000
00000002
00000004
00000006
00000008
Also the addresses are 32-bit
wide.
FFFFFFFF
16-bits
Chapter 2, Slide 18
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
Where:
Rn is a register that contains
the address of the operand to
be loaded
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
and
Rm is the destination register.
FFFFFFFF
16-bits
Chapter 2, Slide 19
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
The question now is how many
bytes are going to be loaded
into the destination register?
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
FFFFFFFF
16-bits
Chapter 2, Slide 20
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
The syntax for the load
instruction is:
LD *Rn,Rm
The answer, is that it depends on
the instruction you choose:
Data
address
a1
x1
00000000
00000002
00000004
00000006
00000008
prod
Y
• LDB: loads one byte (8-bit)
• LDH: loads half word (16-bit)
• LDW: loads a word (32-bit)
• LDDW: loads a double word (64-bit)
Note: LD on its own does not
exist.
Chapter 2, Slide 21
FFFFFFFF
16-bits
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
The syntax for the load
instruction is:
Data
1
0
0xA
0xB
0xC
0xD
0x2
0x1
Example:
0x4
0x3
If we assume that A5 = 0x4 then:
0x6
0x5
(1) LDB *A5, A7 ; gives A7 = 0x00000001
0x8
0x7
LD *Rn,Rm
address
00000000
00000002
00000004
00000006
00000008
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
FFFFFFFF
16-bits
Chapter 2, Slide 22
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Using the Load Instructions
The syntax for the load
instruction is:
address
Data
LD *Rn,Rm
Question:
If data can only be accessed by the
load instruction and the .D unit,
how can we load the register
pointer Rn in the first place?
0xA
0xB
0xC
0xD
0x2
0x1
0x4
0x3
0x6
0x5
0x8
0x7
00000000
00000002
00000004
00000006
00000008
FFFFFFFF
16-bits
Chapter 2, Slide 23
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Loading the Pointer Rn

The instruction MVKL will allow a
move of a 16-bit constant into a register
as shown below:
MVKL
.?
a, A5
(‘a’ is a constant or label)

How many bits represent a full address?
32 bits

So why does the instruction not allow a
32-bit move?
All instructions are 32-bit wide (see
instruction opcode).
Chapter 2, Slide 24
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Loading the Pointer Rn

To solve this problem another instruction
is available:
MVKH
eg.
MVKH
.?
a, A5
(‘a’ is a constant or label)

Chapter 2, Slide 25
ah
al
a
ah
x
A5
Finally, to move the 32-bit address to a
register we can use:
MVKL
a, A5
MVKH
a, A5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Loading the Pointer Rn

Always use MVKL then MVKH, look at
the following examples:
Example 1
A5 = 0x87654321
MVKL
0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
MVKH
0x1234FABC, A5
A5 = 0x1234FABC ; OK
Example 2
MVKH
A5 = 0x12344321
Chapter 2, Slide 26
0x1234FABC, A5
MVKL
0x1234FABC, A5
A5 = 0xFFFFFABC ; Wrong
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
LDH, MVKL and MVKH
Register File A
A0
A1
a
x
A2
A3
A4
prod
Y
.M
.
.
.
.L
.D
A15
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
32-bits
pt1 and pt2 point to some locations
Data Memory
Chapter 2, Slide 27
in the data memory.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
So let’s create a loop
so that we can
implement the SOP
for N Taps.
Chapter 2, Slide 28
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Creating a loop
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a1 * x1
With the C6000 processors
there are no dedicated
instructions such as block
repeat. The loop is created
using the B instruction.
So let’s create a loop
so that we can
implement the SOP
for N Taps.
Chapter 2, Slide 29
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
What are the steps for creating a loop
1. Create a label to branch to.
2. Add a branch instruction, B.
3. Create a loop counter.
4. Add an instruction to decrement the loop counter.
5. Make the branch conditional based on the value in
the loop counter.
Chapter 2, Slide 30
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
1. Create a label to branch to
loop
Chapter 2, Slide 31
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
2. Add a branch instruction, B.
loop
Chapter 2, Slide 32
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.?
loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Which unit is used by the B instruction?
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
MVKL
MVKH
pt1, A5
pt1, A5
MVKL
MVKH
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.?
loop
32-bits
Data Memory
Chapter 2, Slide 33
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Which unit is used by the B instruction?
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
pt2, A6
pt2, A6
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.S
loop
32-bits
Data Memory
Chapter 2, Slide 34
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
3. Create a loop counter.
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
MVKL .S
pt2, A6
pt2, A6
count, B0
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
B
.S
loop
B registers will be introduced later
Data Memory
Chapter 2, Slide 35
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
4. Decrement the loop counter
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
loop
.D
.D
A15
32-bits
MVKL .S
MVKH .S
pt1, A5
pt1, A5
MVKL .S
MVKH .S
MVKL .S
pt2, A6
pt2, A6
count, B0
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Data Memory
Chapter 2, Slide 36
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Make the branch conditional based on the
value in the loop counter

What is the syntax for making instruction
conditional?
[condition]
Instruction
[B1]
loop
Label
e.g.
B
(1) The condition can be one of the following
registers: A1, A2, B0, B1, B2.
(2) Any instruction can be conditional.
Chapter 2, Slide 37
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Make the branch conditional based on the
value in the loop counter

The condition can be inverted by adding the
exclamation symbol “!” as follows:
[!condition]
Instruction
Label
[!B0]
B
loop ;branch if B0 = 0
[B0]
B
loop ;branch if B0 != 0
e.g.
Chapter 2, Slide 38
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Make the branch conditional
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
Register File A
A0
A1
a
x
.S
prod
Y
.M
.M
.
.
.
.L
.L
A2
A3
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop
.D
.D
A15
32-bits
[B0]
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Data Memory
Chapter 2, Slide 39
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
More on the Branch Instruction (1)
 With this processor all the instructions are
encoded in a 32-bit.
 Therefore the label must have a dynamic range
of less than 32-bit as the instruction B has to be
coded.
32-bit
B
 Case 1:
21-bit relative address
B .S1
label
 Relative branch.
 Label limited to +/- 220 offset.
Chapter 2, Slide 40
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
More on the Branch Instruction (2)
 By specifying a register as an operand instead of
a label, it is possible to have an absolute branch.
 This will allow a dynamic range of 232.
32-bit
5-bit register
B
 Case 2:
code
B .S2
register
 Absolute branch.
 Operates on .S2 ONLY!
Chapter 2, Slide 41
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Testing the code
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code performs the following
operations:
loop
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
However, we would like to perform:
ADD
.L
A4, A3, A4
a0*x0 + a1*x1 + a2*x2 + … + aN*xN
SUB
.S
B0, 1, B0
B
.S
loop
a0*x0 + a0*x0 + a0*x0 + … + a0*x0
[B0]
Chapter 2, Slide 42
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Modifying the pointers
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
The solution is to modify the pointers
loop
A5 and A6.
[B0]
Chapter 2, Slide 43
LDH
.D
*A5, A0
LDH
.D
*A6, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Indexing Pointers
Syntax
Description
*R
Pointer
Pointer
Modified
No
In this case the pointers are used but not modified.
R can be any register
Chapter 2, Slide 44
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
Pointer
+ Pre-offset
- Pre-offset
Pointer
Modified
No
No
No
In this case the pointers are modified BEFORE being used
and RESTORED to their previous values.



[disp] specifies the number of elements size in DW (64-bit), W
(32-bit), H (16-bit), or B (8-bit).
disp = R or 5-bit constant.
R can be any register.
Chapter 2, Slide 45
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Pointer
Modified
No
No
No
Yes
Yes
In this case the pointers are modified BEFORE being used
and NOT RESTORED to their Previous Values.
Chapter 2, Slide 46
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes
In this case the pointers are modified AFTER being used
and NOT RESTORED to their Previous Values.
Chapter 2, Slide 47
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Indexing Pointers
Syntax
Description
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement



Chapter 2, Slide 48
Pointer
Modified
No
No
No
Yes
Yes
Yes
Yes
[disp] specifies # elements - size in DW, W, H, or B.
disp = R or 5-bit constant.
R can be any register.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Modify and testing the code
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code now performs the following
loop
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
[B0]
Chapter 2, Slide 49
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
This code now performs the following
loop
operations:
a0*x0 + a1*x1 + a2*x2 + ... + aN*xN
[B0]
Chapter 2, Slide 50
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop
The Pointer A7 has not been initialised.
[B0]
Chapter 2, Slide 51
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Store the final result
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 pt3, A7
MVKH .S2 pt3, A7
MVKL .S2 count, B0
The Pointer A7 is now initialised.
loop
[B0]
Chapter 2, Slide 52
LDH
.D
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
What is the initial value of A4?
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
A4 is used as an accumulator,
so it needs to be reset to zero.
loop
[B0]
Chapter 2, Slide 53
MVKL
MVKH
MVKL
ZERO
LDH
.S2
.S2
.S2
.L
.D
pt3, A7
pt3, A7
count, B0
A4
*A5++, A0
LDH
.D
*A6++, A1
MPY
.M
A0, A1, A3
ADD
.L
A4, A3, A4
SUB
.S
B0, 1, B0
B
.S
loop
STH
.D
A4, *A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Increasing the processing power!
Register File A
A0
A1
A2
A3
A4
.S1
.M1
.
.
.
.L1
How can we add
more processing
power to this
processor?
.D1
A15
32-bits
Data Memory
Chapter 2, Slide 54
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Increasing the processing power!
Register File A
A0
A1
A2
A3
A4
.S1
.M1
.
.
.
(1) Increase the clock
frequency.
(2) Increase the number
of Processing units.
.L1
.D1
A15
32-bits
Data Memory
Chapter 2, Slide 55
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
To increase the Processing Power, this processor has two
sides (A and B or 1 and 2)
Register File A
A0
A1
A2
A3
A4
.
.
.
A15
Register File B
.S1
.S2
.M1
.M2
.L1
.L2
.D1
.D2
32-bits
B0
B1
B2
B3
B4
.
.
.
B15
32-bits
Data Memory
Chapter 2, Slide 56
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Can the two sides exchange operands in order to increase
performance?
Register File A
A0
A1
A2
A3
A4
.
.
.
A15
Register File B
.S1
.S2
.M1
.M2
.L1
.L2
.D1
.D2
32-bits
B0
B1
B2
B3
B4
.
.
.
B15
32-bits
Data Memory
Chapter 2, Slide 57
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
The answer is YES but there are limitations.

To exchange operands between the two
sides, some cross paths or links are
required.
What is a cross path?

A cross path links one side of the CPU to
the other.

There are two types of cross paths:
Chapter 2, Slide 58

Data cross paths.

Address cross paths.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data Cross Paths

Data cross paths can also be referred to
as register file cross paths.

These cross paths allow operands from
one side to be used by the other side.

There are only two cross paths:
Chapter 2, Slide 59

one path which conveys data from side B
to side A, 1X.

one path which conveys data from side A
to side B, 2X.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
TMS320C67x Data-Path
Chapter 2, Slide 60
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data Cross Paths

Data cross paths only apply to the .L, .S
and .M units.

The data cross paths are very useful,
however there are some limitations in
their use.
Chapter 2, Slide 61
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data Cross Path Limitations
<dst>
.L1
.M1
.S1
A
<src>
<src>
2x
(1) The destination register must be
on same side as unit.
(2) Source registers - up to one cross
path per execute packet per side.
1x
B
Execute packet: group of instructions that
execute simultaneously.
Chapter 2, Slide 62
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data Cross Path Limitations
<dst>
eg:
ADD
MPY
SUB
|| ADD
.L1
.M1
.S1
.L1x
.M1x
.S1x
.L1x
A
<src>
<src>
A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2
2x
1x
B
|| Means that the SUB and ADD
belong to the same fetch packet,
therefore execute
simultaneously.
Chapter 2, Slide 63
Dr. Naim Dahnoun, Bristol University,
(c) Texas Instruments 2004
Data Cross Path Limitations
<dst>
eg:
ADD
MPY
SUB
|| ADD
.L1
.M1
.S1
.L1x
.M1x
.S1x
.L1x
A
<src>
<src>
A0,A1,B2
A0,B6,A9
A8,B2,A8
A0,B0,A2
2x
1x
B
NOT VALID!
Chapter 2, Slide 64
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Data Cross Paths for both sides
<dst>
<dst>
Chapter 2, Slide 65
.L1
.M1
.S1
.L2
.M2
.S2
A
<src>
<src>
2x
<src>
<src>
1x
B
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Address cross paths
Data
Addr
A
.D1
(1) The pointer must be on the same
side of the unit.
LDW.D1T1 *A0,A5
STW.D1T1 A5,*A0
Chapter 2, Slide 66
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Load or store to either side
Data1
DA1 = T1
DA2 = T2
Data2
Chapter 2, Slide 67
A5
.D1
*A0
LDW.D1T1 *A0,A5
LDW.D1T2 *A0,B5
A
B
B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Standard Parallel Loads
Data1
DA1 = T1
DA2 = T2
A5
.D1
*A0
.D2
*B0
A
B
LDW.D1T1 *A0,A5
|| LDW.D2T2 *B0,B5
Chapter 2, Slide 68
B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Parallel Load/Store using address cross paths
Data1
DA1 = T1
DA2 = T2
A5
.D1
*A0
.D2
*B0
A
B
LDW.D1T2 *A0,B5
|| STW.D2T1 A5,*B0
Chapter 2, Slide 69
B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Fill the blanks ... Does this work?
Data1
DA1 = T1
DA2 = T2
.D1
*A0
.D2
*B0
A
B
LDW.D1__ *A0,B5
|| STW.D2__ B6,*B0
Chapter 2, Slide 70
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Not Allowed!
Parallel accesses: both cross or neither cross
Data1
DA2 = T2
.D1
*A0
.D2
*B0
B5
A
B
LDW.D1T2 *A0,B5
B6
|| STW.D2T2 B6,*B0
Chapter 2, Slide 71
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Conditions Don’t Use Cross Paths


If a conditional register comes from the
opposite side, it does NOT use a
data or address cross-path.
Examples:
[B2]
[A1]
Chapter 2, Slide 72
ADD
LDW
.L1
.D2
A2,A0,A4
*B0,B5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
‘C62x Data-Path Summary
CPU
Ref Guide
Full CPU Datapath
(Pg 2-2)
Chapter 2, Slide 73
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
‘C67x Data-Path Summary
Chapter 2, Slide 74
‘C67x
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Cross Paths - Summary


Data





Address




Chapter 2, Slide 75
Destination register on same side as unit.
Source registers - up to one cross path per
execute packet per side.
Use “x” to indicate cross-path.
Pointer must be on same side as unit.
Data can be transferred to/from either side.
Parallel accesses: both cross or neither cross.
Conditionals Don’t Use Cross Paths.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Code Review (using side A only)
40
Y = 
n = 1
MVK
loop: LDH
LDH
MPY
ADD
SUB
[A2] B
STH
.S1
.D1
.D1
.M1
.L1
.L1
.S1
.D1
an * xn
40, A2
*A5++, A0
*A6++, A1
A0, A1, A3
A3, A4, A4
A2, 1, A2
loop
A4, *A7
; A2 = 40, loop count
; A0 = a(n)
; A1 = x(n)
; A3 = a(n) * x(n)
; Y = Y + A3
; decrement loop count
; if A2  0, branch
; *A7 = Y
Note: Assume that A4 was previously cleared and the pointers are initialised.
Chapter 2, Slide 76
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Let us have a look at the final details
concerning the functional units.
Consider first the case of the .L and .S
units.
Chapter 2, Slide 77
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant

Operands can be:




5-bit constants (or 16-bit for MVKL and MVKH).
32-bit registers.
40-bit Registers.
However, we have seen that registers are only
32-bit.
So where do the 40-bit registers come from?
Chapter 2, Slide 78
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant

A 40-bit register can be obtained by
concatenating two registers.

However, there are 3 conditions that need
to be respected:
Chapter 2, Slide 79

The registers must be from the same side.

The first register must be even and the second
odd.

The registers must be consecutive.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant

All combinations of 40-bit registers are
shown below:
40-bit Reg
:
odd
even
8
Chapter 2, Slide 80
32
40-bit Reg
:
odd
even
8
32
A1:A0
B1:B0
A3:A2
B3:B2
A5:A4
B5:B4
A7:A6
B7:B6
A9:A8
B9:B8
A11:A10
B11:B10
A13:A12
B13:B12
A15:A14
B15:B14
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
.unit
32-bit
Reg
<src>, <src>, <dst>
5-bit
Const
32-bit
Reg
40-bit
Reg
< src >
< src >
.L or .S
< dst >
32-bit
Reg
Chapter 2, Slide 81
40-bit
Reg
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
<src>, <src>, <dst>
32-bit
Reg
40-bit
Reg
< src >
< src >
.L or .S
< dst >
32-bit
Reg
Chapter 2, Slide 82
40-bit
Reg
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
<src>, <src>, <dst>
32-bit
Reg
40-bit
Reg
< src >
< src >
.L or .S
OR.L1
A0, A1, A2
< dst >
32-bit
Reg
Chapter 2, Slide 83
40-bit
Reg
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
32-bit
Reg
.L or .S
< dst >
Chapter 2, Slide 84
40-bit
Reg
< src >
< src >
32-bit
Reg
<src>, <src>, <dst>
OR.L1
ADD.L2
A0, A1, A2
-5, B3, B4
40-bit
Reg
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
32-bit
Reg
.L or .S
< dst >
Chapter 2, Slide 85
40-bit
Reg
< src >
< src >
32-bit
Reg
<src>, <src>, <dst>
OR.L1
ADD.L2
ADD.L1
A0, A1, A2
-5, B3, B4
A2, A3, A5:A4
40-bit
Reg
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
32-bit
Reg
.L or .S
< dst >
Chapter 2, Slide 86
40-bit
Reg
< src >
< src >
32-bit
Reg
<src>, <src>, <dst>
40-bit
Reg
OR.L1
ADD.L2
ADD.L1
SUB.L1
A0,
-5,
A2,
A2,
A1, A2
B3, B4
A3, A5:A4
A5:A4, A5:A4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Operands - 32/40-bit Register, 5-bit Constant
instr
32-bit
Reg
5-bit
Const
.unit
32-bit
Reg
.L or .S
< dst >
Chapter 2, Slide 87
40-bit
Reg
< src >
< src >
32-bit
Reg
<src>, <src>, <dst>
40-bit
Reg
OR.L1
ADD.L2
ADD.L1
SUB.L1
ADD.L2
A0, A1, A2
-5, B3, B4
A2, A3, A5:A4
A2, A5:A4, A5:A4
3, B9:B8, B9:B8
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Register to register data transfer


Chapter 2, Slide 88
To move the content of a register (A or B)
to another register (B or A) use the move
“MV” Instruction, e.g.:
MV
A0, B0
MV
B6, B7
To move the content of a control register
to another register (A or B) or vice-versa
use the MVC instruction, e.g.:
MVC
IFR, A0
MVC
A0, IRP
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
TMS320C6000 Instruction Set
Chapter 2, Slide 89
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C62x Instruction Set (by category)
Arithmetic
Logical
ABS
ADD
ADDA
ADDK
ADD2
MPY
MPYH
NEG
SMPY
SMPYH
SADD
SAT
SSUB
SUB
SUBA
SUBC
SUB2
ZERO
AND
CMPEQ
CMPGT
CMPLT
NOT
OR
SHL
SHR
SSHL
XOR
Bit Mgmt
CLR
EXT
LMBD
NORM
SET
Data Mgmt
LDB/H/W
MV
MVC
MVK
MVKL
MVKH
MVKLH
STB/H/W
Program Ctrl
B
IDLE
NOP
Note: Refer to the 'C6000 CPU Reference Guide for more details.
Chapter 2, Slide 90
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C62x Instruction Set (by unit)
.L Unit
.S Unit
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH
MVKLH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
.M Unit
MPY
MPYH
SMPY
SMPYH
Other
NOP
Chapter 2, Slide 91
IDLE
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
.D Unit
ADD
ADDA
LDB/H/W
MV
NEG
STB/H/W
SUB
SUBA
ZERO
Note: Refer to the 'C6000 CPU
Reference Guide for more details.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
' C6700: Superset of Fixed-Point (by unit)
.S Unit
.S
.L
.D
ADD
ADDK
ADD2
AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH
NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO
ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP
.D Unit
.M
Chapter 2, Slide 92
ADD
NEG
ADDAB (B/H/W) STB
(B/H/W)
ADDAD
SUB
LDB
(B/H/W) SUBAB (B/H/W)
LDDW
ZERO
MV
.L Unit
ABS
ADD
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM
NOT
OR
SADD
SAT
SSUB
SUB
SUBC
XOR
ZERO
ADDSP
ADDDP
SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP
‘C67x
.M Unit
MPY
MPYH
MPYLH
MPYHL
SMPY
SMPYH
MPYSP
MPYDP
MPYI
MPYID
No Unit Used
NOP
IDLE
Note: Refer to the 'C6000 CPU
Reference Guide for more details.
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Superset of Fixed-Point
Control Registers
Instruction Dispatch
Advanced Instruction
Packing
Instruction Decode
Emulation
Advanced
Emulation
Interrupt
Control
Instruction Fetch
Registers (A0 - A15)
Registers (B0 - B15)
Registers (A16 - A31)
Registers (B16 - B31)
L1
+
+
+
+
S1
+
+
+
+
M1
x
x
x
x
X
D1
+
+
X
D2
+
M2
X
+
X
S2
L2
+
+
x +
x
+
x
+
+
x
+
+
‘C62x: Dual 32-Bit Load/Store
‘C64x: Dual 64-Bit Load/Store
‘C67x: Dual 64-Bit Load/32-Bit Store
Chapter 2, Slide 93
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C64x: Superset of ‘C62x
.S
.D
Dual/Quad Arith
SADD2
SADDUS2
SADD4
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
Bitwise Logical UNPKHU4
ANDN
UNPKLU4
Shifts & Merge SWAP2
SPACK2
SHR2
SPACKU4
SHRU2
SHLMB
SHRMB
Dual Arithmetic Mem Access
ADD2
LDDW
SUB2
LDNW
LDNDW
Bitwise Logical STDW
AND
STNW
ANDN
STNDW
OR
XOR
Load Constant
MVK (5-bit)
Address Calc.
ADDAD
Chapter 2, Slide 94
Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4
.L
Branches/PC
BDEC
BPOS
BNOP
ADDKPC
Dual/Quad Arith
ABS2
ADD2
ADD4
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN
.M
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR
Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4
Multiplies
MPYHI
Shift & Merge
MPYLI
SHLMB
MPYHIR
SHRMB
MPYLIR
Load Constant
MPY2
MVK (5-bit)
SMPY2
Bit Operations DOTP2
DOTPN2
BITC4
DOTPRSU2
BITR
DOTPNRSU2
DEAL
DOTPU4
SHFL
DOTPSU4
Move
GMPY4
MVD
XPND2/4
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
TMS320C6000 Memory
Chapter 2, Slide 95
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Memory size per device
Devices
C6201,
C6204,
C6701
C6205
Internal
EMIFA
P
D
=
=
64 kB
64 kB
C6202
P
D
=
=
256 kB
128 kB
C6203
P
D
C6211
C6711
=
=
52M Bytes
(32-bits wide)
EMIFB
N/A
384 kB
512 kB
128M Bytes
(32-bits wide)
L1P
L1D
L2
=
=
=
4 kB
4 kB
64 kB
C6713
L1P
L1D
L2
=
=
=
4 kB
4 kB
256 kB
128M Bytes
(32-bits wide)
N/A
C6411
DM642
L1P
L1D
L2
=
=
=
16 kB
16 kB
256 kB
128M Bytes
(32-bits wide)
N/A
C6414
C6415
C6416
L1P
L1D
L2
=
=
=
16 kB
16 kB
1 MB
256M Bytes
(64-bits wide)
C6712
Chapter 2, Slide 96
N/A
64M Bytes
(16-bits wide)
64M Bytes
(16-bits wide)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Internal Memory Summary
Devices
Internal
(L2)
External
C6211
C6711
C6713
64 kB
512M
(32-bit wide)
C6712
256 kB
512M
(16-bit wide)
Devices
Internal
(L2)
C6414
C6415
C6416
1 MB
DM642
256 kB
C6411
256 kB
External
A: 1GB (64-bit)
B: 256kB (16-bit)
1GB (64-bit)
256MB (32-bit)
LINK: TMS320C6000 DSP Generation
Chapter 2, Slide 97
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
TMS320C6000 Peripherals
Chapter 2, Slide 98
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C6x System Block Diagram
Memory
External
Memory
Internal Buses
.M1 .M2
.L1 .L2
.S1 .S2
Control Regs
Regs (B0-B15)
Regs (A0-A15)
.D1 .D2
P
E
R
I
P
H
E
R
A
L
S
CPU
Chapter 2, Slide 99
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
‘C6x Internal Buses
Internal
Memory
A
D
External
Interface
A
D
x32
Peripherals
A
D
‘C67x
x32
Program Addr
x32
Program Data
x256
Data Addr - T1
x32
Data Data - T1
x32/64
Data Addr - T2
x32
Data Data - T2
x32/64
DMA Addr - Read
x32
DMA Data - Read
x32
DMA Addr - Write
x32
DMA Data - Write
x32
PC
A
regs
B
regs
DMA
can perform 64-bit data loads.
Chapter 2, Slide 100
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C6x System Block Diagram
Memory
Internal Buses
EMIF
.M1 .M2
.L1 .L2
.S1 .S2
Control Regs
Regs (B0-B15)
Regs (A0-A15)
Ext’l
Memory
.D1 .D2
P
E
R
I
P
H
E
R
A
L
S
CPU
Chapter 2, Slide 101
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C6x System Block Diagram
Program
RAM
Data Ram
Addr
Internal Buses
D (32)
EMIF
- Async
.M1 .M2
.L1 .L2
.S1 .S2
Control Regs
Regs (B0-B15)
- Sync
Regs (A0-A15)
Ext’l
Memory
.D1 .D2
P
E
R
I
P
H
E
R
A
L
S
CPU
Chapter 2, Slide 102
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C6000 Peripherals
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
Ethernet
Video Ports
VCP / TCP
.D1 .D2
.M1 .M2
.L1 .L2
Register Set B
Timers
Register Set A
DMA, EDMA
(Boot)
.S1 .S2
CPU
PLL
Chapter 2, Slide 103
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
EMIF
Async
SDRAM
EMIF
Internal Buses
.D1 .D2
.M1 .M2
.L1 .L2
Register Set B
Register Set A
SBSRAM
Internal
Memory
External Memory Interface (EMIF).S1 .S2




Glueless access to async/sync memory
CPU
Works with PC100/133 SDRAM (cheap, fast, and easy!)
Byte-wide data access
16, 32, or 64-bit bus widths
Chapter 2, Slide 104
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
HPI / XBUS / PCI
Parallel
Comm
External
Memory
EMIF
Internal
Memory
Internal Buses
Parallel Communication Interfaces
.M1 .M2
HPI:
Register Set B
Register Set A
.D1 .D2
Dedicated, slave-only, async 16/32-bit bus allows
host-P access to C6000 memory .L1 .L2
XBUS: Similar to HPI but provides …
.S1 .S2
 Master/slave and sync modes
 Glueless i/f to FIFOs (up to single-cycle
CPUxfer rate)
PCI:
Standard 32-bit, 33MHz/66MHz PCI interface
These interfaces provide means to bootstrap the C6000
Chapter 2, Slide 105
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
GPIO
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
.M1 .M2
.L1 .L2
Register Set B
Register Set A
.D1 .D2
.S1 .S2
General Purpose Input/Output (GPIO)
CPU
 C64x and C6713 provide 8-16 bits of general purpose bit I/O
 Use to observe or control the signal of a single-pin
Chapter 2, Slide 106
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
McBSP and Utopia
Multi-Channel
Buffered Serial Port (McBSP)
Parallel
Internal
 2 (or 3) full-duplex,
synchronous serial-ports
Comm
Memory
 Up to 100 Mb/sec performance
GPIO operation (T1, E1, MVIP, …)
 Supports multi-channel
External
Memory
EMIF
Internal Buses
Serial
Register Set B
Register Set A
.D1 .D2
Multi-Channel Audio Serial Port.M1
(McASP)
.M2




McBSP features plus more …
Up to 8 stereo lines (16 channels)
IIC support
On DM642, C6713
.L1 .L2
.S1 .S2
CPU
Utopia (C64x)
 ATM connection
 50 MHz wide area network connectivity
Chapter 2, Slide 107
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
DMA / EDMA
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
.D1 .D2
.M1 .M2
.L1 .L2
Direct Memory Access (DMA / EDMA)





Transfers any set of memory locations to another
.S1 .S2
4 / 16 / 64 channels (transfer parameter sets)
Transfers can be triggered by any interrupt (sync)
CPU
Operates independent of CPU
On reset, provides bootstrap from memory
Chapter 2, Slide 108
Register Set B
Register Set A
DMA, EDMA
(Boot)
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Timer/Counter
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
Timers
Timer / Counter
.M1 .M2
.L1 .L2
.S1 .S2
 Two (or three) 32-bit timer/counters
 Can generate interrupts
 Both input and output pins
Chapter 2, Slide 109
.D1 .D2
Register Set B
Register Set A
DMA, EDMA
(Boot)
CPU
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Ethernet MAC
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
Ethernet
Video Ports
Ethernet (DM642 only)
VCP MAC
/ TCP
 10/100 Ethernet
.D1 .D2
.M1 .M2
.L1 .L2
Register Set B
Timers
Register Set A
DMA, EDMA
(Boot)
.S1 .S2
CPU
 Pins are muxedPLL
with PCI
 TCP/IP stack available from TI
Chapter 2, Slide 110
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Video Ports
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial only)
Video Ports (DM642
Register Set B
Each configurable
for Capture or Display .D1 .D2
DMA, EDMA
Dual 8/10-bit BT656 or raw modes
(Boot)
.M1
.M2
16/20-bit raw modes and 20-bit Y/C for high
definition
Timers
Horz Scaling and
Chroma Resampling Support for 8-bit modes
.L1 .L2
Supports transport
interface
mode
Ethernet
Register Set A





Video Ports
VCP / TCP
.S1 .S2
CPU
PLL
Chapter 2, Slide 111
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
VCP / TCP -- 3G Wireless
Parallel
Comm
Internal
Memory
GPIO
External
Internal Buses
EMIF
Memory
Turbo Coprocessor (TCP) (C6416 only)
DMA, EDMA
Viterbi Coprocessor
(VCP) (C6416 only) .M1 .M2
Register Set B
Register Set A
 Supports 35 data channels at 384 kbps
McBSP’s
 3GPP / IS2000 Turbo
coder
.D1and
.D2frame length
Utopia
 Programmable parameters
include mode, rate
(Boot)channels at 8 kbps
 Supports > 500 voice
.L1
.L2 length,
 Programmable decoder
parameters
include
constraint
Timers
code rate, and frame
length
.S1 .S2
Video Ports
VCP / TCP
CPU
PLL
Chapter 2, Slide 112
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Phase Locked Loop (PLL)
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
Timers
(Boot)
 Clock multiplier
Ethernet
 Reduces EMI and
cost
Timers
 Rate is Pin selectable
Video Ports
VCP / TCP
Output.L1 .L2
PLL
 CLKIN
.M1 .M2
Register Set B
Input .D1 .D2
Register Set A
DMA, EDMA
(Boot)
DMA,
EDMA
 CLKOUT1
.S1 .S2
 CLKOUT2
(reduced
rate clkout)
CPU
PLL
Chapter 2, Slide 113
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Clock Cycle
What is a clock cycle?
The time between successive instructions
C6000
CLKOUT1 (C6000 clock cycle)
CLKIN
PLL
CLKOUT2 (½, ¼, or 1/6 CLKOUT1)
When we talk
about cycles ...
CLKIN
(MHz)
PLL
Rate
CPU Clock
Frequency
CPU Clock
Cycle Time
MIPs
(max)
60
x12
720 MHz
1.39 ns
5760
30
x10
300 MHz
3.33 ns
2400
50
x4
200 MHz
5 ns
1600
25
x4
100 MHz
10 ns
800
Chapter 2, Slide 114
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
'C6000 Peripherals Summary
Parallel
Comm
Internal
Memory
GPIO
External
Memory
EMIF
Internal Buses
Serial
Ethernet
Timers
Video Ports
VCP / TCP
.D1 .D2
.M1 .M2
.L1 .L2
Register Set B
Timers
Register Set A
DMA, EDMA
(Boot)
.S1 .S2
CPU
PLL
Chapter 2, Slide 115
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
‘C6x Family Part Numbering

Example = TMS320LC6201PKGA200








Chapter 2, Slide 116
TMS320
L
C6
2
01
PKG
A
200
= TI DSP
= Place holder for voltage levels
= C6x family
= Fixed-point core
= Memory/peripheral configuration
= Pkg designator (actual letters TBD)
= -40 to 85C (blank for 0 to 70C)
= Core CPU speed in Mhz
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Module 1 Exam
1. Functional Units
a. How many can perform an ADD? Name them.
b. Which support memory loads/stores?
.M
2. Memory Map
.S
.D
.L
a. How many external ranges exist on ‘C6201?
Chapter 2, Slide 117
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
3. Conditional Code
a. Which registers can be used as cond’l registers?
b. Which instructions can be conditional?
4. Performance
a. What is the 'C6711 instruction cycle time?
b. How can the 'C6711 execute 1200 MIPs?
Chapter 2, Slide 118
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems
a. Move contents of A0-->A1
Chapter 2, Slide 119
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems
a. Move contents of A0-->A1
b. Move contents of CSR-->A1
c. Clear register A5
Chapter 2, Slide 120
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems (cont’d)
d. A2 = A02 + A1
e. If (B1  0) then B2 = B5 * B6
f. A2 = A0 * A1 + 10
g. Load an unsigned constant (19ABCh) into
register A6.
Chapter 2, Slide 121
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems (cont’d)
h. Load A7 with contents of mem1 and postincrement the selected pointer.
Chapter 2, Slide 122
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Module 1 Exam (solution)
1. Functional Units
a. How many can perform an ADD? Name them.
six; .L1, .L2, .D1, .D2, .S1, .S2
b. Which support memory loads/stores?
.M
2. Memory Map
.S
.D
.L
a. How many external ranges exist on ‘C6201?
Four
Chapter 2, Slide 123
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
3. Conditional Code
a. Which registers can be used as cond’l registers?
A1, A2, B0, B1, B2
b. Which instructions can be conditional?
All of them
4. Performance
a. What is the 'C6711 instruction cycle time?
CLKOUT1
b. How can the 'C6711 execute 1200 MIPs?
1200 MIPs = 8 instructions (units) x 150 MHz
Chapter 2, Slide 124
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems
a. Move contents of A0-->A1
or
or
MV .L1
ADD .S1
MPY .M1
A0, A1
A0, 0, A1
A0, 1, A1
(what’s the problem
with this?)
Chapter 2, Slide 125
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems
a. Move contents of A0-->A1
or
or
MV .L1
ADD .S1
MPY .M1
A0, A1
A0, 0, A1
A0, 1, A1
(A0 can only be a
16-bit value)
b. Move contents of CSR-->A1
MVC CSR, A1
c. Clear register A5
or
or
or
or
or
Chapter 2, Slide 126
ZERO .S1
SUB .L1
MPY .M1
CLR .S1
MVK .S1
XOR .L1
A5
A5, A5, A5
A5, 0, A5
A5, 0, 31, A5
0, A5
A5,A5,A5
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems (cont’d)
d. A2 = A02 + A1
MPY.M1
ADD.L1
A0, A0, A2
A2, A1, A2
e. If (B1  0) then B2 = B5 * B6
[B1] MPY.M2
B5, B6, B2
f. A2 = A0 * A1 + 10
MPY
A0, A1, A2
ADD
10, A2, A2
g. Load an unsigned constant (19ABCh) into
register A6.
value .equ
0x00019abc
mvkl .s1 0x00019abc,a6
mvkl.s1 value,a6
mvkh .s1 0x00019abc,a6
mvkh.s1 value,a6
Chapter 2, Slide 127
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
5. Coding Problems (cont’d)
h. Load A7 with contents of mem1 and postincrement the selected pointer.
x16 mem
A7
mem1
load_mem1:
Chapter 2, Slide 128
10h
MVKL
.S1
mem1,
A6
MVKH
.S1
mem1,
A6
LDH
.D1
*A6++, A7
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Architecture

Links:





Chapter 2, Slide 129
C6711 data sheet: tms320c6711.pdf
C6713 data sheet: tms320c6713.pdf
C6416 data sheet: tms320c6416.pdf
User guide: spru189f.pdf
Errata: sprz173c.pdf
Dr. Naim Dahnoun, Bristol University, (c) Texas Instruments 2004
Chapter 2
TMS320C6000 Architectural
Overview
- End -

Chapter 2 - TMS320C6000 Architectural Overview

Transcript Chapter 2 - TMS320C6000 Architectural Overview

Directory