Data Manipulation 國立清華大學資訊工程學系 CS1356 2016/5/23

Download Report

Transcript Data Manipulation 國立清華大學資訊工程學系 CS1356 2016/5/23

CS1356 資訊工程導論
Data Manipulation
國立清華大學資訊工程學系
2016/5/23
What is a Computer?
• Monitor, case, keyboard, mouse, speaker,
scanner, webcam, printer, …
What’s inside?
2
Inside the Case
• CPU, motherboard, adaptors, hard disk,
memory, CDROM, …
We are going to talk about those.
3
Central Processing Unit (CPU)
• An electronic circuit that can execute
computer programs
– Intel i7
– AMD K10
– IBM Cell
– ARM Acorn
– Sun SPARC
• To understand CPU, we need to know what
computer programs are.
4
Outline
•
•
•
•
•
Store program concept
Machine language
Program execution
Peripheral devices
Parallel architectures
5
Stored Program Concept
(pp. 102)
"The final major step in the development of
the general purpose electronic computer
was the idea of a stored program..."
Brian Randell
6
What’re the Differences?
TV: you can watch
different channels.
麵包機: you can
make different food.
Swiss knife: you can use
different tools
Computer: you can …
7
Magic box
• You can add more functions to it. How?
– Program is like data to be input to computers.
• It can perform multiple functions at a time
– We will talk about this in the OS lesson.
8
A Generic Recipe
•
•
•
•
Different
Ingredient: a, b, c, …
The same
Tools: 鍋、爐、刀…
Basic operations: 切、洗、炒、煮…
Procedure:
– A sequence of instructions
Different
using basic operations on ingredients
or intermediate products.
• Output: dish x
Can we treat the
procedure as an input,
just like ingredients?
9
First Try
• e.g. 麵包機
Fixed procedures
to choose from
10
How about Programmable?
• How to tell the machine to do the
procedure that we invent?
11
Ideal: A Universal Cooking
Machine
• Input:
– Ingredient a, b, c, …
– Instruction 1, 2, 3, …
• Tool: universal cooking machine
– Can read instructions and execute them step
by step  programmable.
– Have all tools and ability to perform basic
operations.
• Output: dish x
12
Analog in Computers
• A 麵包機-like computer
Play MP3
Play movie
Data
Data
Memory
Memory
13
Analog in Computer
• Universal cooking machine-like computer
Instr. for
playing
MP3
Instr. for
playing
movie
Data
Memory
14
For Computers
• Data are stored as 0 and 1
• Instructions are also expressed and stored
as 0 and 1
• Why not put them together?
 in memory
15
Stored-Program Concept
• Program: a sequence of instructions
• Stored-program concept:
– A program can be encoded as bit patterns
and stored in main memory, just like data.
– From there, the CPU can fetch the
instructions and execute them.
• Advantage: programmable
– We can use a single machine to perform
different functions by loading different prog.
16
A Stored-Program
Universal Cooking Machine?
Problems
• How to convert instructions to operations?
– This is like Harry Porter’s spell.
• There should be a control unit.
– To control which function to perform.
– To control which data to be operated.
– How can the control unit understand the
instructions?
• What function units should be included?
– CD players, game console, calculators, …?
18
Outline of the Magic Box
Processing
unit
Control
unit
Belt
Storage unit for
instructions and
data
19
von Neumann Architecture
• General purpose electronic computer
Processing tools
Small but
fast
temporary
storage
Fig. 2.1
Large
temporary
storage for
data and
instructions
20
Machine Language
(Sec. 2.2)
What to do
+
Specified information
21
Computer Programs
High Level
Language Program
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
You are learning
it in CS1355
Compiler
Assembly
Language Program
Assembler
Machine Language
Program
Machine
Interpretation
Control Signal
Specification
lw
lw
sw
sw
0000
1010
1100
0101
1001
1111
0110
1000
$15,
$16,
$16,
$15,
1100
0101
1010
0000
0($2)
4($2)
0($2)
4($2)
0110
1000
1111
1001
1010
0000
0101
1100
You will
learn it in
CS2410
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
This will be taught in CS4100
We are going to talk about those.
ALUOP[0:3] <= InstReg[9:11] & MASK
22
Example: a = b + c
1
1+2
3
(Fig. 2.2)
2
23
Represented by Instructions
(Fig. 2.7)
24
Instruction Format
• Store the data in register 5 to memory
cell at address A7
Op-code: Specifies which
operation to execute
Operand: Gives more
detailed information about
the operation
(Fig. 2.5, 2.6)
25
Another Example
• JUMP to instruction at address 58H if the
content of register 2 is the same as that of
register 0
(Fig. 2.9)
26
Instruction Repertoire
• Which instructions should be included?
• For example, swapping v[k] and v[k+1]
Create a new instruction, called
swp, which swaps data in two
memory addresses.
swp
0($2), 4($2)
Complex Instruction Set
Computing (CISC)
Using load and store instructions
lw
lw
sw
sw
$15,
$16,
$16,
$15,
0($2)
4($2)
0($2)
4($2)
Reduced Instruction Set
Computing (RISC)
27
Instruction Types
• Data transfer
– Copy data between CPU and main memory
– E.g., LOAD, STORE, device I/O,
• Control
– Direct the execution of the program
– E.g., JUMP, BRANCH, JNE (conditional jump),
• Arithmetic/logic
– Use existing data values to compute a new value
– E.g., AND, OR, XOR, SHIFT, ROTATE, etc.
28
Instruction Types
Data transfer
Data transfer
Arithmetic/Logic
Data transfer
Control
29
Program Execution
(Sec. 2.3)
30
Program Execution Cycle
31
How to Make a Program
“Run”?
(Fig. 2.10)
32
Instruction Fetch
(Fig. 2.11)
33
Processor Architecture
Processor
Function unit Register
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
Instruction register
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
34
Fetch Instruction 1
Processor
Function unit Register
Address bus
F
Data bus
A2
A0
Instruction register
Controller
Decode
0
1
2
3
4
5
6
Program counter
156C
Memory
A0
156C
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
35
Decode Instruction 1
Processor
Function unit Register
Control
signal
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A2
Decode
Instruction register
156C
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
36
Execute Instruction 1
Processor
Function unit Register
1
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A2
Decode
Instruction register
156C
Controller
Memory
6C
1
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
37
Fetch Instruction 2
Processor
Function unit Register
1
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A4
A2
Instruction register
166D
Controller
Memory
A2
166D
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
38
Decode Instruction 2
Processor
Function unit Register
1
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A4
Decode
Instruction register
166D
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
39
Execute Instruction 2
Processor
Function unit Register
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A4
Decode
Instruction register
166D
Controller
Memory
6D
2
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
40
Fetch Instruction 3
Processor
Function unit Register
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A4
A6
Instruction register
5056
Controller
Memory
A4
5056
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
41
Decode Instruction 3
Processor
Function unit Register
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A6
Decode
Instruction register
5056
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
42
Execute Instruction 3
Processor
Function unit Register
3
Adder
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A6
Decode
Instruction register
5056
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
43
Fetch Instruction 4
Processor
Function unit Register
3
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A8
A6
Instruction register
306E
Controller
Memory
A6
306E
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
44
Decode Instruction 4
Processor
Function unit Register
3
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A8
Decode
Instruction register
306E
Controller
Memory
1
2
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
45
Execute Instruction 4
Processor
Function unit Register
3
1
2
0
1
2
3
4
5
6
Address bus
F
Data bus
Program counter
A8
Decode
Instruction register
306E
Controller
Memory
6E
3
1
2
3
6C
6D
6E
15
6C
16
6D
50
56
30
6E
C0
00
A0
A1
A2
A3
A4
A5
A6
A7
A8
A9
46
Instruction Decode
• How to map opcodes to desired circuits on
a CPU?
• For example:
– 00b: add
– 01b: or
– 10b: jump
– 11b: and
47
Interpretation of Operand
• The interpretation of operands depends on
the op-code
Opcode
Operand
1
4A3
2
4A3
4
0A3
Description
Load the content at address A3
to register 4
Load value “A3” to register 4
Move the content of register A to
register 3
48
Instruction Execution
• Uses logic circuits
• Data transfer: load, store, …
– Logic circuit for registers (Ex: flip-flops)
• Control: jump, jump-equal, …
– Change the value of program counter (PC)
– Comparison logic circuit
• Arithmetic/Logic: add, and, shift, …
– Again, logic circuits (adder, as we have seen.)
49
Flip-flops
• A logic circuit that can store one bit.
– Upper input is used to set its stored value to 1
– Lower input is used to set its stored value to 0
– While both input lines are 0,
the most recently stored value
is preserved
– Initially, both inputs and output
are 0
50
Flip-flops: Set Value 1
1
0
input signal
51
Flip-flops: Set Value 0
Input (1,1) is undefined
52
Example of Jump-equal
• B258: JUMP to instruction at address 58H
if the content of register 2 is the same as
Input XOR
that of register 0
In case you forgot
Register 0
Register 2
XOR
0
0
0
0
1
1
1
0
1
1
1
0
OR
:
:
what XOR is
58H=01011000
NOT
set
Program
counter
53
Exercises
Suppose PC=B0
1. What is in register 3 after
the first instruction?
2. What is the memory cell
B8 when the program
halts?
Address
B0
B1
B2
B3
B4
B5
B6
B7
B8
Contents
13
B8
A3
02
33
B8
C0
00
0F
54
Arithmetic/Logic
Operations (Sec. 2.4)
55
Arithmetic/Logic Operations
• Arithmetic: add, subtract, multiply, divide
– Precise action depends on how the values are
encoded (two’s complement vs. floating-point)
• Shift
– circular shift (rotate), logical shift, arithmetic
shift
• Logic: AND, OR, XOR, NOT
– Masking
56
One-bit Full Adder
57
4-bit Parallel Adder
58
Rotate Operation
Rotating bit pattern 65H
one bit to the right
(Fig. 2.12)
59
Shift Operation
• Circular shift (rotation)
• Logical shift
– Filling the hole with bit 0
– Original: 00000101b  5d
– After 1 left shifting: 00001010b  10d
– After 2 left shifting: 00010100b  20d
• Arithmetic shift
– Shifts that leaves the sign bit unchanged
60
Arithmetic Shift
• The two’s complement of 00001010b (10d)
is 11110110b (-10d)
• Want to use right shift to perform -10/2=-5,
– 11110110b >> 1 = 01111011b = ?
– We want the first bit to be 1. (11111011b =-5)
• Arithmetic shift
– Copy the first bit
11110110b
_1111011
1
b
61
Masking
• AND, OR, XOR can be used for masking
• Example: bit operations on 10101010b
– Set the 4th bit to 0
AND
10101010
11110111
10100010
OR
10101010
00000100
10101110
– Set the 3rd bit to 1
Mask
– Invert the 3rd and the 4th bit
XOR
10101010
00001100
10100110
62
Examples of Using Masks
• Ex1: the floating point described in chap 1,
– Design masks to retrieve sign,
exponent, and mantissa.
– Design a mask to set sign.
• Ex2: The ASCII code described in chap 1,
– Design a mask to convert
capital letters to small letters
or vice versa
A 1000001
a 1100001
B 1000010
b 1100010
C 1000011
c 1100011
D 1000100
d 1100100
E 1000101
E 1100101
63
Put Everything Together
Control
Datapath
Memory
register
Memory
Control
signal
Controller
clock
N
Z
IR
ALU
PC
64
Exercises
• Design a mask to isolate the middle four
bits of a byte (set others = 0).
• Encode each of the following commands
– ROTATE the contents of register 7 to the right
5 bit positions
– ADD the contents of registers 5 and 6 as
thought they were values in floating-point
notation and leave the result in register 4
– AND the contents of registers 5 and 6, leaving
the result in register 4.
65
Peripheral Devices
(Sec. 2.5)
66
Connecting to Other Devices
• Outside the case
– Port: The point at which a device connects to
a computer
67
Inside the Case
(Fig. 2.13)
68
Device Controller
• An intermediary apparatus that handles
communication between the computer
(CPU/memory) and a device.
• Two types of controllers
– Specialized controllers
• Network card, graphics card, …
– General purpose controllers
• USB, FireWire, …
69
Device Addressing
• Memory-mapped I/O:
– CPU communicates with peripheral devices
as though they were memory cells
– Use load and store to access device data
(Fig. 2.14)
• Dedicated I/O instructions for devices
70
Direct Memory Access (DMA)
• DMA is a mechanism for devices to access
memory without occupying CPU.
• At the same time, CPU can execute “other
process” until the I/O is finished.
– Better system throughput
71
Communication Type
• Parallel communication:
– Several communication paths transfer bits
simultaneously.
– Printer, computer bus
• Serial communication:
– Bits are transferred one after the other over a
single communication path.
– USB, FireWire, RS232
72
Exercises
• Suppose the machine use memorymapped I/O and memory address B5 is
the location within the printer port to which
data to be printed. If register 7 contains
the ASCII code for the letter A, what
instruction can make letter A to be printed?
• If a printer can only print 128 characters
per second, and has local buffer of 256KB,
how fast the data rate (bps) can be?
73
Parallel Architectures
(Sec. 2.6)
74
Pipeline
• Execution of an instruction (an instruction
cycle) is divided into three stages: fetch,
decode, execute
– Suppose each stage takes 3 clock cycles
– How many clock
cycles are needed
to execute 1
instruction?
– 50 instructions?
75
Pipeline
• Since the hardware used in each stage is
separated, CPU can overlap the stages
Clk 1
Fetch
Decode
Execute
Clk2
Clk 3
Clk 4
Clk 5
Clk 6 Clk 7
Clk8
Clk 9
Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Inst 6 Inst 7 Inst 8 Inst 9 …
Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Inst 6 Inst 7 Inst 8 …
Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 Inst 6 Inst 7 …
• The more stages, the better throughput?
– Throughput = # executed instructions/time
– Pentium 4 had a 35-stage pipeline.
76
Pamphlet Assembling Example
• Suppose there are 100 pamphlets to be
assembled, each of which has 6 pages.
– The printouts of each page are put into a pile.
– Assembling one page takes 1 second.
• Page 1, …page 6 need be assembled in order.
• Assembling one pamphlet takes 6 seconds.
• How fast can it be done by one person?
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
77
• How fast can it be done by two persons?
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
• How fast can it be done by three persons?
Page 1 Page 2
Page 3
Page 4
Page 5 Page 6
• Analogy
– Number of persons  number of stages
– Number of seconds number of clock cycles
• How fast can it be done by 7 persons?
78
Clock Cycle/Clock Rate
• The basic time unit of a CPU
– For example, a 2GHz CPU has clock cycle
1/2G = 5×10-10 second.
• 2GHz is the “clock rate” of a CPU.
– Every operation in CPU takes the time that is
a multiple of the clock cycle.
79
Parallel Architectures
• Bit-level parallelism:
– 1 bit adder vs. 4 bit adder
• Instruction-level parallelism
– Pipeline: overlap instruction execution stages
• IO/computation parallelism
– DMA: overlap communication/computation
• Multiprocessor parallelization
– Cluster, multi-core processors, GPU
80
Flynn's Taxonomy
• Based on the number of concurrent
instruction and data streams available in
the architecture (Michael J. Flynn, 1966)
– SISD (Single-instruction, single-data stream)
• No parallel processing
– MIMD (Multiple-instruction, multiple data stream)
• Different programs, different data
– SIMD: (Single instruction, multiple data stream)
• Same program, different data
81
SIMD Example
• SISD for-loop
– for(i=0;i<5;i++) A[i]=B[i]+C[i];
• SIMD expansion
– CPU 1:
– CPU 2:
– CPU 3:
– CPU 4:
– CPU 5:
A[0]=B[0]+C[0];
A[1]=B[1]+C[1];
A[2]=B[2]+C[2];
A[3]=B[3]+C[3];
A[4]=B[4]+C[4];
82
By Memory Location
• Distributed memory system
– Multiple processors that
communicate through a
computer network.
• Shared memory system
– Multiple processors that
communicate through a
shared memory space.
• Hybrid system
83
Speedup
• Amdahl’s law
– Suppose there are f (0<f<=1) of tasks cannot
be parallelized, the best speedup by n
processors is
84
Supercomputers
• Hundred thousands of processors
interconnected via special designed
network
– Top1: Roadrunner
– http://www.top500.org/
85
Multi-core Processor
• A processor composed of two or more
independent cores (or CPUs).
• Advantages
– Performance improvement
– Low power consumption
• Disadvantages
– Operating system support
– Software support
We will talk about
those problems later
86
Graphics Processing Unit
(GPU)
• A specialized processor designed for 3D
graphics rendering
• Modern GPU has over thousand cores,
which can be used for general purpose
computation
CPU
GPU
87
Exercises
• Suppose instructions can be fully
overlapped in a 3 stages pipeline CPU,
and each stage takes 3 clock cycles,
how many clock cycles are needed to
execute 500 instructions? How if there
are 5 stages?
• What is the best speedup for 10
processors if there are 20% of tasks can
be parallelized? How about 60%?
88
Related Courses
• Store program concept, peripheral devices
– 計算機結構,硬體實驗,微算機系統,
數位邏輯設計
• Machine language,program execution
– 計算機結構,軟體工程,嵌入式系統概論
89
References
• http://www.top500.org/ (supercomputer)
• https://computing.llnl.gov/tutorials/parallel_comp/
• www.cs.nthu.edu.tw/~ychung/slides/para_progra
mming/slides1.pdf
• http://www.computer50.org/mark1/stored.html
• Textbook chapter 2
90
Opcode Operand
Description
1
RXY
LOAD the register R with the bit pattern found in the
memory cell whose address is XY.
Example: I4A3 would cause the contents of the memory
cell located at address A3 to be placed in register 4.
2
RXY
LOAD the register R with the bit pattern XY.
Example: 20A3 would cause the value A3 to be placed in
register 0.
3
RST
STORE the bit pattern found in register R in the memory
cell whose address is XY.
Example: 35B1 would cause the contents of register 5 to
be placed in the memory cell whose address is B1.
4
ORS
MOVE the bit pattern found in register R to register S.
Example: 40A4 would cause the contents of register A to
be copied into register 4.
5
RST
ADD the bit patterns in registers S and T as though they
were two's complement representations and leave the
result in register R.
Example: 5726 would cause the binary values in registers
2 and 6 to be added and the sum placed in register 7. 91
Opcode Operand
Description
6
RST
ADD the bit patterns in registers S and T as though they
represented values in floating point notation and leave the
floating-point result in register R.
Example: 634E would cause the values in registers 4 and
E to be added as floating-point values and the result to be
placed in register 3.
7
RST
OR the bit patterns in registers S and T and place the
result in register R.
Example: 7CB4 would cause the result of ORing the
contents of registers Band 4 to be placed in register C.
8
RST
AND the bit patterns in registers S and T and place the
result in register R.
Example: 8045 would cause the result of ANDing the
contents of registers 4 and 5 to be placed in register 0.
9
RST
EXCLUSIVE OR the bit patterns in registers Sand T and
place the result in register R.
Example: 95F3 would cause the result of EXCLUSIVE
ORing the contents of registers F and 3 to be placed in
register 5
92
Opcode Operand
Description
A
R0X
ROTATE the bit pattern in register R one bit to the right X
times. Each time place the bit that started at the low-order
end at the high-order end.
Example: A403 would cause the contents of register 4 to
be rotated 3 bits to the right in a circular fashion.
B
RXY
JUMP to the instruction located in the memory cell at
address XY if the bit pattern in register R is equal to the bit
pattern in register number 0. Otherwise, continue with the
normal sequence of execution. (The jump is implemented
by copying XY into the PC during the execute phase.)
Example: B43C would first compare the contents of
register 4 with the contents of register 0. If the two were
equal, the pattern 3C would be placed in the program
counter so that the next instruction executed would be the
one located at that memory address. Otherwise, nothing
would be done and program execution would continue in its
normal sequence.
C
000
HALT execution.
Example: C000 would cause program execution to stop.
93