TI1400 Computer Organization at TU Delft

download report

Transcript TI1400 Computer Organization at TU Delft

Input/Output Organization
(Chapter 4)
http://www.pds.ewi.tudelft.nl/~iosup/Courses/2011_ti1400_8.ppt
1
in1705/07
The “Data Deluge”: Trivia
The Petabyte Age: Because More Isn't Just More — More Is Different, Wired,
23 June 2008.
http://www.wired.com/science/discoveries/magazine/16-07/pb_intro#
2
TI1400/11-PDS
TU-Delft
The “Data Deluge”:
Facts and Predictions
"Everywhere you look, the quantity
of information in the world is
soaring. According to one
estimate, mankind created 150
exabytes (billion gigabytes) of
data in 2005. This year, it will
create 1,200 exabytes. Merely
keeping up with this flood, and
storing the bits that might be
useful, is difficult enough.
Analysing it, to spot patterns and
extract useful information, is
harder still.“
The Data Deluge, The Economist,
25 February 2010.
3
TI1400/11-PDS
TU-Delft
“Data Deluge”: The Mobile Example
exabyte
Battling a Wireless Deluge: AT&T, Other Carriers Use Wi-Fi 'Hotzones' to Siphon Off
Smartphone Traffic, Tech Journal, 2 February 2011.
Read more:
http://online.wsj.com/article/SB10001424052748704124504576118353354099780
.html#ixzz1LEpF4TuA
4
TI1400/11-PDS
TU-Delft
“Data Deluge”: The Personal Memex Example
• Vannevar Bush in the 1940s: record your life
• MIT Media Laboratory: The Human Speechome
Project/TotalRecall, data mining/analysis/visio
- Deb Roy and Rupal Patel “record practically every
waking moment of their son’s first three years”
(20% privacy time…Is this even legal?! Should it be?!)
- 11x1MP/14fps cameras, 14x16b-48KHz mics, 4.4TB
RAID + tapes, 10 computers; 200k hours audio-video
- Data size: 200GB/day, 1.5PB total
5
TI1400/11-PDS
TU-Delft
“Data Deluge”: The Gaming Analytics Example
• EQ II: 20TB/year all logs
• Halo3: 1.4PB served
statistics on player logs
6
TI1400/11-PDS
TU-Delft
“Data Deluge”: Datasets in Comp.Sci.
Dataset
Size
http://gwa.ewi.tudelft.nl
The Failure
Trace
Archive
http://fta.inria.fr
Peer-to-Peer Trace Archive
… PWA, ITA, CRAWDAD, …
1TB/yr
1TB
GamTA
100GB
P2PTA
10GB
1GB
‘06
‘09
‘10
‘11 Year
• 1,000s of scientists: From theory to practice
7
TI1400/11-PDS
TU-Delft
7
The Simplest(?) Problem:
How to Access Data by the CPU/Cores?
• Computers must be able to
communicate with outside
• Large variety of devices
- size
- speed
- distance
• Timing and electrical properties
not the same as within CPU
8
TI1400/11-PDS
TU-Delft
Single-bus structure
Processor
Memory
Bus
I/O device #1
............ I/O device #n
9
TI1400/11-PDS
TU-Delft
Multiple buses
Memory
memory bus
Processor
I/O Bus
I/O device #1
............
I/O device #n
10
TI1400/11-PDS
TU-Delft
Buses and interfaces
Bus contains generally three bit strings:
• Data lines to transport data
• Address lines to identify devices
• Control lines that take care of correct transfer of
data
11
TI1400/11-PDS
TU-Delft
Interfaces
Devices are coupled to bus through interface:
• Address decoder
- for detection if data is for device
• Data registers
- to store incoming and outgoing data
• Status and control registers
- to certify status of device
- to control transfer
12
TI1400/11-PDS
TU-Delft
Interface organization
Address lines
Data lines
Control lines
Address
Decoder
Data and
Status registers
I/O
interface
Control
circuits
Device
TI1400/11-PDS
13
TU-Delft
Video terminal
CPU
DATAIN
SIN
Keyboard
DATAOUT
SOUT
Display
Video terminal
14
TI1400/11-PDS
TU-Delft
Operation (1)
Busy waiting:
READWAIT
Branch to READWAIT if SIN=0
Input from DATAIN to R1
WRITEWAIT
Branch to WRITEWAIT if SOUT=0
Output from R1 to DATAOUT
I/O-instructions:
Move
Move
DATAIN, R1
R1, DATAOUT
15
TI1400/11-PDS
TU-Delft
Operation (2)
2
IOSTATUS
1
SIN
0
SOUT
DATAIN
DATAOUT
READWAIT
Testbit #1, IOSTATUS
Branch=0 READWAIT
Move DATAIN, R1
16
TI1400/11-PDS
TU-Delft
I/O Instructions
• Memory-mapped I/O
- the registers of the devices have addresses in
the same space as main memory locations
- normal instructions can be used
• move DATAIN, R1
• I/O instructions
- special instructions for I/O
• IN device, data
• OUT data, device
17
TI1400/11-PDS
TU-Delft
Memory and register structure
Memory
CPU
......
IOPROC1
IOPROC2
18
TI1400/11-PDS
TU-Delft
Address spaces
memory mapped
separate address spaces
0
1
2
0
1
2
CPU
0
1
2
3
4
5
IOPROC1
IOPROC2
CPU
0
1
2
0
1
2
IOPROC1
IOPROC2
0
6
......
Mem
......
Mem
19
TI1400/11-PDS
TU-Delft
I/O and Programming
There are two basic mechanisms for I/O:
1.
2.
Programmed I/O
Non-programmed I/O
20
TI1400/11-PDS
TU-Delft
Programmed I/O
• By executing of special program in CPU
• Unconditional I/O
- no synchronization with I/O device
• Passive signaling
- synchronization between CPU and Device by
programmed interrogation by CPU
• Active signaling
- synchronization between CPU and Device by active
interrupt of Device
21
TI1400/11-PDS
TU-Delft
Non-programmed I/O
I/O is done by separate active entity
• Direct Memory Access (DMA)
- some intelligence in device takes care of data
transport
• Special I/O processors
22
TI1400/11-PDS
TU-Delft
Interrupts
Compute routine
Print routine
1
...
Interrupt
i
i +1
M
jump
....
return
.....
.....
23
TI1400/11-PDS
TU-Delft
Service Routines
• I/O device alerts CPU by hardware signal called
interrupt signal
• Usually special line in control group of I/O bus is used
for this: interrupt request line
• CPU stops program and starts executing service
routine
• Much like executing subroutine
• Except: these routines have nothing in common !!
24
TI1400/11-PDS
TU-Delft
Handling interrupts
1. Device raises interrupt request
•
Processor interrupts program in execution
•
Interrupts are disabled
•
Device is informed of acceptance and,
as a consequence, lowers interrupt
•
Interrupt is handled by service routine
•
Interrupts are enabled
•
Execution of interrupted program is resumed
25
TI1400/11-PDS
TU-Delft
Multiple devices
• How can processors distinguish devices ?
• How can processors obtain the appropriate
starting address service routine ?
• Should we allow a new interrupt while
another is being served ?
• How do we handle simultaneous interrupts ?
26
TI1400/11-PDS
TU-Delft
Interrupt line
INTR = INT1 + INT2 + .... + INTn
interrupt request
CPU
INT1
INT2
INTn
Finding device by POLLING :
- search for device with IRQ bit set in status register
27
TI1400/11-PDS
TU-Delft
Vectored Interrupt
• Device sends identification code on bus
• Called interrupts vector
• Issued after GRANT signal from CPU
interrupt request
CPU
grant
INT1
INT2
INTn
28
TI1400/11-PDS
TU-Delft
Interrupt priority
priority circuit
CPU
grant1
grant2
INT1
INT2
INTn
grant3
29
TI1400/11-PDS
TU-Delft
Bus arbitration (1)
bus release line (rel_i)
interrupt request line (req_i)
CPU
grant
bus is free iff: (rel_1 • rel_2 • ..... • rel_n) =1
30
TI1400/11-PDS
TU-Delft
Bus arbitration (2)
• Request: set req_i to 1
• Acquire: if grant=1, then
set req_i to 0 (interrupt once) and
set rel_i to 0 (prevent others from interrupting)
• Release: set rel_i to 1
grant = (req_1 + req_2 + ..... +req_n) • (rel_1 • rel_2 • ..... • rel_n)
at least one request
bus released by all
31
TI1400/11-PDS
TU-Delft
PowerPC interrupt structure (1)
MSR = Machine State Register
0
16 17
EE PR
21
SE
25
EP
31
EE = External interrupt enable
PR = Privilege level
SE = Single step trace exception enable
EP = Exception prefix
EP=0  address service starts at 000001F4
EP=1  address service starts at FFF001F4
32
TI1400/11-PDS
TU-Delft
PowerPC interrupt structure (2)
• PowerPC has two special Save/Store registers:
SRR0 and SRR1
• After interrupt:
MSR
PC
SRR0
SRR1
Clear interrupt enable bit in MSR
33
TI1400/11-PDS
TU-Delft
IA-32 interrupt structure (1)
Processor status register (EFLAGS)
31
•
•
•
•
•
13 12 11
9 8 7
6
0
IOPL OF
IF TF SF ZF
CF
CF, ZF, SF, OF: condition code flags
TF: trap flag
IF: Interrupt Enable Flag
IOPL: I/O Privilege Level (4 levels)
IA-32 has two interrupt request lines
34
TI1400/11-PDS
TU-Delft
IA-32 interrupt structure (2)
•
Steps when an interrupt occurs:
1. push processor status register, current segment
register (CS), and instruction pointer (EIP) onto the
stack
2. clear interrupt-enable flag if needed
3. fetch starting address of interrupt-service routine
from Interrupt Descriptor Table and load it into EIP
•
At end of routine, execute IRET
35
TI1400/11-PDS
TU-Delft
Example
DATAIN
STATUS
6
IE
2
1
0
SIN
SOUT
interrupt
keyboard interface
36
TI1400/11-PDS
TU-Delft
Memory Layout
STATUS
DATAIN
LINE
buffer area
1F4
READ
.....
.....
.....
32 K I/O space
32 K program space
address READ
.....
37
TI1400/11-PDS
TU-Delft
PowerPC: Initialization
INTVEC
EQU
$1F4 Interrupt vector address
(location where start address
of interrupt routine is stored)
INTEN
INTDIS
EQU
EQU
$40
0
NEWMSR
EQU
$8000 Desired contents of MSR
(external interrupt enable)
RTRN
EQU
$0D
Keyboard interrupt enable
and disable masks
(will be stored in status
register of device)
Code Carriage Return
(for checking end-of-line)
38
TI1400/11-PDS
TU-Delft
PowerPC: Interrupt Processing (1)
START ADDI R2,0,READ
STW R2,INTVEC(0)
Get address of service
routine and store at
interrupt vector location
ADDI R2,0,LINE
STW R2, PNTR(0)
Get address of LINE
and store at PNTR
ADDI R2,0,INTEN
STW R2,STATUS(0)
Store interrupt enable
in STATUS register
39
TI1400/11-PDS
TU-Delft
PowerPC: Interrupt Processing (3)
ADDI R2,0,NEWMSR
MTSRR1 R2
Store new MSR
in SRR1
ADDI R2,0,MAIN
MTSRR0 R2
Store new PC
in SRR0
RFI
Return From Interrupt
(use new MSR and PC)
40
TI1400/11-PDS
TU-Delft
PowerPC: Program (1)
MAIN
READ
PNTR
<main program>
.....
.....
Save registers
LBZ
Get input character
R30,DATAIN(0)
LWZ R31,PNTR(0)
STBU R30,1(R31)
STW R31,PNTR(0)
Load value at PNTR
Store character
in buffer
Update PNTR for
next character
41
TI1400/11-PDS
TU-Delft
PowerPC: Program (2)
EOL
CMPWI
BNE
CR1,R30,RTRN
CR1,DONE
Check for CR (end of
line)
ADDI
STW
R2,0,INTDIS
R2,STATUS(0)
Store interrupt disable
in STATUS register
next character
BL
TEXT
DONE
....
RFI
Call subroutine for
dealing with line
Restore saved registers
Return from interrupt
42
TI1400/11-PDS
TU-Delft
IA-32: Program (1)
MAIN: MOV
MOV
OR
STI
EOL,0
BL,4
CONTROL,BL
READ: PUSH
PUSH
MOV
MOV
MOV
INC
EAX
EBX
EAX,PNTR
BL,DATAIN
[EAX],BL
DWORD PTR [EAX]
not yet end of line
set keyboard
interrupt enable
set interrupt flag in
processor register
save registers
load address pntr
get input,
store it,
and increment pntr
43
TI1400/11-PDS
TU-Delft
IA-32: Program (2)
CMP
JNE
MOV
XOR
MOV
RTRN: POP
POP
IRET
BL,0DH
RTRN
BL,4
CONTROL,BL
EOL,1
EBX
EAX
char=end of line?
no
yes
so disable interrupts
and set EOL flag
restore registers
return from interrupt
44
TI1400/11-PDS
TU-Delft
Other interrupts
• Not only I/O devices can cause interrupts
• Recovery from errors, e.g.:
- illegal OP code used
- division by 0
• Debugging
• Privilege exception
45
TI1400/11-PDS
TU-Delft
Operating Systems (1)
• In general, interrupts controlled by Operating System
• CPU can be in user mode or supervisor mode
• Privileged instructions only allowed in supervisor
mode
- starting of I/O operations
- setting of priorities
- setting of clock values
46
TI1400/11-PDS
TU-Delft
Operating Systems (2)
• Process: program in execution
- Program
- Data
- Status: PC, Registers, etc
• State of a process:
- Running
- Runnable (waiting for CPU)
- Blocked (waiting for something else)
• Multi-tasking
- Multiple tasks in execution
• Time-slicing
- Divide time across processes
47
TI1400/11-PDS
TU-Delft
Operating Systems (3)
• Context switch: change of processes
• After clock interrupt: dispatcher chooses suitable
process
• Device drivers: service routines for devices
• System Call: call to OS service routine
- printf (“%d\n”,a)
- fscanf (file,”%d\n”,&a)
48
TI1400/11-PDS
TU-Delft
OS init, services, scheduler
OSINIT
Set interrupt vectors
Set addresses Time slice clock <- SCHEDULER
for dealing Trap <- OSSERVICES
with these
VDT interrupts <- IODATA
interrupts
...
OSSERVICES
Examine stack to determine request
Call appropriate routine
SCHEDULER
Save current context
Select runnable process
Context switch
Restore saved context of new process
Return from interrupt
49
TI1400/11-PDS
TU-Delft
I/O routines
IOINIT
IODATA
Set process status to blocked
Initialize memory buffers
Call device driver to initialize device (e.g., VDT)
Return from subroutine
Poll devices to determine source
of interrupt (e.g., VDT)
Call appropriate driver
if END=1 then set process to runnable
Return from interrupt
50
TI1400/11-PDS
TU-Delft
VDT driver (e.g., Keyboard)
VDTINIT
Initialize device interface (e.g., baud rate)
Enable interrupts
Return from subroutine
VDTDATA
Check device status
If ready then transfer character
If character = CR (check end-of-line) then
set END=1; Disable interrupts
else set END=0
Return from subroutine
51
TI1400/11-PDS
TU-Delft
Direct Memory Access
Start address
Wordcount
more “intelligent”
device interface
31 30
2
1
Status &Control
IE
Interrupt request IRQ
R/W
0
Done
DMA controller
52
TI1400/11-PDS
TU-Delft
Direct Memory Access to Physical Devices
Processor
DMA Ch. 1
Memory
Bus priority modes
Cycle stealing: DMA > CPU
Burst: DMA exclusive
System bus
DMA/Disk
controller
BUFR DMA
controller
DMA Channel 2
Disk1
Disk2
Network
Interface
53
TI1400/11-PDS
TU-Delft
Cell/B.E.: A Modern DMA Use
•
1 x PPE 64-bit PowerPC
-
•
8 x SPE cores:
-
•
Local mem (LS): 256 KB
128 x 128 bit vector registers
Peak performance
-
•
L1: 32 KB I$+32 KB D$
L2: 512 KB
~200 GFLOPS for all SPEs
~240 GFLOPS total
Main memory access:
-
PPE: Rd/Wr
- SPEs: Async DMA
54
TI1400/11-PDS
TU-Delft
Bus structures
Specification of bus
•
•
•
•
•
•
Number of data lines
Size of address space
Multiplexing discipline
Control structure
Synchronous versus asynchronous
Physical properties:
connectors, pinning, electrical properties
55
TI1400/11-PDS
TU-Delft
NVIDIA G80/GT200/Fermi:
I/O as Performance Bottleneck
G80
•
•
•
GT200
SM = streaming multiprocessor
1 SM = 8 SP (streaming proc/CUDA cores)
1TPC = 2 x SM / 3 x SM =
thread processing clusters
TI1400/11-PDS
Per chip 1+TFLOPS
I/O: 2.5GB/s (1:400)
56
TU-Delft
Synchronous Bus
Bus clock
Address
Data
Clock “slow enough” for all connected devices
57
TI1400/11-PDS
TU-Delft
Asynchronous Bus (1)
Address
(to allow for skew)
Ready (set by CPU)
Accept (set by device)
Data (from device)
Explicit handshaking: Input Cycle
58
TI1400/11-PDS
TU-Delft
Asynchronous Bus (2)
Address
Ready
Accept
Data (from CPU)
Explicit handshaking: Output Cycle
59
TI1400/11-PDS
TU-Delft
SCSI bus
• Small Computer System Interface (SCSI)
• ANSI standard X3.131
• Up to 25 meter
• 50-wire cable
• Up to 8 (16) devices connected to bus
• A connection has an initiator and a target
• Target controls data transfer
60
TI1400/11-PDS
TU-Delft
SCSI- based Computer System
Processor
Memory
Printer
Terminal
Par. intface
Ser.intface
processor bus
SCSI
controller
Disk
controller
Disk1
TI1400/11-PDS
Disk2
CD-ROM
controller
CD ROM drive
SCSI bus
61
TU-Delft
SCSI bus signals
• Data:
DB(0),..., DB(7)
• Parity:
DB(P)
• Phase:
BSY, SEL
• Information type:
C/D, MSG (control/message)
• Handshake:
REQ, ACK
• Direction:
I/O
• Other:
ATN, RST
• Data lines used for identifying bus controllers
• Signals are active in the low-voltage state
62
TI1400/11-PDS
TU-Delft
Typical sequence
-DB2
initiator
2 retreats
target
-DB5
-DB6
initiator
6 wins
-BSY
-SEL
bus
free
arbitration
selection
63
TI1400/11-PDS
TU-Delft