Chap. 9 Pipeline and Vector Processing
Download
Report
Transcript Chap. 9 Pipeline and Vector Processing
13-1
Chap. 13 Multiprocessors
13-1 Characteristics of Multiprocessors
Multiprocessors System = MIMD
An interconnection of two or more CPUs with memory and I/O equipment
» a single CPU and one or more IOPs is usually not included in a multiprocessor system
Unless the IOP has computational facilities comparable to a CPU
Computation can proceed in parallel in one of two ways
1) Multiple independent jobs can be made to operate in parallel
2) A single job can be partitioned into multiple parallel tasks
Classified by the memory Organization
1) Shared memory or Tightly-coupled system
» Local memory + Shared memory
higher degree of interaction between tasks
2) Distribute memory or Loosely-coupled system
» Local memory + message passing scheme (packet or message 전송)
most efficient when the interaction between tasks is minimal
13-2 Interconnection Structure
Multiprocessor System을 구성하는 Components
1) Time-shared common bus
2) Multi-port memory
3) Crossbar switch
4) Multistage switching network
5) Hypercube system
Computer System Architecture
CPU, IOP, 그리고 Memory unit 들을
서로 Interconnection하는 Components
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.
13-2
Time-shared Common Bus
Time-shared single common bus system : Fig. 13-1
» Only one processor can communicate with the memory or another processor at any given
time
when one processor is communicating with the memory, all other processors are either busy with
internal operations or must be idle waiting for the bus
Dual common bus system : Fig. 13-2
Tightly coupled system
» System bus + Local bus
» Shared memory
the memory connected to the common system bus is shared by all processors
» System bus controller
Link each local but to a common system bus
Local bus
COmmon
shared
memory
Memory unit
System
bus
controller
CPU
IOP
Local
memory
Local bus
CPU 1
CPU 2
CPU 3
IOP 1
IOP 2
System
bus
controller
CPU
IOP
Local bus
Computer System Architecture
Chap. 13 Multiprocessors
Local
memory
System
bus
controller
CPU
Local
memory
Local bus
Dept. of Info. Of Computer.
13-3
Multi-port memory : Fig. 13-3
multiple paths between processors and memory
» Advantage : high transfer rate can be achieved
» Disadvantage : expensive memory control logic / large number of cables & connectors
Crossbar Switch : Fig. 13-4
Memory Module의 I/O Port가 하나인 경우에 Crossbar Switch를 사용해야 함
Block diagram of crossbar switch : Fig. 13-5
CPUs
MM
Memory modules
Memory modules
MM 1
MM 2
MM 3
MM 1
MM 4
MM 2
MM 3
MM 4
Data,address, and
control form CPU 1
Data
CPU 1
CPU 1
Address
CPU 2
CPU 3
Memory
module
CPU 2
Read/write
Multiplexers
and
arbitration
logic
Data,address, and
control form CPU 2
Data,address, and
control form CPU 3
Memory
enable
CPU 3
Data,address, and
control form CPU 4
CPU 4
Computer System Architecture
CPU 4
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.
13-4
Crossbar Switch 사용 예제
cluster
cluster
cluster
cluster
cluster
cluster
cluster
CrossbarHierarchies
cluster
cluster
cluster
cluster
cluster
cluster
cluster
cluster
cluster
Cluster
Node
Node
Node
Node
Node
PU
Node
Crossbar
CU
8
8
Network
Interface
I/O
4
Local Memory
Computer System Architecture
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.
13-5
Multistage Switching Network
Control the communication between a number of sources and destinations
» Tightly coupled system : PU
» Loosely coupled system : PU
MM
PU
Basic components of a multistage switching network :
two-input, two-output interchange switch : Fig. 13-6
예제 ) 2 Processor (P1 and P2) are connected through switches to 8 memory
modules (000 - 111) : Fig. 13-7
Omega Network : Fig. 13-8
» 2 x 2 Interchange switch를 사용하여 N input x N output network topology 구성
0
0
A
1
B
0
A
0
1
B
A connected to 0
1
0
1
A connected to 1
P1
0
1
B
B connected to 0
Computer System Architecture
0
0
1
0
1
B
001
010
2
010
011
3
011
100
4
100
101
5
101
6
110
7
111
1
0
A
000
1
001
1
P0
A
0
000
1
0
B connected to 1
1
Chap. 13 Multiprocessors
110
111
Dept. of Info. Of Computer.
13-6
Hypercube Interconnection : Fig. 13-9
Loosely coupled system에서 사용
Hypercube Architecture 예제 : Intel iPSC ( n = 7, 128 node )
011
0
01
11
010
111
110
001
0
00
10
000
101
100
13-3 Interprocessor Arbitration : Bus Control
Single Bus System : Address bus, Data bus, Control bus
Multiple Bus System : Memory bus, I/O bus, System bus
System bus : Bus that connects CPUs, IOPs, and Memory in multiprocessor
system
Data transfer method over the system bus
Synchronous bus : achieved by driving both units from a common clock source
Asynchronous bus : accompanied by handshaking control signals
Computer System Architecture
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.
13-7
System Bus 예제 : IEEE Standard 796 MultiBus
86 signal lines : Tab. 13-1
» Bus Arbitration 신호선 : BREQ, BUSY, …
* Bus Busy Line 사용
If this line is inactive,
no other processor is using the bus
Bus Arbitration Algorithm : Static / Dynamic
Static : priority fixed
» Serial arbitration : Fig. 13-10
Highest
priority
Bus
1
PI
Bus
PO
arbiter 1
PI
Lowest
priority
Bus
Bus
PO
PI
arbiter 1
PO
PI
arbiter 1
PO
To next
arbiter
arbiter 1
Bus busy line
» Parallel arbitration : Fig. 13-11
Bus
arbiter 1
Ack
Req
Bus
arbiter 2
Ack
Req
Bus
arbiter 3
Ack
Dynamic : priority flexible
»
»
»
»
»
Time slice (fixed length time)
Polling
LRU
FIFO
Rotating daisy-chain
Computer System Architecture
Chap. 13 Multiprocessors
Req
Bus
arbiter 4
Ack
Req
Bus busy line
4×2
Priority encoder
2×4
Decoder
Dept. of Info. Of Computer.
13-8
13-4 Interprocessor Communication & Synchronization
Interprocessor Communication
shared memory : tightly coupled system
» Accessible to all processors : common memory
» Act as a message center similar to a mailbox
no shared memory : loosely coupled system
» message passing through I/O channel communication
Interprocessor Synchronization
Enforce the correct sequence of processes and ensure mutually exclusive access
to shared writable data
Mutual Exclusion
» Protect data from being changed simultaneous by two or more processor
Mutual Exclusion with Semaphore
» Critical Session
Once begun, must complete execution before another processor accesses
» Semaphore
Indicate whether or not a processor is executing a critical section
» Hardware Lock
Computer System Architecture
Processor generated signal to prevent other processors from using system bus
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.
13-9
Semaphore를 이용한 shared memory 사용 방법
1) TSL SEM 명령 실행 (Test and Set while Locked)
» Hardware Lock 신호를 발생시키면서 SEM 비트를 검사
» 2 memory cycle 필요
R M [ SEM ] : Test semaphore (semaphore를 레지스터 R로 읽어 들인다)
M [ SEM ] 1
: Set semaphore (다른 processor의 shared memory 사용을 금지)
2) R = 0 인 경우 : shared memory is available
R = 1 인 경우 : processor can not access shared memory (semaphore
originally set)
X = 120
13-5 Cache Coherence
Bus
Conditions for Incoherence : Fig. 13-12, 13
Multiprocessor system with private caches
» Write through : P2, P3 Incoherence
» Write back : P2, P3, Main memory Incoherence
P1 이 X 에 120 을
Write 하는 경우
X = 52
X = 120
X = 52
X = 52
P0
P2
P3
P0
X = 52
X = 52
Main memory
P2
X = 52
P3
Caches
Processors
(a) With write-through cache policy
Main memory
Bus
Bus
X = 52
Computer System Architecture
Main memory
X = 120
X = 52
X = 52
P0
P2
P3
Caches
Caches
Processors
Chap. 13 Multiprocessors
(b) With write-back cache policy
Dept. of Info. Of Computer.
Processors
13-10
Solution to the Cache Coherence Problem
Software 적인 방법
» 1) Shared writable data are non-cacheable
» 2) Writable data exists in one cache : Centralized global table
Hardware 적인 방법
» 1) Monitor possible write operation : Snoopy cache controller
참고 문헌 :
» IEEE Computer, 1988, Feb.
“Synchronization, coherence, and event ordering in multiprocessors”
» IEEE Computer, 1990, June.
“A survey of cache coherence schemes for multiprocessors”
Computer System Architecture
Chap. 13 Multiprocessors
Dept. of Info. Of Computer.