Transcript Process

High Performance Power System 효율적으로 사용하기
High Performance Power System
Date. 15/10/2009
DongJoon Cho ([email protected])
MTS, GTS, IBM Korea
© 2009 IBM Corporation
Agenda
• Concerns about Power System
• Summary of the solutions
• Architectures for effective computing
– H/W Architecture
– System Architecture
– S/W Architecture
© 2009 IBM Corporation
Concerns about Power System
• 왜 고성능 Server를 구매해놓고 100% 활용을 하지 못할까?
• CPU Clock은 높아졌는데 왜 Application 성능은 나오지 않는 걸까?
• Clock은 2배로 빨라졌는데 왜 성능은 2배가 되지 않는 걸까?
• Memory를 2배로 추가했는데 왜 사용률이 ½로 떨어지지 않는 걸까?
• IBM Power System은 왜 다른 System에 비해 tpmC가 높게 나올까?
• IBM Power System은 response time은 좋은데 왜 사용량이 높을까?
S/W의 변화 없이 System만
바꾼다고 성능이 향상될까?
System에 대해 CPU Clock
이외에 무엇을 더 알고 있을까?
© 2009 IBM Corporation
Summary of the solutions
• 간접적 방법
– Firmware update
– AIX update
– Software update
• 직접적 방법
–
–
–
–
AIX configuration
Plan/Selection Hardware
System Architecture
Software Architecture
대부분의 software 문제는 개발시간 및 비용문제
로 인해 직접적인 방법으로 해결하기 어려움
© 2009 IBM Corporation
Hardware Architecture - CPU
• CISC
– Complex Instruction Set Computer Architecture
– 필요한 모든 명령어 셋을 갖추도록 설계
– VAX, x86
• EPIC
– Explicitly Parallel Instruction Computing Architecture
– HP/Intel 공동 설계, 명시적 병렬 처리를 제공
– IA64
• RISC
– Reduced Instruction Set Computer Architecture
– 명령어 셋 자체를 가장 자주 사용되는 명령어만으로 개수를 줄임으로써 대부
분의 활용 업무 면에서 소요시간을 단축할 수 있도록 설계
– SPARC, POWER, PA-RISC
© 2009 IBM Corporation
Hardware Architecture - CPU Instructions
• Computation Instructions
Arithmetic operations
Logical operations
ADD
Add
AND
True if A and B true
SUB
Subtract
OR
True if A or B true
MUL
Multiply
NOT
True if A is false
DIV
Divide
XOR
True if only one of
INC
Increment
DEC
Decrement
SHL
Shift bits left
CMP
Compare
SHR
Shift bits right
BSWAP
Reverse byte order
A and B is true
• Operands Types
Stack
Accumulator
Register
Memory
Push A
Ld A
Ld R1, A
Add C, B, A
Push B
Add B
Ld R2, B
Add
St C
Add R3, R2, R1
Pop C
St C, R3
© 2009 IBM Corporation
Hardware Architecture - CPU Instructions
• Data Transfer Instructions
LD
Load value from memory to a register
ST
Store value from a register to memory
MOV
Move value from register to register
CMOV
Conditionally move value from register to register
if a condition is met
PUSH
Push value onto top of stack
POP
Pop value from top of stack
© 2009 IBM Corporation
Hardware Architecture - CPU Instructions
• Control Flow Instructions
JMP
Unconditional jump to another instruction
BR
Branch to instruction if condition is met
CALL
Call a procedure
RET
Return from procedure
INT
Software interrupt
• Control Flow Relative Frequency
Instruction
Integer programs
Floating-point programs
Branch
75%
82%
Jump
6%
10%
Call & return
19%
8%
© 2009 IBM Corporation
Hardware Architecture - CPU Instructions
• Common Instructions
Instruction
Instruction type
Percent of instructions
executed
Instruction type
Overall percentage
Load
Data transfer
22%
Data transfer
38%
Branch
Control flow
20%
Computation
35%
Compare
Computation
16%
Control flow
22%
Store
Data transfer
12%
Add
Computation
8%
And
Computation
6%
Sub
Computation
5%
Move
Data transfer
4%
Call
Control flow
1%
Return
Control flow
1%
Total
95%
© 2009 IBM Corporation
Hardware Architecture - CPU and I/O
• CPU Speed versus I/O Speeds
CPU보다 느린 I/O
I/O에 의한 wait를 줄이
는 여러 기술 필요
• Several options to overcome I/O limitations
– Incorporate more I/O buses (parallelism)
– Extend current I/O technology (increase bandwidth, enhance operating
modes)
– Develop new I/O technology
© 2009 IBM Corporation
Hardware Architecture - CPU and I/O
• CPU Efficiency and CPU Access Costs
I/O에 의한 성능 저하
© 2009 IBM Corporation
Hardware Architecture - I/O
• The elements of an I/O system
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
• Comparing InfiniBand to Existing Technology
– Differences and Benefits
Change
Benefit
From:
To:
Memory mapped
Channel based
CPU efficiency, scalability, isolation
, recovery.
Parallel bus
Switched fabric
Scalability, isolation, redundancy, r
educed pin-out, modularity, higher
cross-sectional bandwidth.
Shared bus access
Point to point
Greater distance, higher speeds.
Load/store
DMA scheduling
Improved CPU efficiency.
Single open address space
Independent address domains
Protection, isolation, recovery, relia
bility.
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
Shared Bus Topology
Switched Fabric Topology
Shared Bus Architecture
traditional
InfiniBand
InfiniBand Architecture
InfiniBand Switched Architecture
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
Accessing InfiniBand Services - The Channel Interface : Work / Completion Queue Architecture
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
Queue를 통해 wait 최소화,
비동기 처리
InfiniBand Queue Operations – Operations on the send queue fall into three subclass
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
• VIA (Virtual Interface Architecture)
– Messages Model
– Direct, protected access by user level software to the communications
hardware; the protection is effected by means of the virtual memory
system.
Send and receive packet descriptors that specify scatter-gather operations—specifying where
data must be distributed to and collected up from—when sending and receiving
A send message queue and a receive message queue, comprising linked lists of packet
descriptors
A means of notifying the network interface that packets have been placed on a queue
An asynchronous notification process for the status of the operations requested
(completion of a send or receive operation is signaled by writing state information into a packet
descriptor)
Registration of memory areas used for communications: before communications are started,
the memory areas for each hardware unit are identified and noted, allowing expensive
operations, such as locking the pages, to be used and translating from virtual to real
addresses to be done once, outside performance-critical data transfers
Comparison of VIA and traditional communications
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
Logical processing steps in TCP/IP
Checksum계산, memory 관리에
의해서도 overhead 발생
White indicates per-message processing: it is the processing load imposed by the system call on the sockets interface, and is independent of the size of the
message
Light gray indicates per-fragment processing (a long message is broken up into several fragments): this covers TCP, IP, media access and interrupt handling
Dark grey indicates per-byte processing (actually, per fragment plus per byte in fragment): this covers the data-copying overhead along with computation of
the checksum
© 2009 IBM Corporation
Hardware Architecture - I/O : InfiniBand
• Mechanisms to reduce the number of interrupts
Operation
Simple DMA
Improved DMA
Send
•set up the DMA registers (with buffer address and s
ize)
•lock the page containing the buffers and purge cor
responding addresses in the data cache
•activate the send command
•wait until the end of the operation
•interrupt upon completion of the operation, and fre
e (unlock) the page
•refill the free buffers with data to be sent
•lock the buffer page(s) and purge corresponding a
ddresses in the data cache
•refill a descriptor with the addresses and sizes of t
he buffers just set up
•change the descriptor status indicator to "DMA"
•if the DMA was inactive, wake it up
Receive
•DMA interrupts processor
•allocate a page and purge the cache of its address
es
•set up the DMA registers (with buffer address and s
ize)
•when the operation completes the DMA will raise a
n interrupt
•refill descriptor(s) for receiving
•purge corresponding addresses in the data cache
•when a receive operation completes, DMA sets the
descriptor indicator to System; the OS can test the
status of different descriptors
•if there are no free buffers, the DMA raises an interr
upt
개선된 DMA 방식으로 interrupt
횟수를 줄여 overhead를 줄임
© 2009 IBM Corporation
System Architecture (Hardware)
• LPAR / DLPAR
© 2009 IBM Corporation
System Architecture (Hardware)
• LPAR / DLPAR
– Hypervisor
© 2009 IBM Corporation
System Architecture (Hardware)
• Micro Partitioning
– 프로세서당 최대 10개의 파티션 작성
– 여러 파티션 간 자원 공유
© 2009 IBM Corporation
System Architecture (Hardware)
• Micro Partitioning
© 2009 IBM Corporation
System Architecture (Hardware)
• VIO
–
–
–
–
–
–
Part of the Advanced POWER Virtualization feature
Allows for sharing of physical devices, including storage and network
Implemented as a customized AIX-based appliance
Requires careful planning to maintain VIO Server with minimal impact to VIO
Clients
Provides command line tools for maintenance or can be maintained with NIM
© 2009 IBM Corporation
System Architecture (System Software)
• SMT (Simultaneous Multi-Threading)
– POWER5에서 향상된 하드웨어 디자인으로 프로세서가 동시에 두 개의 개별
instruction을 실행할 수 있는 기능
– 하드웨어와 소프트웨어 thread의 우선 순위 선정을 통해서 어플리케이션의
성능에 지장을 주지 않고 더 많은 하드웨어 자원의 사용률을 증대
• WLM (Workload Manager)
– 시스템을 분할하지 않고서 운영중인 업무간에 동적으로 시스템자원을 할당
– CPU 프로세서 단위가 아닌 CPU 시간을 분할하여 관리하므로 보다 세밀하게
CPU 자원을 제어
– CPU 시간, 메모리, 입출력량 등의 개별적 제어를 통해 특성이 다른 여러 종
류의 어플리케이션들을 하나의 서버상에서 관리
© 2009 IBM Corporation
System Architecture (System Software)
• WPARs (Workload Partitions)
– A workload partition (WPAR), new with the IBM® AIX® 6.1 operating
system, expands on the traditional IBM AIX logical partitioning (LPAR)
technology by further allowing AIX to be virtualized within a single
operating-system image.
– A simple definition of a WPAR is that it is a virtualized AIX instance that
runs within a single AIX operating-system image.
© 2009 IBM Corporation
Software Architecture - OS
• OS와 Network Program과의 관계
– Network Program의 구성요소
•
•
•
•
Socket API
I/O
Multi Connection 처리를 위한 Process or Thread
Process or Thread를 동기화하기 위한 IPC(Inter Process Communication)
Socket
API
Process
Thread
I/O
IPC
File System, Memory
OS
H/W (Disk, NIC …)
OS와 Network Program과의 관계
© 2009 IBM Corporation
Software Architecture – File on the Unix
• What is the File on the Unix?
• Process가 열고 있는 file 확인
– Process가 생성되면 기본적으로 Open하는 File
• 0 : 표준입력
• 1 : 표준출력
• 2 : 표준오류
office2@root/proc/9804/fd>ls -al
total 120
dr-x------ 1 root system
dr-xr-xr-x 1 root system
lr-xr-xr-x 24 root system
lr-xr-xr-x 24 root system
lr-xr-xr-x 24 root system
--w--w---- 1 root system
--w--w---- 1 root system
--w--w---- 1 root system
0 Sep 27 03:22 .
0 Sep 27 03:22 ..
1024 Sep 22 18:48 0 -> /
1024 Sep 22 18:48 1 -> /
1024 Sep 22 18:48 2 -> /
12506 Sep 15 18:13 7
12506 Sep 15 18:13 8
12506 Sep 15 18:13 9
© 2009 IBM Corporation
Software Architecture
• Application Programs and OS
– Type of Software (Conceptual Model)
© 2009 IBM Corporation
Software Architecture
• Application Programs and OS
– Application Programs
© 2009 IBM Corporation
Software Architecture
• Application Programs and OS
– Operating Systems
© 2009 IBM Corporation
Software Architecture
• Application Programs and OS
– Device Drivers
© 2009 IBM Corporation
Software Architecture
• Application Programs and OS
– AIX 5L Structure
© 2009 IBM Corporation
Application Architecture
Multi-processing으로 인한
IPC는 kernel overhead를 증가시킴
• Multi-Process Model
– Process : Program이 실행될 때 생성되는 Program을 대표하는 제어흐름과
System자원(memory,file,IPC…)등을 의미
– Process 생성 및 제어
• fork()
– Process 복사본 생성
– 자신과 코드를 공유하는 Child Process 생성
• exec()
– 현재 Process에 Program의 실행 이미지를 변경
– 새로운 Program을 Load해서 실행
Init Process
fork()
Process’
fork()
Process’’
exec()
Process A
exec()
Process B
© 2009 IBM Corporation
Application Architecture
• Multi-Processing Model
– Socket Program
Server
Client
socket
bind
socket
listen
연결요청
connect
accept
fork()
데이터 요청
write
read
데이터 수신
write
read
close
© 2009 IBM Corporation
Application Architecture
• Multi-Processing Model
Server App
2
Server App
Server App
Server App
Server App
Client App
1
Client App
Client App
Client App
2
Server App
Server App
Server App
Server App
① connecting Client to Server
② fork()
요청이 있을 때마다 fork()가 일어난다.
Client App
1
fork()
Client App
Client App
process Pool
① fork()
② connecting Client to Server
fork() 시간이 오래 걸리므로 pool에 미리
fork()를 해서 child processes를 만들어 놓
는다.
Client App
© 2009 IBM Corporation
Application Architecture
• IPC (Inter Process Communication)
– What is IPC?
• Process간에 data를 공유하고 동기화하기 위해 사용하는 방법
– IPC 종류
• Semaphore
– 세마포어는 프로세스간 데이타를 동기화하고 보호
• Shared Memory
– 다중프로세스들이 가상메모리를 공유, 메모리 공유를 위한 가장 빠른 수단
• Message Queues
– queue 는 자료구조의 한종류인데, 먼저 들어온 자료가 먼저 나가는 구조
– 메시지큐의 IPC로써의 특징은 다른 공유방식에 비해서 사용방법이 매우 직관적이고 간
단
– 제어하기가 상당히 까다롭다.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC 종류
• Pipe
– 프로세스의 데이타를 다른 프로세스에게 넘기기 위한 목적으로 사용. 데이타는 한쪽방향
으로만 흐를수 있으며(읽거나 쓸수만 있고, 동시에 읽고 쓰기를 할수는 없다.- Read
only or Write only), 동일한 부모를(PPID가 같은) 가지는 process 사이에서만 사용이가
능 하다
• FIFO (Named Pipe)
– 연속처리 I/O STREAM 선입선출로 Pipe와 비슷하나 이름을 부여해 서로다른 Process
사이의 사용이 가능한것이 Pipe와 다른점
– mknod를 이용하여 FIFO를 생성
• UDS (Unix Domain Socket)
– socket API를 수정없이 이용가능하며, port 기반의 Internet Domain Socket에 비해서
로컬 시스템의 파일시스템을 이용해서 내부프로세스간의 통신을 위해 사용한다.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC commands
기능
메세지큐
세마포어
공유메모리
1.IPC할당방법
msgget
semget
shmget
2.IPC제어방법
msgctl
semctl
shmctl
3.IPC작동방법
msgsnd
semop
shmat
(send/receive)
msgrcv
(상태변경,해제)
shmdt
– lpcs comnand
• ipcs -m ( shared memory )
• ipcs -q ( message gueues )
• ipcs -s ( semaphore )
– lpcrm comnand
• 세마포어, 메세지큐,공유메모리부분을 시스템에서 제거
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC Limits
Semaphores
4.3.0
4.3.1
4.3.2
5.1
5.2
5.3
Maximum number of semaphore IDs for 32-bit kernel
4096
4096
131072
131072
131072
131072
Maximum number of semaphore IDs for 64-bit kernel
4096
4096
131072
131072
131072
1048576
Maximum semaphores per semaphore ID
65535
65535
65535
65535
65535
65535
Maximum operations per semop call
1024
1024
1024
1024
1024
1024
Maximum undo entries per process
1024
1024
1024
1024
1024
1024
Size in bytes of undo structure
8208
8208
8208
8208
8208
8208
Semaphore maximum value
32767
32767
32767
32767
32767
32767
Adjust on exit maximum value
16384
16384
16384
16384
16384
16384
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC Limits
Message Queue
4.3.0
4.3.1
4.3.2
5.1
5.2
5.3
Maximum message size
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
Maximum bytes on queue
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
Maximum number of message queue IDs for 32-bit ker
nel
4096
4096
131072
131072
131072
131072
Maximum number of message queue IDs for 64-bit ker
nel
4096
4096
131072
131072
131072
1048576
Maximum messages per queue ID
524288
524288
524288
524288
524288
524288
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC Limits
Shared Memory
4.3.0
4.3.1
4.3.2
5.1
5.2
5.3
Maximum segment size (32-bit process)
256 MB
2 GB
2 GB
2 GB
2 GB
2 GB
Maximum segment size (64-bit process) for 32-bit
kernel
256 MB
2 GB
2 GB
64 GB
1 TB
1 TB
Maximum segment size (64-bit process) for 64-bit
kernel
256 MB
2 GB
2 GB
64 GB
1 TB
32 TB
Minimum segment size
1
1
1
1
1
1
Maximum number of shared memory IDs (32-bit kernel)
4096
4096
131072
131072
131072
131072
Maximum number of shared memory IDs (64-bit kernel)
4096
4096
131072
131072
131072
1048576
Maximum number of segments per process (32-bit
process)
11
11
11
11
11
11
Maximum number of segments per process (64-bit
process)
2684354
56
2684354
56
2684354
56
2684354
56
2684354
56
2684354
56
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– msgmax
Purpose:
Specifies maximum message size.
Values:
Dynamic with maximum value of 4 MB
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– msgmnb
Purpose:
Specifies maximum number of bytes on queue.
Values:
Dynamic with maximum value of 4 MB
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– msgmni
Purpose:
Specifies maximum number of message queue IDs.
Values:
Dynamic with maximum value of 131072
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– msgmnm
Purpose:
Specifies maximum number of messages per queue.
Values:
Dynamic with maximum value of 524288
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– semaem
Purpose:
Specifies maximum value for adjustment on exit.
Values:
Dynamic with maximum value of 16384
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– semmni
Purpose:
Specifies maximum number of semaphore IDs.
Values:
Dynamic with maximum value of 131072
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– semmsl
Purpose:
Specifies maximum number of semaphores per ID.
Values:
Dynamic with maximum value of 65535
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– semopm
Purpose:
Specifies maximum number of operations per semop() call.
Values:
Dynamic with maximum value of 1024
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– semume
Purpose:
Specifies maximum number of undo entries per process.
Values:
Dynamic with maximum value of 1024
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– semvmx
Purpose:
Specifies maximum value of a semaphore.
Values:
Dynamic with maximum value of 32767
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– shmmax
Purpose:
Specifies maximum shared memory segment size.
Values:
Dynamic with maximum value of 256 MB for 32-bit processes and 0x8
0000000u for 64-bit
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
– shmmin
Purpose:
Specifies minimum shared-memory-segment size.
Values:
Dynamic with minimum value of 1
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• IPC
– IPC tunable parameters
– shmmni
Purpose:
Specifies maximum number of shared memory IDs.
Values:
Dynamic with maximum value of 131072
Display:
N/A
Change:
N/A
Diagnosis:
N/A
Tuning:
Does not require tuning because it is dynamically adjusted as needed
by the kernel.
© 2009 IBM Corporation
Application Architecture
• Multi-Thread Model
– Thread : Process 내에서 존재하는 제어 흐름
– Socket Program
Server
Client
socket
bind
socket
listen
pthread_create()
연결요청
connect
accept
데이터 요청
write
read
데이터 수신
write
read
close
© 2009 IBM Corporation
Application Architecture
• Multi-Thread Model
Server App
2
Thread
Thread
Thread
Thread
Client App
1
Client App
Client App
2
Thread
Thread
Thread
Thread
① connecting Client to Server
② pthread_create()
요청이 있을 때마다 pthread_create()가
일어나지만, fork()보다는 훨씬 가볍다.
Client App
Client App
1
pthread_create()
Client App
Client App
Thread Pool
① pthread_create()
② connecting Client to Server
fork() 보다는 가볍지만 thread 생성시간
조차도 줄이기 위해 pool을 사용.
Client App
© 2009 IBM Corporation
Application Architecture
• N:N DB Connection (Multi-Process Model)
DB Connection은 n:n으로 이루어지지만
Oracle의 fork()로 인해 system resource를 낭비
Process
Child process
Child process
Child process
Child process
Child process
Child process
Oracle
Connection n:n
Child process
Child process
Child process
Child process
Child process
Child process
DB Query의 가장 큰 load
1. DB Connect (from network)
2. DB Query 해석
© 2009 IBM Corporation
Application Architecture
• 1:1 DB Connection (Multi-Process Model)
Process
Child process
Child process
Child process
Child process
Child process
Child process
Connection 1:1
DB Connection은 1:1로 oracle의 fork()는
1회로 제한되어 system resource 낭비가
적지만 client의 연결이 원활하지 않을 수
있음
Oracle
Child process
DB Query의 가장 큰 load
1. DB Connect (from network)
2. DB Query 해석
© 2009 IBM Corporation
Application Architecture
• DB Connection Pool (Multi-Process Model)
– Thread Pool or Process Pool
Process
Child process
Child process
Child process
Child process
Child process
Child process
Pool 내의 미리 맺어놓은 Connection으로 처리, Pool의 자원을
빌려주는 형태로, 부족할 때 Pool의 자원을 유동적으로
할당 가능
Connection
Pool
Thread
Thread
Thread
Oracle
Child process
Child process
Child process
Child process
Thread
Connection n:n
DB Query의 가장 큰 load
1. DB Connect (from network)
2. DB Query 해석
© 2009 IBM Corporation
Application Architecture
• DB Connection Pool (Multi-Thread Model)
Pre-Process Model (Process Pool)
Pre-Thread Model (Thread Pool)
• Multi Treading Model (①, ⑤)
• Thread Pool Model for DB Connection (①, ②, ③, ⑥)
Server App
5
Client App
Thread
Thread
Thread
Thread
Client App
4
Client App
Client App
6
Oracle
3
Child process
Child process
Child process
Child process
2
1
Thread
Thread
Thread
Thread
© 2009 IBM Corporation
Application Architecture
• I/O Multiplexing Model
– Socket이 각자의 socket I/O를 이용하여 통신하지 않고 하나의 socket I/O를
통해서 통신하는 방법으로 Socket을 file descriptor table에 등록한 후 file
descriptor table의 I/O를 감시해서 다중 접속을 처리
– select / poll
File descriptor 지정
연결요청
Server
Client
File descriptor 감시
data 송수신
Server
Client
File descriptor 해제
연결종료
Server
Client
© 2009 IBM Corporation
Application Architecture
• I/O Multiplexing Model
– 단점
• I/O Multiplexing을 위해 selec / poll을 이용하는데 넓은 범위의 file descriptor
array 중에 어떤 file descriptor에서 event가 발생하였는지 일일이 loop를 돌며 확
인해야 함
File descriptor table
지정한 File descriptor
모든 File descriptor를 검사해야 함
I/O Multiplexing Model의 단점
© 2009 IBM Corporation
Application Architecture
• Event based I/O Model through Real-time Signal
– Event 기반의 socket 처리 방식
• UNIX/Linux : POSIX Real-time Signal, epoll
• Windows, AIX, iSeries OS : IOCP
• FreedBSD : kqueue (kernel queue)
© 2009 IBM Corporation
Application Architecture
• Event based I/O Model through Real-time Signal
– Real-time Signal
• 대기열이 존재하지 않는 Signal의 단점과 이로인해 아무런 정보다 전달되지 않는
단점을 보완
• Real-time Signal은 대기열이 존재하며, 대기열의 크기만큼 event를 저장할 수 있
어 signal의 손실을 피할 수 있다.
• 또한, real-time signal을 발생시킨 socket의 descriptor 등의 정보 전달이 가능하
여, 부가적인 정보를 저장할 수 있다.
• select / poll 과 같이 file descriptor table의 descriptor array를 뒤지지 않아도 된
다.
Client1
Socket1
Client2
Socket2
Client3
Socket3
SIGRTMIN+1
SIGRTMIN+2
SIGRTMIN+3
Thread1
Thread2
Thread-pool을 이용하여
Real-time signal을
thread와 함께 사용
Thread3
© 2009 IBM Corporation
Applicatoin Architecture
• epoll
–
–
–
–
epoll : event poll
Real-time Signal 보다 약 10% ~ 20%의 성능 향상
HP-UX, Redhat 지원, AIX 미지원
Event poll에 넣고 관리하기 때문에 read/write event가 발생하면 관련 정보
를 return해줌. Return 되는 정보는 descriptor와 같은 정보로 poll과 같은
loop를 통해 확인할 필요가 없다.
Socket1
Event poll
Socket2
File descriptor
Socket3
© 2009 IBM Corporation
Application Architecture
• epoll
– httpd test result
dphttpd symmetric multiprocessor result
dphttpd uniprocessor result
© 2009 IBM Corporation
Application Architecture
• epoll
– Pipetest
Pipetest symmetric multiprocessor result
Pipetest uniprocessor result
© 2009 IBM Corporation
Application Architecture
• epoll
– Dead connecton test
128bytes context ,Dead connections test result
1024ytes context ,Dead connections test result
© 2009 IBM Corporation
Application Architecture
• IOCP (I/O Completion Ports)
– IOCP on iSeries
• AS/400부터 지원, i는 1988년 AS/400으로 시작, AS/400, OS/400, i5/OS, i6/OS
로 발전
• AS/400 QMU 5.0.1.02 introduces asynchronous I/O completion ports (IOCP)
– IOCP on Windows NT
• Windows NT Winsock2부터 지원
– IOCP on AIX
• I/O completion port support was first introduced in AIX 4.3 by APAR IY06351.
An I/O completion port was originally a Windows NT scheduling construct
that has since been implemented in other OS's. Domino uses these
constructs to improve the scalability of the server. It allows one thread to
handle multiple session requests, so that a Notes client session is no longer
bound to a single thread for its duration. The completion port is tied directly
to a device handle and any network I/O requests that are made to that
handle.
© 2009 IBM Corporation
Application Architecture
• Parallel Programming
– Fundamental of Parallel Programming
•
•
•
•
•
Multi-Process/Multi-Thread
Asynchronous Procedure Calls
Signal, Event
Queuing Asynchronous Procedure Calls
IOCP
Ex) File Finder Agent
© 2009 IBM Corporation
Application Architecture
• Parallel Programming – OpenMP(Open Multi-Processing)
– An Application Program Interface (API) that may be used to explicitly
direct multi-threaded, shared memory parallelism
– Comprised of three primary API components
• Compiler Directives
• Runtime Library Routines
• Environment Variables
– Portable
– Standardized
© 2009 IBM Corporation
Application Architecture
• Parallel Programming - MPI
– MPI (Message Passing Interface)
• Message Passing Parallel Programming을 위한 Standard Data Communication
Library
• References
– http://www.mcs.anl.gov/mpi/index.html
– http://www.mpi-forum.org/docs/docs.html
– MPI 목표
• 이식성 (portability)
• 효율성 (efficiency)
• 기능성 (functionality)
© 2009 IBM Corporation
Application Architecture
• Parallel Programming
– MPI 기본 개념
• Process 기준으로 작업 할당
• Processor : Process = 1:1 or 1:N
• Message = data + envelope
–
–
–
–
–
–
–
어떤 process가 보내는가?
어디에 있는 data를 보내는가?
어떤 data를 보내는가?
얼마나 보내는가?
어떤 process가 받는가?
어디에 저장할 것인가?
얼마나 받을 준비를 해야 하는가?
• Tag
– Message matching과 구분에 이용
– 순서대로 메시지 도착을 처리할 수 있음
– 와일드 카드 사용 가능
• Communicator
– 서로간에 통신이 허용되는 프로세스들의 집합
© 2009 IBM Corporation
Application Architecture
• Parallel Programming
– MPI 기본 개념
• Process Rank
– 동일한 communicator 내의 process들을 식별하기 위한 식별자
• Point to Point Communication
– 두 개 process 사이의 통신
– 하나의 송신 process에 하나의 수신 process가 대응
• Collective communication
– 동시에 여러 개의 process가 참여
– 1:N, N:1, N:N 대응 가능
– 여러 번의 P2P Communication 사용을 하나의 Collective Communication으로 대체
»
오류 가능성 적음, 최적화로 빠름
© 2009 IBM Corporation
Application Architecture
• Java
– Development and execution of Java applications
© 2009 IBM Corporation
Application Architecture
• Java
– Java application을 이용하여 System을 효율적으로 사용하는 방법
반드시 개선됨
•
•
•
•
•
NIO (New I/O)
NIO pollset
Garbage collector는 자동으로 collect하도록 나둘 것
특별한 이유가 없으면 JRE는 최신으로 update할 것
개발시 source code는 최신으로 유지할 것 (Deprecated로 명시된 API는 되도록
다른 API로 변경하여 사용)
• Framework을 사용한다면 framework을 최신으로 유지할 것
JRE나 Framework로 인해
개선이 안될 수도 있음
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• Java Source code
–
–
–
–
–
–
DatagramChannel channel = DatagramChannel.open();
Channel = configureBlocking(false);
Selector selector = Selector.open();
Channel.register(selector, SelectKey.OP_READ);
Channel.register(selector, SelectKey.OP_READ);
int poll(struct pollfd fds[], nfds_t nfds, int timeout);
• Native pollset interface C source code
–
–
–
–
pollset_t ps = pollset_create(int maxfd);
int rc = pollset_destory(pollset_t ps);
int rc = pollset_ctl(pollset_t ps, struct poll_ctl *pollctl_array, int array_length);
int nfound = pollset_poll(pollset_t ps, struct pollfd *polldata_array, int array_length,
int timeout);
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• Traditional poll method
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• pollset method
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• pollcache internal
– pollcache control block
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• pollset() – bulky update
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• The throughput performance two drivers(with poll() and with pollset())
– pollset driver가이 poll driver보다 13.3% 성능 향상
© 2009 IBM Corporation
Application Architecture
• Java
– pollset
• Time spent on CPU
© 2009 IBM Corporation
AIX I/O Model
• select / poll
• pollset
• event
• Real-time Signal
• AIO
• IOCP
© 2009 IBM Corporation
AIX IOCP
• IOCP
– I/O completion port support was first introduced in AIX 4.3 by APAR
IY06351. An I/O completion port was originally a Windows NT scheduling
construct that has since been implemented in other OS's.
© 2009 IBM Corporation
AIX IOCP
• IOCP
– Synchronous I/O versus asynchronous I/O
© 2009 IBM Corporation
AIX IOCP
• IOCP
– IOCP Operation
© 2009 IBM Corporation
AIX IOCP
• IOCP
– CreateIoCompletionPort Function
< IOCP on AIX >
#include <iocp.h>
int CreateIoCompletionPort (FileDescriptor, CompletionPort, CompletionKey, ConcurrentThreads)
HANDLE FileDescriptor, CompletionPort;
DWORD CompletionKey, ConcurrentThreads;
< IOCP on Windows >
HANDLE CreateIoCompletionPort (
HANDLE FileHandle,
// handle to file (socket)
HANDLE ExistingCompletionPort, // handle to I/O completion port
ULONG_PTR CompletionKey,
// completion key
DWORD NumberOfConcurrentThreads // number of threads to execute concurrently
);
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
• fileset : bos.iocp.rte
$ lslpp -l bos.iocp.rteThe output from the lslpp command should be similar to the following :
Fileset
Level State Description
---------------------------------------------------------------------------Path: /usr/lib/objrepos
bos.iocp.rte
5.3.9.0 APPLIED I/O Completion Ports API
Path: /etc/objrepos
bos.iocp.rte
5.3.0.50 COMMITTED I/O Completion Ports API
office2@root/>lsdev -Cciocp
iocp0 Available I/O Completion Ports
office2@root/>lsattr -Eliocp0
autoconfig available STATE to be configured at system restart True
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
© 2009 IBM Corporation
AIX IOCP
• IOCP
– How to configure IOCP on AIX
© 2009 IBM Corporation
AIX IOCP
• IOCP API
–
–
–
–
–
–
CreateCompletionPort
GetMultipleCompletionStatus
GetQueuedCompletionStatus
PostQueuedCompletionStatus
ReadFile
WriteFile
© 2009 IBM Corporation
iSeries IOCP
• IOCP API
–
–
–
–
–
–
–
–
QsoStartAccept
QsoCreateIOCompletionPort
QsoDestroyIOCompletionPort
QsoPostIOCompletion
QsoStartRecv
QsoStartSend
QsoCancelOperation
QsoWaitForIOCompletion
© 2009 IBM Corporation
Windows IOCP
• IOCP API
–
–
–
–
–
–
–
CreateIoCompletionPort
GetQueuedCompletionStatus
GetQueuedCompletionStatusEx
PostQueuedCompletionStatus
ReadFileEx
WriteFileEx
Kernel Functions
•
•
•
•
•
•
•
•
NtCreateIoCompletion, NtRemoveIoCompletion
KeInitializeQueue, KeRemoveQueue
KeInsertQueue
KeWaitForSingleObject
KeDelayExecutionThread
KiActivateWaiterQueue
KiUnwaitThread
NtSetIoCompletion
© 2009 IBM Corporation
Q&A
© 2009 IBM Corporation