Transcript Process
High Performance Power System 효율적으로 사용하기 High Performance Power System Date. 15/10/2009 DongJoon Cho ([email protected]) MTS, GTS, IBM Korea © 2009 IBM Corporation Agenda • Concerns about Power System • Summary of the solutions • Architectures for effective computing – H/W Architecture – System Architecture – S/W Architecture © 2009 IBM Corporation Concerns about Power System • 왜 고성능 Server를 구매해놓고 100% 활용을 하지 못할까? • CPU Clock은 높아졌는데 왜 Application 성능은 나오지 않는 걸까? • Clock은 2배로 빨라졌는데 왜 성능은 2배가 되지 않는 걸까? • Memory를 2배로 추가했는데 왜 사용률이 ½로 떨어지지 않는 걸까? • IBM Power System은 왜 다른 System에 비해 tpmC가 높게 나올까? • IBM Power System은 response time은 좋은데 왜 사용량이 높을까? S/W의 변화 없이 System만 바꾼다고 성능이 향상될까? System에 대해 CPU Clock 이외에 무엇을 더 알고 있을까? © 2009 IBM Corporation Summary of the solutions • 간접적 방법 – Firmware update – AIX update – Software update • 직접적 방법 – – – – AIX configuration Plan/Selection Hardware System Architecture Software Architecture 대부분의 software 문제는 개발시간 및 비용문제 로 인해 직접적인 방법으로 해결하기 어려움 © 2009 IBM Corporation Hardware Architecture - CPU • CISC – Complex Instruction Set Computer Architecture – 필요한 모든 명령어 셋을 갖추도록 설계 – VAX, x86 • EPIC – Explicitly Parallel Instruction Computing Architecture – HP/Intel 공동 설계, 명시적 병렬 처리를 제공 – IA64 • RISC – Reduced Instruction Set Computer Architecture – 명령어 셋 자체를 가장 자주 사용되는 명령어만으로 개수를 줄임으로써 대부 분의 활용 업무 면에서 소요시간을 단축할 수 있도록 설계 – SPARC, POWER, PA-RISC © 2009 IBM Corporation Hardware Architecture - CPU Instructions • Computation Instructions Arithmetic operations Logical operations ADD Add AND True if A and B true SUB Subtract OR True if A or B true MUL Multiply NOT True if A is false DIV Divide XOR True if only one of INC Increment DEC Decrement SHL Shift bits left CMP Compare SHR Shift bits right BSWAP Reverse byte order A and B is true • Operands Types Stack Accumulator Register Memory Push A Ld A Ld R1, A Add C, B, A Push B Add B Ld R2, B Add St C Add R3, R2, R1 Pop C St C, R3 © 2009 IBM Corporation Hardware Architecture - CPU Instructions • Data Transfer Instructions LD Load value from memory to a register ST Store value from a register to memory MOV Move value from register to register CMOV Conditionally move value from register to register if a condition is met PUSH Push value onto top of stack POP Pop value from top of stack © 2009 IBM Corporation Hardware Architecture - CPU Instructions • Control Flow Instructions JMP Unconditional jump to another instruction BR Branch to instruction if condition is met CALL Call a procedure RET Return from procedure INT Software interrupt • Control Flow Relative Frequency Instruction Integer programs Floating-point programs Branch 75% 82% Jump 6% 10% Call & return 19% 8% © 2009 IBM Corporation Hardware Architecture - CPU Instructions • Common Instructions Instruction Instruction type Percent of instructions executed Instruction type Overall percentage Load Data transfer 22% Data transfer 38% Branch Control flow 20% Computation 35% Compare Computation 16% Control flow 22% Store Data transfer 12% Add Computation 8% And Computation 6% Sub Computation 5% Move Data transfer 4% Call Control flow 1% Return Control flow 1% Total 95% © 2009 IBM Corporation Hardware Architecture - CPU and I/O • CPU Speed versus I/O Speeds CPU보다 느린 I/O I/O에 의한 wait를 줄이 는 여러 기술 필요 • Several options to overcome I/O limitations – Incorporate more I/O buses (parallelism) – Extend current I/O technology (increase bandwidth, enhance operating modes) – Develop new I/O technology © 2009 IBM Corporation Hardware Architecture - CPU and I/O • CPU Efficiency and CPU Access Costs I/O에 의한 성능 저하 © 2009 IBM Corporation Hardware Architecture - I/O • The elements of an I/O system © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand • Comparing InfiniBand to Existing Technology – Differences and Benefits Change Benefit From: To: Memory mapped Channel based CPU efficiency, scalability, isolation , recovery. Parallel bus Switched fabric Scalability, isolation, redundancy, r educed pin-out, modularity, higher cross-sectional bandwidth. Shared bus access Point to point Greater distance, higher speeds. Load/store DMA scheduling Improved CPU efficiency. Single open address space Independent address domains Protection, isolation, recovery, relia bility. © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand Shared Bus Topology Switched Fabric Topology Shared Bus Architecture traditional InfiniBand InfiniBand Architecture InfiniBand Switched Architecture © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand Accessing InfiniBand Services - The Channel Interface : Work / Completion Queue Architecture © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand Queue를 통해 wait 최소화, 비동기 처리 InfiniBand Queue Operations – Operations on the send queue fall into three subclass © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand • VIA (Virtual Interface Architecture) – Messages Model – Direct, protected access by user level software to the communications hardware; the protection is effected by means of the virtual memory system. Send and receive packet descriptors that specify scatter-gather operations—specifying where data must be distributed to and collected up from—when sending and receiving A send message queue and a receive message queue, comprising linked lists of packet descriptors A means of notifying the network interface that packets have been placed on a queue An asynchronous notification process for the status of the operations requested (completion of a send or receive operation is signaled by writing state information into a packet descriptor) Registration of memory areas used for communications: before communications are started, the memory areas for each hardware unit are identified and noted, allowing expensive operations, such as locking the pages, to be used and translating from virtual to real addresses to be done once, outside performance-critical data transfers Comparison of VIA and traditional communications © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand Logical processing steps in TCP/IP Checksum계산, memory 관리에 의해서도 overhead 발생 White indicates per-message processing: it is the processing load imposed by the system call on the sockets interface, and is independent of the size of the message Light gray indicates per-fragment processing (a long message is broken up into several fragments): this covers TCP, IP, media access and interrupt handling Dark grey indicates per-byte processing (actually, per fragment plus per byte in fragment): this covers the data-copying overhead along with computation of the checksum © 2009 IBM Corporation Hardware Architecture - I/O : InfiniBand • Mechanisms to reduce the number of interrupts Operation Simple DMA Improved DMA Send •set up the DMA registers (with buffer address and s ize) •lock the page containing the buffers and purge cor responding addresses in the data cache •activate the send command •wait until the end of the operation •interrupt upon completion of the operation, and fre e (unlock) the page •refill the free buffers with data to be sent •lock the buffer page(s) and purge corresponding a ddresses in the data cache •refill a descriptor with the addresses and sizes of t he buffers just set up •change the descriptor status indicator to "DMA" •if the DMA was inactive, wake it up Receive •DMA interrupts processor •allocate a page and purge the cache of its address es •set up the DMA registers (with buffer address and s ize) •when the operation completes the DMA will raise a n interrupt •refill descriptor(s) for receiving •purge corresponding addresses in the data cache •when a receive operation completes, DMA sets the descriptor indicator to System; the OS can test the status of different descriptors •if there are no free buffers, the DMA raises an interr upt 개선된 DMA 방식으로 interrupt 횟수를 줄여 overhead를 줄임 © 2009 IBM Corporation System Architecture (Hardware) • LPAR / DLPAR © 2009 IBM Corporation System Architecture (Hardware) • LPAR / DLPAR – Hypervisor © 2009 IBM Corporation System Architecture (Hardware) • Micro Partitioning – 프로세서당 최대 10개의 파티션 작성 – 여러 파티션 간 자원 공유 © 2009 IBM Corporation System Architecture (Hardware) • Micro Partitioning © 2009 IBM Corporation System Architecture (Hardware) • VIO – – – – – – Part of the Advanced POWER Virtualization feature Allows for sharing of physical devices, including storage and network Implemented as a customized AIX-based appliance Requires careful planning to maintain VIO Server with minimal impact to VIO Clients Provides command line tools for maintenance or can be maintained with NIM © 2009 IBM Corporation System Architecture (System Software) • SMT (Simultaneous Multi-Threading) – POWER5에서 향상된 하드웨어 디자인으로 프로세서가 동시에 두 개의 개별 instruction을 실행할 수 있는 기능 – 하드웨어와 소프트웨어 thread의 우선 순위 선정을 통해서 어플리케이션의 성능에 지장을 주지 않고 더 많은 하드웨어 자원의 사용률을 증대 • WLM (Workload Manager) – 시스템을 분할하지 않고서 운영중인 업무간에 동적으로 시스템자원을 할당 – CPU 프로세서 단위가 아닌 CPU 시간을 분할하여 관리하므로 보다 세밀하게 CPU 자원을 제어 – CPU 시간, 메모리, 입출력량 등의 개별적 제어를 통해 특성이 다른 여러 종 류의 어플리케이션들을 하나의 서버상에서 관리 © 2009 IBM Corporation System Architecture (System Software) • WPARs (Workload Partitions) – A workload partition (WPAR), new with the IBM® AIX® 6.1 operating system, expands on the traditional IBM AIX logical partitioning (LPAR) technology by further allowing AIX to be virtualized within a single operating-system image. – A simple definition of a WPAR is that it is a virtualized AIX instance that runs within a single AIX operating-system image. © 2009 IBM Corporation Software Architecture - OS • OS와 Network Program과의 관계 – Network Program의 구성요소 • • • • Socket API I/O Multi Connection 처리를 위한 Process or Thread Process or Thread를 동기화하기 위한 IPC(Inter Process Communication) Socket API Process Thread I/O IPC File System, Memory OS H/W (Disk, NIC …) OS와 Network Program과의 관계 © 2009 IBM Corporation Software Architecture – File on the Unix • What is the File on the Unix? • Process가 열고 있는 file 확인 – Process가 생성되면 기본적으로 Open하는 File • 0 : 표준입력 • 1 : 표준출력 • 2 : 표준오류 office2@root/proc/9804/fd>ls -al total 120 dr-x------ 1 root system dr-xr-xr-x 1 root system lr-xr-xr-x 24 root system lr-xr-xr-x 24 root system lr-xr-xr-x 24 root system --w--w---- 1 root system --w--w---- 1 root system --w--w---- 1 root system 0 Sep 27 03:22 . 0 Sep 27 03:22 .. 1024 Sep 22 18:48 0 -> / 1024 Sep 22 18:48 1 -> / 1024 Sep 22 18:48 2 -> / 12506 Sep 15 18:13 7 12506 Sep 15 18:13 8 12506 Sep 15 18:13 9 © 2009 IBM Corporation Software Architecture • Application Programs and OS – Type of Software (Conceptual Model) © 2009 IBM Corporation Software Architecture • Application Programs and OS – Application Programs © 2009 IBM Corporation Software Architecture • Application Programs and OS – Operating Systems © 2009 IBM Corporation Software Architecture • Application Programs and OS – Device Drivers © 2009 IBM Corporation Software Architecture • Application Programs and OS – AIX 5L Structure © 2009 IBM Corporation Application Architecture Multi-processing으로 인한 IPC는 kernel overhead를 증가시킴 • Multi-Process Model – Process : Program이 실행될 때 생성되는 Program을 대표하는 제어흐름과 System자원(memory,file,IPC…)등을 의미 – Process 생성 및 제어 • fork() – Process 복사본 생성 – 자신과 코드를 공유하는 Child Process 생성 • exec() – 현재 Process에 Program의 실행 이미지를 변경 – 새로운 Program을 Load해서 실행 Init Process fork() Process’ fork() Process’’ exec() Process A exec() Process B © 2009 IBM Corporation Application Architecture • Multi-Processing Model – Socket Program Server Client socket bind socket listen 연결요청 connect accept fork() 데이터 요청 write read 데이터 수신 write read close © 2009 IBM Corporation Application Architecture • Multi-Processing Model Server App 2 Server App Server App Server App Server App Client App 1 Client App Client App Client App 2 Server App Server App Server App Server App ① connecting Client to Server ② fork() 요청이 있을 때마다 fork()가 일어난다. Client App 1 fork() Client App Client App process Pool ① fork() ② connecting Client to Server fork() 시간이 오래 걸리므로 pool에 미리 fork()를 해서 child processes를 만들어 놓 는다. Client App © 2009 IBM Corporation Application Architecture • IPC (Inter Process Communication) – What is IPC? • Process간에 data를 공유하고 동기화하기 위해 사용하는 방법 – IPC 종류 • Semaphore – 세마포어는 프로세스간 데이타를 동기화하고 보호 • Shared Memory – 다중프로세스들이 가상메모리를 공유, 메모리 공유를 위한 가장 빠른 수단 • Message Queues – queue 는 자료구조의 한종류인데, 먼저 들어온 자료가 먼저 나가는 구조 – 메시지큐의 IPC로써의 특징은 다른 공유방식에 비해서 사용방법이 매우 직관적이고 간 단 – 제어하기가 상당히 까다롭다. © 2009 IBM Corporation Application Architecture • IPC – IPC 종류 • Pipe – 프로세스의 데이타를 다른 프로세스에게 넘기기 위한 목적으로 사용. 데이타는 한쪽방향 으로만 흐를수 있으며(읽거나 쓸수만 있고, 동시에 읽고 쓰기를 할수는 없다.- Read only or Write only), 동일한 부모를(PPID가 같은) 가지는 process 사이에서만 사용이가 능 하다 • FIFO (Named Pipe) – 연속처리 I/O STREAM 선입선출로 Pipe와 비슷하나 이름을 부여해 서로다른 Process 사이의 사용이 가능한것이 Pipe와 다른점 – mknod를 이용하여 FIFO를 생성 • UDS (Unix Domain Socket) – socket API를 수정없이 이용가능하며, port 기반의 Internet Domain Socket에 비해서 로컬 시스템의 파일시스템을 이용해서 내부프로세스간의 통신을 위해 사용한다. © 2009 IBM Corporation Application Architecture • IPC – IPC commands 기능 메세지큐 세마포어 공유메모리 1.IPC할당방법 msgget semget shmget 2.IPC제어방법 msgctl semctl shmctl 3.IPC작동방법 msgsnd semop shmat (send/receive) msgrcv (상태변경,해제) shmdt – lpcs comnand • ipcs -m ( shared memory ) • ipcs -q ( message gueues ) • ipcs -s ( semaphore ) – lpcrm comnand • 세마포어, 메세지큐,공유메모리부분을 시스템에서 제거 © 2009 IBM Corporation Application Architecture • IPC – IPC Limits Semaphores 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3 Maximum number of semaphore IDs for 32-bit kernel 4096 4096 131072 131072 131072 131072 Maximum number of semaphore IDs for 64-bit kernel 4096 4096 131072 131072 131072 1048576 Maximum semaphores per semaphore ID 65535 65535 65535 65535 65535 65535 Maximum operations per semop call 1024 1024 1024 1024 1024 1024 Maximum undo entries per process 1024 1024 1024 1024 1024 1024 Size in bytes of undo structure 8208 8208 8208 8208 8208 8208 Semaphore maximum value 32767 32767 32767 32767 32767 32767 Adjust on exit maximum value 16384 16384 16384 16384 16384 16384 © 2009 IBM Corporation Application Architecture • IPC – IPC Limits Message Queue 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3 Maximum message size 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB Maximum bytes on queue 4 MB 4 MB 4 MB 4 MB 4 MB 4 MB Maximum number of message queue IDs for 32-bit ker nel 4096 4096 131072 131072 131072 131072 Maximum number of message queue IDs for 64-bit ker nel 4096 4096 131072 131072 131072 1048576 Maximum messages per queue ID 524288 524288 524288 524288 524288 524288 © 2009 IBM Corporation Application Architecture • IPC – IPC Limits Shared Memory 4.3.0 4.3.1 4.3.2 5.1 5.2 5.3 Maximum segment size (32-bit process) 256 MB 2 GB 2 GB 2 GB 2 GB 2 GB Maximum segment size (64-bit process) for 32-bit kernel 256 MB 2 GB 2 GB 64 GB 1 TB 1 TB Maximum segment size (64-bit process) for 64-bit kernel 256 MB 2 GB 2 GB 64 GB 1 TB 32 TB Minimum segment size 1 1 1 1 1 1 Maximum number of shared memory IDs (32-bit kernel) 4096 4096 131072 131072 131072 131072 Maximum number of shared memory IDs (64-bit kernel) 4096 4096 131072 131072 131072 1048576 Maximum number of segments per process (32-bit process) 11 11 11 11 11 11 Maximum number of segments per process (64-bit process) 2684354 56 2684354 56 2684354 56 2684354 56 2684354 56 2684354 56 © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – msgmax Purpose: Specifies maximum message size. Values: Dynamic with maximum value of 4 MB Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – msgmnb Purpose: Specifies maximum number of bytes on queue. Values: Dynamic with maximum value of 4 MB Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – msgmni Purpose: Specifies maximum number of message queue IDs. Values: Dynamic with maximum value of 131072 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – msgmnm Purpose: Specifies maximum number of messages per queue. Values: Dynamic with maximum value of 524288 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – semaem Purpose: Specifies maximum value for adjustment on exit. Values: Dynamic with maximum value of 16384 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – semmni Purpose: Specifies maximum number of semaphore IDs. Values: Dynamic with maximum value of 131072 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – semmsl Purpose: Specifies maximum number of semaphores per ID. Values: Dynamic with maximum value of 65535 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – semopm Purpose: Specifies maximum number of operations per semop() call. Values: Dynamic with maximum value of 1024 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – semume Purpose: Specifies maximum number of undo entries per process. Values: Dynamic with maximum value of 1024 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – semvmx Purpose: Specifies maximum value of a semaphore. Values: Dynamic with maximum value of 32767 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – shmmax Purpose: Specifies maximum shared memory segment size. Values: Dynamic with maximum value of 256 MB for 32-bit processes and 0x8 0000000u for 64-bit Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. – shmmin Purpose: Specifies minimum shared-memory-segment size. Values: Dynamic with minimum value of 1 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • IPC – IPC tunable parameters – shmmni Purpose: Specifies maximum number of shared memory IDs. Values: Dynamic with maximum value of 131072 Display: N/A Change: N/A Diagnosis: N/A Tuning: Does not require tuning because it is dynamically adjusted as needed by the kernel. © 2009 IBM Corporation Application Architecture • Multi-Thread Model – Thread : Process 내에서 존재하는 제어 흐름 – Socket Program Server Client socket bind socket listen pthread_create() 연결요청 connect accept 데이터 요청 write read 데이터 수신 write read close © 2009 IBM Corporation Application Architecture • Multi-Thread Model Server App 2 Thread Thread Thread Thread Client App 1 Client App Client App 2 Thread Thread Thread Thread ① connecting Client to Server ② pthread_create() 요청이 있을 때마다 pthread_create()가 일어나지만, fork()보다는 훨씬 가볍다. Client App Client App 1 pthread_create() Client App Client App Thread Pool ① pthread_create() ② connecting Client to Server fork() 보다는 가볍지만 thread 생성시간 조차도 줄이기 위해 pool을 사용. Client App © 2009 IBM Corporation Application Architecture • N:N DB Connection (Multi-Process Model) DB Connection은 n:n으로 이루어지지만 Oracle의 fork()로 인해 system resource를 낭비 Process Child process Child process Child process Child process Child process Child process Oracle Connection n:n Child process Child process Child process Child process Child process Child process DB Query의 가장 큰 load 1. DB Connect (from network) 2. DB Query 해석 © 2009 IBM Corporation Application Architecture • 1:1 DB Connection (Multi-Process Model) Process Child process Child process Child process Child process Child process Child process Connection 1:1 DB Connection은 1:1로 oracle의 fork()는 1회로 제한되어 system resource 낭비가 적지만 client의 연결이 원활하지 않을 수 있음 Oracle Child process DB Query의 가장 큰 load 1. DB Connect (from network) 2. DB Query 해석 © 2009 IBM Corporation Application Architecture • DB Connection Pool (Multi-Process Model) – Thread Pool or Process Pool Process Child process Child process Child process Child process Child process Child process Pool 내의 미리 맺어놓은 Connection으로 처리, Pool의 자원을 빌려주는 형태로, 부족할 때 Pool의 자원을 유동적으로 할당 가능 Connection Pool Thread Thread Thread Oracle Child process Child process Child process Child process Thread Connection n:n DB Query의 가장 큰 load 1. DB Connect (from network) 2. DB Query 해석 © 2009 IBM Corporation Application Architecture • DB Connection Pool (Multi-Thread Model) Pre-Process Model (Process Pool) Pre-Thread Model (Thread Pool) • Multi Treading Model (①, ⑤) • Thread Pool Model for DB Connection (①, ②, ③, ⑥) Server App 5 Client App Thread Thread Thread Thread Client App 4 Client App Client App 6 Oracle 3 Child process Child process Child process Child process 2 1 Thread Thread Thread Thread © 2009 IBM Corporation Application Architecture • I/O Multiplexing Model – Socket이 각자의 socket I/O를 이용하여 통신하지 않고 하나의 socket I/O를 통해서 통신하는 방법으로 Socket을 file descriptor table에 등록한 후 file descriptor table의 I/O를 감시해서 다중 접속을 처리 – select / poll File descriptor 지정 연결요청 Server Client File descriptor 감시 data 송수신 Server Client File descriptor 해제 연결종료 Server Client © 2009 IBM Corporation Application Architecture • I/O Multiplexing Model – 단점 • I/O Multiplexing을 위해 selec / poll을 이용하는데 넓은 범위의 file descriptor array 중에 어떤 file descriptor에서 event가 발생하였는지 일일이 loop를 돌며 확 인해야 함 File descriptor table 지정한 File descriptor 모든 File descriptor를 검사해야 함 I/O Multiplexing Model의 단점 © 2009 IBM Corporation Application Architecture • Event based I/O Model through Real-time Signal – Event 기반의 socket 처리 방식 • UNIX/Linux : POSIX Real-time Signal, epoll • Windows, AIX, iSeries OS : IOCP • FreedBSD : kqueue (kernel queue) © 2009 IBM Corporation Application Architecture • Event based I/O Model through Real-time Signal – Real-time Signal • 대기열이 존재하지 않는 Signal의 단점과 이로인해 아무런 정보다 전달되지 않는 단점을 보완 • Real-time Signal은 대기열이 존재하며, 대기열의 크기만큼 event를 저장할 수 있 어 signal의 손실을 피할 수 있다. • 또한, real-time signal을 발생시킨 socket의 descriptor 등의 정보 전달이 가능하 여, 부가적인 정보를 저장할 수 있다. • select / poll 과 같이 file descriptor table의 descriptor array를 뒤지지 않아도 된 다. Client1 Socket1 Client2 Socket2 Client3 Socket3 SIGRTMIN+1 SIGRTMIN+2 SIGRTMIN+3 Thread1 Thread2 Thread-pool을 이용하여 Real-time signal을 thread와 함께 사용 Thread3 © 2009 IBM Corporation Applicatoin Architecture • epoll – – – – epoll : event poll Real-time Signal 보다 약 10% ~ 20%의 성능 향상 HP-UX, Redhat 지원, AIX 미지원 Event poll에 넣고 관리하기 때문에 read/write event가 발생하면 관련 정보 를 return해줌. Return 되는 정보는 descriptor와 같은 정보로 poll과 같은 loop를 통해 확인할 필요가 없다. Socket1 Event poll Socket2 File descriptor Socket3 © 2009 IBM Corporation Application Architecture • epoll – httpd test result dphttpd symmetric multiprocessor result dphttpd uniprocessor result © 2009 IBM Corporation Application Architecture • epoll – Pipetest Pipetest symmetric multiprocessor result Pipetest uniprocessor result © 2009 IBM Corporation Application Architecture • epoll – Dead connecton test 128bytes context ,Dead connections test result 1024ytes context ,Dead connections test result © 2009 IBM Corporation Application Architecture • IOCP (I/O Completion Ports) – IOCP on iSeries • AS/400부터 지원, i는 1988년 AS/400으로 시작, AS/400, OS/400, i5/OS, i6/OS 로 발전 • AS/400 QMU 5.0.1.02 introduces asynchronous I/O completion ports (IOCP) – IOCP on Windows NT • Windows NT Winsock2부터 지원 – IOCP on AIX • I/O completion port support was first introduced in AIX 4.3 by APAR IY06351. An I/O completion port was originally a Windows NT scheduling construct that has since been implemented in other OS's. Domino uses these constructs to improve the scalability of the server. It allows one thread to handle multiple session requests, so that a Notes client session is no longer bound to a single thread for its duration. The completion port is tied directly to a device handle and any network I/O requests that are made to that handle. © 2009 IBM Corporation Application Architecture • Parallel Programming – Fundamental of Parallel Programming • • • • • Multi-Process/Multi-Thread Asynchronous Procedure Calls Signal, Event Queuing Asynchronous Procedure Calls IOCP Ex) File Finder Agent © 2009 IBM Corporation Application Architecture • Parallel Programming – OpenMP(Open Multi-Processing) – An Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism – Comprised of three primary API components • Compiler Directives • Runtime Library Routines • Environment Variables – Portable – Standardized © 2009 IBM Corporation Application Architecture • Parallel Programming - MPI – MPI (Message Passing Interface) • Message Passing Parallel Programming을 위한 Standard Data Communication Library • References – http://www.mcs.anl.gov/mpi/index.html – http://www.mpi-forum.org/docs/docs.html – MPI 목표 • 이식성 (portability) • 효율성 (efficiency) • 기능성 (functionality) © 2009 IBM Corporation Application Architecture • Parallel Programming – MPI 기본 개념 • Process 기준으로 작업 할당 • Processor : Process = 1:1 or 1:N • Message = data + envelope – – – – – – – 어떤 process가 보내는가? 어디에 있는 data를 보내는가? 어떤 data를 보내는가? 얼마나 보내는가? 어떤 process가 받는가? 어디에 저장할 것인가? 얼마나 받을 준비를 해야 하는가? • Tag – Message matching과 구분에 이용 – 순서대로 메시지 도착을 처리할 수 있음 – 와일드 카드 사용 가능 • Communicator – 서로간에 통신이 허용되는 프로세스들의 집합 © 2009 IBM Corporation Application Architecture • Parallel Programming – MPI 기본 개념 • Process Rank – 동일한 communicator 내의 process들을 식별하기 위한 식별자 • Point to Point Communication – 두 개 process 사이의 통신 – 하나의 송신 process에 하나의 수신 process가 대응 • Collective communication – 동시에 여러 개의 process가 참여 – 1:N, N:1, N:N 대응 가능 – 여러 번의 P2P Communication 사용을 하나의 Collective Communication으로 대체 » 오류 가능성 적음, 최적화로 빠름 © 2009 IBM Corporation Application Architecture • Java – Development and execution of Java applications © 2009 IBM Corporation Application Architecture • Java – Java application을 이용하여 System을 효율적으로 사용하는 방법 반드시 개선됨 • • • • • NIO (New I/O) NIO pollset Garbage collector는 자동으로 collect하도록 나둘 것 특별한 이유가 없으면 JRE는 최신으로 update할 것 개발시 source code는 최신으로 유지할 것 (Deprecated로 명시된 API는 되도록 다른 API로 변경하여 사용) • Framework을 사용한다면 framework을 최신으로 유지할 것 JRE나 Framework로 인해 개선이 안될 수도 있음 © 2009 IBM Corporation Application Architecture • Java – pollset • Java Source code – – – – – – DatagramChannel channel = DatagramChannel.open(); Channel = configureBlocking(false); Selector selector = Selector.open(); Channel.register(selector, SelectKey.OP_READ); Channel.register(selector, SelectKey.OP_READ); int poll(struct pollfd fds[], nfds_t nfds, int timeout); • Native pollset interface C source code – – – – pollset_t ps = pollset_create(int maxfd); int rc = pollset_destory(pollset_t ps); int rc = pollset_ctl(pollset_t ps, struct poll_ctl *pollctl_array, int array_length); int nfound = pollset_poll(pollset_t ps, struct pollfd *polldata_array, int array_length, int timeout); © 2009 IBM Corporation Application Architecture • Java – pollset • Traditional poll method © 2009 IBM Corporation Application Architecture • Java – pollset • pollset method © 2009 IBM Corporation Application Architecture • Java – pollset • pollcache internal – pollcache control block © 2009 IBM Corporation Application Architecture • Java – pollset • pollset() – bulky update © 2009 IBM Corporation Application Architecture • Java – pollset • The throughput performance two drivers(with poll() and with pollset()) – pollset driver가이 poll driver보다 13.3% 성능 향상 © 2009 IBM Corporation Application Architecture • Java – pollset • Time spent on CPU © 2009 IBM Corporation AIX I/O Model • select / poll • pollset • event • Real-time Signal • AIO • IOCP © 2009 IBM Corporation AIX IOCP • IOCP – I/O completion port support was first introduced in AIX 4.3 by APAR IY06351. An I/O completion port was originally a Windows NT scheduling construct that has since been implemented in other OS's. © 2009 IBM Corporation AIX IOCP • IOCP – Synchronous I/O versus asynchronous I/O © 2009 IBM Corporation AIX IOCP • IOCP – IOCP Operation © 2009 IBM Corporation AIX IOCP • IOCP – CreateIoCompletionPort Function < IOCP on AIX > #include <iocp.h> int CreateIoCompletionPort (FileDescriptor, CompletionPort, CompletionKey, ConcurrentThreads) HANDLE FileDescriptor, CompletionPort; DWORD CompletionKey, ConcurrentThreads; < IOCP on Windows > HANDLE CreateIoCompletionPort ( HANDLE FileHandle, // handle to file (socket) HANDLE ExistingCompletionPort, // handle to I/O completion port ULONG_PTR CompletionKey, // completion key DWORD NumberOfConcurrentThreads // number of threads to execute concurrently ); © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX • fileset : bos.iocp.rte $ lslpp -l bos.iocp.rteThe output from the lslpp command should be similar to the following : Fileset Level State Description ---------------------------------------------------------------------------Path: /usr/lib/objrepos bos.iocp.rte 5.3.9.0 APPLIED I/O Completion Ports API Path: /etc/objrepos bos.iocp.rte 5.3.0.50 COMMITTED I/O Completion Ports API office2@root/>lsdev -Cciocp iocp0 Available I/O Completion Ports office2@root/>lsattr -Eliocp0 autoconfig available STATE to be configured at system restart True © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX © 2009 IBM Corporation AIX IOCP • IOCP – How to configure IOCP on AIX © 2009 IBM Corporation AIX IOCP • IOCP API – – – – – – CreateCompletionPort GetMultipleCompletionStatus GetQueuedCompletionStatus PostQueuedCompletionStatus ReadFile WriteFile © 2009 IBM Corporation iSeries IOCP • IOCP API – – – – – – – – QsoStartAccept QsoCreateIOCompletionPort QsoDestroyIOCompletionPort QsoPostIOCompletion QsoStartRecv QsoStartSend QsoCancelOperation QsoWaitForIOCompletion © 2009 IBM Corporation Windows IOCP • IOCP API – – – – – – – CreateIoCompletionPort GetQueuedCompletionStatus GetQueuedCompletionStatusEx PostQueuedCompletionStatus ReadFileEx WriteFileEx Kernel Functions • • • • • • • • NtCreateIoCompletion, NtRemoveIoCompletion KeInitializeQueue, KeRemoveQueue KeInsertQueue KeWaitForSingleObject KeDelayExecutionThread KiActivateWaiterQueue KiUnwaitThread NtSetIoCompletion © 2009 IBM Corporation Q&A © 2009 IBM Corporation