Strassen`s Matrix Multiplication - Computer Science and Engineering

Download Report

Transcript Strassen`s Matrix Multiplication - Computer Science and Engineering

Course Project On :
Strassen's Matrix Multiplication
Under The Guidance Of:
Prof. Subodh Kumar
Presented By:
Gaurav Jain
Lalchand
Basic Matrix Multiplication
Suppose we want to multiply two matrices of size N
x N: for example A x B = C.
C11 = a11b11 + a12b21
C12 = a11b12 + a12b22
C21 = a21b11 + a22b21
C22 = a21b12 + a22b22
2x2 matrix multiplication can be
accomplished in 8 multiplication.(2log28 =23)
Strassens’s Matrix Multiplication
Strassens’s Matrix Multiplication
P1 = (A11+ A22)(B11+B22)
P2 = (A21 + A22) * B11
P3 = A11 * (B12 - B22)
P4 = A22 * (B21 - B11)
P5 = (A11 + A12) * B22
P6 = (A21 - A11) * (B11 + B12)
P7 = (A12 - A22) * (B21 + B22)
Strassens’s Matrix Multiplication
P1 = (A11+ A22)(B11+B22)
P2 = (A21 + A22) * B11
P3 = A11 * (B12 - B22)
P4 = A22 * (B21 - B11)
P5 = (A11 + A12) * B22
P6 = (A21 - A11) * (B11 + B12)
P7 = (A12 - A22) * (B21 + B22)
C11 = P1 + P4 - P5 + P7
C12 = P3 + P5
C21 = P2 + P4
C22 = P1 + P3 - P2 + P6
Strassens’s Matrix Multiplication
Ref : Accelerating High Performance Applications with CUDA and MPI
Why MPI + CUDA ?..
➢
Equations naturally suitable for CUDA environment
➢
Incapability of CUDA : No inter GPU communication.
➢
MPI : Data distributing mechanism
➢
CUDA : Main Execution Engine
MPI + CUDA
Steps Performed
➢
➢
➢
Divide the input matrix into four equal parts
Send the appropiate part to the corresponding process
Each process compute the corresponding equation

Node Contains GPU

Use kernels on their own GPU to compute result
Steps Performed
Divide the input matrix into four equal parts
➢ Send the appropiate part to the corresponding process
➢ Each process compute the corresponding equation
➢ Process
will send their result to the head process of
equation
➢ All Heads collect data
➢ Head will compute C's equation
➢ All head send their partial result to master node
➢ Master will combine & display the result
➢
Detailed Description – Step 1
P1 = (A11+ A22)(B11+B22)
P5 = (A11 + A12) * B22
P2 = (A21 + A22) * B11
P6 = (A21 - A11) * (B11 + B12)
P3 = A11 * (B12 - B22)
P7 = (A12 - A22) * (B21 + B22)
P4 = A22 * (B21 - B11)
Detailed Description – Step 2
P1 , P 5
P2 , P 6
P3 , P 7
P4
Detailed Description – Step 3
P1 , P 5
P2 , P 6
P3 , P7
P4
Declare
Result
Experimental Result - 1
Experimental Result - 2
Experimental Result - 3
References :
Accelerating High Performance Applications with CUDA and MPI :
N. P . Karunadasa & D. N. Ranasinghe
Strassen’s Matrix Multiplication on GPUs : Junjie Li , Sanjay Ranka
Thanks