Strassen`s Matrix Multiplication - Computer Science and Engineering
Download
Report
Transcript Strassen`s Matrix Multiplication - Computer Science and Engineering
Course Project On :
Strassen's Matrix Multiplication
Under The Guidance Of:
Prof. Subodh Kumar
Presented By:
Gaurav Jain
Lalchand
Basic Matrix Multiplication
Suppose we want to multiply two matrices of size N
x N: for example A x B = C.
C11 = a11b11 + a12b21
C12 = a11b12 + a12b22
C21 = a21b11 + a22b21
C22 = a21b12 + a22b22
2x2 matrix multiplication can be
accomplished in 8 multiplication.(2log28 =23)
Strassens’s Matrix Multiplication
Strassens’s Matrix Multiplication
P1 = (A11+ A22)(B11+B22)
P2 = (A21 + A22) * B11
P3 = A11 * (B12 - B22)
P4 = A22 * (B21 - B11)
P5 = (A11 + A12) * B22
P6 = (A21 - A11) * (B11 + B12)
P7 = (A12 - A22) * (B21 + B22)
Strassens’s Matrix Multiplication
P1 = (A11+ A22)(B11+B22)
P2 = (A21 + A22) * B11
P3 = A11 * (B12 - B22)
P4 = A22 * (B21 - B11)
P5 = (A11 + A12) * B22
P6 = (A21 - A11) * (B11 + B12)
P7 = (A12 - A22) * (B21 + B22)
C11 = P1 + P4 - P5 + P7
C12 = P3 + P5
C21 = P2 + P4
C22 = P1 + P3 - P2 + P6
Strassens’s Matrix Multiplication
Ref : Accelerating High Performance Applications with CUDA and MPI
Why MPI + CUDA ?..
➢
Equations naturally suitable for CUDA environment
➢
Incapability of CUDA : No inter GPU communication.
➢
MPI : Data distributing mechanism
➢
CUDA : Main Execution Engine
MPI + CUDA
Steps Performed
➢
➢
➢
Divide the input matrix into four equal parts
Send the appropiate part to the corresponding process
Each process compute the corresponding equation
Node Contains GPU
Use kernels on their own GPU to compute result
Steps Performed
Divide the input matrix into four equal parts
➢ Send the appropiate part to the corresponding process
➢ Each process compute the corresponding equation
➢ Process
will send their result to the head process of
equation
➢ All Heads collect data
➢ Head will compute C's equation
➢ All head send their partial result to master node
➢ Master will combine & display the result
➢
Detailed Description – Step 1
P1 = (A11+ A22)(B11+B22)
P5 = (A11 + A12) * B22
P2 = (A21 + A22) * B11
P6 = (A21 - A11) * (B11 + B12)
P3 = A11 * (B12 - B22)
P7 = (A12 - A22) * (B21 + B22)
P4 = A22 * (B21 - B11)
Detailed Description – Step 2
P1 , P 5
P2 , P 6
P3 , P 7
P4
Detailed Description – Step 3
P1 , P 5
P2 , P 6
P3 , P7
P4
Declare
Result
Experimental Result - 1
Experimental Result - 2
Experimental Result - 3
References :
Accelerating High Performance Applications with CUDA and MPI :
N. P . Karunadasa & D. N. Ranasinghe
Strassen’s Matrix Multiplication on GPUs : Junjie Li , Sanjay Ranka
Thanks