Data Structures in Java for Matrix Computations

Download Report

Transcript Data Structures in Java for Matrix Computations

Data Structures in Java for Matrix Computations Geir Gundersen Department of Informatics University of Bergen Norway

Joint work with Trond Steihaug

Overview

We will show how to utilize Java’s native arrays for matrix computations.

 How to use Java arrays as a 2D array for efficient dense matrix computation.

 How to create efficient sparse matrix data structure using Java arrays.

 Object-oriented programming have been favored in the last decade(s):   Easy to understand paradigm.

Straightforward to build large scale applications.

 Java will be used for (limited) numerical computations.

  Java is already introduced as the programming language in introductory courses in scientific computation.

Impact on computing will force new fields to use Java.

A “mathematical” 2D array

A 2D Java Array

 Array elements that refers to another array creates a multidimensional array.

A true 2D Java Array

Java Arrays

 Java arrays are true objects.

 Thus creating an array is object creation.

 The objects of an array of objects are not necessarily stored continuously.

 An array of objects stores references to the actual objects.  The primitive elements of an array are most likely stored continuously.

 An array of primitive elements holds the actual values for those elements.

Frobenius Norm Example

The mathematical definition of the operation is:

s

=

m i

 

0 1 n

 

j

 

1 0 A ij

These two next code examples shows the implementation of the mathimatical definition in Java: Loop-order (i,j): double s = 0; double[] array = new double[m][n]; for(int i = 0;i

Frobenius Norm Example

 Basic observation:  Accessing the consecutive elements in a row will be faster then accessing consecutive elements in a column.

Matrix Multiplication Algorithms

 The efficiency of the matrix multiplication operation is dependent on the details of the underlying data structure both hardware and software.  We discuss several different implementations using Java arrays as the data structure:  A straightforward matrix multiplication algorithm  A package implementation that is highly optimized  An algorithm that takes the row-wise layout into fully consideration and uses the same optimizing techniques as the package implementation.

Matrix Multiplication Algorithms

C

ij

=

n

1 k

 

0 A ik B k

`

j i = 0,1,2,3,…,m-1 j=0,1,2,3,…,p-1

A straightforward matrix multiplication operation.

for(int i = 0; i

Matrix Multiplication Algorithms

m=n=p

80 115 138 240 468

(k,i,j)

66

Matrix Multiplication Pure Row Partial

178 298 1630 13690

(i,k,j)

63 174 257 1538 13175

(i,j,k)

66 208 331 2491 27655

(j,i,k)

72 233 341 2617 28804

Pure Column (j,k,i)

100 295 468 4458 56805

(k,j,i)

99 299 474 4457 58351 The loop orders tells us how the matrices involved gets traversed in the course of the matrix multiplication operation.

We see the same time differences with pure row versus pure column as we did with the Frobenius norm example. This is the same effect.

Matrix Multiplication Algorithms

 The time differences are due to accessing different object arrays when traversing columns as opposed to accessing the same object array several times (when traversing a row).

 For a rectangular array of primitive elements, the elements of a row will be stored continuously, but the rows may be scattered.

 Differences between row and column traversing is also an issue in FORTRAN, C and C++ but the differences are not so significant.

JAMA

 A basic linear algebra package implemented in Java.  It provides user-level classes for constructing and manipulating real dense matrices.  It is intended to serve as the standard matrix class for Java.  JAMA is comprised of six Java classes:  Matrix:  Matrix Multiplication: A.times(B)   CholeskyDecomposition LUDecomposition    QRDecomposition SingularValueDecomposition EigenvalueDecomposition

Matrix Multiplication Operations

JAMA versus Pure-Row

 A comparison on input AB is shown for square matrices.  The pure row-oriented algorithm has an average of 30 % better performance than JAMA's algorithm.

JAMA versus Pure-Row

 JAMA's algorithm is more efficient than the pure row-oriented algorithm on input Ab with an average factor of two.

JAMA versus Pure-Row

 There is a significant difference between JAMA's algorithm versus the pure row-oriented algorithm on b

T

A with an average factor of 7.  In this case JAMA is less efficient.

 The break even results.

Sparse Matrices

 A sparse matrix is usually defined as a matrix where "many" of its elements are equal to zero  We benefit both in time and space by working only on the nonzero data structure.

 Currently there is no packages implemented in Java for matrix computation on sparse matrices, as complete as JAMA (for dense matrices).

Sparse Matrix Concept

 The Sparse Matrix Concept (SMC) is a general object oriented structure.

 The Rows objects stores the arrays for the nonzero values and indexes.

Java Sparse Array

 The Java Sparse Array (JSA) format is a new concept for storing sparse matrices made possible with Java.

 One array for storing the references to the value arrays and one for storing the references to the index arrays.

Java's native arrays can store object references therefore the extra Rows object layer in SMC is unnecessarily in Java.

Compressed Row Storage

 The most commonly used storage schemes for large sparse matrices:  Compressed Row/Column Storage  These storage schemes have enjoyed several decades of research  The compressed storage schemes have minimal memory requirements.

Numerical Results

m=n=p

115 468 2205 4884 10974 17282 Sparse Matrix Multiplication

nnz(A) nnz(C)

CRS JSA 421 2820 14133 147631 219512 553956 1027 8920 46199 473734 620957 2525937 1 19 21 185 207 829 2 17 38 169 228 642 SMC 2 17 36 165 278 628 These numerical results shows that CRS, SMC and JSA have approximately the same performance.

Sparse Matrix Update

 Consider the outer product ab

T

elements are 0. of the two vectors a,b where many of the  The outer product will be a sparse matrix with some rows where all elements are 0, and the corresponding sparse data structure will have rows without any elements.

 A typical operation is a rank one update of an n x n matrix A:

A ij

A ij

a i b j

i

n

1, j

n

1

 where

a

i

is element i in a and b of A where a

i j

is element j in b. Thus only those rows is different from 0 need to be updated.

Numerical Results

m=n=p

115 468 2205 4884 10974 17282

nnz(A)

421 2820 14133 147631 219512 553956 Sparse Matrix Update

nnz(B)

nnz(new A) 7 148 426 2963 449 2365 1350 324 14557 149942 220104 554138 CRS 11 13 44 183 753 1806 JSA 0 1 8 8 8 11 These numerical results shows that JSA is more efficient than CRS with an average factor of 78 which is significant.

Concluding Remarks

 Using Java arrays as a 2D array for dense matrices we need to consider that the rows are independent objects.

 Other suggestion to eliminate the row versus column “problem”:   Cluster row objects together in memory.

Creating a Java array class, avoiding array of arrays.

 Java Sparse Array:   Manipulating only the rows of the structure without updating or traversing the rest of the structure, unlike Compressed Row Storage.

More efficient, less memory requirements and have a more natural notation than SMC.

 People will use Java for numerical computations, therefore it may be useful to invest time and resources finding how to use Java for numerical computation.

 This work has given ideas of how some constructions in Java restricts natural development (rows versus columns).

 Java has flexibility that is not fully explored (Java Sparse Array).