CIS6930: Java Remote Method Invocation

Download Report

Transcript CIS6930: Java Remote Method Invocation

Java for High Performance Computing
java.nio: High Performance I/O for
Java
http://www.hpjava.org/courses/arl
Instructor: Bryan Carpenter
Pervasive Technology Labs
Indiana University
[email protected]
1
NIO: New I/O

Prior to the J2SE 1.4 release of Java, I/O had become a bottleneck.
– JIT performance was reaching the point where one could start to think of Java as
a platform for High Performance computation, but the old java.io stream classes
had too many software layers to be fast—the specification implied much
copying of small chunks of data; there was no way to multiplex data from
multiple sources without incurring thread context switches; also there was no
way to exploit modern OS tricks for high performance I/O, like memory
mapped files.

New I/O changes that by providing:
– A hierarchy of dedicated buffer classes that allow data to be moved from the
JVM to the OS with minimal memory-to-memory copying, and without
expensive overheads like switching byte order; effectively buffer classes give
Java a “window” on system memory.
– A unified family of channel classes that allow data to be fed directly from
buffers to files and sockets, without going through the intermediaries of the old
stream classes.
– A family of classes to directly implement selection (AKA readiness testing,
AKA multiplexing) over a set of channels.
– NIO also provides file locking for the first time in Java.
[email protected]
2
References

The Java NIO software is part of J2SE 1.4 and later, from
http://java.sun.com/j2se/1.4

Online documentation is at:
http://java.sun.com/j2se/1.4/nio

There is an authoritative book from O’Reilly:
“Java NIO”, Ron Hitchens, 2002
[email protected]
3
Buffers
[email protected]
4
Buffers


A Buffer object is a container for a fixed amount of data.
It behaves something like a byte [] array, but is encapsulated
in such a way that the internal storage can be a block of
system memory.
– Thus adding data to, or extracting it from, a buffer can be a very direct
way of getting information between a Java program and the underlying
operating system.
– All modern OS’s provide virtual memory systems that allow memory
space to be mapped to files, so this also enables a very direct and highperformance route to the file system.
– The data in a buffer can also be efficiently read from, or written to, a
socket or pipe, enabling high performance communication.

The buffer APIs allow you to read or write from a specific
location in the buffer directly; they also allow relative reads
and writes, similar to sequential file access.
[email protected]
5
The java.nio.Buffer Hierarchy
Buffer
CharBuffer
IntBuffer
DoubleBuffer
ShortBuffer
LongBuffer
FloatBuffer
ByteBuffer
MappedByt
eBuffer
[email protected]
6
The ByteBuffer Class


The most important buffer class in practice is probably the
ByteBuffer class. This represents a fixed-size vector of
primitive bytes.
Important methods on this class include:
byte get()
byte get(int index)
ByteBuffer get(byte [] dst)
ByteBuffer get(byte [] dst, int offset, int length)
ByteBuffer put(byte b)
ByteBuffer put(int index, byte b)
ByteBuffer put(byte [] src)
ByteBuffer put(byte [] src, int offset, int length)
ByteBuffer put(ByteBuffer src)
[email protected]
7
File Position and Limit


Apart from forms with an index parameter, these are all relative operations: they
get data from, or insert data into, the buffer starting at the current position in the
buffer; they also update the position to point to the position after the read or written
data. The position property is like the file pointer in sequential file access.
The superclass Buffer has methods for explicitly manipulating the position and
related properties of buffers, e.g:
int position()
Buffer position(int newPosition)
int limit()
Buffer limit(int newLimit)
– The ByteBuffer or Buffer references returned by these various methods are simply
references to this buffer object, not new buffers. They are provided to support cryptic
invocation chaining. Feel free to ignore them.

The limit property defines either the last space available for writing, or how much
data has been written to the file.
– After finishing writing a flip() method can be called to set limit to the current value of
position, and reset position to zero, ready for reading.

Various operations implicitly work on the data between position and limit.
[email protected]
8
Creating Buffers

Four interesting factory methods can be used to create a new ByteBuffer:
ByteBuffer allocate(int capacity)
ByteBuffer allocateDirect(int capacity)
ByteBuffer wrap(byte [] array)
ByteBuffer wrap(byte [] array, int offset, length)
These are all static methods of the ByteBuffer class.
– allocate() creates a ByteBuffer with an ordinary Java backing array of size
capacity.
– allocateDirect()—perhaps the most interesting case—creates a direct
ByteBuffer, backed by capacity bytes of system memory.
– The wrap() methods create ByteBuffer’s backed by all or part of an array
allocated by the user.

The other typed buffer classes (CharBuffer, etc) have similar factory
methods, except they don’t support the important allocateDirect() method.
[email protected]
9
Other Primitive Types in ByteBuffer’s

It is possible to write other primitive types (char, int, double, etc) to a
ByteBuffer by methods like:
ByteBuffer putChar(char value)
ByteBuffer putChar(int index, char value)
ByteBuffer putInt(int value)
ByteBuffer putInt(int index, int value)
…
The putChar() methods do absolute or relative writes of the two bytes in a
Java char, the putInt() methods write 4 bytes, and so on.
– Of course there are corresponding getChar(), getInt(), … methods.



These give you fun, unsafe ways of coercing bytes of one primitive type to
another type, by writing data as one type and reading them as another.
But actually this isn’t the interesting bit—this was always possible with
the old java.io DataStream’s.
The interesting bit is that the new ByteBuffer class has a method that
allows you to set the byte order…
[email protected]
10
Endian-ness

When identifying a numeric type like int or double with a sequence of bytes
in memory, one can either put the most significant byte first (big-endian), or
the least significant byte first (little-endian).
– Big Endian: Sun Sparc, PowerPC CPU, numeric fields in IP headers,…
– Little Endian: Intel processors

In java.io, numeric types were always rendered to stream in big-endian order.
– Creates a serious bottleneck when writing or reading numeric types.
– Implementations typically must apply byte manipulation code to each item, to
ensure bytes are written in the correct order.

In java.nio, the programmer specifies the byte order as a property of a
ByteBuffer, by calling one of:
myBuffer.order(ByteOrder.BIG_ENDIAN)
myBuffer.order(ByteOrder.LITTLE_ENDIAN)
myBuffer.order(ByteOrder.nativeOrder())

Provided the programmer ensures the byte order set for the buffer agrees with
the native representation for the local processor, numeric data can be copied
between JVM (which will use the native order) and buffer by a straight block
memory copy, which can be extremely fast—a big win for NIO.
[email protected]
11
View Buffers


ByteBuffer has no methods for bulk transfer of arrays other than type byte[].
Instead, create a view of (a portion of) a ByteBuffer as any other kind of typed
buffer, then use the bulk transfer methods on that view. Following methods of
ByteBuffer create views:
CharBuffer asCharBuffer()
IntBuffer asIntBuffer()
…
– To create a view of just a portion of a ByteBuffer, set position and limit
appropriately beforehand—the created view only covers the region between these.
– You cannot create views of typed buffers other than ByteBuffer.
– You can create another buffer that represents a subsection of any buffer (without
changing element type) by using the slice() method.

For example, writing an array of floats to a byte buffer, starting at the current
position:
float [] array ;
…
FloatBuffer floatBuf = byteBuf.asFloatBuffer() ;
floatBuf.put(array) ;
[email protected]
12
Channels
[email protected]
13
Channels

A channel is a new abstraction in java.nio.
– In the package java.nio.channels.

Channels are a high-level version of the file-descriptors
familiar from POSIX-compliant operating systems.
– So a channel is a handle for performing I/O operations and various
control operations on an open file or socket.

For those familiar with conventional Java I/O, java.nio
associates a channel with any RandomAccessFile,
FileInputStream, FileOutputStream, Socket, ServerSocket
or DatagramSocket object.
– The channel becomes a peer to the conventional Java handle objects;
the conventional objects still exist, and in general retain their role—the
channel just provides extra NIO-specific functionality.

NIO buffer objects can written to or read from channels
directly. Channels also play an essential role in readiness
selection, discussed in the next section.
[email protected]
14
Simplified Channel Hierarchy
<<<Interface>>>
Channel
<<<interface>>>
ByteChannel
FileChannel
SelectableChannel
DatagramChannel
SocketChannel
ServerSocketChannel
Some of the “inheritance” arcs here are indirect: we missed
out some interesting intervening classes and interfaces.
[email protected]
15
Opening Channels

Socket channel classes have static factory methods called
open(), e.g.:
SocketChannel sc = SocketChannel.open() ;
Sc.connect(new InetSocketAddress(hostname, portnumber)) ;

File channels cannot be created directly; first use conventional
Java I/O mechanisms to create a FileInputStream,
FileOutputStream, or RandomAccessFile, then apply the new
getChannel() method to get an associated NIO channel, e.g.:
RandomAccessFile raf = new RandomAccessFile(filename, “r”) ;
FileChannel fc = raf.getChannel() ;
[email protected]
16
Using Channels

Any channel that implements the ByteChannel interface—i.e.
all channels except ServerSocketChannel—provide a read()
and a write() instance method:
int read(ByteBuffer dst)
int write(ByteBuffer src)
– These may look reminiscent of the read() and write() system calls in
UNIX:
int read(int fd, void* buf, int count)
int write(int fd, void* buf, int count)
– The Java read() attempts to read from the channel as many bytes as
there are remaining to be written in the dst buffer. Returns number of
bytes actually read, or -1 if end-of-stream. Also updates dst buffer
position.
– Similarly write() attempts to write to the channel as many bytes as
there are remaining in the src buffer. Returns number of bytes actually
read, and updates src buffer position.
[email protected]
17
Example: Copying one Channel to Another

This example assumes a source channel src and a destination channel dest:
ByteBuffer buffer = ByteBuffer.allocateDirect(BUF_SIZE) ;
while(src.read(buffer) != -1) {
buffer.flip() ; // Prepare read buffer for “draining”
while(buffer.hasRemaining())
dest.write(buffer) ;
buffer.clear() ; // Empty buffer, ready to read next chunk.
}
– Note a write() call (or a read() call) may or may not succeed in transferring
whole buffer in a single call. Hence need for inner while loop.
– Example introduces two new methods on Buffer: hasRemaining() returns
true if position < limit; clear() sets position to 0 and limit to buffer’s
capacity.
– Because copying is a common operation on files, FileChannel provides a
couple of special methods to do just this:
long transferTo(long position, long count, WriteableByteChannel target)
long transferFrom(ReadableByteChannel src, long position, long count)
[email protected]
18
Memory-Mapped Files

In modern operating systems one can exploit the virtual memory
system to map a physical file into a region of program memory.
– Once the file is mapped, accesses to the file can be extremely fast: one
doesn’t have to go through read() and write() system calls.
– One application might be a Web Server, where you want to read a whole file
quickly and send it to a socket.
– Problems arise if the file structure is changed while it is mapped—use this
technique only for fixed-size files.

This low-level optimization is now available in Java.
FileChannel has a method:
MappedByteBuffer map(MapMode mode, long position, long size)
– mode should be one of MapMode.READ_ONLY,
MapMode.READ_WRITE, MapMode.PRIVATE.
– The returned MappedByteBuffer can be used wherever an ordinary
ByteBuffer can.
[email protected]
19
Scatter/Gather

Often called vectored I/O, this just means you can pass an
array of buffers to a read or write operation; the overloaded
channel instance methods have signatures:
long read(ByteBuffer [] dsts)
long read(ByteBuffer [] dsts, int offset, int length)
long write(ByteBuffer [] srcs)
long write(ByteBuffer [] srcs, int offset, int length)


The first form of read() attempts to read enough data to fill all
buffers in the array, and divides it between them, in order.
The first form of write() attempts to concatenate the
remaining data in all buffers and write it.
– The arguments offset and length select a subset of buffers from the
arrays (not, say, an interval within buffers).
[email protected]
20
SocketChannels

As mentioned at the beginning of this section, socket channels
are created directly with their own factory methods
– If you want to manage a socked connection as a NIO channel this is
the only option. Creating NIO socket channel implicitly creates a peer
java.net socket object, but (contrary to the situation with file handles)
the converse is not true.


As with file channels, socket channels can be more
complicated to work with than the traditional java.net socket
classes, but provide much of the hard-boiled flexibility you
get programming sockets in C.
The most notable new facilities are that now socket
communications can be non-blocking, they can be interrupted,
and there is a selection mechanism that allows a single thread
to do multiplex servicing of any number of channels.
[email protected]
21
Basic Socket Channel Operations

Typical use of a server socket channel follows a pattern like:
ServerSocketChannel ssc = ServerSocketChannel.open() ;
ssc.socket().bind( new InetSocketAddress(port) ) ;
while(true) {
SocketChannel sc = ssc.accept() ;
… process a transaction with client through sc …
}

The client does something like:
SocketChannel sc = SocketChannel.open() ;
sc.connect( new InetSocketAddr(serverName, port) ) ;
… initiate a transaction with server through sc …

The elided code above will typically be using read() and write() calls on the
SocketChannel to exchange data between client and server.
– So there are four important operations: accept(), connect(), write(), read() .
[email protected]
22
Nonblocking Operations

By calling the method
socket.configureBlocking(false) ;

you put a socket into nonblocking mode (calling again with
argument true restores to blocking mode, and so on).
In non-blocking mode:
– A read() operation only transfers data that is immediately available. If no
data is immediately available it returns 0.
– Similarly, if data cannot be immediately written to a socket, a write()
operation will immediately return 0.
– For a server socket, if no client is currently trying to connect, the accept()
method immediately returns null.
– The connect() method is more complicated—generally connections
would always block for some interval waiting for the server to respond.
» In non-blocking mode connect() generally returns false. But the negotiation
with the server is nevertheless started. The finishConnect() method on the
same socket should be called later. It also returns immediately. Repeat until
it return true.
[email protected]
23
Interruptible Operations


The standard channels in NIO are all interruptible.
If a thread is blocked waiting on a channel, and the thread’s
interrupt() method is called, the channel will be closed, and
the thread will be woken and sent a
ClosedByInterruptException.
– To avoid race conditions, the same will happen if an operation on a
channel is attempted by a thread whose interrupt status is already true.
– See the lecture on threads for a discussion of interrupts.

This represents progress over traditional Java I/O, where
interruption of blocking operations was not guaranteed.
[email protected]
24
Other Features of Channels



File channels provide a quite general file locking facility. This
is presumably important to many applications (database
applications), but less obviously so to HPC operations, so we
don’t discuss it here.
There is a DatagramChannel for sending UDP–style
messages. This may well be important for high performance
communications, but we don’t have time to discuss it.
There is a special channel implementation representing a kind
of pipe, which can be used for inter-thread communication.
[email protected]
25
Selectors
[email protected]
26
Readiness Selection

Prior to New I/O, Java provided no standard way of selecting—from a set of
possible socket operations—just the ones that are currently ready to proceed,
so the ready operations can be immediately serviced.
– One application would be in implementing an MPI-like message passing system:
in general incoming messages from multiple peers must be consumed as they
arrive and fed into a message queue, until the user program is ready to handle
them.
– Previously one could achieve equivalent effects in Java by doing blocking I/O
operations in separate threads, then merging the results through Java thread
synchronization. But this can be inefficient because thread context switching
and synchronization is quite slow.


One way of achieving the desired effect in New I/O would be set all the
channels involved to non-blocking mode, and use a polling loop to wait until
some are ready to proceed.
A more structured—and potentially more efficient—approach is to use
Selectors.
– In many flavors of UNIX this is achieved by using the select() system call.
[email protected]
27
Classes Involved in Selection



Selection can be done on any channel extending
SelectableChannel—amongst the standard channels this
means the three kinds of socket channel.
The class that supports the select() operation itself is Selector.
This is a sort of container class for the set of channels in
which we are interested.
The last class involved is SelectionKey, which is said to
represent the binding between a channel and a selector.
– In some sense it is part of the internal representation of the Selector,
but the NIO designers decided to make it an explicit part of the API.
[email protected]
28
Setting Up Selectors


A selector is created by the open() factory method. This is naturally a
static method of the Selector class.
A channel is added to a selector by calling the method:
SelectionKey register(Selector sel, int ops)
– This, slightly oddly, is an instance method of the SelectableChannel class—
you might have expected the register() method to be a member of Selector.
– Here ops is a bit-set representing the interest set for this channel: composed by
oring together one or more of:
SelectionKey.OP_READ
SelectionKey.OP_WRITE
SelectionKey.OP_CONNECT
SelectionKey.OP_ACCEPT
– A channel added to a selector must be in nonblocking mode!

The register() method returns the SelectionKey created
– Since this automatically gets stored in the Selector, so in most cases you
probably don’t need to save the result yourself.
[email protected]
29
Example

Here we create a selector, and register three pre-existing
channels to the selector:
Selector selector = Selector.open() ;
channel1.register (selector, SelectionKey.OP_READ) ;
channel2.register (selector, SelectionKey.OP_WRITE) ;
channel3.register (selector, SelectionKey.OP_READ |
SelectionKey.OP_WRITE) ;
– For channel1 the interest set is reads only, for channel2 it is writes
only, for channel3 it is reads and writes.

Note channel1, channel2, channel3 must all be in nonblocking mode at this time, and must remain in that mode as
long as they are registered in any selector.
– You remove a channel from a selector by calling the cancel() method
of the associated SelectionKey.
[email protected]
30
select() and the Selected Key Set

To inspect the set of channels, to see what operations are newly ready to
proceed, you call the select() method on the selector.
– The return value is an integer, which will be zero if no status changes
occurred.
– More interesting than the return value is the side effect this method has on the
set of selected keys embedded in the selector.

To use selectors, you must understand that a selector maintains a Set
object representing this selected keys set.
– Because each key is associated with a channel, this is equivalent to a set of
selected channels.
– The set of selected keys is different from (presumably a subset of) the
registered key set.
– Each time the select() method is called it may add new keys to the selected
key set, as operations become ready to proceed.
– You, as the programmer, are responsible for explicitly removing keys from the
selected key set belonging to the selector, as you deal with operations that
have become ready.
[email protected]
31
Ready Sets




This is quite complicated already, but there is one more complication.
We saw that each key in the registered key set has an associated interest set,
which is a subset of the 4 possible operations on sockets.
Similarly each key in the selected key set has an associated ready set, which
is a subset of the interest set—representing the actual operations that have
been found ready to proceed.
Besides adding new keys to the selected key set, a select() operation may add
new operations to the ready set of a key already in the selected key set.
– Assuming the selected key set was not cleared after a preceding select().

You can extract the ready set from a SelectionKey as a bit-set, by using the
method readyOps(). Or you can use the convenience methods:
isReadable()
isWriteable()
isConnectable()
isAcceptable()
which effectively return the bits of the ready set individually.
[email protected]
32
A Pattern for Using select()
… register some channels with selector …
while(true) {
selector.select() ;
Iterator it = selector.selectedKeys().iterator() ;
while( it.hasNext() ) {
SelectionKey key = it.next() ;
if( key.isReadable() )
… perform read() operation on key.channel() …
if( key.isWriteable() )
… perform write() operation on key.channel() …
if( key.isConnectable() )
… perform connect() operation on key.channel() …
if( key.isAcceptable() )
… perform accept() operation on key.channel() …
it.remove() ;
}
}
[email protected]
33
Remarks

This general pattern will probably serve for most uses of
select():
1. Perform select() and extract the new selected key set
2. For each selected key, handle the actions in its ready set
3. Remove the processed key from the selected key set
»

Note the remove() operation on an Iterator removes the current item
from the underlying container.
More generally, the code that handles a ready operation may
also alter the set of channels registered with the selector
– e.g after doing an accept() you may want to register the returned
SocketChannel with the selector, to wait for read() or write()
operations.

In many cases only a subset of the possible operations read,
write, accept, connect are ever in interest sets of keys
registered with the selector, so you won’t need all 4 tests.
[email protected]
34
Key Attachments

One problem with the pattern above is that when it.next()
returns a key, there is no convenient way of getting
information about the context in which the associated channel
was registered with the selector.
– For example channel1 and channel3 are both registered for
OP_READ. But the action that should be taken when the read
becomes ready may be quite different for the two channels.
– You need a convenient way to determine which channel the returned
key is bound to.

You can specify an arbitrary object as an attachment to the
key when you create it; later when you get the key from the
selected set, you can extract the attachment, and use its
content in to decide what to do.
– At its most basic the attachment might just be an index identifying the
channel.
[email protected]
35
Simplistic Use of Key Attachments
channel1.register (selector, SelectionKey.OP_READ,
new Integer(1) ) ; // attachment
…
channel3.register (selector, SelectionKey.OP_READ |
SelectionKey.OP_WRITE,
new Integer(3) ) ; // attachment
…
while(true) {
…
Iterator it = selector.selectedKeys().iterator() ;
…
SelectionKey key = it.next() ;
if( key.isReadable() )
switch( ((Integer) key.channel().attachment() ).value() ) {
case 1 :
… action appropriate to channel1 …
case 3 :
… action appropriate to channel3 …
}
…
}
[email protected]
36
Conclusion

We briefly visited several topics in New I/O that are likely to
be interesting for HPC with Java.
– Some topics that are less obviously relevant we skipped, like file
locking, and regular expressions.
– Also we didn’t cover datagram channels, which may well be relevant.


New I/O has been widely hailed as an important step forward
in getting serious performance out of the Java platform.
See the paper:
“MPJava: High-Performance Message Passing in Java using java.nio”
William Pugh and Jaime Spacco
For a good example of how New I/O may affect the “Java for
HPC” landscape.
[email protected]
37
[email protected]
38