Transcript Document

Programming for
Geographical Information Analysis:
Core Skills
Lecture 7:Core Packages:
File Input/Output
This lecture
Files
Text files
Binary files
Files
File types
Dealing with files starts with encapsulating the idea of a file in
an object
File locations
Captured in two classes:
java.io.File
Encapsulates a file on a drive.
java.net.URL
Encapsulates a Uniform Resource Locator (URL), which
could include internet addresses.
java.io.File
Before we can read or write files we need to capture them. The
File class represents an external file.
File(String pathname);
File f = new File("e:/myFile.txt");
However, we must remember that different OSs have different
file systems.
Note the use of a forward slash.
Java copes with most of this, but “e:” wouldn’t work in *NIX /
Mac / mobiles etc.
Getting file locations
java.awt.FileDialog
Opens a “Open file” box with a directory tree in it. This
stays open until the user chooses a file or cancels.
Once chosen use FileDialog’s getDirectory() and
getFile() methods to get the directory and
filename.
Getting file locations
import java.awt.*;
import java.io.*;
FileDialog fd = new FileDialog(new Frame());
fd.setVisible(true);
File f = null;
if((fd.getDirectory() != null)||( fd.getFile() != null)) {
f = new File(fd.getDirectory() + fd.getFile());
}
The application directory
Each object has a java.lang.Class object associated with
it. This represents the class loaded into the JVM.
One use is to get resources local to the class, i.e. in the same
directory as the .class file. We use a java.net.URL object
to do this.
Class thisClass = getClass();
URL url =
thisClass.getResource("myFile.txt");
We can then use URL’s getPath() to return the file path as a
String for the File constructor.
Useful File methods
exists(), canRead() and canWrite()
Test whether the file exists and can be read or written to.
createNewFile() and createTempFile()
Create a new file, and create a new file in “temp” or “tmp”.
delete() and deleteOnExit()
Delete the file (if permissions are correct). Delete when JVM
shutsdown.
isDirectory() and listFiles()
Checks whether the File is a directory, and returns an array of Files
representing the files in the directory. Can use a FilenameFilter
object to limit the returned Files.
Files
File types
As we’ll see, the type of the file has a big effect on how we
handle it.
Binary vs. Text files
All files are really just binary 0 and 1 bits.
In ‘binary’ files, data is stored in binary representations of the
primitive types:
8 bits = 1 byte
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000001
00000000 00000000 00000000 00000010
00000000 00000000 00000000 00000100
00000000 00000000 00000000 00110001
00000000 00000000 00000000 01000001
00000000 00000000 00000000 11111111
= int 0
= int 1
= int 2
= int 4
= int 49
= int 65
= int 255
Binary vs. Text files
In text files, which can be read in notepad++ etc. characters
are stored in smaller 2-byte areas by code number:
00000000 01000001 = code 65 = char “A”
00000000 01100001 = code 97 = char “a”
Characters
All chars are part of a set of 16 bit international characters
called Unicode.
These extend the American Standard Code for Information
Interchange (ASCII) , which are represented by the ints 0 to 127,
and its superset, the 8 bit ISO-Latin 1 character set (0 to 255).
There are some invisible characters used for things like the end
of lines.
char back = 8; // Try 7, as well!
System.out.println("hello" + back + "world");
The easiest way to use stuff like newline characters is to use
escape characters.
System.out.println("hello\nworld");
Binary vs. Text files
Note that :
00000000 00110001 = code 49 = char “1”
Seems much smaller – it only uses 2 bytes to store the character “1”, whereas
storing the int 1 takes 4 bytes.
However each character takes this, so:
00000000 00110001
= code 49 = char “1”
00000000 00110001 00000000 00110010
= code 49, 50 = char “1” “2”
00000000 00110001 00000000 00110010
00000000 00110111
= code 49, 50, 55 = char “1” “2” “7”
Whereas :
00000000 00000000 00000000 01111111
= int 127
Binary vs. Text files
In short, it is much more efficient to store anything with a
lot of numbers as binary (not text).
However, as disk space is cheap, networks fast, and it is
useful to be able to read data in notepad etc. increasingly
people are using text formats like XML.
As we’ll see, the filetype determines how we deal with files.
Review
File f = new File("e:/myFile.txt");
Three methods of getting file locations:
Hardwiring
FileDialog
Class getResource()
Need to decide the kind of file we want to deal with.
This lecture
Files
Text files
Binary files
Input and Output (I/O)
So, how do we deal with files (and other types of I/O)?
In Java we use address encapsulating objects, and input and
output “Streams”.
Streams are objects which represent the external resources
which we can read or write to or from. We don’t need to worry
about “how”.
Input Streams are used to get stuff into the program. Output
streams are used to output from the program.
Streams
Streams based on four abstract classes…
java.io.Reader and Writer
Work on character streams – that is, treat everything like
it’s going to be a character.
java.io.InputStream and OutputStream
Work on byte streams – that is, treat everything like it’s
binary data.
Character based streams
Two abstract superclasses – Reader and Writer.
These are used for a variety of character streams.
Most important are:
FileReader
FileWriter
: for reading files.
: for writing files.
Example
File f = new File(“myFile.txt");
FileReader fr = null;
try {
fr = new FileReader (f);
} catch (FileNotFoundException fnfe) {
Read one character out
fnfe.printStackTrace();
of the file.
}
try {
char char1 = fr.read();
fr.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Close the connection to
the file so others can use it.
Example
File f = new File("myFile.txt");
FileWriter fw = null;
try {
fw = new FileWriter (f, true);
Note this boolean is
optional and sets
whether to append to
the file (true) or
overwrite it (false).
Default is overwrite.
} catch (IOException ioe) {
ioe.printStackTrace();
}
try {
fw.write("A");
fw.flush();
fw.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
Make sure everything
in the stream is
written out.
Buffers
Plainly it is a pain to read a character at a time.
It is also possible that the filesystem may be slow or
intermittent, which causes issues.
It is common to wrap streams in buffer streams to cope with
these two issues.
BufferedReader br = new BufferedReader(fr);
BufferedWriter bw = new BufferedWriter(fw);
BufferedReader br = new BufferedReader(fr);
// Remember fr is a FileReader not a File.
Example
int lines = -1;
String textIn = " ";
String[] file = null;
try {
while (textIn != null) {
textIn = br.readLine();
lines++;
}
Run through the file
once to count the
lines and make a
String array the
right size.
file = new String[lines];
// close the buffer here and remake both FileReader and
// buffer to set it back to the file start.
for (int i = 0; i < lines; i++) {
file[i] = br.readLine();
}
br.close();
} catch (IOException ioe) {}
Go back to the
start of the file
and read it into
the array.
Example
String[][] strData = getStringArray();
BufferedWriter bw = new BufferedWriter (fw);
// Remember fw is a FileWriter not a File.
try{
for (int i = 0; i < strData.length; i++) {
for (int j = 0; j < strData[i].length; j++) {
bw.write(strData[i][j] + ", ");
}
bw.newLine();
}
bw.close();
} catch (IOException ioe) {}
Processing data
This is fine for text, but what if we want values and we have
text representations of the values?
There is a difference between 0.5 and “0.5”.
The computer understands the first as a number, but not the
second
First, parse (split and process) the file to get each individual
String representing the numbers.
Second, turn the text in the file into real numbers.
java.util.StringTokenizer
String line = “Call me Dave”;
StringTokenizer st = new StringTokenizer(line);
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
Call
me
Dave
Default separators: space, tab, newline, carriage-return
character, and form-feed.
Processing data
There are wrapper classes for each primitive that will do the
cast:
double d = Double.parseDouble("0.5");
int i = Integer.parseInt("1");
boolean b = Boolean.parseBoolean("true");
On the other hand, for writing, String can convert most things
to itself:
String str = String.valueOf(0.5);
String str = String.valueOf(data[i][j]);
Example
for (int i = 0; i <= lines; i++) {
file[i] = br.readln();
}
br.close();
double[][] data = new double [lines][];
Comma or
space
separated
data
for (int i = 0; i < lines; i++) {
StringTokenizer st = new StringTokenizer(file[i],", ");
data[i] = new double[st.countTokens()];
int j = 0;
while (st.hasMoreTokens()) {
data[i][j] = Double.parseDouble(st.nextToken());
j++;
}
}
Example
double[][] dataIn = getdata();
BufferedWriter bw = new BufferedWriter (fw);
String tempStr = "";
try {
for (int i = 0; i < dataIn.length; i++) {
for (int j = 0; j < dataIn[i].length; j++) {
tempStr = String.valueOf(dataIn[i][j]);
bw.write(tempStr + ", ");
}
bw.newLine();
}
bw.close();
} catch (IOException ioe) {}
Converts the
double to a
String.
java.util.Scanner
Wraps around all this to make reading easy:
Scanner s = null;
try {
s = new Scanner(
new BufferedReader(
new FileReader("myText.txt")));
while (s.hasNext()) {
System.out.println(s.next());
}
if (s != null) {
s.close();
}
} catch (Exception e) {}
However, no token counter, so not great for reading into arrays.
Scanners
By default looks for spaces to tokenise on.
Can set up a regular expression to look for.
Comma followed by optional space:
s.useDelimiter(",\\s*");
Data conversion
s.next() / s.hasNext()
String
nextBoolean() / hasNextBoolean()
nextDouble() / hasNextDouble()
nextInt() / hasNextInt()
nextLine() / hasNextLine()
boolean
double
int
String
If the type doesn’t match, throws
InputMismatchException.
Reading from keyboard
Scanner s = new Scanner(System.in);
int i = s.nextInt();
String str = s.nextLine();
Parsing Strings
Usually with text we want to extract useful information.
Search and replace.
String searches
startsWith(String prefix), endsWith(String suffix)
Returns a boolean.
indexOf(int ch), indexOf(int ch, int fromIndex)
Returns an int representing the first position of the first instance of
a given Unicode character integer to find.
indexOf(String str),
indexOf(String str, int fromIndex)
Returns an int representing the position of the first instance of a
given String to find.
lastIndexOf
Same as indexOf, but last rather than first.
String manipulation
replace(char oldChar, char newChar)
Replaces one character with another.
substring(int beginIndex, int endIndex)
substring(int beginIndex)
Pulls out part of the String and returns it.
toLowerCase(), toUpperCase()
Changes the case of the String.
trim()
Cuts white space off the front and back of a String.
Example
String str = "old pond; frog leaping; splash";
int start = str.indexOf("leaping");
int end = str.indexOf(";", start);
String startStr = str.substring(0, start);
String endStr = str.substring(end);
str = startStr + "jumping" + endStr;
str now “old pond; frog jumping; splash”
Review
Use a java.util.Scanner where possible.
Otherwise use a FileWriter/Reader.
But remember to buffer both.
This lecture
Files
Text files
Binary files
Byte streams
InputStream
Read methods return -1 at the end of the resource.
FileInputStream(File fileObject)
Allows us to read bytes from a file.
OutputStream
Used to write to resources.
FileOutputStream(File fileObject)
Used to write to a file if the user has permission.
Overwrites old material in file.
FileOutputStream(File fileObject, boolean append)
Only overwrites if append is false.
Example
FileInputStream ourStream = null;
File f = new File(“e:/myFile.bin”);
try {
ourStream = new FileInputStream(f);
} catch (FileNotFoundException fnfe) {
// Do something.
}
The Stream is then usually used in the following fashion:
int c = 0;
while( (c = ourStream.read()) >= 0 ) {
// Add c to a byte array (more on this shortly).
}
ourStream.close();
Byte streams II
There are cases where we want to write to and from arrays
using streams.
These are usually used as a convenient way of reading and
writing a byte array from other streams and over the network.
ByteArrayInputStream
ByteArrayOutputStream
Example
FileInputStream fin = new FileInputStream(file);
ByteArrayOutputStream baos = new
ByteArrayOutputStream();
int c;
while((c = fin.read()) >= 0) {
baos.write(c);
}
byte[] b = baos.toByteArray();
Saves us having to find out size of byte array as
ByteArrayOutputStream has a toByteArray() method.
Buffering streams
As with the FileReader/FileWriter:
BufferedInputStream
BufferedOutputStream
You wrap the classes using the buffer’s constructors.
Other byte streams
RandomAccessFile
Used for reading and writing to files when you need to write
into the middle of files as opposed to the end.
PrintStream
Was used in Java 1.0 to write characters, but didn’t do a very
good job of it.
Now deprecated as an object, with the exception of
System.out, which is a static final object of this type.
Object Streams
Serialization
Given that we can read and write bytes to streams, there’s
nothing to stop us writing objects themselves from the
memory to a stream.
This lets us transmit objects across the network and save the
state of objects in files.
This is known as object serialization. More details at:
http://www.tutorialspoint.com/java/java_serialization.htm
Summary
We can represent and explore the files on a machine with
the File class.
To save us having to understand how external info is
produced, java uses streams.
We can read and write bytes to files or arrays.
We can store or send objects using streams.
We can read and write characters to files or arrays.
We should always try and use buffers around our streams
to ensure access.