Characters and Strings

Download Report

Transcript Characters and Strings

Characters and Strings
24-Jul-16
Characters


In Java, a char is a primitive type that can hold one
single character
A character can be:




A letter or digit
A punctuation mark
A space, tab, newline, or other whitespace
A control character

Control characters are holdovers from the days of teletypes—they are
things like backspace, bell, end of transmission, etc.
2
char literals

A char literal is written between single quotes (also
known as apostrophes):
'a'

'5'
'?'
''
Some characters cannot be typed directly and must be
written as an “escape sequence”:



'A'
Tab is '\t'
Newline is '\n'
Some characters must be escaped to prevent ambiguity:


Single quote is '\'' (quote-backslash-quote-quote)
Backslash is '\\'
3
Additional character literals
\n newline
\t
tab
\b
backspace
\r
return
\f
\\
\'
\"
form feed
backslash
single quote
double quote
4
Character encodings





A character is represented as a pattern of bits
The number of characters that can be represented
depends on the number of bits used
For a long time, ASCII (American Standard Code for
Information Interchange) has been used
ASCII is a seven-bit code (allows 128 characters)
ASCII is barely enough for English

Omits many useful characters:
¢½ç“”
5
Unicode

Unicode is a new standard for character encoding that is
designed to replace ASCII
“Unicode provides a unique number for every character,
no matter what the platform, no matter what the
program, no matter what the language.”

Java uses a subset of Unicode to represent characters


The Java subset uses two bytes for every character


Java 1.5 expands this by allowing some three-byte characters
Except for having these extra characters available, it seldom
makes any difference to how you program
6
Unicode character literals




The rest of the ASCII characters can be written as
octal numbers from \0 to \377
Any Unicode character (in the Java subset) can be
written as a hexadecimal number between \u0000
and \uFFFF
Since there are over 64000 possible Unicode
characters, the list occupies an entire book
 This makes it hard to look up characters
Unicode “letters” in any alphabet can be used
in identifiers
7
Glyphs and fonts


A glyph is the printed representation of a character
For example, the letter ‘A’ can be represented by any of
the glyphs
A A A A A


A font is a collection of glyphs
Unicode describes characters, not glyphs
8
Strings




A String is a kind of object, and obeys all the rules for
objects
In addition, there is extra syntax for string literals and
string concatenation
A string is made up of zero or more characters
The string containing zero characters is called the
empty string
9
String literals



A string literal consists of zero or more characters
enclosed in double quotes
"" "Hello" "This is a String literal."
To put a double quote character inside a string, it
must be backslashed:
"\"Wait,\" he said, \"Don't go!\""
Inside a string, a single quote character does not
need to be backslashed (but it can be)
10
String concatenation



Strings can be concatenated (put together) with
the + operator
"Hello, " + name + "!"
Anything “added” to a String is converted to a string
and concatenated
Concatenation is done left to right:
"abc" + 3 + 5
gives "abc35"
3 + 5 + "abc"
gives "8abc"
3 + (5 + "abc") gives "35abc"
11
Newlines




The character '\n' represents a “newline” (actually, it’s an
LF, the linefeed character)
When “printing” to the screen, you can go to a new line by
printing a newline character
You can also go to a new line by using System.out.println
with no argument or with one argument
When writing to the internet, you should use "\r\n" instead
of println because println is platform-specific




On UNIX, println uses LF for a newline
On Macintosh, println uses CR instead of LF for a newline
On Windows, println uses CR-LF for a newline
When you use the character constants, you are in control of what is
actually output
12
System.out.print and println





System.out.println can be called with no arguments
(parameters), or with one argument
System.out.print is called with one argument
The argument may be any of the 8 primitive types
The argument may be any object
Java can print any object, but it doesn’t always do a
good job


Java does a good job printing Strings
Java typically does a poor job printing types you define
13
Printing your objects

In any class, you can define the following instance method:
public String toString() { ... }



This method can return any string you choose
If you have an instance x, you can get its string
representation by calling x.toString()
If you define your toString() method exactly as above, it
will be used whenever your object is converted to a String


This happens during concatenation:
"My object is " + myObject
toString() is also used by System.out.print and
System.out.println
14
Constructing a String

You can construct a string by writing it as a literal:
"This is special syntax to construct a String."

Since a string is an object, you could construct it with
new:
new String("This also constructs a String.")

But using new for constructing a string is foolish,
because you have to write the string as a literal to pass it
in to the constructor

You’re doing the same work twice!
15
String methods


This is only a sampling of string methods
All are called as: myString.method(params)



length() -- the number of characters in the String
charAt(index) -- the character at (integer) position index,
where index is between 0 and length-1
equals(anotherString) -- equality test (because == doesn’t
do quite what you expect


Hint: Use "expected".equals(actual) rather than
actual.equals("expected") to avoid NullPointerExceptions
Don’t learn all 48 String methods unless you use
them a lot—instead, learn to use the API!
16
Vocabulary







escape sequence -- a code sequence for a character,
beginning with a backslash
ASCII -- an 7-bit standard for encoding characters
Unicode -- a 16-bit standard for encoding characters
glyph -- the printed representation of a character
font -- a collection of glyphs
empty string -- a string containing no characters
concatenate -- to join strings together
17
The End
18