Data Types and Data Structures

Download Report

Transcript Data Types and Data Structures

Data Types and Data Structures
Ranga Rodrigo
Does the computer
know about data types?
No.
Data Types
Computer programs manipulate data of various
types, such as:
numbers, both integral and floating point,
characters, based on the ASCII code,
boolean values, and
compound structures such as arrays and records.
In memory, however,
all data is held as bit patterns
which must be interpreted
before the data can be processed.
There is clearly a threat of insecurity here:
if a bit pattern is interpreted wrongly,
the program may crash
or produce erroneous results.
Type Errors
Type errors arise when an operation defined for
one type of data is applied to another.
E.g., if you try to add an array to a string.
In general, these errors are detected at run-time
if and when the run-time system tries to execute
the erroneous operation.
Untyped Languages
E.g., Perl.
It is the programmer's responsibility to avoid
run-time type errors.
Any variable can store data of any type, and it
is up to the programmer to make sure that
operations are only applied to data of the
correct type.
Interpreted languages are often untyped.
Untyped Languages
Variables do not have a type.
Programmers have to keep track of what is
stored where.
Errors may only be caught at run-time, when it
may be too late.
Worse, data corruption may take place
unnoticed.
Typed Languages
Typed languages try to use the compiler to
detect type errors.
This is to ensure that programs will not crash at
run-time.
This is widely seen as a crucial aspect of
language security.
Typed Languages
Variables and similar entities have defined
types.
Each type has a range of permissable
operations defined for it.
The compiler can therefore ensure that
operations are only applied to data of the
correct sort.
What is a type-secure
language?
A type-secure language is one which would in no
circumstances give rise to a run-time error related to
types.
It is hard to guarantee this property.
Languages
Untyped

Typed
Perl
Strongly
Pascal
 Eiffel

Weakly

C
Weak and Strong Typing
The distinction relates to the extent to which the
compiler will silently convert data of one type to
another related type.
E.g., converting numeric types, or integers to
addresses in C.
Weak typing has the potential to let through
more errors than strong typing.
Type Conversion and Casting
Sometimes it is convenient or necessary to
convert data at run time from one type to
another.
A common example is given by calculations
which need to mix integer and floating point
data.
Here, some languages, including C++, Java
and Eiffel, will carry out some data conversion
automatically, e.g., changing an integer data
item into the corresponding floating point value.
Conversions in Eiffel
Some conversions might involve a loss of
information: for example, converting a floating
point value like 3.14159 into an integer.
C++ allows such conversions, whereas Java
and Eiffel compilers report an error in this case.
If the programmer wants the conversion to go
ahead, an explicit function call can be inserted
in Eiffel to specify exactly what conversion is
required.
In C++ and Java
Casing can be used.
void main
{
int i ;
float f ;
i = (int) f ;
}
Dangers of Casting
There are dangers associated with the
unrestricted use of casts.
It provides a means for a programmer to
override the type checks implemented by the
compiler.
Casting is a common source of programming
errors.
For this reason, languages intended to be
secure, like Eiffel, do not support casting.
Types
Value types
Reference types
The actual data of
interest---an integer or
a boolean value, say--will be stored in the
allocated memory.
A memory address will
be stored. The actual
data will be placed at
the memory location
pointed to by the
reference.
Reference Types
Reference types allow the same data to be
referred to at different points in a program, and
complex data structures to be constructed.
Reference types allow data structures to be
passed as parameters efficiently: instead of
copying the whole structure, a reference is
passed.
Value Types
With value types, there is no danger of
accidental corruption of data through sharing.
Value types avoid the overhead of having to dereference an address before getting hold of the
data to be processed.
Types in Java
All classes in Java define reference types: this
means that if Person, is a class, then following
a variable declaration such as
Person p ;
the variable p will only be able to hold
references.
Java also defines a range of primitve value
types---e.g., int, char, bool etc---which allow
simple data to be manipulated efficiently.
Primitive Types in Java
The values of the primitive types lie outside the
Java class hierarchy.
So in one sense Java is not a pure objectoriented language.
To get round this, Java defines classes that
correspond to the built-in types---Integer,
Boolean etc.
Autoboxing and Unboxing
It can sometimes be rather clumsy and
confusing to convert data between value and
the corresponding reference types.
To deal with this problem, later versions of Java
have introduced autoboxing and unboxing---in
effect, built in conversions between the primitive
types and the corresponding reference types.
In C++, the distinction between value and
reference types on the one hand, and simple
and class data on the other is orthogonal.
In other words, the two concepts are quite
independent of each other.
It is possible to have references to ints in C++,
and equally for an instance of a class to be
stored as a simple value.
Reference Types in C++
In C++, reference types are defined explicitly.
Person p; //a value
Person& pr; //a reference
Person* pp; //a pointer, also
requiring dereferencing
This gives great flexibility in the way that
memory is managed, but is also a common
source of programming errors.
By contrast, in Java it is impossible to take the
address of or obtain a reference to an int and
there is no equivalent of the pointer
manipulations possible in C++.
Types in Eiffel
In Eiffel, every type is a class, including basic
types like INTEGER.
There are no special primitive types as in Java,
so in this sense Eiffel is more object-oriented
than Java.
This makes the language very consistent and
conceptually simple.
Types in Eiffel
However, to avoid the inefficiency involved if it
was necessary to dereference addresses to
calculate something like 3 + 4, Eiffel provides a
mechanism of expanded types to enable data to
be stored by value rather than by reference.
In a way this is the opposite of C++:
in C++, all data is stored by value by default, and
operators are provided to define reference types;
in Eiffel, data is stored by reference by default, and
an operator is defined enabling some data to be
stored by value.
Expanded Types
If a class is defined as expanded, variables of
that class hold data values, not references.
The classes defining the basic types are all
defined to be expanded:
expanded class INTEGER
...
Expanded classes can't be unexpanded, so it's
not possible to define a variable which holds a
reference to an INTEGER, but for each
expanded class there is a corresponding
reference class defined in the Base Library, eg
Expanded Types
Expanded classes cannot be unexpanded.
So it is not possible to define a variable which
holds a reference to an INTEGER, but for each
expanded class there is a corresponding
reference class defined in the Base Library,
e.g., INTEGER_REF.
So Eiffel can provide a consistent type system,
without a performance handicap on basic types.
For practical purposes, the language works
much as expected; it is rarely necessary to deal
explicitly with expanded types.
Specifying EXPANDED Variables
It is also possible to specify that individual
variables are expanded, in which case they will
hold data values instead of references:
x : expanded COUNTER
In a case like this, the class must provide a
creation procedure with no arguments, so that
the variable can be correctly initialized.
User-Defined Types
Classic languages, like Pascal, defined a range
of basic types and a number of user-defined
types which enabled programmers to define
more complex data structures based on the
basic types.
User-defined types included sets, subtypes,
enumerated types, records and arrays.
In OO languages, the class is the main vehicle
for the definition of user-defined types,
essentially replacing and extending record
types.
Enumerations
This is a user-defined type consisting of a fixed
number of values normally given names and
thought of as uninterpreted symbols; They are
commonly implemented by assigning a unique
integer value to each symbol. E.g., in C++ and
Java 5.0:
enum Colour {red, yellow, green};
In C++, this defines Colour to be a value type.
In Java 5.0, this is a form of class definition:
attributes and methods can be added, and
enum types include the functionality inherited
from Object.
Enumerations in Eiffel
Eiffel does not support the declaration of
enumeration types. A set of constant attributes
to act as an enumeration can be defined:
red : INTEGER is unique ;
yellow : INTEGER is unique ;
green : INTEGER is unique ;
Unique attributes are guaranteed to have
values that are different from that of any other
unique attribute defined in the same class. They
are typically used in inspect statements to
discriminate the various possible cases.
Arrays
Since the beginning of programming, languages
have included arrays to facilitate the handling of
repeated data.
Arrays are characterized by two basic
properties:
The type of data contained in the array.
The size of the array, specified either as number or
elements, or by giving array bounds, i.e., the lowest
and highest permissable indicies.
Arrays in C++
In C/C++ arrays are defined to be the same as
pointers, ie the address of the area in memory
holding the array.
There is no notion of an array type, though the
component type of arrays is given.
Arrays are created by specifying the required
length, but this length cannot then be checked
at run-time: it is the programmer's responsibility
to keep track of the end of an array, eg nullterminated strings.
This is very insecure.
Arrays in Pascal
In Pascal, an array type is defined by the
component type and the bounds, from which
the length can be deduced.
This was found to be very strict: for example a
sort routine for arrays of one length won't type
won't work for others, even though the
algorithm would work unchanged.
Pascal got round this problem by defining a
looser array type for parameters, that specified
only the component type of the array, and
allowing the size of these arrays to be obtained
at run-time.
Arrays in Java
In Java, arrays are quasi-objects, though there
is no array class defined.
In particular, you can find out the length of an
array at run-time by calling something that looks
very like a class method.
Arrars in General
Languages have converged on defining array
types simply in terms of the component type,
and letting the size of an array object be
determined at run-time.
In fact, no extra type security is obtained by
including array size in the type, as the compiler
cannot check that array bounds will not be
exceeded at run-time, so run-time errors cannot
be eliminated.
Arrays in Eiffel
In Eiffel, arrays are defined by a class, like all
types.
The syntax is the same as other classes, with
the addition of special notation for array literals:
x : ARRAY[INTEGER]
create x.make(1, 10)
x := << 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 >>
-- assign constant array
x.put(42, 1)
-- put 42 at position 1 in x
io.put_integer(x.item(1)) -- prints 42
io.put_integer(x @ 1)
-- synonym for "item"
C++ Notation in Eiffel Arrays
In later versions of Eiffel, the same notation as
C++ or Java can be used to store data in an
array and to retrieve it:
x[1] := 42
io.put_integer(x[1])
Run-Time Violations of Bounds
Languages respond in different ways to a runtime violation of an array's bounds.
C and C++ simply ignore it; more secure
languages such as Java and Eiffel will raise a
run-time exception if an out-of-bounds array
access is attempted.
Arrays and Design by Contract
Consider the following partial definition of a
class intended to record details of a football
team's performances during a season.
class
RESULTS
create
make
feature
points : ARRAY[INTEGER]
played : INTEGER
total : INTEGER
make( games : INTEGER ) is
do
create points.make(1, games)
end
add_result( pts : INTEGER ) is
do
played := played + 1
points.put( pts, played )
total := total + pts
end
end
Invarients
The invariant for this class should state, among
other things, that all the values in the array
points should be equal to 0, 1 or 3, the possible
points that a team can be awarded for each
game.
Using conventional boolean expressions, the
only way of specifying this would be to write
something like
(points[1] = 0 or points[1] = 1 or
points[1] = 3) and (points[2] = 0 ...)
and ...
valid_results : BOOLEAN is
local
i : INTEGER
do
Result := true
from
i := 1
until
not Result or else i > points.upper
loop
Result := Result and (points[i] = 0 or
points[i] = 1 or points[i] = 3)
i := i + 1
end
end
invariant
valid_results
for_all and there_exists
A better approach would be to find a way to
mimic the quantifiers of logic, or in other words
to extend the boolean expressions used in
assertions so that it is possible to say things like
"every element of the array is ..." or "at least
one element of the array is ...".
Eiffel provides this kind of facility by defining the
features for_all and there_exists in the ARRAY
class.
for_all and there_exists
Both for_all and there_exists apply a given
Boolean-valued function to every element of an
array.
for_all returns true if every element of the
array satisfies the given function, and
there_exists returns true if at least one does.
The way in which the function is supplied to
for_all and there_exists varies between
different versions of EiffelStudio.
Here the helper function valid_result tests a
single value, and the loop code is provided by
the for_all feature. The keyword agent
creates a 'function reference', and the '?'
indicates which parameter should be replaced
by each array element.
With EiffelStudio 5.7 or later it is possible to use
the agent keyword to define an anonymous
function. This means that the invariant can be
written without defining a separate feature
whose job is simply to check the value of an
array element.
With EiffelStudio 5.6 or Later
valid_result( i : INTEGER) : BOOLEAN is
do
Result := i = 0 or else i = 1 or
else i = 3
end
invariant
valid_results:
points.for_all( agent
valid_result(?) )
With EiffelStudio 5.7 or Later
invariant
valid_results: points.for_all(
(agent (i : INTEGER) : BOOLEAN
do
Result := i = 0 or else i =
1
or else
i = 3
end
)
)