Types Overview Categorizing what a statement is intended to accomplish is difficult. Categorizing what a data structure is intended to mean (at least in.

Download Report

Transcript Types Overview Categorizing what a statement is intended to accomplish is difficult. Categorizing what a data structure is intended to mean (at least in.

Types
Overview
Categorizing what a statement is intended
to accomplish is difficult.
Categorizing what a data structure is
intended to mean (at least in part) is
much easier.
Types permit this categorization, and
enforce rules that the data structures
are used in a meaningful way.
Limited Objectives
• To allow overloading. E.g. “x+y” can be
– IntAdd(x,y) if int x,y.
– FloatAdd(IntToFloat(x),y) if int x, float y.
– StringCat(x,IntToString(y)) if string x, int y
etc.
Grand Objective:
Eliminating nonsense
• Using the same bit string with two different
interpretations: as an integer and a float etc.
• Mixing apples and oranges. E.g. assigning a
month (represented as an integer) to be a
font (ditto). Multiplying a font by 7.
• Dereferencing. E.g. dereferencing a pointer to
a “customer ” record and reading it as if it
were a “item for sale” record.
• Calling a function with inappropriate
arguments.
Further constraints
•
•
•
•
•
•
•
•
•
Composite types (records, arrays, lists etc.)
User-defined types with constraints
Types have properties.
Functions have types
Polymorphic functions (e.g. list operations)
Not overburden user with type theory.
If possible, find errors at compile time.
Low runtime burden.
Allow user to do transgressive things if that’s really
what they intend to do.
Tension between nanny PL and permissive PL.
Dynamic vs. Static Typing
• Dynamic typing: Type of a variable is not
known until run time. Variables must be
allocated on heap and tagged with type.
(LISP, other interpreted PLs)
• Static typing: Type of a variable can be
determined at compile time.
– Declarations (most PL’s): user declares type.
– Inferred (ML): compiler infers type.
Static typing allows efficient
compilation
• Allocate memory
• Disambiguate overloading
Translating between types
Type conversion (also called type
casting): explicitly translating one type
to another.
int I; float X; char C;
I = int(X);
X = float(I);
C = char(I);
Type coercion
Compiler automatically translates one
type to another.
i=x;
 i = fix(x)
x=i;
 x = float(i)
x=2
 x = 2.0
Non-converting type coercion
Reading a bit string of one type as if it
were another type.
E.g. using the bit string for floating point
as if it were an integer.
Rarely what you actually want. Highly
implementation dependent.
Therefore, the PL should make it
impossible to do this accidentally.
Overloading
An operator or function is overloaded if it
has multiple definitions, depending on
the types of the arguments.
E.g. int i; float x; string s;
i+i  intAdd(i,i)
x+x  floatAdd(x,x)
s+s  stringAppend(s,s)
Overloading functions
function exponent (N,P : Integer) return
Integer;
function exponent(N: Float; P: Integer)
return Float;
function exponent(N,P; Float) return Float;
(Ada)
Terminology
Type checking: Ensure that a program
obeys the language’s type rules.
Type clash: Violation of the rules.
Strongly typed: The implementation
catches all type clashes.
The rules depend on the
language
An operation can be a type clash in one language but
not in another. E.g.
int i, a[32];
i := 100;
a[i] := 5;
In PASCAL this is a type clash because “array of integer
of size 32” is a type. In Ada it’s just a semantic error,
because the type is “array of integer”.
If you want your language to be statically typed, you
have to define types so that this isn’t a type clash.
Not that it makes any difference to what the compiler
or the run-time executor does.
Viewpoints on types
Denotational: A type is a set of values.
Constructive: Complex types are built up
from primitive types
Abstract: A type consists of a set of
operations that combine according to
specified rules.
Basic types
• Numeric: Integer, floating point,
long/short integer, double precision …
• Character: short (ASCII), long (Unicode)
• Boolean
Enumerated types
PASCAL
type weekday = (sun, mon, tue, wed, thu, fri,
sat)
for today := mon to fri do begin …
var sales : array[day] of real;
Implemented as integers, but incompatible.
weekday D;
D := 5 /* Type error */
Subrange types
Ada:
type testScore is new Integer range 0 …
100; // Derived type
subtype workday is weekday range mon
… fri; // Constrained subtype
Compatibility: Derived type
S : testScore;
Count : Integer;
S := 95; // OK. “95” has type
// “Universal_Integer”
S := Count; // Not OK. Type clash.
S := testScore(Count); // OK
// Casting from Integer to testScore
Compatibility: Subtype
D : weekday;
E : workday;
E = D; // OK. No cast needed.
D = sat;
E= D; // Range error but not a type error.
Arrays
A one-dimensional array M[L..U] of type T is a
mapping from the integers between L and U
to values of type T.
Assume all values of type T are the same size.
The standard implementation of M places all
U+1-L values in a consecutive block of
memory of size (U+1-L)*|T|, starting at
address A0. The starting address of M[I] is
A0+(I-L)*|T|
Dope vector
Features of the array that are not known
at compile time must be saved in a data
structure known as a dope vector.
The location of the dope vector and the
significance of its fields is known at
compile time.
Allocation of arrays
Global lifetime, static shape: Allocated at
fixed address.
Local lifetime, static shape: Allocated in
activation record
Local lifetime, shape bound at elaboration
time: Allocated in activation record using
indirection and dope vector
Allocation of arrays: cntd
• Lifetime survives function that creates it:
allocated on heap
• Dynamic size: Allocated on heap.
Array bounds checking
In principle, at every reference to A[I], you have
to check that I is within the bounds of A.
In implementations that don’t do this (C,
FORTRAN) this is by far the most common
source of unsafeness and bizarre bugs.
An optimizing compiler can often eliminate
many of these checks as redundant.
Bizarre bugs
Bugs whose manifestation is not
explainable in terms of the language
model.
Chief source: Out of bounds array, pointer
arithmetic, dangling pointer, erroneous
deallocation
Bizarre bugs (cntd)
May appear or disappear when
• Add or delete unused variable declaration
• Add or delete print statements.
• Reorder variable declarations
• Upgrade compiler
May appear in executing routines far from the
actual bug.
May be intermittent.
Magic success
Code that shouldn’t really work that does
because of some unsafe operation.
E.g. a variable is properly initialized
because some array is written past its
bounds.
Multidimensional arrays
An array M of dimension A x B of type T
is allocated as a block of size AxBx|T|.
Row-major order (almost universal):
M[0,0], M[0,1], … M[0,B-1],
M[1,0], M[1,1] … M[1,B]
…
M[A-1,0] … M[A-1,B-1].
M[I,J] at M[0,0] + (B*I+J)*|T|
Efficient array access
for (I=0; I++; I<N)
for (J=0; J++; J<N)
M[I][J] = I+J;
may work much faster than
for (J=0; J++; J<N)
for (I=0; I++; I< N)
M[I][J] = I+J;
• Cache hit rate
• (In large arrays) Page fault rate.
Ragged arrays
Multidimensional array allocated as a array of
pointers to rows. Either record the length of
each row or terminate with a flag value (e.g.
null character).
Advantages:
• Sometimes faster (trade indirection for
multiplication).
• Sometimes space efficient (trade pointer for
null values).
Records
As compared with arrays:
• Heterogeneous type, while arrays are
homogeneous.
• Named fields rather than integer index.
Offset of field from record is known at
compile time.
Pointers
• Really only two operations: Creating a
pointer to an object, and following a
pointer.
• C allows arithmetic on pointers within
same array to achieve optimization. Bad
idea.
What are pointers allowed to
point to?
• Only heap objects (Pascal, Ada-83,
Java)
• Stack objects (C, C++, Ada-95)
Dangling pointer
double *p;
void a() {
double x;
p = &x;
}
void b() {
int i = 1;
*p = 2.0;
print(i);
}
main() { a(); b(); }.
Ada 95 has restrictions on pointers to
stack objects that prevent this, but that’s
complicated.
Advantages to allowing pointers
to stack objects
If you want to create a data structure with
pointers whose lifetime is equal to the
function that creates it, that’s easy to do
allocating it on the stack, and a pain to
do allocating it on the heap.
Note: during the lifetime of the data
structure, the same code will work
whether it’s on the stack or on the heap.
Other advantage to allowing
pointers to stack objects
function A lexically contains functions B
and C.
A has a collection of large structures (e.g.
arrays).
Declare a pointer local to B and C and
use the pointer to pass references to
the structure.
Unions/Variants
Example: A bibliography records:
• Books. Fields: Author, title, date, publisher.
• Articles. Fields: Author, title, journal title, date,
volume, pages.
• Manuscripts: Fields: Author, title, library, ms
number.
You want all these to be the same type,
because you want to create arrays of such
records. You don’t want 9 different fields, of
which at least 3 will always be null.
Union/variants
These are generally more trouble than
they’re worth, so unless you’re actually
tight for space, don’t bother.
Unions in C: The wrong way
union datum {
int i;
double d;
} /* u.d and u.i share the same storage */
union datum u;
u.d = 2.0; /* storage is set as floating point */
j = u.i;
/* and accessed as integer */
Ada variant parts: the right way
type ItemCat is (Book, Article, MS)
type BibRecord(Category: ItemCat)
record Author, Title: string;
case Category is
when Book => year: Integer;
publisher: string;
when Article => year, vol, pStart, pEnd:
Integer;
jnl: string
end case
end record
Ada variant parts
Can only access pStart field if Category is
set to “Article” (runtime check).
If fields have been set and Category is
changed, then fields are reset to
“Unbound”.