Transcript Document

Types
Antonio Cisternino
Programmazione Avanzata
Types




Computer hardware is capable of interpreting
bits in memory in several different ways
A type limit the set of operations that may be
performed on a value belonging to it
The hardware usually doesn’t enforce the notion
of type, though it provides operations for
numbers and pointers
Programming languages tend to associate types
to value to enforce error-checking
Type system

A type system consists of:
 A mechanism
for defining types and
associating them with certain language
constructs
 A set of rules for:
type equivalence: two values are the same
 type compatibility: a value of a given type can be
used in a given context
 type inference: type of an expression given the
type of its constituents

Type checking



Type checking is the process of ensuring that a
program obeys the language’s type compatibility
rules
A language is strongly typed if it prohibits, in a
way that the language implementation can
enforce, the application of any operation to any
object that is not intended to support that
operation
A language is statically typed if it is strongly
typed and type checking can be performed at
compile time
Programming Languages and type
checking
No type checking








Assembly
C
Pascal
C++
Java
Lisp
Prolog
ML
Static type checking
Not entirely
strongly typed (union,
Static
type checking
interoperability
of pointers
and arrays)
Not
entirely strongly
typed (untagged
variant records)
Static type checking
Not entirely strongly typed (as C)
Static type checking
Dynamic type checking (virtual methods)
Dynamic type checking (virtual methods,
upcasting)
Dynamic
type checking
Dynamic
type checking
Strongly
typed
Strongly
typed
Strongly
Static
typetyped
checking
Strongly typed
Different views for types

Denotational:
 types
are set of values (domains)
 Application: semantics

Constructive:
 Built-in
types
 Composite types (application of type constructors)

Abstraction-based:
 Type
is an interface consisting of a set of operations
Language types






Boolean
Int, long, float, double (signed/unsigned)
Characters (1 byte, 2 bytes)
Enumeration
Subrange (n1..n2)
Composite types:





Struct
Union
Arrays
Pointers
List
Type Conversions and Casts

Consider the following definition:
int add(int i, int j);
int add2(int i, double j);
 And the following calls:
add(2, 3); // Exact
add(2, (int)3.0); // Explicit cast
add2(2, 3); // Implicit cast
Memory Layout
Typically hardware-types on 32 bits
architectures require from 1 to 8 bytes
 Composite types are represented by
chaining constituent values together
 For performance reasons often compilers
employ padding to align fields to 4 bytes
addresses

Memory layout example
4 bytes/32 bits
struct element {
char name[2];
int atomic_number;
double atomic_weight;
char metallic;
};
name
atomic_number
atomic_weight
metallic
Optimizing Memory Layout



C requires that fields of struct should be
displaced in the same order of the declaration
(essential for working with pointers!)
Not all languages behaves like this: for instance
ML doesn’t specify any order
If the compiler is free of reorganizing fields holes
can be minimized (in the example by packing
metallic with name saving 4 bytes)
Union


Union types allow sharing the same memory
area among different types
The size of the value is the maximum of the
constituents
4 bytes/32 bits
name
union u {
struct element e;
int number;
};
atomic_number
atomic_weight
metallic
number
Abstract Data Types




According to the abstraction-based view of types
a type is an interface
An ADT defines a set of values and the
operations allowed on it
In their evolution programming languages have
included mechanisms to define ADT
Definition of an ADT requires the ability of
incapsulating values and operations
Example: a C list
struct node {
int val;
struct list *next;
};
struct node* next(struct node* l) { return l->next; }
struct node* initNode(struct node* l, int v) {
l->val = v; l->next = NULL; return l;
}
void append(struct node* l, int v) {
struct node p = l;
while (p->next) p = p->next;
p->next =
initNode((struct node)malloc(sizeof(struct node)), v);
}
ADT, modules and classes




C doesn’t provide any mechanism to hide the
structure of data types
A program can access the next pointer without
using the next function.
The notion of module has been introduced to
define data types and restrict the access to their
definition
An evolution of module is the class: values and
operations are tied together (with the addition of
inheritance)
Class type





Class is a type constructor like struct and array
A class combines other types like structs
Class definition contains also methods which are
the operations allowed on the data
The inheritance relation is introduced
Two special operations provide control over
initialization and finalization of objects
The Node Type in Java
class Node {
int val;
Node m_next;
Node(int v) { val = v; }
Node next() { return m_next; }
void append(int v) {
Node n = this;
while (n.m_next != null) n = n.m_next;
n.m_next = new Node(v);
}
}
Inheritance
If the class A inherits from class B (A<:B)
when an object of class B is expected an
object of class A can be used instead
 Inheritance expresses the idea of adding
features to an existing type (both methods
and attributes)
 Inheritance can be single or multiple

Example
class
int
int
int
}
class
int
int
}
A {
i;
j;
foo() { return i + j; }
B : A {
k;
foo() { return k + super.foo(); }
Questions

Consider the following:
A a = new A();
A b = new B();
Console.WriteLine(a.foo());
Console.WriteLine(b.foo());


Which version of foo is invoked in the second
print?
What is the layout of class B?
Upcasting

Late binding happens because we convert a reference to
an object of class B into a reference of its super-class A
(upcasting):
B b = new B();
A a = b;


The runtime should not convert the object: only use the
part inherited from A
This is different from the following implicit cast where the
data is modified in the assignment:
int i = 10;
long l = i;
Downcasting

Once we have a reference of the super-class we
may want to convert it back:
A a = new B();
B b = (B)a;


During downcast it is necessary to explicitly
indicate which class is the target: a class may be
the ancestor of many sub-classes
Again this transformation informs the compiler
that the referenced object is of type B without
changing the object in any way
Upcasting, downcasting



We have shown upcasting and downcasting as
expressed in languages such as C++, C# and Java;
though the problem is common to OO languages
Note that the upcast can be verified at compile time
whereas the downcast cannot
Upcasting and downcasting don’t require runtime type
checking:


in Java casts are checked at runtime
C++ simply changes the interpretation of an expression at
compile time without any attempt to check it at runtime
Late Binding




The output of the example depends on the
language: the second output may be the result
of invoking A::foo() or B::foo()
In Java the behavior would result in the
invocation of B::foo
In C++ A::foo would be invoked
The mechanism which associates the method
B::foo() to b.foo() is called late binding
Late Binding





In the example the compiler cannot determine statically
the exact type of the object referenced by b because of
upcasting
To allow the invocation of the method of the exact type
rather than the one known at compile time it is necessary
to pay an overhead at runtime
Programming languages allow the programmer to
specify whether to apply late binding in a method
invocation
In Java the keyword final is used to indicate that a
method cannot be overridden in subclasses: thus the
JVM may avoid late binding
In C++ only methods declared as virtual are considered
for late binding
Late Binding




With inheritance it is possible to treat objects in a generic
way
The benefit is evident: it is possible to write generic
operations manipulating objects of types inheriting from
a common ancestor
OOP languages usually support late binding of methods:
which method should be invoked is determined at
runtime
This mechanism involves a small runtime overhead: at
runtime the type of an object should be determined in
order to invoke its methods
Example (Java)
class A {
final void foo() {…}
void baz() {…}
void bar() {…}
}
class B extends A {
// Suppose it possible!
final void foo() {…}
void bar();
}
A a = new A();
B b = new B();
A c = b;
a.foo();
a.baz();
a.bar();
b.foo();
b.bar();
c.foo();
c.bar();
//
//
//
//
//
//
//
A::foo()
A::baz()
A::bar()
B::foo()
B::bar()
A::foo()
B::bar()
Abstract classes

Sometimes it is necessary to model a set S of objects
which can be partitioned into subsets (A0, … An) such
that their union covers S:


If we use classes to model each set it is natural that



x  S  Ai  S, x  Ai
 A  S, A<:S
Each object is an instance of a subclass of S and no
object is an instance of S.
S is useful because it abstracts the commonalities
among its subclasses, allowing to express generic
properties about its objects.
Example






We want to manipulate documents with different formats
The set of documents can be partitioned by type: doc,
pdf, txt, and so on
For each document type we introduce a class that
inherits from a class Doc that represents the document
In the class Doc we may store common properties to all
documents (title, location, …)
Each class is responsible for reading the document
content
It doesn’t make sense to have an instance of Doc though
it is useful to scan a list of documents to read
Abstract methods




Often when a class is abstract some of its
methods could not be defined
Consider the method read() in the previous
example
In class Doc there is no reasonable
implementation for it
We leave it abstract so that through late binding
the appropriate implementation will be called
Syntax

Abstract classes can be declared using the abstract
keyword in Java or C#:
abstract class Doc { … }

C++ assumes a class is abstract if it contains an abstract
method


it is impossible to instantiate an abstract class, since it will lack
that method
A virtual method is abstract in C++ if its definition is
empty:
virtual string Read() = 0;

In Java and C# abstract methods are annotated with
abstract and no body is provided:
abstract String Read();
Inheritance




Inheritance is a relation among classes
Often systems impose some restriction on
inheritance relation for convenience
We say that class A is an interface if all its
members are abstract; has no fields and may
inherit only from one or more interfaces
Inheritance can be:
 (C. A <: C  C = B))
| A <: B}, 1 BS  ¬interface(B))
 Multiple (no restriction)
 Single (A <: B
 Mix-in (S = {B
Multiple inheritance



Why systems should impose restrictions on inheritance?
Multiple inheritance introduces both conceptual and implementation
issues
The crucial problem, in its simplest form, is the following:



In presence of a common ancestor:




B <: A  C <: A
D <: B  D <: C
The instance part from A is shared between B and C
The instance part from A is duplicated
This situation is not infrequent: in C++ ios:>istream, ios:>ostream
and iostream<:istream, iostream<:ostream
The problem in sharing the ancestor A is that B and C may change
the inherited state in a way that may lead to conflicts
Java and Mix-in inheritance






Both single and mix-in inheritance fix the common
ancestor problem
Though single inheritance can be somewhat restrictive
Mix-in inheritance has become popular with Java and
represents an intermediate solution
Classes are partitioned into two sets: interfaces and
normal classes
Interfaces constraints elements of the class to be only
abstract methods: no instance variables are allowed
A class inherits instance variables only from one of its
ancestors avoiding the diamond problem of multiple
inheritance
Implementing Single and Mix-in
inheritance

Note that Upcastingthe
and Downcasting
Consists only in combining
state ofcomes
a for
free: the pointer at the base of the instance can
class and its super-classess
be seen both as a pointer to an instance of A or
B
A
B<:A
C<:B<:A
D<:C<:B<:A
A
A
A
A
B
B
B
D
Implementing multiple inheritance

With multiple inheritance becomes more
complex than reinterpreting a pointer!
A
B<:A
C<:A
D<:B, D<:C
D<:B, D<:C
A
A
A
A
A (B)
B
C
B
A (C)
C
B
D
C
D
Late binding







How to identify which method to invoke?
Solution: use a v-table for each class that has
polymorphic methods
Each virtual method is assigned a slot in the table
pointing to the method code
Invoking the method involves looking up in the table at a
specific offset to retrieve the address to use in the call
instruction
Each instance holds a pointer to the v-table
Thus late binding incurs an overhead both in time (2
indirections) and space (one pointer per object)
The overhead is small and often worth the benefits
Late binding: an example (Java)
A’s v-table
class A {
void foo() {…}
void f() {…}
int ai;
}
class B extends A {
void foo() {…}
void g() {…}
int bi;
}
A a = new A();
a.foo();
a.f();
foo
V-pointer
a
ai
f
b
B’s v-table
foo
f
V-pointer
ai
bi
g
B b = new B();
b.foo();
b.g();
b.f();
A c = b;
c.foo();
c.f();
c
JVM invokevirtual

A call like:
x.equals("test")

is translated into:
aload_1
; push local variable 1 (x) onto the operand stack
ldc "test" ; push string "test" onto the operand stack
invokevirtual java.lang.Object.equals(Ljava.lang.Object;)Z




where java.lang.Object.equals(Ljava.lang.Object;)Z is a
method specification
When invokevirtual is executed, the JVM looks at method specification and
determines its # of args
From the object reference it retrieves the class, searches the list of methods
for one matching the method descriptor.
If not found, searches its superclass
Invokevirtual optimization


The Java compiler can arrange every subclass
method table (mtable) in the same way as its
superclass, ensuring that each method is
located at the same offset
The bytecode can be modified after first
execution, by replacing with:
invokevirtual_quick mtable-offset

Even when called on objects of different types,
the method offset will be the same
Virtual Method in Interface


Optimization does not work for interfaces
interface Incrementable { public void incr(); }
class Counter implements Incrementable {
public void incr(); }
class Timer implements Incrementable {
public void decr();
public void inc(); }
Incrementable i;
i.incr();
Compiler cannot guarantee that method incr() is at the same offset.
Runtime type information




Execution environments may use the v-table
pointer as a mean of knowing the exact type of
an object at runtime
This is what happens in C++ with RTTI, in .NET
CLR and JVM
Thus the cost of having exact runtime type
information is allocating the v-pointer to all
objects
C++ leaves the choice to the programmer:
without RTTI no v-pointer is allocated in classes
without virtual methods
Overloading


Overloading is the mechanism that a language
may provide to bind more than one object to a
name
Consider the following class:
class A {
void foo() {…}
void foo(int i) {…}
}

The name foo is overloaded and it identifies two
methods
Method overloading






Overloading is mostly used for methods because the compiler may
infer which version of the method should be invoked by looking at
argument types
Behind the scenes the compiler generates a name for the method
which includes the type of the signature (not the return type!)
This process is known as name mangling
In the previous example the name foo_v may be associated to the
first method and foo_i to the second
When the method is invoked the compiler looks at the types of the
arguments used in the call and chooses the appropriate version of
the method
Sometimes implicit conversions may be involved and the resolution
process may lead to more than one method: in this case the call is
considered ambiguous and a compilation error is raised
Operator overloading




Though operators such as + and – have a syntax
different from the function invocation they identify
functions
C++ and other languages (i.e. C#) allow overloading
these operators in the same way as ordinary functions
and methods
Conceptually each invocation of + is rewritten in to the
functional version and the standard overloading process
is used
Example (C++):
c = a + b; // operator=(c, operator+(a, b))
Late binding: an example (Java)
A’s v-table
class A {
void foo() {…}
void f() {…}
int ai;
}
class B extends A {
void foo(int i) {…}
void g() {…}
int bi;
}
foo()
V-pointer
a
ai
f
b
B’s v-table
foo()
f
V-pointer
ai
bi
foo(int)
g
A a = new A();
a.foo();
a.f();
B b = new B();
b.foo();
b.g();
b.f();
A c = b;
c.foo(3);
c.f();
c
Late binding: only on first argument
A’s v-table
class A {
void foo(A a) {…}
void f() {…}
int ai;
}
class B extends A {
void foo(B b) {…}
void g() {…}
int bi;
}
foo()
V-pointer
a
ai
f
b
B’s v-table
foo(A)
f
V-pointer
ai
bi
foo(B)
g
A a = new A();
a.foo();
a.f();
B b = new B();
b.foo();
b.g();
b.f();
A c = b;
c.foo(c);
c.f();
c