Transcript Types (PPT)

Type Systems
Haskell & ML: Interesting Features
• Type inferencing
• Freedom from side effects
• Pattern matching
• Polymorphism
• Support for higher order functions
• Lazy patterns / lazy evaluation
• Support for object-oriented programming
Type Inferencing
• Def: ability of the language to infer types without having
programmer provide type signatures.
SML e.g.:
fun min
= if
(a: real,
a > b
– type of a has to be given, but then that’s sufficient to figure out
• type of b
• type of min
– What if type of a is not specified?
- could be ints
- could be
Type Inferencing (cont)
• Haskell (as with ML) guarantees type safety
Haskell example:
(a = b)
– a polymorphic function that has a return type of bool,
• assumes only that its two arguments are of the same type and can
have the equality operator applied to them.
– ML has similar assumption, for what it calls equality types.
• Overuse of type inferencing in both languages is discouraged
– declarations are a design aid
– declarations are a documentation aid
– declarations are a debugging aid
ML:
fun factorial (0)
= | factorial (n)
n * factorial (n - 1)
– ML infers factorial is an integer function: int -> int
Haskell:
factorial (0)
factorial (n) = n * factorial (n - 1)
– Haskell infers factorial is a (numerical) function: Num a => a -> a
Polymorphism (cont)
ML:
fun mymax(x,y) = if x > y then x else y
– SML infers mymax is ambiguous
fun mymax(x: real ,y) = if x > y then x else y
– SML infers mymax is real
Haskell:
mymax(x,y) = if x > y then x else y
– Haskell infers mymax is an Ord function
Polymorphism (Cardelli & Wegner)
• Universe, V, of all values
• A Type is a set of values selected from V (subset of V)
• Sometimes only way to enumerate is through constants and functions
• An Ideal is a type that satisfies certain "technical" properties
• (one would not identify a type containing integers and Int-> Int functions)
• All types found in programming languages are ideals
• (Value) Having a type::= membership in a set.
• Because ideals can overlap, a value can have many types
• A type system (in a language) is a collection of ideals of V
• Languages provide support for defining which types are
mappable onto ideals
More Terms
• Monomorphic Type System: a value belongs to at
most one type
• Polymorphic Type System: a value may belong to
many types
• Mostly Monomorphic . . . Mostly Polymorphic
– One or the other characterizes individual languages
• Polymorphism, as it relates to:
– values and variables: may have more than one type
– functions:
– types:
arguments can be of > one type
operations are applicable to operands of more than one type
Polymorphism: A Taxonomy
Universal: infinite number of types with common
Parametric: uniformity of type structure
is achieved by type parameters
Inclusion: object can belong to many
different classes that need not be
disjoint (subtypes & inheritance)
Ad Hoc
Overloading: same name used to denote
different functions. Use determined
from context
Coercion: a semantic operation required
to convert an argument to a type
expected by a function.
Ad Hoc: finite set of potentially unrelated types.
Exploring Terminology…
• Is inclusion polymorphism a kind of parametric polymorphism?
– Consider invocation of a method (behavior) in C++ (Smalltalk): selection
is based (parametrically) on type…
– Why is inclusion polymorphism not a form of parametric polymorphism?
• Are generics (templates) a form of universal polymorphism?
– Cardelli & Wegner: no
– Day et. al.: yes (parametric)
• Is there a difference between/among subtypes, subclasses and
– Subtypes: derived type’s methods/data subsume parent type’s
– Subclasses: structuring
– Inheritance: subtypes + subclasses -> specialization
Cardelli on Type Systems
• Type system
– purpose is to prevent occurrence of execution errors during runtime
• Type Sound Language
– absence of execution errors holds for all program runs that can be
described in a programming language
• Typechecker
– method for determining if type errors occur
– ambiguities in language specifications often lead to different type checker
implementations, hampering language soundness.
• Type
– “Upper bound” (maximal set) on range of values a variable can take on
• Typed Language
– one in which variables can be given (nontrivial) types
How about “can assume”?
More Cardelli on Types
• Explicit / implicit typing
– as names suggest…
• Trapped errors
– execution error when computation stops “immediately”
• Untrapped errors
– execution errors that go unnoticed and cause arbitrary behavior
• Safe program fragment
– one that does not (cannot?) cause untrapped errors to occur
• Safe language
– one in which all program fragments are safe
Safety and Typed Languages
• “Untyped languages may enforce safety by performing run
time checks.”
• “Typed languages may also use a mixture of run time and
static checks.”
-- Is an untyped language that enforces safety
comprehensively at run time equivalent to a
typed language that uses run time checks
Off on Good Behavior
• Forbidden errors
– all untrapped errors plus some trapped errors
– (what trapped errors might be included?)
• Good Behavior (well behaved)
– no forbidden errors occur
– a well behaved program fragment is safe
• Strongly checked language
– One in which all (legal) program fragments have good behavior
• no untrapped errors occur
• none of the specified trapped errors occur
• other trapped errors may occur - programmer must avoid them
– (notice avoidance of “strongly typed”)
Safety and Typed
-- Cardelli argues languages should be safe and typed
(Should type system be implicit, or explicit, or both?)
ML Type Inferencing
• Key concepts:
– Type variables
– Substitution
– Unification
– Most general unifiers
– Inferencing
Type Variables/Instances
• Type variables:
tyvar::= ‘identifier
e.g.: ‘a ‘b ‘m
– provide for polymorphism
• Type instances:
int <: ‘a
int list <: ‘a
int list <: ‘a list
int <: ‘a list
-- int is an instance of ‘a
-- list of ints is an instance of ‘a
-- int list is an instance of list of ‘a
-- int is NOT an instance of list of ‘a
• A precise definition requires substitution for type variables
• Substitution:
– replacement of type variable by another type variable or a concrete
– e.g. Replacing ‘a by int, or ‘a by ‘b list
• Unification:
– t1 and t2 are unified by substitution s if
st1 = st2
– e.g. unification of ‘a * int and int * ‘b is:
‘a --> int,
‘b --> int
• (yielding int * int)
Most General Unifiers
• Make no unnecessary assumptions:
– ‘a list and ‘b are unified by ‘a --> int list, ‘b --> int list list
– ‘a list and ‘b are unified by ‘b --> ‘a list
• Which unifier is more general?
s1 is an instance of s2 iff
there exists s such that s1 = ss2
• For example above:
s = ‘a --> int list
• The most general unifier of types t1 and t2 is a substitution s
such that:
– t1 and t2 are unified by s
– and there is no more general s’ that also unifies t1 and t2
ML Type Inferencing Example
fun find p [] = false
| find p (x::S) = if p x then true
else find p S
• Initial type environment:
bool *
‘e *
• No assumptions about find:
• lambda: new r with fresh type variables for parameters:
ML Type Inferencing Example (2)
fun find p [] = false
| find p (x::S) = if p x then true
else find p S
• Analysis:
implies ‘j --> ‘c list
p x
implies ‘i --> ‘c -> ‘l
if p x
implies ‘l --> bool
if ... true else find ... implies ‘k --> ‘m -> bool
find p ...
implies ‘m --> (‘c -> bool) * ‘n
find ... S
implies ‘n --> ‘c list
• Composing all substitutions yields:
p :
‘c list
‘c -> bool
(‘c -> bool) * ‘c list -> bool
Constraint-Based Type Inference and
Parametric Polymorphism
Ole Agesen (Stanford) 1994
• Constraint-based analysis: technique for inferring
implementation types
• Using flow analysis to build network of type variables
connected by constraints
• Program can be viewed as collection of slots and
// slot declaration
x := y + 1
// Expression (y+1) and slots (x,y)
Three Step Process: Steps 1 & 2
• Allocating "type variables" to every slot and expression
– Initially empty
– Process of type inference binds types to type variables
– On termination of inferencing, type variables hold "sound
– Inference exhibits montonicity: type variables have types
added, only.
• Seeding type variables
– Find obvious cases where type is known and assign to type
x 0
// x’s type is the type of the literal 0.
Three Step Process: Step 3
• Establishing constraints and propagating (repeat until
– Connect type variables into a network by adding directed
• Nodes are type variables; edges are constraints
– Whenever constraint is added, object types are propagated
– One constraint generated for each data flow in the program
• assignment generates data flow from expr to assigned variable
• variable access gens data flow from variable to accessing expression
• message send (or func call) generates flows from actuals to formals
and result generates flow back to message send (invocation point)
IF Expression Data
IF Expression
(IF test-expr THEN-expr ELSE-expr)
Type of whole expression is union of types of
max: a = ( self > a iftrue: [self] false: [a] ).
self > a
Template Examples
3 max: 4
2 max: 1
3 max: 4
2.5 max: 1.3
[integer, float]
Inference Algorithms
• Basic - just saw it.
– works when all uses of a method are "similar”
– fails when two or more uses of method supply different types of
arguments, whether or not uses are individually polymorphic
• only one max template is created; may need more than one
• 1-level expansion: retype each method for each send invoking it.
– (would separate template shown on right on previous slide into two)
– Works when polymorphic call chain is only one level deep
– Fails when polymorphic call chain is > 1 level deep
– usual case
1-Level Expansion Algorithm: Failure
P-Level Expansion Algorithm
• Generalization of 1-Level algorithm
– expand to depth of p, then apply basic algorithm
• Size of expanded program is exponential in p !!!
• For any p, it’s possible to find a code sequence that
requires p+1 expansions.
Adaptive Inference Algorithms
• Precise inference algorithms must not mix types
– Types mix if two incompatible activation records are represented by the
same template
– Create lots of templates so they represent fewer activation records
• Efficient type inference requires processing as few templates as
– Template creation carries a computational cost
• 1-level algorithm does poorly in both cases
• Desire algorithm that is precise and efficient
• Adaptive algorithms attempt to create templates only when needed
– Lump similar activation records in one template when possible
• Success is when algorithm operates at a cost proportional to the
amount of polymorphism in the program.
Inference Algorithms
A critical condition in adaptive inference algorithms:
 type(rcvrExp1, N1) = type(rcvrExp2, N2)
share a template <=> 
 type(argExp1, N1) = type(argExp2, N2)
(N , N )
for the sends:
rcvrExp1 max: argExp1
rcvrExp2 max: argExp2
i.e. two uses of max: can share same template if they have same receiver and
same argument types resp.
Problem is: we’re trying to infer their types!
Adaptive algorithms use partial type info (or anything else) at decision point
Function Algorithm
Hash Hash
Improves precision and efficiency significantly
Hash function is computed when use of a method is processed:
hash: Send x Template  HashValue
maps a use (send in context of a template) to a hash value
template can be shared by two uses iff they have same hash value
Tradeoffs between precision and efficiency determined by hash function
Assume send is being analyzed in context of a template N (for method
containing S); analysis done for each possible receiver of the send:
hash_family(S,N) = {(r1S), (r2 ,S),..., (rk,S)}
type(S.receiver, N) = {r1, r2 ,..., rk}
Hash Function Algorithm
Hash Function
For hash value (ri,S); the first value is a possible reciever
two uses of method can share template only if they have same receiver object
Precision improved in two ways:
Inherited methods are reanalyzed in context of every object that inherits them
Sends to self (common) can be analyzed more precisely because a single target
can be found
Second component of hash value is send itself
=> different sends connect to different templates
essentially, an implementation of 1-level algorithm
Sends with no arguments hashed to same template
Note: receiver type may not be fully known when hash is done
OK. Just go back and rehash when receiver type grows (monotonically)
Hash Function Algorithm (3)
• Algorithm controls polymorphism at receiver well
444 value: nil With: nil With: nil.
3.5 value: 100 With: 100 With: 100
will map to receiver types of [integer] and [float] resp. i.e. the two
sends do not interfere. Two distinct sets of templates will be created.
Hash algorithm clearly fails on polymorphic arguments since
argument types are not considered
• One work-around is to always force templates to not be shared for
methods such as ifTrue: False
– Has obvious computational costs
Iterative Algorithm
• A significant improvement over hash function algorithm
• First iteration is to apply the basic algorithm
– creates one shared template for each method
• In subsequent iterations, less is shared
• Key idea: use type information on previous iteration to decide
whether or not to share templates in current iteration.
typei-1(rcvrExp1, N1) = typei-1(rcvrExp2, N2)
Share a template 
typei-1(argExp1, N1) = typei-1(argExp2, N2)
rcvrExp1 max: argExp1
(in context N1)
rcvrExp2 max: argExp2
(in context N2)
Iterative Algorithm (2)
• Advantage: complete type information is available from previous
• Iterative algorithm has more information available when making
critical decision
• Iterative algorithm uses types of both receiver and arguments (but
from the previous iteration)
• Termination is a key consideration
– after fixed number of steps (e.g. 5 - 7)
– when a fix point is reached
• may never be reached in face of recursion
• With this algorithm (using fix-point termination), analysis time is
proportional to amount of polymorphism in the program.
