Formalization of Generics for the .NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge)

Download Report

Transcript Formalization of Generics for the .NET Common Language Runtime Dachuan Yu (Yale University) Andrew Kennedy, Don Syme (Microsoft Research Cambridge)

Formalization of Generics for the
.NET Common Language Runtime
Dachuan Yu
(Yale University)
Andrew Kennedy, Don Syme
(Microsoft Research Cambridge)
1
Introduction

Upcoming revision of Microsoft .NET platform
includes support for parametric polymorphism
(“generics”) in






Programming languages C#, Visual Basic, Managed C++
Common Language Runtime (the “virtual machine”)
Visual Studio (Integrated Development Environment)
Libraries
Previous work (PLDI’01) described implementation
techniques used in the CLR
Now we formalize the polymorphic intermediate
language and aspects of the implementation
2
CLR: The big picture
C#
program
Visual Basic
program
SML.NET
program
C#
compiler
Visual Basic
compiler
SML.NET
compiler
IL
IL
IL
Native
interop
Loader & JIT front-end
Remoting
Threads
JIT IL
JIT code-gen
Machine code
Native
binary
Garbage
collector
Exception
Handling
Common Language
Runtime
Security
3
CLR: The big picture
C#
program
Visual Basic
program
SML.NET
program
C#
compiler
Visual Basic
compiler
SML.NET
compiler
IL
IL
IL
Native
interop
Loader & JIT front-end
Remoting
Threads
JIT IL
JIT code-gen
Machine code
Native
binary
Garbage
collector
Exception
Handling
Common Language
Runtime
Security
4
High-level design of generics

Type parameterization for all declarations

classes
e.g. class Set<T>

interfaces
e.g. interface IComparable<T>

structs
e.g. struct HashBucket<K,D>

methods
e.g. static void Reverse<T>(T[] arr)

delegates (“first-class methods”)
e.g. delegate void Action<T>(T arg)
5
Good design =>
Tricky Implementation

Unrestricted instantiation
List<string> ls = new List<string>();
List<double> ld = …
List<Pair<string,double>> lsd = …

Full support for run-time types
if (x is Set<string>) { ... }
y = (List<T>) z;

// reference types
// primitive types
// struct types
// type-test
// checked cast
Recursion in instantiations
class List<T> : ICloneable<List<T>>
class C<T> { C<C<T>> fld; }
// finite
// infinite
6
Why formalize?


In previous work (POPL’01, Gordon & Syme) the aim
was a type soundness proof for a subset of IL (Baby
IL)
Our aims are different:

Implementation techniques used in the CLR product are
subtle and difficult to get right (=> bugs, perhaps security
holes)


Current JIT- and pre-compilers are not type-preserving


Our formalization provides a basis for typed compiler
intermediate languages for more capable and robust compilers
It’s also difficult to express and apply optimizations


We’d like to validate those techniques
Formalization makes this easier
By-product is a generic variant on Baby IL
7
Formalization: the big picture
BILG classes
and methods
BILG = “Baby IL with Generics”
A tiny subset of MS-IL
Specialize generic classes and methods
Share instantiations w.r.t. data representation
Introduce types-as-values
Optimize use of types-as-values
BILC classes
and methods
BILC = “Baby IL with Constrained generics”
A typed intermediate language more suitable
for code-generation
8
Illustrative example, in C#
Want to share generated code for ArrayToList
over different instantiations of T
class ArrayUtils {
static List<T> ArrayToList<T>(T[] arr)
{
Pass type parameters at
…new List<T>()…
runtime?
}
}
Want to share generated code for List
over different instantiations of T
Look up type representations
at runtime?
class List<T> {
virtual List<T> Append(object obj) {
…(List<T>) obj…
…new ListCell<T>…
}
Look up type representations at runtime?
}
How do we know what T is?
9
Source Language: BILG


“Baby IL with Generics”
Purely functional, à la Featherweight Java (Igarashi, Pierce,
Wadler)







Primitive types & generic classes
Inheritance-based subtyping
Generic methods (static and virtual)
Type-case operation (isinst) inspects run-time type of object
No overloading, no interfaces, no abstract methods, no structs
(“value classes”), no delegates, no boxing, no null values, no heap, no
bounded polymorphism
Just enough to demonstrate most of the implementation
techniques!
Typing rules & big-step semantics in paper


Easier to work with big-step
¬ 9 v. e  v taken as definition of divergence
10
Source language: BILG
(type)
(inst type)
(class def)
T,U
I
cd
::=
::=
::=
(method def )
md
(method ref)
(expr) e
M
::=
|
::=
::=
|
|
|
|
|
X | int32 | int64 | I
C<T1,…,Tn>
class C<X1,…,Xn > : I
{T1 f1 ;…; Tm fm; md1 … mdk }
static T m<X1,…,Xn>(T1,…,Tm) { e; }
virtual T m<X1,…,Xn>(T1,…,Tm) { e; }
I::m<T1,…,Tn>
ldc.i4 i4 | ldc.i8 i8 | ldarg x
e1 … en newobj I
e ldfld I::f
e1 … en call M
e e1 … en callvirt M
e isinst I or e
11
BILG typing and evaluation for isinst
E`e:I
E ` e’ : I’
E ` e isinst I’ or e’ : I
fr ` e  I’(f1=v1,…,fn=vn)
` I’ <: I
fr ` e isinst I or e’  I’(f1=v1,…,fn=vn)
fr ` e  I’(f1=v1,…,fn=vn)
` ¬(I’ <: I)
fr ` e’  v’
fr ` e isinst I or e’  v’
12
BILG typing and evaluation for isinst
E`e:I
E ` e’ : I’
Observe:
E ` e isinst I’ or e’ : I
Types affect evaluation
They cannot be erased
fr ` e  I’(f1=v1,…,fn=vn)
` I’ <: I
They serve static and dynamic purposes
fr ` e isinst I or e’  I’(f1=v1,…,fn=vn)
fr ` e  I’(f1=v1,…,fn=vn)
` ¬(I’ <: I)
fr ` e’  v’
fr ` e isinst I or e’  v’
13
Target Language: BILC

Similar to BILG, but adds

Representation constraints on type parameters




Types-as-values




ref: “must be a reference type”
i4: “must be a 32-bit integer”
i8: “must be a 64-bit integer”
RT is a value representing closed type T
The value RT has singleton type Rep(T), interpreted as
“is a value representing the type T”
Construct reps for open types
mkrepC<T1,…,Tn>(e1,…,en) creates a type-rep
for C<T1,…,Tn> given type-reps for T1,…,Tn
Semantics given by small-step reduction relation
14
Target language: BILC (subset)
(type)
(inst type)
(extended types)
(constraint)
(class def)
T,U
I

s
cd
::=
::=
::=
::=
::=
(method def )
md
(method ref)
(expr)
e
M
::=
|
::=
::=
|
|
|
|
|
|
|
X | int32 | int64 | I
C<T1,…,Tn>
T | Rep(T)
ref | i4 | i8
class C<X1 :s1,…,Xn :sn > : I
{T1 f1 ;…; Tm fm; md1 … mdk }
static T m<X1 :s1,…,Xn :sn >(1,…, k) { e; }
virtual T m<X1 :s1,…,X :sn>(1,…, k ) { e; }
I::m<T1,…,Tn>
i4 | i8 | x
I(e,e1,…,en)
e ldfld I::f
e1 … en call M
e e1 … en callvirt M
e isinstIe or e
RT
mkrepC<T1,…,Tn>(e1,…,en)
15
Some typing and reduction rules
E ` C<T1,…,Tn> ok
E ` e1 : Rep(T1)
…
E ` en : Rep(Tn)
E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>)
E ` e : I’
E ` e’ : Rep(I)
E ` e isinstI e’ or e’’ : I
v = I(w,v1,…,vn)
E ` e’’ : I
“Reflected subtyping”:
RI Á RI’ iff I <: I’
w Á w’
 ` (v isinstT w or v’) ! v
v = I(w,v1,…,vn)
w § w’
 ` (v isinstT w or v’) ! v’
16
Some typing and reduction rules
E ` C<T1,…,Tn> ok
E ` e1 : Rep(T1)
…
E ` en : Rep(Tn)
E ` mkrepC<T1,…,Tn>(e1,…,en) : Rep(C<T1,…,Tn>)
E ` e : I’
E ` e’ : Rep(I)
E ` e’’ : I
E ` e isinstI e’ or e’’ : I
v = I(w,v1,…,vn)
w Á w’
 ` (v isinstT w or v’) ! v
v = I(w,v1,…,vn)
w § w’
 ` (v isinstT w or v’) ! v’
Observe:
Types do not affect evaluation
They can be erased
They serve only static purposes
17
Example

Static generic method in BILG:
static List<T> Conv<T>(object a) {
…a isinst List<T>…

Translated to BILC:
Specialized code for T= int32
static Listi Convi(object a) {
…a isinstTreei RTreei)…
Listl
Convl(object
Code static
shared for
reference
types a) {
…a isinstTreel RTreel…
Specialized code for T= int64
Extra parameter representing T
static Listr<T> Convr<T:ref>(Rep(T) r, object a) {
…a isinstListr<T> (mkrepListr<T>(r))…
Lookup/Create type rep at runtime
18
We need more…

So far:



specialization, sharing, and separation of run-time types from static
types
but mkrep is a costly operation, requiring type-rep creation at
runtime
Idea: instead of passing representations for type parameters,
pass representations of types that we actually need:
static Listr<T> Convr<T:ref>(Rep(Listr<T>) r, object a) {
…a isinstListr<T>(r)…
Extra parameter
representing List<T>
19
We need more…

In general, we need many type-reps in a single method body


What type does a dictionary of type-reps have?




So we pass around dictionaries of type-reps
At its simplest, it is just a tuple
e.g. Rep(List<X>) £ Rep(Vec<Vec<X>>) is type of a two-slot dictionary
containing type-reps for List<X> and Vec<Vec<X>>
In general, dictionaries may contain cycles (e.g. for mutually
recursive methods), so we need recursive values and their types
Worse still, polymorphic recursion requires “infinite” dictionaries
Simpler: use name-based types for dictionaries



reps for methods: Rep(M), RM, mkrepM(e1,…,en)
statically: each Rep-type determines a particular tuple of other
Rep-types
dynamically: each type-rep RT or method-rep RM determines a tuple
of type-rep/method-rep values
20
Target language: BILC (full)
(type)
(inst type)
(ext type)
(constraint)
(class def)
T,U
I

s
cd
::=
::=
::=
::=
::=
(method def )
md
::=
(method ref)
(expr)
e
M
|
::=
::=
|
|
|
|
|
|
|
|
|
|
X | int32 | int64 | I
C<T1,…,Tn>
T | Rep(T) | Rep(M)
ref | i4 | i8
class C<X1 :s1,…,Xn :sn > : I
{T1 f1 ;…; Tm fm; md1 … mdk } with 1,…,p
static T m<X1 :s1,…,Xn :sn >(1,…, k) { e; }
with 1,…,p
virtual T m<X1 :s1,…,X :sn>(1,…, k) { e; }
I::m<T1,…,Tn>
i4 | i8 | x
I(e,e1,…,en)
e ldfld I::f
e1 … en call M
e e1 … en callvirt M
e isinstIe or e
RT | RM
mkrepC<T1,…,Tn>(e1,…,en)
mkrepC<T1,…,Tn>::m<U1,…,Uk>(e1,…,en,e1,…,ek)
objdicti e
mdicti e
21
Translation scheme

Static generic methods:



Virtual methods in generic classes



Extra dictionary parameter associated with method
Accessed using mdicti(e)
Obtain dictionary through type of object
Accessed using objdict_i(e)
Generic virtual methods:


Dictionary type not known statically (body could be
overridden)
So pass reps for type parameters and construct type-reps at
runtime using mkdrep
22
In the paper…
Complete formalization of BILG, BILC, and a
translation
 Theorems:




Translation preserves types
Translation preserves behaviour
And in forthcoming technical report:


Full proofs
Type erasure theorem: types in BILC do not affect
evaluation
23
Future work

Extend BILG and the translation to cover more
features

Value classes (structs)





Would satisfy representation constraint of form [s1,…,sn] where
s1,…,sn are constraints on the fields’ representations
Now have unbounded number of specializations
All methods on generic structs whose code is shared take a
dictionary parameter
Need treatment of boxing
Flexible specialization policies


Less sharing: e.g. full specialization of selected types
More sharing: e.g. share all instantiations of C<T> by boxing and
unboxing appropriately (cf ML)
24
Future work: structural typing


Flexible specialization interacts badly with run-time
types based on name-equivalence
Instead, describe dictionaries using structural typing:



Products:
Rep(List<X>) £ Rep(X) is two-slot dictionary with type-reps
for List<X> and X
Circular dictionaries => Recursive types e.g.
 D. Rep(Vec<X>) £ (Rep(Set<X>) £ D)
Polymorphic recursion in code => Higher-kinded recursive
types e.g.
(D. X. Rep(Vec<X>) £ D(Set<X>)) string
25
Related work

Rep(T)


Crary, Weirich, Morrisett: “Intensional polymorphism in
type-erasure semantics”
Dictionary-passing for polymorphism implementation


Saha and Shao (ML)
Viroli and Natali (Java)
26
Questions?
27