Virtual Machines

Download Report

Transcript Virtual Machines

Antonio Cisternino
Vincenzo Gervasi
Introduction
Goal: explore virtual machines world,
exploring practical and formal methods
 Overview:

 Review of programming languages notions
 Anatomy of a VM and implementation
 Formal approaches (ASM, Oper. Sem.)
 Bytecode analysis
 Program generation
Architecture of a compiler
Back end
Front end
The VM
VM
Compiler
Execution model
Building blocks (CLR and JVM)
Memory management Note that the essential traits of the
execution environment are similar,
 Type system
though there are relevant
difference in the design
 Security
 Dynamic loading
 JIT
CLI has been standardized (ECMA
and ISO) and is a proper superset
 Reflection
of Java.
In the rest of the course we will
 Multi-threading
refer mainly to CLR. We will point

out when a feature is not included
in JVM.
How CLR works
C#
Unmanaged
GC
x86
C++
Managed x86
Security
Managed
ML
CIL
VB
BCL
Loader
JIT
CLR
…
A new layer to the onion
Runtime exposes a superset of OS Runtime mediates access between
Services through the BCL
the application and OS
App
T1App 2
1 CRT
T3
Applications
Tl
CLR
MLRT
T2
TmApp
n
Tn
OS
Hw
RT
Different runtimes implements in a
Applications are group of types interacting different way LP abstractions such
together
types: interoperability is complex
Review
Type system

A type system consists of:
 A mechanism for defining types and associating
them with certain language constructs
 A set of rules for:
○ type equivalence: two values are the same
○ type compatibility: a value of a given type can be
used in a given context
○ type inference: type of an expression given the type
of its constituents
Type checking
Type checking is the process of ensuring that a
program obeys the language’s type compatibility
rules
 A language is strongly typed if it prohibits, in a
way that the language implementation can
enforce, the application of any operation to any
object that is not intended to support that
operation
 A language is statically typed if it is strongly
typed and type checking can be performed at
compile time

Programming Languages and type
checking
No type checking








Assembly
C
Pascal
C++
Java
Lisp
Prolog
ML
Static type checking
Not entirely
strongly typed (union,
Static
type checking
interoperability
of pointers
and arrays)
Not
entirely strongly
typed (untagged
variant records)
Static type checking
Not entirely
strongly typed (as C)
Static
type checking
Dynamic type
type checking
checking (virtual
(virtual methods,
methods)
Dynamic
upcasting)
Dynamic
type checking
Dynamic
type checking
Strongly
typed
Strongly
typed
Strongly typed
Static type checking
Strongly typed
Different views for types

Denotational:
 types are set of values (domains)
 Application: semantics

Constructive:
 Built-in types
 Composite types (application of type
constructors)

Abstraction-based:
 Type is an interface consisting of a set of
operations
Language types







Boolean
Int, long, float, double (signed/unsigned)
Characters (1 byte, 2 bytes)
Enumeration
Subrange (n1..n2)
Functions
Composite types:





Struct
Union
Arrays
Pointers
List
Type Conversions and Casts

Consider the following definition:
int add(int i, int j);
int add2(int i, double j);
 And
the following calls:
add(2, 3); // Exact
add(2, (int)3.0); // Explicit cast
add2(2, 3); // Implicit cast
Memory Layout
Typically hardware-types on 32 bits
architectures require from 1 to 8 bytes
 Composite types are represented by
chaining constituent values together
 For performance reasons often
compilers employ padding to align fields
to 4 bytes addresses

Memory layout example
4 bytes/32 bits
struct element {
char name[2];
int atomic_number;
double atomic_weight;
char metallic;
};
name
atomic_number
atomic_weight
metallic
Optimizing Memory Layout
C requires that fields of struct should be
displaced in the same order of the
declaration (essential for working with
pointers!)
 Not all languages behaves like this: for
instance ML doesn’t specify any order
 If the compiler is free of reorganizing fields
holes can be minimized (in the example by
packing metallic with name saving 4
bytes)

Union
Union types allow sharing the same memory
area among different types
 The size of the value is the maximum of the
constituents

4 bytes/32 bits
name
union u {
struct element e;
int number;
};
atomic_number
atomic_weight
metallic
number
Memory management
Different lifetimes: static, automatic,
dynamic
 Problem: How to deal with dynamic
memory?
 Memory managers for the heap!
 Different strategies

 Free list
 Reference Counting
 Garbage Collection
Memory management and VM
Use of a Garbage Collector
 Typically Generational collection
 Generation 0: Copy collection
 Use of Mark and Sweep
 Use of write barrier
 Asynchronous Garbage Collection
 Heap for large objects
 Code collection (recently)

Class type





Class is a type constructor like struct and
array
A class combines other types like structs
Class definition contains also methods
which are the operations allowed on the
data
The inheritance relation is introduced
Two special operations provide control over
initialization and finalization of objects
Inheritance
If the class A inherits from class B
(B<:A) when an object of class B is
expected an object of class A can be
used instead
 Inheritance expresses the idea of adding
features to an existing type (both
methods and attributes)
 Inheritance can be single or multiple

Upcasting

Late binding happens because we convert a
reference to an object of class B into a reference
of its super-class A (upcasting):
B b = new B();
A a = b;


The runtime should not convert the object: only
use the part inherited from A
This is different from the following implicit cast
where the data is modified in the assignment:
int i = 10;
long l = i;
Downcasting

Once we have a reference of the superclass we may want to convert it back:
A a = new B();
B b = (B)b;
During downcast it is necessary to explicitly
indicate which class is the target: a class
may be the ancestor of many sub-classes
 Again this transformation inform the
compiler that the referenced object is of
type B without changing the object in any
way

Upcasting, downcasting



We have shown upcasting and downcasting as
expressed in languages such as C++, C# and
Java; though the problem is common to OO
languages
Note that the upcast can be verified at compile
time whereas the downcast cannot
Upcasting and downcasting don’t require runtime
type checking:
 in Java casts are checked at runtime
 C++ simply change the interpretation of an expression at
compile time without any attempt to check it at runtime
Late Binding
The output of the example depends on the
language: the second output may be the
result of invoking A::foo or B::foo
 In Java the behavior would result in the
invocation of B::foo
 In C++ A::foo would be invoked
 The mechanism which associate the
method B::foo to b is called late binding

Abstract classes




Sometimes it is necessary to model a set S of
objects which can be grouped into subsets such
as their union cover S
In this case x  S  A  S, x  A
If we use classes to model each set it is natural
that  A  S, A<:S
Each object is an instance of a subclass of S and
no object is instance of S. Thus S is useful
because abstracts the commonalities among its
subclasses, allowing to express generic properties
about its objects.
Abstract methods
Often when a class is abstract some of its
methods could not be defined
 In the previous example we may take into
account the method read
 In the class Doc there is no reasonable
implementation for it
 We leave it abstract so that through late
binding the appropriate implementation will
be called

Inheritance
Inheritance is a relation among classes
Often systems impose some restriction on
inheritance relation for convenience
 We say that class A is an interface if all its
members are abstract; has no fields and
may inherits only from one or more
interfaces
 Inheritance can be:


 Single (A<:B  (C. A <: C  C = B))
 Mix-in (S={B|A<:B}, !BS  ¬interface(B))
 Multiple (no restriction)
Multiple inheritance



Why systems should impose restrictions on inheritance?
Multiple inheritance introduces both conceptual and
implementation issues
The crucial problem, in its simplest form, is the following:
 A<:B  A <: C
 B<:D, C<:D

In presence of a common ancestor:
 The instance part from D is shared between B and C
 The instance part from D is duplicated


This situation isn’t infrequent: in C++ ios:>istream,
ios:>ostream and iostream<:istream, iostream<:ostream
The problem in sharing the ancestor C is that B and C may
change the inherited state in a way that may lead to conflicts
Java and Mix-in inheritance
Both single and mix-in inheritance fix the common
ancestor problem
 Though single inheritance can be somewhat restrictive
 Mix-in inheritance has become popular with Java and
represents an intermediate solution
 Classes are partitioned into two sets: interfaces and
normal classes
 Interfaces constraints elements of the class to be only
abstract methods: no instance variables are allowed
 A class inherits instance variables only from one of its
ancestors avoiding the diamond problem of multiple
inheritance

Implenting Single and Mix-in
inheritance

Consists only in combining
theandstate
of acomes for
Note that Upcasting
Downcasting
free: the pointer at the base of the instance can
class and its super-classess
be seen both as a pointer to an instance of A or
B
A
B<:A
C<:B<:A
D<:C<:B<:A
A
A
A
A
B
B
B
D
Implementing multiple inheritance

With multiple inheritance becomes more
complex than reinterpreting a pointer!
A
B<:A
C<:A
D<:B, D<:C
D<:B, D<:C
A
A
A
A
A (B)
B
C
B
A (C)
C
B
D
C
D
Late binding
How do we identify the method to be invoked?
 We introduce a v-table for each class that uses late
binding
 To each virtual method is associated a slot in the table
pointing to method’s body
 When the method is invoked a lookup in the table is
done to retrieve the address of the call instruction
 Each instance holds a pointer to the v-table
 Thus late binding costs both in time and space, though
the overhead is small and often the benefits motivate
the use of this mechanism

Late binding: an example (Java)
A’s v-table
class A {
void foo() {…}
void f() {…}
int ai;
}
class B extends A {
void foo() {…}
void g() {…}
int bi;
}
A a = new A();
a.foo();
a.f();
foo
V-pointer
a
ai
f
b
B’s v-table
V-pointer
foo
ai
f
bi
g
B b = new B();
b.foo();
b.g();
b.f();
A c = b;
c.foo();
c.f();
c
Runtime type information
Execution environments may use the v-table
pointer as a mean of knowing the exact type of
an object at runtime
 This is what happens in C++ with RTTI, in .NET
CLR and JVM
 Thus the cost of having exact runtime type
information is allocating the v-pointer to all
objects
 C++ leaves the choice to the programmer:
without RTTI no v-pointer is allocated in classes
without virtual methods

Reflection
VMs require information about types to
properly load them
 Executable contains the program (data)
and its description in terms of types
(meta-data)
 Meta-data can be accessed at runtime
through the Reflection API (more later)
 It is possible to associate custom metadata at runtime.

Polymorphism
Associate different bindings to a name
 We have already seen two forms of
polymorphism:

 Subtype/inclusion (inheritance)
 Overloading
Polymorphism is the fundamental
mechanism for generic programming
 There are other forms of polymorphisms
we will examine

Terminology

Overloading: methods of one class share the same name
but have different signatures

Overriding: methods of an inherited class share the same
name and the same signature of a super class

Binding refers to the association of a method invocation
to the code to be executed on behalf of the invocation.
 In static binding (early binding), all the associations are
determined at compilation time.
 In dynamic binding (late binding), the code to be executed in
response to a method invocation (i.e., a message) will not be
determined until runtime.
Polymorphic Methods



Polymorphic = “of many forms”
A polymorphic method is one that has the same name for different
classes of the same family but has different implementations for the
various classes
Polymorphism is possible because of
 inheritance: subclasses inherit attributes and methods of the superclass.
public class Circle extends Shape {
… …
}
 method overriding: subclasses can redefine methods that are inherited from
the superclass
public class Shape {
public float calculateArea( ) { return 0.0f; }
…
}
public class Circle extends Shape {
public float calculateArea( ) { return 3.14 * radius *radius; }
…
}
Classification of Polymorphism
Parametric
Universal
Subtype
Polymorphism
Overloading
Ad hoc
Coercion
Universal vs. ad hoc polymorphism
With overloading we are required to
provide an implementation for each
signature
 We provide ad hoc solutions for different
objects
 Besides with inheritance we define
algorithms that operates on objects that
inherits from a given class
 In this case there is a single (universal)
solution for different objects

Overloading
Overloading is the mechanism that a
language may provide to bind more than
one object to a name
 Consider the following class:

class A {
void foo() {…}
void foo(int i) {…}
}

The name foo is overloaded and it
identifies two methods
Methods’ overloading






Overloading is mostly used for methods because the
compiler may infer which version of the method should be
invoked by looking at argument types
Behind the scenes the compiler generates a name for the
method which includes the type of the signature (not the
return type!)
This process is known as name mangling
In the previous example the name foo_v may be associated
to the first method and foo_i to the second
When the method is invoked the compiler looks at the types
of the arguments used in the call and chooses the
appropriate version of the method
Sometimes implicit conversions may be involved and the
resolution process may lead to more than one method: in
this case the call is considered ambiguous and a compilation
error is raised
Operator overloading
Though operators such as + and – have a syntax
different from the function invocation they identify
functions
 C++ and other languages (i.e. C#) allow
overloading these operators in the same way as
ordinary functions and methods
 Conceptually each invocation of + is rewritten in to
the functional version and the standard
overloading process is used
 Example (C++):

c = a + b; // operator=(c, operator+(a, b))
Subtype Polymorphism

Example: Java Vector
Vector v = new Vector();
v.addElement(new Integer(2));
v.addElement("Pippo");

Signature of addElement:
void addElement(Object x);

The input argument is of type object
because the container can contain every
type of object
Problem with subtype polymorphism
When we add an object in the vector we
loose compile-time information: there is no
information about type at compile time
 In hte example we implicitly upcast from
Integer to Object:

v.addElement(new Integer(2));

This assignment produces a runtime error:
Integer i = (Integer)v.elementAt(1);
Type system
Execution environments such as CLR and JVM
are data oriented
 A type is the unit of code managed by the
runtime: loading, code, state and permissions
are defined in terms of types
 Applications are set of types that interact
together
 One type exposes a static method (Main) which
is considered the loader of the application: it
loads the needed types and creates the
appropriate instances

Java type system
T[]
Object
Class
interface T
String
class T
int
Base types
Java type system


There are base types: numbers, Object, String and
Class (which is the entry-point for reflection)
Type constructors are:
 Array
 Class



The number types are unrelated to Object with
respect to inheritance relation
This applies to interfaces too, but objects that
implements interfaces are always inherited from
object
Java type system is far simpler than the one of
CLR
CLR type system
Array
Object
Type
interface T
String
ValueType
T[]
Struct T
int
Base types
class T
Enum
Enum T
Delegate
Delegate T
CLR Type System


Common rooted: even numbers inherits from Object
There are more type constructors:
 Enum: constants
 Struct: like class but without inheritance and stack allocation
 Delegate: type that describes a set of methods with common
signature



Value types (numbers and structs) inherits from object.
Still are not references and aren’t stored on the heap
The trick is that when a value type should be upcasted
to Object it is boxed in a wrapper on the heap
The opposite operation is called unboxing
Delegate types


A delegate is a type that describes a class of
methods
Example:
class Foo {
delegate int MyFun(int i, int j);
static int Add(int i, int j) { return i +
j; }
static void Main(string[] args) {
MyFun f = new MyFun(Foo.Add);
Console.WriteLine(f(2, 3));
}
}
CLR delegates
Delegate object
Object
Object
Method
Method code
Delegates as types
A delegate type allows building delegate
objects on methods with a specified
signature
 The type exposes an Invoke method with
the appropriate signature at CLR level
 C# exposes delegates with a special
syntax in the declaration (not class like)
 The pair is built using the new operator and
the pair is specified using an invocation-like
syntax

Delegates like closures?
In functional programming it is possible to
define a function that refers external variables
 The behavior of the function depends on those
external values and may change
 Closures are used in functional programming to
close open terms in functions
 Delegates are not equivalent to closures
although are a pair (env, func): the environment
should be of the same type to which the method
belongs

Example: Aggregate function

The following method maps a function on
an array:
delegate int MyFun(int);
int[] ApplyInt(MyFun f, int[] a) {
int[] r = new int[a.Length];
for (int i = 0;i < a.Length;i++)
r[i] = f(a[i]);
return r;
}
Parametric Polymorphism





C++ templates implement a form of
parametric polymorphism
PP is implemented in many flavors and
many languages: Eiffel, Mercury, Haskell,
ADA, ML, C++…
Improve the expressivity of a language
May improve the performance of programs
It is a form of Universal polymorphism
C++ templates and macros



Macros are implemented into scanner
C++ templates are implemented on the syntax tree
The following class compiles unless the method
foo is used:
template <class T>class Foo {
T x;
int foo() { return x + 2; }
};

The instantiation strategy is lazy: we can use
Foo<char*> unless we use method foo
A more semantic approach





Parametric polymorphism has been
proposed for languages such as Java and
C#
In both cases the compiler is able to check
parametric classes just looking at their
definition
In this case parametric types are more than
macros on AST
Examples of generics are: GJ/Pizza for
Java and Generic C# for .NET
We introduce the syntax that is almost
identical in both GJ and GC#
Generics in a Nutshell

Type parameterization for classes, interfaces, and
methods e.g.
class Set<T> { ... }
//
parameterized
class
In
GJ
is
class Dict<K,D> { ... }
// two-parameter class
<T>
T[]
Slice(…)
interface IComparable<T> { ... } // parameterized
interface
struct Pair<A,B> { ... }
// parameterized struct (“value
class”)
T[] Slice<T>(T[] arr, int start, int count) // generic method

Very few restrictions on usage:
○ Type instantiations can be primitive or class e.g.
Set<int> Dict<string,List<float>>
MyClass>
Pair<DateTime,
○ Generic methods of all kinds (static, instance, virtual)
○ Inheritance through instantiated types e.g.
class Set<T> : IEnumerable<T>
class FastIntSet : Set<int>
Virtual methods
only in GC#!
More on generic methods




Generic methods are similar to template functions in
C++
As in C++ GJ tries to infer the type parameters from the
method call
GC# requires that type arguments are specified like all
other arguments
Example:
template <class T> T sqr(T x) { return x*x; }
std::cout << sqr(2.0) << std::endl;
class F { <T> static void sort(T[] a) {…} }
String[] s; F.sort(s);
class F { static void sort<T>(T[] a) {…} }
string[] s; F.sort<string>(s);
C++
GJ
GC#
Generic Stack
class Stack<T> {
private T[] items;
private int nitems;
Stack<T> { nitems = 0; items = new T[] (50); }
T Pop() {
if (nitems == 0) throw Empty();
return items[--nitems];
}
bool IsEmpty() { return (nitems == 0); }
void Push(T item){
if (items.Length == nitems) {
T[] temp = items;
items = new T[nitems*2];
Array.Copy(temp, items, nitems); }
return items[nitems++];
}
}
How does
the compiler
check the
definition?
Tip


C++ requires a space in nested parameter types: vector<vector<int> > to
avoid ambiguity with operator >>
GJ (and GC#) fixed the problem with the following grammar:
ReferenceType ::= ClassOrInterfaceType | ArrayType | TypeVariable
ClassOrInterfaceType ::= Name | Name < ReferenceTypeList1
ReferenceTypeList1 ::= ReferenceType1 | ReferenceTypeList , ReferenceType1
ReferenceType1 ::= ReferenceType > | Name < ReferenceTypeList2
ReferenceTypeList2 ::= ReferenceType2 | ReferenceTypeList , ReferenceType2
ReferenceType2 ::= ReferenceType >> | Name < ReferenceTypeList3
ReferenceTypeList3 ::= ReferenceType3 | ReferenceTypeList , ReferenceType3
ReferenceType3 ::= ReferenceType >>>
TypeParameters ::= < TypeParameterList1
TypeParameterList1 ::= TypeParameter1 | TypeParameterList , TypeParameter1
TypeParameter1 ::= TypeParameter > | TypeVariable extends ReferenceType2 |
TypeVariable implements ReferenceType2
The semantic problem
The C++ compiler cannot made any
assumption about parameter types’ nature
 The only way to type-check a C++ class is
to wait argument’s specification
(instantiation): only then it is possible to
check operations used (i.e. comp method
in sorting)
 From the standpoint of the C++ compiler’s
semantic module all types aren’t
parametric

Checking class definition





To be able to type-check a parametric class
just having a look to its definition we
introduce the notion of bound
As in method arguments have a type,
types’ arguments are bound to other types
The compiler will allow to use values of
such types as if upcasted to the bound
Example: class Vector<T : Sortable>
Elements of the vector should implement
(or inherit from) Sortable
Example
interface Sortable<T> {
int compareTo(T a);
}
class Vector<T : Sortable<T>> {
T[] v;
int sz;
Vector() { sz = 0; v = new T[15]; }
void addElement(T e) {…}
void sort() {
…
if (v[i].compareTo(v[j]) > 0)
…
}
}
Compiler can typecheck this because v
contains values that
implement
Sortable<T>
Pros and Cons




A parameterized type is checked also if no instantiation is
present
Assumptions on type parameters are always explicit (if no
bound is specified Object is assumed)
Is it possible to made assumptions beyond bound?
Yes, you can always cheat by upcasting to Object and then
to whatever you want:
class Foo<T : Button> {
void foo(T b) {
String s = (String)(Object)b;
}
}

Still the assumption made by the programmer is explicit
Implementation
There are several possibilities to implement
parametric polymorphism
 C++ generates AST for method and
classes at need
 GJ implements generic types at compile
time: the JVM is not aware of parametric
types
 GC# assumes that CLR is aware of
parametric types: the IL has been extended
with generic instructions to handle with
type parameters

GJ strategy
GJ is a language superset of Java
 The compiler verifies that generic types are
used correctly
 Type parameters are dropped and the
bound is used instead; downcasts are
inserted in the right places
 The output is a normal class file unaware
of parametric polymorphism

Example
class Vector<T> {
T[] v; int sz;
Vector() {
v = new T[15];
sz = 0;
}
<U implements Comparer<T>>
void sort(U c) {
…
c.compare(v[i], v[j]);
…
}
}
…
Vector<Button> v;
v.addElement(new Button());
Button b = b.elementAt(0);
class Vector {
Object[] v; int sz;
Vector() {
v = new Object[15];
sz = 0;
}
void sort(Comparer c) {
…
c.compare(v[i], v[j]);
…
}
}
…
Vector v;
v.addElement(new Button());
Button b =
(Button)b.elementAt(0);
Expressivity vs. efficiency






GJ doesn’t improve execution speed; though it
helps to express genericity better than inheritance
There is a main limit in GJ expressivity: at runtime
there isn’t exact type information
All instantiations of a generic type collapse to the
same class
Consequences are no virtual generic methods and
pathological situations
Benefit: Java classes could be seen as generic
types! Reuse of the large existing codebase
GJ isn’t the only implementation of generics for
Java
Problem with GJ
Stack<String> s = new Stack<String>();
s.push("Hello");
Object o = s;
Stack<Button> b = (Stack<Button>)s;
// Class cast exception
Button mb = s.pop();
Cast authorized:
both Stack<String> and
Stack<Button> map to class
Stack
Generic C# Strategy: GCLR





Kennedy and Syme have extended CLR to
support parametric types (the same proposal has
been made for PolyJ by Cartwright and Steele)
In IL placeholders are used to indicate type
arguments (!0, !1, …)
The verifier, jit and loader have been changed
When the program needs an instantiation of a
generic type the loader generates the appropriate
type
The JIT can share implementation of reference
instantiations (Stack<String> has essentially the
same code of Stack<Object>)
Generic C# compiler
GC# compiler implements a GJ like notation for
parametric types
 Bounds are the same as in GJ
 NO type-inference on generic methods: you
should specify type in the call
 The compiler relies on GCLR to generate the code
 Exact runtime types are granted by CLR so are
allowed virtual generic methods
 All type constructors can be parameterized: struct,
classes, interfaces and delegates.

Example
using System;
namespace n {
public class Foo<T> {
T[] v;
Foo() { v = new T[15]; }
public static
void Main(string[] args) {
Foo<string> f =
new Foo<string>();
f.v[0] = "Hello";
string h = f.v[0];
Console.Write(h);
}
}
}
.field private !0[] v
.method private hidebysig
specialname
rtspecialname
instance void .ctor() cil
managed {
.maxstack 2
ldarg.0
call
instance void
[mscorlib]System.Object::.ct
or()
ldarg.0
ldc.i4.s
15
newarr
!0
stfld
!0[] class
n.Foo<!0>::v
ret
} // end of method Foo::.ctor
Performance
The idea of extending CLR with generic
types seems good; but how is
performance?
 Although the instantiation is performed at
load time the overhead is minimal
 Moreover code sharing reduces
instantiations, improving execution speed
 A technique based on dictionaries is
employed to keep track of already
instantiated types

Expressive power of generics




System F is a well known typed -calculus with
polymorphic types
If Turing-equivalence is a trivial property of programming
languages; for a type-system being equivalent to System
F it isn’t
Polymorphic languages such as ML and Haskell cannot
fully express System F (both languages have been
extended to fill the gap)
System F can be transposed into GC#
http://www.cs.kun.nl/~erikpoll/ftfjp/2002/KennedySyme.pdf
Power of combining generics and
inheritance

We should extend inheritance relation with
a new subtyping rule:
Given
class C<T1,...,Tn> extends ty
we make
C<ty1,...,tyn>
now
cast
up
and
down to Object
subtype of
ty[ty1/T1, ..., tyn/Tn]
Can
safely.
 Note: we propagate the types because the
super-class can be parametric

Manipulating types
Grouping values into types has helped us
to build better compilers
 Could we do the same with types?
 Types can be grouped by means of
inheritance which represents the union of
type sets
 Parametric types combined with
inheritance allow expressing function on
types:

class Stack<T:object> : Container
Function name
Function arguments
Result type
Example: generic containers
class Row<T : Control> : Control
{ /* row of graphic controls *> }
class Column<T : Control> : Control
{ /* column of graphic controls */ }
class Table<T : Control> : Row<Column<T>>
{ /* Table of graphic controls */ }
…
// It generates the keypad of a calculator
Table<Button> t = new Table<Button>(3, 3);
for (int i = 0; i < 3; i++)
for (int j = 0; j < 3; j++)
t[i, j].Text = (i * 3 + j + 1).ToString();
Rotor
Extending SSCLI
Extending the runtime is a good way to
understand how it works
 There are several means of extending the
runtime:

 Adding new instructions
 Adding new internal calls
 Extending the execution model (i.e. change to
the type-system)

We focus on extending the execution
engine by adding a new instruction
Why?
SSCLI sources are huge (2Mil. loc)
 There is no way to browse the code without
a solid understanding of the models lying
under it
 A full understanding of how programming
languages work is needed in order to
dominate the complexity
 A runtime such as SSCLI is a neat example
of knowledge repository of the past 30
years of computer science

When?
The modification to the runtime should be few and
small: many problems can be solved on top of it
 PInvoke mechanism supports invocation of functions in
DLLs and can be used to add new functionalities in a
very efficient way
 Nonetheless sometimes it is necessary to change the
runtime
 Generics have been implemented changing the runtime
and introducing the notion of parametric type inside the
runtime
 New instructions have been added and many
components have been changed

The ldhw instruction
We want to add an essential instruction
to the runtime: ldhw
 The aim of this instruction is to simplify
the compilation of the most diffused
program: the hello world program
 When the instruction is executed the
string “Hello World” is loaded on the
operand’s stack

The C# program
public class Foo {
public static void Main(string[]
args) {
System.Console.WriteLine("Hello
world");
}
}
Compiler output
.class public auto ansi beforefieldinit Foo
extends [mscorlib]System.Object {
.method public hidebysig static void Main(string[] args)
cil managed {
.entrypoint
.maxstack 1
IL_0000: ldstr
"Hello world"
IL_0005: call void
[mscorlib]System.Console::WriteLine(string)
IL_000a: ret
} // end of method Foo::Main
.method public hidebysig specialname rtspecialname
instance void .ctor() cil managed {…}
} // end of class Foo
Version with ldhw
.class public auto ansi beforefieldinit Foo
extends [mscorlib]System.Object {
.method public hidebysig static void Main(string[] args)
cil managed {
.entrypoint
.maxstack 1
IL_0000: ldhw
IL_0005: call void
[mscorlib]System.Console::WriteLine(string)
IL_000a: ret
} // end of method Foo::Main
.method public hidebysig specialname rtspecialname
instance void .ctor() cil managed {…}
} // end of class Foo
What should be modified?
We should specify that there is a new
opcode
 The verifier must be aware of it
 JIT compiler should generate the
appropriate code for it
 At least ilasm and ildasm must be able
to read and write the new instruction

Adding an opcode
After a quick search it is easy to find that
opcode.def contains the opcodes definition
 Each opcode is described by a macro
OPDEF:

OPDEF(CEE_UNUSED5,
Constant name "unused", Pop0, Push a
Push0, InlineNone,
IPrimitive, 1,reference on
inside the
0xFF, 0xA6,runtime
NEXT)
the stack
Opcode
Don’t pop
any name

High byte (not
operandwith the following:
We replace the definition
used)
OPDEF(CEE_LDHW, "ldhw", Pop0,
PushRef, InlineNone, IObjModel, 1,
0xFF, 0xA6, NEXT)
Control flow
behavior
No argument
One byte
Low byte
Opcode kind
opcode
Are there other changes to do?

In principle there are many aspects in the runtime
that may depend on opcodes:








JIT
Verifier
Reflection.Emit
Ilasm
Ildasm
Yet the declaration in opcode.def is enough to
introduce an opcode
Instructions into the runtime are propagated by
mean of include and programs such as
opcodegen.pl
Ilasm/Ildasm are already capable of dealing with
ldhw
Giving semantics to ldhw
Now the runtime is aware of the new
instruction
 Yet it doesn’t know how to cope with it
 The semantics of an opcode is defined in the
JIT; the verifier should be informed too
 We should also discover how to get a String
object

The verifier



The verifier handle opcodes in a semi-automatic
fashion
In vertable.h the opcodes are listed as in
opcodes.def but with a different macro:
VEROPCODE
We substitute the macro for UNUSED5:
VEROPCODE(CEE_LDHW, "!")


The string “!” means that the instruction is verified
manually
CEE_STELEM_R4 is defined as “r4[r:” that means
that the stack should contains a float, a four byte
integer, an array of float, and after the instruction
all the operands are popped from the stack
The verifier


We couldn’t have declared our opcode without a custom handling:
there is no string type in the possible types
We add a new case to verifier.cpp:
case CEE_LDHW: {
Item
StrItem;
StrItem.SetKnownClass(g_pStringClass);
if (!Push(&StrItem)) {
FAILMSG_PC_STACK_OVERFLOW();
goto exit;
}
break;
}

The code is similar to CEE_LDSTR case, but in our case we
shouldn’t verify any input argument
Extending the JIT
The extension of the JIT is similar to the
one we made to the verifier
 In fjit.cpp there is a huge switch with all the
instructions of the runtime
 We should add our instruction to the
switch:

case CEE_LDHW:
JitResult = compileCEE_LDHW();
break;

The compileCEE_LDHW method will
output the appropriate x86 code
compileCEE_LDHW
FJitResult FJit::compileCEE_LDHW() {
// Where do we get the handle to "Hello World"
void* literalHnd = ???;
emit_WIN32(emit_LDC_I4(literalHnd))
emit_WIN64(emit_LDC_I8(literalHnd)) ;
emit_LDIND_PTR();
// Get the type handle for strings
CORINFO_CLASS_HANDLE s_StringClass =
jitInfo->getBuiltinClass(CLASSID_STRING);
VALIDITY_CHECK( s_StringClass != NULL );
pushOp(OpType(typeRef, s_StringClass ));
return FJIT_OK;
}
JIT structure
To improve performance JIT is defined
using macros
 There are several levels of macros to
simplify its management
 In our case we use only emit_XXX
instructions because we simply need to
load the string handle on top of the stack
 The PushOp call is not related to code
generation but rather than to a verification
of the code being jitted

Where do we get the string?





The JIT compiler is separated from the EE
This is because it should be possible to use different
compilers
Thus the simple solution of sharing a static variable is
not viable
The EE and the JIT cooperates through interfaces
contained in the inc directory
We extend the ICorStaticInfo (corinfo.h) interface with a
method called constructStringConstant which returns a
constant string
And its implementation?

The implementation of the ICorDynamicInfo and
ICorStaticInfo interfaces is in jitinterface.cpp (class
CEEInfo):
LPVOID __stdcall
CEEInfo::constructStringConstant(
CORINFO_MODULE_HANDLE scopeHnd, int s) {
REQUIRES_4K_STACK;
LPVOID result;
COOPERATIVE_TRANSITION_BEGIN();
result = (LPVOID)ConstructStringConstant(s);
COOPERATIVE_TRANSITION_END();
return result;
}
ConstructStringConstant
static OBJECTHANDLE _hndHW = NULL;
static OBJECTHANDLE __stdcall ConstructStringConstant(CORINFO_MODULE_HANDLE
scopeHnd, int s) {
THROWSCOMPLUSEXCEPTION();
OBJECTHANDLE string = NULL;
BEGIN_ENSURE_COOPERATIVE_GC();
Module* module = GetModule(scopeHnd);
if (_hndHW == NULL) {
OBJECTHANDLE tmpHandle =
module->GetAssembly()->Parent()->CreateHandle(NULL);
if (FastInterlockCompareExchangePointer((LPVOID *)&_hndHW,
tmpHandle, NULL) != NULL) DestroyHandle(tmpHandle);
}
if (HndFetchHandle(_hndHW) == NULL)
StoreObjectInHandle(_hndHW, COMString::NewString(L"Hello World"));
string = _hndHW;
END_ENSURE_COOPERATIVE_GC();
return string;
}
How it works?
It calls the NewString factory of
COMString class
 A static variable is used to refer the
string
 The object handle is got from the
AppDomain, otherwise the GC wouldn’t
be able to track the allocated string
 The string is created only if needed

compileCEE_LDHW
FJitResult FJit::compileCEE_LDHW() {
void* literalHnd =
jitInfo->constructStringConstant(methodInfo>scope,0);
emit_WIN32(emit_LDC_I4(literalHnd))
emit_WIN64(emit_LDC_I8(literalHnd)) ;
emit_LDIND_PTR();
// Get the type handle for strings
CORINFO_CLASS_HANDLE s_StringClass =
jitInfo->getBuiltinClass(CLASSID_STRING);
VALIDITY_CHECK( s_StringClass != NULL );
pushOp(OpType(typeRef, s_StringClass ));
return FJIT_OK;
}
Make it run








Change the source files (we distribute the
files of SSCLI with generics)
Run buildall
Compile tst.cs
Run: clix tst.exe
Disassemble it: ildasm tst.exe > tst.il
Change ldstr “Hello world” into ldhw
Assemble again: ilasm tst.il
Run the program: clix tst.exe
Conclusions





We have extended the runtime with a new
instruction
This has been the opportunity to perform a
little walkthrough of the code
Still we have seen only an aspect of it,
there are many others (MM, code
generation strategy, …)
Although the knowledge of the JIT helps in
digging other aspects
Now we need a special construct in C# for
the hello world program! ;-)