History, Architecture, and Implementation of the CLR Serialization and Formatter Classes Peter de Jong April 24, 2003

Download Report

Transcript History, Architecture, and Implementation of the CLR Serialization and Formatter Classes Peter de Jong April 24, 2003

History, Architecture, and
Implementation of the CLR
Serialization and Formatter
Classes
Peter de Jong
April 24, 2003
History







J++ DCOM 1997
J++ SOAP 1998
CLR .Net Remoting 1999 Spring
CLR Serialization Classes 1999 Spring
CLR SoapFormatter 1999 Spring
CLR BinaryFormatter 1999 December
CLR V1 2002 January
J++ Soap

Http
Server
Original Soap Spec (Bob
Atkinson) 1997

Protocol

HTTP Bi-Directional


XML


Client
No namespaces, no xsd
RPC



Give me a call - Server
callback using response
from a hanging http
request.
Soap Header root for Soap
Headers and parameter
graph
No Envelope
J++ Proxy/Stub for
serialization/deserialization
of Interface parameters
Soap Root
Parameters
Soap Headers
CLR Soap

Soap .9 spec



Section 5 specifies how to map objects
Namespaces, no xsd
Soap Envelope



Rpc - rooted Headers and Parameters
Serialization – root of object graph
Most annoying part



Headers are really an array of objects
For XML beauty specified as xml field elements.
Lead to specification of root attribute
Soap Moving Target



Original Soap
Soap .9
Soap as a cottage industry

Easy to produce a subset of soap



Microsoft had 5 or so implementations
Individuals and companies set up Soap Web sites
Soap Interop Meeting (IBM 2000-2001)


Soap Application Bench marks
Led to Web sites which implemented the Applications


Soap 1.0

Standards effort which included many of the Soap producers.



~15 sites to test interoperability
Envelope, body - no header or parameter root
Moved Section 5 to an appendix
Soap 1.1

Nest top level object
Serialization Classes
Architecture
BinaryFormatter
Binary Stream
Serializer
----------Parser
Object Reader
---------------------Object Writer
SoapFormatter
Serializer
----------- Soap XML Stream
Parser
Object Reader
---------------------Object Writer
Serialization Classes
Object Reader
---------------------Object Writer
Serialization Classes

Designed to make it easy to produce
Formatters.


True for a subset of CLR
False for the complete CLR object model

SoapFormatter and BinaryFormatter are the
only Serialization/Deserialization engines which
support the complete CLR model.
Serialization Classes Services





System controlled serialization (Serializable,
NotSerialized)
User controlled serialization (ISerializable)
Type substitution (ISerializationSurrogate,
ISurrogateSelector)
Object Substitution (IObjectReference)
Object Sharing Fixups
System Controlled Serialization

Serialization




Serialization Custom Attribute
NotSerialized Customer Attribute
public, internal, private fields serialized
Deserialization



Creates Uninitialized object
Populates the fields
Constructor is not called
User Controlled Serialization



Inherits from ISerializable
Serialization – GetObjectData give
name/value pairs to serializer
Deserialization – Constructor used to retrieve
name/value pairs and populate object.



Constructor is not in Interface, so compiler can’t
check whether it present
Constructor isn’t inherited, so each subclass needs
its own constructor
Earlier version used SetObjectData instead of
constructor
Surrogates

Type substitution

Objects of specified type replaced by a new
object of a different type.
MarshalByRefObject
ObjRef
Proxy
Object Substitution

IObjectReference

GetRealObject method returns deserialized
object


When object is returned, it and its descendents
are completely deserialized
Used extensively for returning singleton
system objects

Types, Delegates
Object Fixup

Reference before
object

Serialization swizzles
objref to integer
Object Fixup Complications



Value classes must be fixed up before
boxed
ISerializable directly referenced object
graphs must be deserialized one level
IObjectReference object graph must be
completely deserialized
IDeserializationCallBack

Used to signal that deserialization is
complete

E.g. Hashtable can’t create hashes until all
the objects are deserialized.
Formatter Classes
IFormatter Object Graph



Serialize(Stream s, Object graph)
Object Deserialize(Stream s)
Properties



ISurrogateSelector
SerializationBinder (Type substitution when deserializing)
StreamingContext









CrossProcess
CrossMachine
File
Persistence
Remoting
Other
Clone
CrossAppDomain
All
IRemotingFormatter - RPC

Serialize(Stream s, Object graph, Header[] headers)

Two Serializations



Graph (parameter array)
Headers (Header array)
Object Deserialize(Stream s, HeaderHandler handler)


Delegate Object HeaderHandler(Headers[] headers)
Headers handed to delegate, delegate returns object into
which parameters are deserialized.
Formatter Property Enums

FormatterTypeStyle

TypesWhenNeeded – types outputted for




TypesAlways



Arrays of Objects
Object fields, inheritable fields
ISerializable
version compatibility
MemberInfo -> ISerializable
FormatterAssemblyStyle
Simple – No version information

Full – Full assembly name
Defaults
Remoting – Serialization Full, Deserialization Simple
Non-Remoting – Serialization Full, Deserialization Full

SoapFormatter additional
Properties

ISoapMessage – Alternate way of specifying
Parameter/Header serialization.






ParamNames
ParamValues
ParamTypes
MethodName
XmlNameSpace
Header[] headers
BinaryFormatter

Binary Stream Format Design

Primitive types are written directly


Array of primitives - bytes are copied directly from the CLR
(100x faster then using reflection)
All other types are written as records

Basic record types


SerializedStreamHeader, Object, ObjectWithMap,
ObjectWithMapAssemId, ObjectWithMapTyped,
ObjectWithMapTypedAssemId, ObjectString, Array,
MemberPrimitiveTyped, MemberReference, ObjectNull,
MessageEnd, Assembly
Record types added later for performance

ObjectNullMultiple256, ObjectNullMultiple, ArraySinglePrimitive,
ArraySingleObject, ArraySingleString, CrossAppDomainMap,
CrossAppDomainString, CrossAppDomainAssembly, MethodCall,
MethodReturn
Serialization
1
2
5
1
2
6
3
4
3
7
5
6
7
4
8
9
8
9
10
10
Serialization Complications








MethodCall/MethodReturn
CrossAppDomain
Determine when Type information is needed
Value classes are nested/Non-Value classes are top
level
Arrays – mix of jagged and multi-dimensional [][,,][]
Array of primitives copied to stream as a collection of
bytes
Surrogates
ISerializable
Deserialization
5
1
2
3
Fixups
Process 1, fixups 2, 3, 4
Process 2, fixups 5,6
Process 3, fixups 7
Process 4, fixups 8,9
4
6
7
8
9
10
Deserialization Binary

Parsing


Record Headers specify what is coming
next in stream
Primitives do not have headers so need to
use previously encountered record headers
as map for reading primitive
Deserialization Complications

Remoting





MethodCall/MethodReturn optimization
CrossAppDomain
Value Type
ISerializable
Surrogate
Retrospective
What Went Wrong -1

Beta1 gave GC a workout


Object oriented style is dangerous for plumbing.
Lots of objects created.
Solution




Use object singletons (or fixed number)
Object pools
Start with larger storage for growing objects such as
ArrayLists
Special cases – Primitive parameters - serialization
classes aren’t used so aren’t initialized.
What Went Wrong - 2


Performance is never good enough
Reflection is slow



Boxes value types
Interpretive
Serialization classes are slow


Boxes value types
Keeps lots of state around in resizable
arrays
What Went Wrong - 3

Formatters are slow


Object type and field information inflates size of stream (reflection and
versioning requirement)
Lots of irregular cases



Clr – value types, singletons, transformations
Serialization – ISerializable, Resolving graph rules
Code more general then it has to be

now we know, but during development underlying system kept changing





Clr object model (variants, reflection, security, BCL, etc)
Serialization model (ISerializable underwent many changes)
Soap spec kept changing
Binary Format changed for perf reasons
Fixups used too much – strings and value classes are put in stream when
encountered, object references are put in stream, with object coming later


Soap 1.2 nests reference objects
BinaryFormatter should be changed to nest objects
What Went Wrong -4

Why didn’t we use Reflection.Emit



BinaryFormatter Primitive Arrays uses array copy rather then reflection


Primitive and strings bypasses the BinaryFormatter results in faster times
then COM cross process
BinaryFormatter prototyped option to omit type information in stream


100x faster when switch was made
Cross Appdomain smuggling


1200 serialization to make up cost
Couldn’t serialize private and internal fields
4 byte point class serialized in 10 bytes instead of 125 bytes.
Future version of the Formatters will be much faster


Improvements to Reflection.Emit
Cross Appdomain Serialization Prototype implemented in the EE.
What Went Wrong - 5

Web Services

The BinaryFormatter and SoapFormatter existed before Web
Service classes




Serialization, Formatter, and Remoting classes are based on
object oriented programming, RPC and COM models
Web Services started to gain importance late in the
development of the .Net Frameworks
Future releases will combine the two models, use same custom
attributes and underlying messaging model
SoapFormatter



Specify shape of stream to some extent
Object WSDL, added additional schema information to WSDL to
allow generation of the CLR object model in client proxies
Object WSDL is only way in .Net Frameworks V1 to copy clr
metadata without copying dll which includes code
The Formatters are Great (at least useful)






Only way to make a deep copy of an object
graph with complete fidelity
Integrated with .Net Remoting
Combines the CLR Object Model with the Web
Services Model
Version resilient (at least the attempt is
made)
Secure
Perf isn’t all that bad