Introduction to distributed systems

Download Report

Transcript Introduction to distributed systems

Remote procedure call (RPC)
Outline
> Remote procedure calls
> Middleware
> Distributed objects
> CORBA
> .Net Remoting
Outline
> Remote procedure calls (RPC)
> Middleware
> Distributed objects
> CORBA
> .Net Remoting
Online shop example
> Implementing an online shop client and server with
just sockets is tedious
> Smart approach:
> public Receipt order(Book[] books)
> public Book[] search(String keyword)
> Client calls procedures on the server as if they were
local procedures -> RPC
RPC – Overall Goal 1/2
> Provide distribution transparency
> Programming as if there is no distribution
> “Hunt for the holy grail”
> We will get quite close, but we will never get hold of the grail
> “We argue that objects that interact in a distributed system need to
be dealt with in ways that are intrinsically different from objects that
interact in a single address space. These differences are required
because distributed systems require that the programmer be aware
of latency, have a different model of memory access, and take into
account issues of concurrency and partial failure.” [1]
> Degenerated case: Programming as if there is only distribution
[1] S. C. Kendall, J. Waldo, A. Wollrath and G. Wyant: A Note on
Distributed Computing
RPC – Overall Goal 2/2
> Hide expert knowledge in the tool chain
> “Sockets”
> Session management
> Data representation
> Data transformation
> Service discovery
> … you name it …
Remote procedure call
> RPC sits on top of the transport layer
> Hiding network communication from application programmer
i.e. building abstraction
> Sockets etc. are not visible to the application programmer
> Usually a request-reply protocol is specified
> Procedure invocation message (Request)
> Procedure result message (Reply)
> RPC is responsible for
> Marshalling & unmarshaling data (parameters and results)
> External data representation
> Addressing
Problems
> Heterogeneity
> Different data representations (little-endian vs. big-endian, ASCII vs.
EBCDIC)
> Addressing
> How to identify a remote process?
> Partial failure
> What happens if the server crashes during execution?
> Can we guarantee “at least once” semantics?
> What happens if the client crashes?
> Can we detect and remove orphans?
> What happens if a crashed machine is rebooted?
> Can addresses survive a reboot?
How is it implemented?
> Client-side proxy
> Implements the interface of the target procedure on the client side
> Client calls this interface locally (-> transparency)
> Procedure is invoked on a remote machine
> Server-side proxy
> Implements the interface of the target procedure on the server side
> Incoming requests are dispatched to this interface locally
> Server does not realize that the call is a remote call
> The result is: almost distribution transparency …
> … if there is no failure
Proxy Generation
> We need a tool that generates the proxies
> … to be continued …
Flow of action 1/3
Call remote procedure
Return from call
wait for result
Client
request
reply
Server
wait for request Call local procedure
and return result
wait for request
time
Flow of action 2/3
1. Client calls procedure on client-side proxy locally
2. Client proxy marshals parameters and sends
message to server proxy
3. Server proxy unmarshals parameters and calls
procedure locally
4. Procedure does work and returns result to the proxy
5. Server proxy marshals result and sends message to
client proxy
6. Client proxy unmarshals result and returns to client
Bookstore example
Client
Server
Client process
Server process
receipt order(Book[] books)
receipt order(Book[] books)
receipt
Client proxy
receipt
order, books
receipt
Server proxy
Proxy Generation
> We need a tool that generates the proxies
> The tool has no know about
> The interface
> Procedure names
> Parameter types
> Return types
> Exceptions
> … to be continued …
Calling semantics
> Local case
> Call-by-value
> Call-by-reference
> Call-by-reference remotely is difficult
> Simulate by call-by-copy/restore
> Transmit copy of the data and transmit changed copy back
> Slightly different semantics!
> Increased overhead for collections (graphs, lists,…)
Marshalling / Unmarshalling
> Assembling to external format: Marshalling
> Disassembling from external format: Unmarshalling
> Complex data types must be serialized
> Lists, Structs, Graphs, …
> When data formats of machines differ
> Use agreed common external format
> Sender transforms to external
> Receiver transforms external to his own
> Use sender’s format and “receiver makes it right”
> Sender must send indication of format
Proxy Generation
> We need a tool that generates the proxies
> The tool has no know about
> The interface
> Procedure names
> Parameter types
> Return types
> Exceptions
> Data types
Interface definition
> Interfaces are defined by an Interface Definition
Language (IDL)
> Language-neutral
> Usually C-style syntax
> Proxies can be generated from IDL
> Different proxies for different programming languages
> E.g. client in Java -> client proxy in Java
> E.g. server in C -> server proxy in C
IDL example
module shops{
interface bookshop{
struct Book{
string name;
long isbn;
};
struct Receipt{
string bank;
long accountnumber;
int amount;
};
Receipt order(in Book[] books);
Book[] search(in string keyword);
};
};
IDL in the Tool Chain
shop.idl
shop_cproxy.c
IDL-Compiler
shop_sproxy.c
C-Compiler
shop_cproxy.o
C-Compiler
shop.h
shop_sproxy.o
#include
shop_client.c
shop_server.c
C-Compiler
shop_client.o
shop_server.o
Linker
shop_client.exe
Linker
shop_server.exe
IDL Pros & Cons
> Pros
> Language neutral
> Cons
> Generated interface can be ugly
> Example: CORBA and C++
> Developers have to master two languages
> Requires top-down approach
> First IDL, then implementation
> Cannot simply use existing code & data structures
> Solutions? Yes!
> .NET Remoting does not require any IDL
RPC failure I
> Local call failure
> When call fails the whole program fails
> What can go wrong with RPC?
1. Client cannot find the server
2. Client crashes after sending request
3. Request gets lost
4. Server crashes after receiving request, before sending response
5. Response gets lost
6. Client crashes before receiving response
RPC failure II
> Fault detection
> Wait for expected response
> After timeout  failure
> What is a good timeout?
> Maybe the network is too slow
> Maybe the other computer is too slow
> Usually no real-time  Calculation of optimal timeout is impossible
> No way to find out what went wrong
> Remote machine does not respond. Why?
> Machine crashed?
> Message loss?
Client crash
> Processes on servers for non-existing clients
(Orphans)
> Block resources
> Solution
> Client sends “heartbeat”
> Server pings client
> Pings & heartbeats are expensive and subject to failures, too
> Client restart
> Do not mix new with old messages
> E.g. counter for every restart
Server crash
> Client gets a timeout waiting for the response
> Did the server process the request?
> Possible semantics:
> Maybe
> Nothing to be done
> At-least-once
> Repeat until response received
> At-most-once
> Serial numbers for requests
> Exactly-once
> Transactions
RPC call variants
> Synchronous (blocking) call
> Parallelism in distributed system is not exploited
> No parallel invocations to multiple servers
> Asynchronous RPC
> “Fire-and-forget”
> Deferred synchronous RPC
> Do something while server is executing
Reentrance 1/2
Client1
Client2
proc(x)
proc(y)
Server1
The procedure proc
is invoked by Client2
while it is working on
the request of
Client1
Solution:
Serialize requests
Reentrance 1/2
Client
proc(x)
Server1
proc2(x)
Server2
proc(y)
Serialization is NO
solution. It would
cause a deadlock
The procedure proc is
invoked by the Server2
while it is working on the
request of the Client
Example RPC systems
> Sun RPC
> Used by the Network File System (NFS)
> Distributed Computing Environment (DCE) RPC
> Basis for Microsoft's DCOM (Component Object Model)
> RPC (i.e. DCOM) has been used in several exploits
Outline
> Remote procedure calls
> Middleware
> Distributed objects
> CORBA
> .Net Remoting
Motivation
> Implementing a distributed application on top of
sockets is tedious
> Dealing with challenges of distributed systems on your own:
> Distribution transparency
> Interoperability (heterogeneity e.g. data representation)
> Security
> Common services: Naming, Persistency, Events, Transactions, …
> Higher level of abstraction: middleware
> Deals with challenges of distributed systems (to a certain degree)
> Provide additional services (e.g. naming, persistency,…)
Definition
> “There is no good definition for middleware.”
> “The slash between client/server.”
> “Middleware is the intersection of the stuff that
network engineers don’t want to do with the stuff that
application developers don’t want to do.”
“Classical” approach
Computer A
Computer B
Computer C
Distributed application
Distributed application
Distributed application
Network services
Network services
Network services
Middleware
Computer A
Computer B
Computer C
Distributed application
Middleware
Network services
Network services
Network services
Models
Message passing
Client
send()
Server
Virtual shared memory
Client
Server
write()
read()
receive()
Remote procedure call
Client
proc()
Server
Distributed object systems
Client
Server
o.operate()
Services
> Persistency
> Security
> Naming
> Events
> Transaction
> Trader
> Accounting
Examples
> Distributed object system
> OMG CORBA
> Remote procedure call
> DCE RPC
> Message passing
> IBM MQSeries
> Virtual shared memory
> Linda
Outline
> Remote procedure calls
> Middleware
> Distributed object systems
> CORBA
> .Net Remoting
Distributed objects
> Today’s commonly used programming languages are
object-oriented
> “Remote objects”
> Objects that can receive method invocations from objects in other
processes
> Including processes on a different machine
> Objects get a remote interface
> Usually defined in IDL
> Client needs a remote object reference to perform a
remote method invocation
Why (distributed) object systems?
> Middleware provides a programming abstraction
> Programming languages changed
> C, Pascal, Basic -> C++, Java, C#, Delphi, VisualBasic.NET
> Object-orientation is the most prominent paradigm
 Extend it to the remote case
> Objects are well suited for proxies
> Objects provide a public interface
> The implementation is not visible to the outside
> A proxy is just an object with a special (remote) implementation
Example
Machine A
Machine B
Process B1
Process A1
O1
Local invocation
Remote invocation
O3
O4
Remote invocation
O2
O5
Process B1
How does it work?
> Similar to RPC
1. Client calls method on client proxy object
2. Client proxy object & ORB does marshalling
3. Client-side ORB sends message
4. Server-side ORB receives message
5. ORB dispatches message
6. Server proxy receives message and unmarshals parameters
7. Server proxy calls method on remote object locally
8. Server proxy & ORB marshals result
9. Server-side ORB sends message
10. Client-side ORB receives message
11. Proxy receives message, unmarshals result and returns it to client
Flow schematics
Client
Server
Client object
Remote object
receipt order(Book[] books)
receipt
receipt
Proxy object
receipt order(Book[] books)
Proxy object
order, books
receipt
Difference to RPC
> Class A { void foo( int x ); }
versus
void foo( A* object, int x )
> In the local case there is no difference
> In the remote case there is a difference
> A* object is a pointer that cannot be marshaled
> Distributed object systems introduce
object references
> An object reference is the remote equivalent to a pointer
Remote Object References
> Similar to local object references
> Uniquely identifies an object in the distributed system
> Can be passed between processes on different machines
> E.g. host, port, object key
> Client must bind to an object using the reference
> Binding builds a proxy on the client side
> Remote methods can be invoked on proxy
> Reference must contain enough information to allow binding
(E.g. endpoint)
Remote Object Activation
> Bookstore example:
> We treat every book as an object
> Every remotely accessible object has a remote object reference
> However, books are stored in a database
We cannot hold all book objects in memory
> Solution
> Create object references for virtual objects, for example
(www.mybookstore.com, 80, ISBN:1-12345-434-5)
> Virtual objects are incarnated (i.e. created from the database) upon
invocation
> They are garbage collected afterwards
Distributed Objects Realization
> Language integrated
> Definition of remote objects at language level
> Easy to use
> Language dependent
> E.g. Java RMI
> Language independent
> IDL to specify interface
> Objects can be implemented in any language
> Even in a procedural language using procedures and data
structures as object state
> More programming overhead
> E.g. CORBA
Static vs. dynamic invocation
> Static invocation
> Interface of the remote objects is known while client is being
developed
> Client must be recompiled when interface changes
> Example: C++, Java
> Dynamic invocation
> Compose method invocation at runtime
> Inspect target object or interface implicit in client implementation
> Available methods, parameters,…
> any invoke(object, method, parameters[])
> Typically used for interpreted languages & scripting languages
> Example: TCL
Distributed object system examples
> CORBA
> .Net Remoting
> Java RMI
Outline
> Remote procedure calls
> Middleware
> Distributed object systems
> CORBA
> .Net Remoting
CORBA
> Common Object Request Broker Architecture
> Standard of the Object Management Group (OMG)
>
www.omg.org
>
Not a specific system
>
Describes a whole software architecture (OMA)
> Programming language-independent
>
CORBA IDL to define object interfaces
> Platform-independent
>
Object Request Brokers (ORBs) interoperate via specified protocols
>
GIOP (General Inter-ORB Protocol) & IIOP (Internet Inter-ORB Protocol)
> Complete distributed object infrastructure
>
Lots of additional services
CORBA
> CORBA‘s goal: „IDL-ize“ client/server middleware by
two steps:
> Turn everything into nails: CORBA IDLs
> Give everyone a hammer: CORBA compliant ORB
> Extreme “IDL-ing”
> Even the local interfaces of the ORB are specified in IDL
> The XML DOM API has been defined using IDL
http://www.w3.org/TR/DOM-Level-2-Core/idl-definitions.html
> Drawback
> Everything is nice on the IDL level, but …
> … generated interfaces are often ugly
ORB
> Core of any CORBA system
> ORBs build a “distributed object bus”
Application
Object Request Broker(s)
Services
General CORBA system architecture
Client machine
Client application
Static
IDL
proxy
Dynamic ORB
invocation interface
interface
Client ORB
Local OS
Server machine
Object implementation
Object Skeleton
adapter
Dynamic
skeleton
interface
Server ORB
Local OS
ORB
interface
Client uses
Example
invocation
Client obtains
Client binds to
naming service to
initialremote
reference
Client
calls
remote
remote object
obtain
Object implementation
to
naming
method
on
proxy
reference ->
object reference
executes remote
Client
machine
Server
machine
service
from
proxy is
for target object
method and returns
ORB
generated
Skeleton
marshals
Skeleton
result
to unmarshals
skeleton
Proxy unmarshals
result andand
passes
parameters
calls it to
Proxy
marshals
Client application
Object implementation
result and
passes
it
the adapter
objectadapter
implementation
Object
parameters
and
to the
client
Dynamic
Dynamic
requestORB
to
sendsORB
invocation
Static
Object Skeletonpasses
application
invocation
skeleton
Adapter
passes
skeleton
possibly
request
to ORB
IDL
interface
adapter
interface
ORB
receives
interface
response
to ORB
ORB uses OS
firstinterface
activating
the message
proxy
request
ORB send GIOP
ORB receives
functions to send
object
and
looks for
response
message
to
response message
GIOP request
matching
object
client ORB
and passes resultmessage
to
to
adapter
proxy
target
ORB
Client ORB
Server ORB
Local OS
Response
Request message
Local OS
is sent as IIOP message is sent
as IIOP message
message
CORBA features
> Static and dynamic invocation
> Interface repository holds all interface specifications
> Synchronous, asynchronous and deferred
synchronous invocations
> Interoperability
> Programming language
> IDL to specify object interfaces
> ORB implementations
> ORBs communicate via GIOP (General Inter-ORB Protocol)
> CORBA services
CORBA services I
> Collection service
> Grouping objects
> Query service
> Querying objects in a declarative manner
> Transaction service
> Transactions on method calls over multiple objects
> Naming
> Naming of objects
> Persistence service
> Storing objects
CORBA services II
> Security service
> Secure channels,…
> Trading service
> Advertisements of object capabilities
> Life cycle
> Moving, creating, deleting objects
> Event service
> Asynchronous event broadcast
CORBA summary
> Interoperability
> CORBA services
> Powerful but not easy-to-use
Outline
> Remote procedure calls
> Middleware
> Distributed object systems
> CORBA
> .Net Remoting
.Net Remoting
> Distributed objects in .Net
> Language independent
> As long as it’s a .Net language
> No IDL
> Highly configurable
> Can be integrated with other systems
Application domains
> Isolated execution space for applications
> One process can host multiple application domains
Process 1
Application
domain 1
Application
domain 2
Process 2
Application domain 3
Object
Local invocation
Remote invocation
Remote versus local objects
> Objects in the same application domain
> Local objects
> Immediate method call
> Objects by reference
> All other objects
> Remote objects
> Call via proxy object
> Objects are marshalled
> Also context-bound objects
> Not covered here
Marshalling
> Objects must be serializable
> [Serializable] and [NonSerialized] attribute
> Extend ISerializable
> Marshal-by-value
> Copy of object is transferred
> No link between copy and original
> Marshal-by-reference
> Object reference is transferred
> Can be used to build a proxy of the original
Object activation
> Client activated objects (CAO)
> Client requests activation of remote object
> Through Activator or using new
> Server activated objects (SAO)
> Single call
> Object created per call
> Stateless
> Singleton
> Object created at registration time
> Shared instance for all clients
> Can be stateful
Relevant namespaces and classes
> Remote object
> Namespace System.Runtime.Remoting
> Inherit from System.MarshalByRefObject
> Server side
> Configuration, deployment,…
> System.Runtime.Remoting.RemotingServices
> Client side
> Object activation
> System.Activator
Shop example – server implementation
public class Shop : MarshalByRefObject {
public Shop {
… (initialize shop)
}
public Book[] search(String keyword) {
Book[] result = Databasetools.search(keyword);
return result;
}
}
[Serializable]
public class Book {
…
}
Shop example – server main
using System.Runtime.Remoting.RemotingServices;
public class server {
public static void main(string[] args) {
RemotingConfiguration.configure(“server.xml”);
Console.ReadLine();
}
Shop example – server configuration
<configuration>
<system.runtime.remoting>
<application>
<service>
<wellknown mode="Singleton" type="Shop,
ShopAssembly" objectUri="myShop" />
</service>
<channels>
<channel ref="http" port="2000" />
</channels>
</application>
</system.runtime.remoting>
</configuration>
Shop example – client mainline
using System.Activator;
public class ShopClient {
public static void main(string[] args) {
Shop shop =
(Shop)Activator.GetObject(typeof(shop),
"http://targetHost:2000/myShop");
Book[] result = shop.search("Remoting“);
…
}
}
Deployment
> Standalone application
> Object lives as long as creating process lives
> IIS
> Only SOAP over HTTP (essentially a Web service)
> IIS authentication and HTTPs support
> Windows service
> Can be controlled via service console
> System.ServiceProcess.ServiceBase
How does it work?
> Client side proxy is created automatically
> No IDL, IDL-compiler
> Uses reflection and reflection-emit
> Generic server side proxy
> Stackbuilder dispatches method invocations
> Uses reflection
Messages
> Proxies communicate via messages
> Method invocation, method response
> Constructor invocation, constructor response
> Represented by objects implementing IMessage
> Dictionary with (name, value) pairs, e.g.
> "__Uri"
> "__MethodName"
> "__Args"
Sink chain / A conceptual view
Client
Proxy
Server Object
IMessage
Proxy
Sinks
Sinks
Formatter
Formatter
Sinks
Sinks
Transport Channel
Channels
> Messages are sent through channels
> TcpChannel, HttpChannel
> Custom channels (e.g. IPX)
> Endpoint-to-endpoint communication between
proxies
> Can implement channel sinks
> Logging
> Interception
> Message transformations (e.g.  CORBA)
Formatters
> Serialize objects
> Used by channels
> Implemented as channel sinks
> Association between channels and formatters is configurable
> Default:
> TcpChannel  BinaryFormatter
> HttpChannel  SoapFormatter
> Custom:
> IIOP, RMI,…
TcpChannel
> System.Runtime.Remoting.Channels.Tcp
> Uses TCP sockets
> Permanent connection
> Use in LANs
> Transfer of compact binary wire format
> Default: BinaryFormatter
> .Net native format
> Fast
> Could use other formatters as well
HttpChannel
> System.Runtime.Remoting.Channels.Http
> Uses HTTP 1.1
> Stateless, scales well
> Use in Internet environments
> Transfers SOAP XML format
> Default: SOAPFormatter
> Interoperability
> Slower
> Could use other formatters as well
Sink chain / A detailed view
Client
Server Object
Transparent Proxy
Stackbuilder Sink
Real Proxy
Envoy Sink*
Client Context Sink*
Ctx. Terminator Sink
Highlighted modules
can be exchanged or
customized.
Sinks with * can have
multiple instances.
Lease Sink
Object Sink*
Dynamic Sink*
Cross Context Sink
Dynamic Sink*
Dispatch Sink
Channel Sink*
Channel Sink*
Formatter
Formatter
Channel Sink*
Channel Sink*
Transport Channel
Sinks
> Implement IMessageSink
> Drop-off for messages
> Also implemented by Channel
> Chain sink
> Simple linked list via NextLink property
> Message interception, modification, …
> Custom sinks
Message path
Client
Server Object
Transparent Proxy
Stackbuilder Sink
Real Proxy
Envoy Sink*
Message path is
implemented as a
sink chain.
Lease Sink
Object Sink*
Client Context Sink*
Dynamic Sink*
Ctx. Terminator Sink
Cross Context Sink
Dynamic Sink*
Dispatch Sink
Channel Sink*
Channel Sink*
Formatter
Formatter
Channel Sink*
Channel Sink*
Transport Channel
Proxies
> Transparent proxy
> Looks like remote object
> Builds messages
> Calls Invoke(IMessage) on real proxy
> Real proxy
> Communication layer for transparent proxies
> Could be used to implement load-balancing for example
(first look for remote object with low workload)
Stackbuilder sink
> Located at channel endpoint
> Receives messages
> Builds stack frame from message
> Invokes method on actual object
> Collect result and build response message
Object lifetime
> How to garbage-collect distributed objects?
> DCOM: reference counting + pinging
> RMI: reference lists
> CORBA: lifetime service (rarely used)
> .Net remoting: Leasing
> For CAOs
> Time to live (lease) associated with each remote object
> Increases with each invocation
> Lease sponsors can prolong lease
> Configurable
Advanced features
> Asynchronous invocations
> Same like local invocations
> BeginInvoke, EndInvoke, IAsyncResult
> Oneway (fire-and-forget) invocations
> Methods with OneWay attribute
> Callbacks
> Delegate
Discussion
> Powerful middleware
> Application domains
> Sink chain can be adapted
> Different kinds of object activation
> Standalone / IIS
> Doesn’t enforce separation of interface and
implementation
> Conceptual drawback
> However, very convenient in practice
> Only hard-wired dispatcher