Systems Architecture for Statistical Applications: Introduction and Overview Andrew Westlake Survey & Statistical Computing Wednesday 25th January 2006 Introduction • Systems Architecture for Statistical Applications  Not Features.

Download Report

Transcript Systems Architecture for Statistical Applications: Introduction and Overview Andrew Westlake Survey & Statistical Computing Wednesday 25th January 2006 Introduction • Systems Architecture for Statistical Applications  Not Features.

Systems Architecture for Statistical
Applications:
Introduction and Overview
Andrew Westlake
Survey & Statistical Computing
Wednesday 25th January 2006
Introduction
• Systems Architecture for Statistical Applications
 Not Features or Usability
• Long-term issues that affect Statistical Systems
 Ease of maintenance and enhancement
 Responsiveness to developments in operating
environments
 Portability between computing environments
 Interoperability with other related systems
 Extensions by Users
• Programme
 Papers from developers of statistical systems
 Describing different approaches
 Discussing problems and solutions
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Some Issues
• Statistical software has a small market
 Limited development budgets
 Early design and implementation decisions can be critical
 Re-engineering is a major step
• Statistical Software is different
 Provides functionality for solving (a class of) problems
 Not automation of tasks
 More generalised than traditional application design
• Need to exploit ideas and developments
 Objects, components, standards, services, … *
 Open source, Windows, Linux, Internet
 Data warehouses & OLAP, Data mining, …
• Levels of Abstraction/Generalisation
 Different levels needed at different times in design and discussion
 Confusion often due to discussion at the wrong (or different) levels
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Object-Oriented Design
• Alternative way of thinking about software structure
 An abstract model of programming
 Developed in ’60’s and ’70’s
• Greater Reliability, Ease of Maintenance
 Objects have behaviour and own data
 Avoidance of ‘side-effects’
• Compiler and Run-time system support
 C++, Java, VB(?) …
• Big influence on design of S
• Academic and Commercial input
 Ideas and concepts from abstract work by academics
 Developed, extended and realised by commercial developers
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
The Object Paradigm
• Objects are Instances of Classes
 Classes define shared structure (attributes) and behaviour (methods)
 Objects have Identity, Information and State (attribute values)
 Created and destroyed dynamically at run time, can be persistent
• Encapsulation
 Objects receive Messages invoking Behaviour
> Includes changing and returning attribute values
 Can only access the attributes of an object through its public methods
• Inheritance
 New classes can be defined as specialisations of others
 Inherit structure and methods, but can alter and extend
• Polymorphic Methods
 Methods behave differently for different classes, so response depends
on type of object receiving message
> E.g object knows how to Display itself
 Object sending message does not need to worry (much)
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
System Modelling Methodologies: UML
• Need recognised for systematic design and development
methods




Management of complexity
Identification and control of requirements
Ease of maintenance
Feedback and validation from Users
• Various conflicting systems proposed
• Task force of Object Management Group: OMG
• Produced the Unified Modelling Language: UML
 Rumbaugh, Jacobson and Booch
 Supports design from User Requirements to Code Production
• Development Methodologies built around UML
 Agile-, Extreme-, Feature-Driven-, Iterative-, Unified-, …
Development
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
UML Features
• Formal specification of Language and Semantics for design
of systems (now version 2.0)
• Includes formalised diagram types and elements
 Activity, Class, Component, Deployment, Sequence, State, Use Case,
… Diagrams
 Aggregation, Generalisation, Cardinality, Classification, Concurrency,
Constraints, Dependency, Interfaces, Synchronicity, Visibility, …
elements, attributes, facets
• Various packages support complete development from
design to code generation
 Poseidon, Rational (IBM), Together (Borland), Visual Studio, …
 Essentially independent of implementation language*
• Can be used informally for early design stages (e.g. Visio)
• Difficult to learn thoroughly
 Good overview in UML Distilled, Martin Fowler (A-W, 2004)
 Not perfect – some areas under-developed, some omissions
 No alternative is as well established or supported
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
A UML Class Diagram
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Interfaces and Components
• Formal definition of Interfaces is an aspect of Encapsulation
 Straight forward within a single system
 Improves robustness of the system
• Idea extended to distributed components and systems
 Independent components on the same system, eg COM objects, Active-X
 Servers and Clients on same or different systems, eg database servers
(ODBC), web servers (HTML), distributed data archives (RDF)
 Distributed processing on specialised servers, eg DCOM, Web Services, Grid
• Difficult issues for management of communication channels
 Message language, message structure and protocol, service discovery
 All being resolved through industry collaboration building on academic ideas
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Distributed Architecture
• Construct system from components that communicate through
messages
 May be remote – message security and transport handled by Internet (for
example)
• Use the best components for the job, only develop the bits no one
else does
 For example, use SQL Server for data store, with access control, Apache
to deliver displays to users, R for statistical calculations and charts, …
 Can distribute almost anything: processing power, algorithms, data,
knowledge, metadata, …
• Benefits
 Cheaper – you only have to build your bits
 Better – get the best products for the other bits
• Problems
 Overheads in communication – can be avoided with clever design
 Have to agree on message mechanisms – or follow a standard
 Cost of other components – but many are effectively free
• The future of Computing Systems
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML – eXtensible Markup Language
• Markup Language
 Text with Tags (<Field> field contents </Field>)
> Identifies an Element of type Field with content field contents
 Content of an element can be simple or complex
> Numbers, strings, etc., or combinations of other elements
 Nested Tags (elements) => multiple hierarchies
• Generic syntax for languages
 Tags not defined, only the language structure
 XML instance document contains complex structure of information as linear
text – ideal for messages and other interchange
• XML is a Standard from W3C (based on SGML)
 Generic tools to read and write XML in programs
 Schema (XSD) for defining rules about Tag names and structure
 Style sheets (XSL/T) for transforming XML to some other text form
> For example, HTML for display, text script to drive a program, a different
(equivalent) XML structure for another context
• Can use UML to design the logical structure and specify the semantics
 Can generate XML schema (XSD)
 For example, hyperModel workbench, by David Carlson,
www.xmlmodeling.com
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML Fragment – Metadata for a model
<Parameter Name="FlowWithin" ElementType="Matrix" Terminal="true">
<Tag TagName="Description">Factors associated with flow within Zones (so Destination is the
same as Origin).</Tag>
<Dimension ClassificationName="OriginZones"/>
</Parameter>
</Parameters>
<Relationships>
<Relationship RelType="Stochastic" Name="Estimate 1 distribution">
<Tag TagName="Description">Poisson distribution for observations in first estimate set,
based on common rates.</Tag>
<RelInput>
<ParRef Name="Flow"/>
</RelInput>
<RelOutput>
<VarRef Name="FlowEstimate1"/>
</RelOutput>
<RelStochastic>
<DistPoisson>
<Rate>
<ParRef Name="Flow"/>
</Rate>
</DistPoisson>
</RelStochastic>
</Relationship>
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML processed to HTML
Relationships:
Name & Type
Output
Input
Form
Estimate 1
distribution
Stochastic
FlowEstimate1
Flow
Distribution: Poisson, Rate= Flow
Estimate 2
distribution
Stochastic
FlowEstimate2
Flow
Distribution: Poisson, Rate= Flow
Derived
Flow
FlowLog
Derivation: exp(FlowLog)
Derived
FlowLog
FlowAverage,
OriginFlowFactor,
DestFlowFactor,
FlowWithin
Derivation: For : i ∈ OriginZones , j ∈ DestZones
Constraint
OriginFlowFactor
Derivation: ∑ OriginZoneFactor = 0
Constraint
DestFlowFactor
Derivation: ∑ DestZoneFactor = 0
25/1/2006
Poisson distribution for observations in first estimate set, based on common rates.
Poisson distribution for observations in second estimate set, based on common
rates.
Poisson rates are derived as exponential of (linear) flow function.
If : i ≠ j | FlowAverage + OriginFlowFactor[i] + DestFlowFactor[j]
If : i = j | FlowWithin[i]
Linear function for log of flow rates. Inter-zone flow is modelled as an average flow
adjusted by origin and destination factors (with no interaction). Intra-zone flows are
modelled separately
Origin factors sum to zero, so product of rate components is one.
Destination factors sum to zero, so product of rate components is one.
RSS/ASC Systems Architecture: Introduction and Overview
Programming Languages
• Is Fortran dead?
• Not according to Microsoft
 Have rediscovered the idea of language-independent
intermediate code (runtime – LIR)
 Ideal for UML modelling approach
• System functionality provided at runtime level,
so the same for all languages
 New compilers only have to do language translation
• Requires a common programming model
 Or at least a subset of the runtime model
• Allows closely coupled components to be written
in different languages
 May be the answer for legacy systems
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview