Systems Architecture for Statistical Applications: Introduction and Overview Andrew Westlake Survey & Statistical Computing Wednesday 25th January 2006 Introduction • Systems Architecture for Statistical Applications Not Features.
Download
Report
Transcript Systems Architecture for Statistical Applications: Introduction and Overview Andrew Westlake Survey & Statistical Computing Wednesday 25th January 2006 Introduction • Systems Architecture for Statistical Applications Not Features.
Systems Architecture for Statistical
Applications:
Introduction and Overview
Andrew Westlake
Survey & Statistical Computing
Wednesday 25th January 2006
Introduction
• Systems Architecture for Statistical Applications
Not Features or Usability
• Long-term issues that affect Statistical Systems
Ease of maintenance and enhancement
Responsiveness to developments in operating
environments
Portability between computing environments
Interoperability with other related systems
Extensions by Users
• Programme
Papers from developers of statistical systems
Describing different approaches
Discussing problems and solutions
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Some Issues
• Statistical software has a small market
Limited development budgets
Early design and implementation decisions can be critical
Re-engineering is a major step
• Statistical Software is different
Provides functionality for solving (a class of) problems
Not automation of tasks
More generalised than traditional application design
• Need to exploit ideas and developments
Objects, components, standards, services, … *
Open source, Windows, Linux, Internet
Data warehouses & OLAP, Data mining, …
• Levels of Abstraction/Generalisation
Different levels needed at different times in design and discussion
Confusion often due to discussion at the wrong (or different) levels
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Object-Oriented Design
• Alternative way of thinking about software structure
An abstract model of programming
Developed in ’60’s and ’70’s
• Greater Reliability, Ease of Maintenance
Objects have behaviour and own data
Avoidance of ‘side-effects’
• Compiler and Run-time system support
C++, Java, VB(?) …
• Big influence on design of S
• Academic and Commercial input
Ideas and concepts from abstract work by academics
Developed, extended and realised by commercial developers
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
The Object Paradigm
• Objects are Instances of Classes
Classes define shared structure (attributes) and behaviour (methods)
Objects have Identity, Information and State (attribute values)
Created and destroyed dynamically at run time, can be persistent
• Encapsulation
Objects receive Messages invoking Behaviour
> Includes changing and returning attribute values
Can only access the attributes of an object through its public methods
• Inheritance
New classes can be defined as specialisations of others
Inherit structure and methods, but can alter and extend
• Polymorphic Methods
Methods behave differently for different classes, so response depends
on type of object receiving message
> E.g object knows how to Display itself
Object sending message does not need to worry (much)
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
System Modelling Methodologies: UML
• Need recognised for systematic design and development
methods
Management of complexity
Identification and control of requirements
Ease of maintenance
Feedback and validation from Users
• Various conflicting systems proposed
• Task force of Object Management Group: OMG
• Produced the Unified Modelling Language: UML
Rumbaugh, Jacobson and Booch
Supports design from User Requirements to Code Production
• Development Methodologies built around UML
Agile-, Extreme-, Feature-Driven-, Iterative-, Unified-, …
Development
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
UML Features
• Formal specification of Language and Semantics for design
of systems (now version 2.0)
• Includes formalised diagram types and elements
Activity, Class, Component, Deployment, Sequence, State, Use Case,
… Diagrams
Aggregation, Generalisation, Cardinality, Classification, Concurrency,
Constraints, Dependency, Interfaces, Synchronicity, Visibility, …
elements, attributes, facets
• Various packages support complete development from
design to code generation
Poseidon, Rational (IBM), Together (Borland), Visual Studio, …
Essentially independent of implementation language*
• Can be used informally for early design stages (e.g. Visio)
• Difficult to learn thoroughly
Good overview in UML Distilled, Martin Fowler (A-W, 2004)
Not perfect – some areas under-developed, some omissions
No alternative is as well established or supported
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
A UML Class Diagram
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Interfaces and Components
• Formal definition of Interfaces is an aspect of Encapsulation
Straight forward within a single system
Improves robustness of the system
• Idea extended to distributed components and systems
Independent components on the same system, eg COM objects, Active-X
Servers and Clients on same or different systems, eg database servers
(ODBC), web servers (HTML), distributed data archives (RDF)
Distributed processing on specialised servers, eg DCOM, Web Services, Grid
• Difficult issues for management of communication channels
Message language, message structure and protocol, service discovery
All being resolved through industry collaboration building on academic ideas
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
Distributed Architecture
• Construct system from components that communicate through
messages
May be remote – message security and transport handled by Internet (for
example)
• Use the best components for the job, only develop the bits no one
else does
For example, use SQL Server for data store, with access control, Apache
to deliver displays to users, R for statistical calculations and charts, …
Can distribute almost anything: processing power, algorithms, data,
knowledge, metadata, …
• Benefits
Cheaper – you only have to build your bits
Better – get the best products for the other bits
• Problems
Overheads in communication – can be avoided with clever design
Have to agree on message mechanisms – or follow a standard
Cost of other components – but many are effectively free
• The future of Computing Systems
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML – eXtensible Markup Language
• Markup Language
Text with Tags (<Field> field contents </Field>)
> Identifies an Element of type Field with content field contents
Content of an element can be simple or complex
> Numbers, strings, etc., or combinations of other elements
Nested Tags (elements) => multiple hierarchies
• Generic syntax for languages
Tags not defined, only the language structure
XML instance document contains complex structure of information as linear
text – ideal for messages and other interchange
• XML is a Standard from W3C (based on SGML)
Generic tools to read and write XML in programs
Schema (XSD) for defining rules about Tag names and structure
Style sheets (XSL/T) for transforming XML to some other text form
> For example, HTML for display, text script to drive a program, a different
(equivalent) XML structure for another context
• Can use UML to design the logical structure and specify the semantics
Can generate XML schema (XSD)
For example, hyperModel workbench, by David Carlson,
www.xmlmodeling.com
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML Fragment – Metadata for a model
<Parameter Name="FlowWithin" ElementType="Matrix" Terminal="true">
<Tag TagName="Description">Factors associated with flow within Zones (so Destination is the
same as Origin).</Tag>
<Dimension ClassificationName="OriginZones"/>
</Parameter>
</Parameters>
<Relationships>
<Relationship RelType="Stochastic" Name="Estimate 1 distribution">
<Tag TagName="Description">Poisson distribution for observations in first estimate set,
based on common rates.</Tag>
<RelInput>
<ParRef Name="Flow"/>
</RelInput>
<RelOutput>
<VarRef Name="FlowEstimate1"/>
</RelOutput>
<RelStochastic>
<DistPoisson>
<Rate>
<ParRef Name="Flow"/>
</Rate>
</DistPoisson>
</RelStochastic>
</Relationship>
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview
XML processed to HTML
Relationships:
Name & Type
Output
Input
Form
Estimate 1
distribution
Stochastic
FlowEstimate1
Flow
Distribution: Poisson, Rate= Flow
Estimate 2
distribution
Stochastic
FlowEstimate2
Flow
Distribution: Poisson, Rate= Flow
Derived
Flow
FlowLog
Derivation: exp(FlowLog)
Derived
FlowLog
FlowAverage,
OriginFlowFactor,
DestFlowFactor,
FlowWithin
Derivation: For : i ∈ OriginZones , j ∈ DestZones
Constraint
OriginFlowFactor
Derivation: ∑ OriginZoneFactor = 0
Constraint
DestFlowFactor
Derivation: ∑ DestZoneFactor = 0
25/1/2006
Poisson distribution for observations in first estimate set, based on common rates.
Poisson distribution for observations in second estimate set, based on common
rates.
Poisson rates are derived as exponential of (linear) flow function.
If : i ≠ j | FlowAverage + OriginFlowFactor[i] + DestFlowFactor[j]
If : i = j | FlowWithin[i]
Linear function for log of flow rates. Inter-zone flow is modelled as an average flow
adjusted by origin and destination factors (with no interaction). Intra-zone flows are
modelled separately
Origin factors sum to zero, so product of rate components is one.
Destination factors sum to zero, so product of rate components is one.
RSS/ASC Systems Architecture: Introduction and Overview
Programming Languages
• Is Fortran dead?
• Not according to Microsoft
Have rediscovered the idea of language-independent
intermediate code (runtime – LIR)
Ideal for UML modelling approach
• System functionality provided at runtime level,
so the same for all languages
New compilers only have to do language translation
• Requires a common programming model
Or at least a subset of the runtime model
• Allows closely coupled components to be written
in different languages
May be the answer for legacy systems
25/1/2006
RSS/ASC Systems Architecture: Introduction and Overview