Application of Service Oriented Architecture in Statistics New Zealand UNSC Modernisation of the Statistical Process Seminar New York, February 24, 2010 Geoff Bascand & Matjaz.

Download Report

Transcript Application of Service Oriented Architecture in Statistics New Zealand UNSC Modernisation of the Statistical Process Seminar New York, February 24, 2010 Geoff Bascand & Matjaz.

Application of Service Oriented
Architecture in Statistics New Zealand
UNSC Modernisation of the
Statistical Process Seminar
New York, February 24, 2010
Geoff Bascand & Matjaz Jug
Drivers for IT Architecture
• Agility: transformational changes like shift
towards the increased use of administrative
data, more automated data processing etc.
• Cost & Reuse: standardisation and reducing
high costs of development and maintenance of
statistical production systems.
• Integration: need for integration of outsourced
statistical tools and legacy application assets
• Configuration: response to frequent changes in
data sources, questionnaires, methodology
and classifications.
SOA Definition
• The Open Group describes Service Oriented
Architecture (SOA) as a:
– “style of IT architecture that delivers agility and
Boundaryless Information Flow™. It is deployed
on an increasing scale in enterprises today.”
• SOA is a message-based, independent
component architecture where:
– communication between components is managed
by a “service (or process) manager” that mediates
communication, coordination and cooperation
among components through messages. The
message carries data and process data.
SOA Benefits
• Increased agility: organisations should be
able to more quickly respond to changes in
business process and external environment.
• Reduction of cost through reuse: new IT
systems should be able to leverage the most
readily available code and services from
across the organization and externally.
• Better possibilities for integration using
loosely coupled framework and orchestration.
• Configuration rather than programming
Situation in Statistical Organizations
• Many lessons learnt from early adopters
• Even now there are not a lot of statistical
organisations implementing SOA on a large
scale
• We are “behind” compared with some other
sectors like the Airline Industry
• WHY? Are we really so different?
In Some Areas We are Different!
• Many semantically diverse data structures
• Frequent change in data structure, sources,
questionnaires
• Specific requirements like data confidentiality
• Many stove-piped legacy application assets
• Mainly non-transactional processing
• End-user processing environments
Learning from Data Warehousing
and Metadata-Driven Projects
1. High degree of organisational change is required
which is usually slow process.
2. It is difficult to establish new governance.
3. New architecture usually requires complete
replacement of legacy application assets portfolio.
4. Software development capability is difficult to
upgrade and maintain in-house
5. Common challenge organisations often face
involves effectively managing metadata.
6. Lack of standardisation – it appears every new
paradigm requires more of it.
Additional lessons from early SOA
attempts
• Standardisation of services and data
structures is vital
• Too broad a business or services scope, then
costs of generality & development are high
• Too specific a service or business request,
then benefits of re-usability are limited
• Performance degrades with volume
Architecture in Stats NZ now – Platform
approach and Shared Services (SOA)
STATISTICAL INFRASTRUCTURE
MICRO ECONOMIC
C
O
L
L
E
C
T
MACROECONOMIC
SOCIAL
CENSUS
IT INFRASTRUCTURE
D
I
S
S
E
M
I
N
A
T
I
O
N
Statistical Infrastructure
Frames and Registers
Classification Management
Metadata Management
Processing - Micro Economic Statistics
Collection
Administrative Data
Methodologies
Dissemination
Platform for Micro economic statistics (BESt)
Content Management
(www.stats.govt.nz)
Other systems (mostly legacy)
Imaging
Future
(Web)
Builder
Processing - Macro Economic Statistics
Platform for National Accounts (DNA)
Other systems (mostly legacy)
Processing - Social/Household Statistics
Platform for HH statistics (POSS)
Data Dissemination Management
CATI
Table
Respondents & Collection Management
CAPI
Infoshare
Business
Toolbox
Future
Other systems (mostly legacy)
Census Platform
IT Infrastructure
Hardware
Server Software (OS, email, SQL
DB, OLAP, CRM, CMS)
Applications & Tools
Desktop Software (MS Office,
Lotus Notes)
SOA in Data Collection
• Description: data collected through CATI, CAPI and
Imaging are loaded (pushed) using messaging
infrastructure to production databases. The grain is
individual questionnaire response. Load service was
built to deliver data to Legolution and POSS Input Data
Environment (now Social Input Store).
• Challenges: We have dropped this approach in
Process phase due to difficulties in moving large
amounts of data as a messages. Requirement to pass
process-metadata was overlooked so additional
metadata transfer had to be used
• Benefits: infrastructure required for transactional data
collection where every response can be pushed to
production systems. This approach is anticipated as a
result of Standard Business Reporting project.
SOA in Data Processing
• Description: Data is now transferred using ETL
packages (pull). Service is used to initiate ETL
packages. Configuration store is a central place
where process is configured (metadata) and is
currently used by two systems: BESt platform and
SOFIE processing system.
• Challenges: Reuse of ETL packages is limited to the
single platform (BESt) but some components
(configuration store) can be used by other systems
as well (as part of statistical infrastructure).
• Benefits: Highly configurable process workflow
enabling WHAT-IF scenarios.
A user created pipeline of SAS
operations – note the small scale of
each step
Users can change the steps and
order in the pipeline – The system
records changes and provides the
ability to rerun using an older
version. Multiple configuration sets
can be made and tested at one time
v1 18/02/2010
Settings for a BANFF SAS macro
(Historic Imputation for Bad Debt
variable)
11
SOA in Data Dissemination
• Description: Dissemination tool Business Toolbox is
using SDMX query service to get aggregated data
from dissemination data warehouse OECD.stat and
present it in a customized user friendly way.
• Challenges: integration of data warehouse with
output production (legacy) systems.
• Benefits: Presentation of information is not
dependent on the physical structure in data
warehouse, possibility to easily add new SDMXbased web components as well as new data.
SOA in Statistical Infrastructure
• Description: Coding is the first example of
statistical infrastructure to be offered through
the service interface (internally and
externally). CCS coder will offer automated
coding service based on classification
metadata in CARS.
• Challenges: metadata management &
standardisation.
• Benefits: Statistical infrastructure (metadata
management systems, registers) can provide
services to internal and external platforms
and individual systems.
How to Start? Areas Where SOA
Can Deliver Significant Value
• Metadata services: a good candidate for
reuse in many stovepipe and corporate
applications.
• Statistical tools/components: making them
more interoperable using service interface
would significantly improve the possibilities to
integrate them in different IT environments
and therefore increase their shared usage
and collaboration.
Summary
• Iterative development (low hanging fruit first)
and proofs of the concepts
• No emphasis on any particular approach:
SOA, DW and metadata-driven architecture
are used together in a way which maximizes
benefits and minimizes risk
• Strong focus on use of standards (SDMX)
• Common IT Infrastructure is enabling
additional consolidation (MS SQL Server &
Analysis Services, SAS Server, Blaise, .NET)
Annex: Systems Architecture and
SOA Use – Detailed Version
• The following slide is a detailed picture of our
systems architecture
–
–
–
–
Box 1 highlights SOA in the collections area
Box 2 highlights SOA in the processing area
Box 3 highlights SOA in the dissemination area
Box 4 highlights SOA in statistical infrastructure
Architecture and SOA Use – Detailed Version
Statistical Infrastructure
BF
Administrative data
LBF
LEED
Micro-economic platform (BESt)
Dissemination
Load
(ETL)
Write back
(ETL)
Collection (Contact platform)
Content
Management
Initiate
ETL package
(SOA)
Main Store
Tax data
External
System
CCS Coder
(SOA)
CARS
Process
Store
Run E&I
Scenario
(ETL)
Web Coder
Configuration
Store
Beyond
20/20
Other statistical production systems (including legacy stovepipes)
Table Builder
PC-Axis
SAS &
Banff
Sprocet
Imaging
CAPI
Other
Other
systems
Other
systems
Other
systems
systems
Legolution
CATI
Sofie
process
Infoshare
Statistical
Outputs
Internal
System
Translate &
Configure
(ETL)
Load area
Social-household platform (POSS)
Load
Service
(SOA)
OECD.stat
Business
Toolbox
Social Input
Store
SDMX
Data Query
(SOA)
Collection Management
Respondent management
IT Infrastructure
Process
Store
New Table
Builder
Future
components