Statistics New Zealand’s Case Study ”Creating a New Business Model for a National Statistical Office if the 21st Century” Craig Mitchell, Gary Dunnet,

Download Report

Transcript Statistics New Zealand’s Case Study ”Creating a New Business Model for a National Statistical Office if the 21st Century” Craig Mitchell, Gary Dunnet,

Statistics New Zealand’s
Case Study
”Creating a New Business Model for a National Statistical Office if the 21st Century”
Craig Mitchell, Gary Dunnet, Matjaz Jug
Overview
• Introduction: organization, programme, strategy
• The Statistical Metadata Systems and the
Statistical Cycle: description of the metainformation
systems, overview of the process model, description
of different metadata groups
• Statistical Metadata in each phase of the
Statistical Cycle: metadata produced & used
• Systems and Design issues: IT architecture, tools,
standards
• Organizational and cultural issues: user groups
• Lessons learned
63
Geography, Regional &
Environment
Prices
John Morris x4307
57
Tammy Estabrooks x4614
Manager (Acting)
National Accounts
EA: Indigo Freya x4858
51
Macro-Economic,
Environment, Regional &
Geography
HRAM: Alan McIntyre x4662
37
Macro-Economic Statistics
Development Unit
Government &
International Accounts
Strategic
Communication
Peter Swensson x4060
Sam Fisher x4225
Julian Silver x4387
General Manager
Strategy &
Communication
EA: Hanli van der
Westhuizen x4235
Business Indicators
36
Louise Holmes-Oliver x8780
and Kathy Connolly x8975
35
Andrew Hunter
Business, Financial
& Structural
Andrew Hunter x8355
Business Performance
& Agriculture
Eileen Basher x4701
Census 2011
65
Carol Slappendel x4947
General Manager
EA: Tania Mattock x4074
47
56
59
Last Updated 20/06/07
01
General Manager
Statistical Education &
Research
EA: Indigo Freya x4858
General Manager
Statistical &
Methodological
Services
EA: Indigo Freya x4858
Deputy Government
Statistician
Social & Population
Statistics
EA: Tania Mattock x4074
Social & Population
HRAM: Robynn Cade x4681
HR Account Manager
Executive Assistant
Gary Dunnet
Cathryn Ashley Jones
Ray Freeman
General Manager
Auckland Office
EA: Diane McGuire x9315
38
OSRDAC
Hamish James x4237
61
Collection &
Classification
Standards
31
Bridget Hamilton-Seymour x4833
27
Statistical Methods
Diane Ramsay x4355
Product Development
& Publishing
Chief Information Officer
Gareth McGuinness x4851
EA Hanli van der Westhuizen x4235
29
Application
Services
Information Customer
Services
82
Business Transformation
Strategy
Gary Dunnet x4650
Matjaz Jug x4238
78
Mike Moore x8701
Integrated Data Collections
HRAM: Alan McIntyre x4662
Statistical & Methodological
HRAM: Robynn Cade x4681
90
General Manager
Business &
Dissemination
Services
EA: Hanli van der
Westhuizen x4235
Integrated Data
Collection
Ray Freeman
x9143
Statistical Education &
Research
HRAM: Alan McIntyre x4662
Sharleen Forbes
Vince Galvin
Tere Scotney x4956
Standard of Living
66 Andrea
Blackburn x4680
21
Dallas Welch
Social Statistics Development Unit
Paul Brown x4304
(Acting till 11 July 2007)
Corporate Services
EA: Eugenie Bint x4903
Deputy Government
Statistician
Industry & Labour
Statistics
EA: Eugenie Bint x4903
Population
Social Conditions
09
Vina Cullum
General Manager
Maori Statistics Unit
EA: Eugenie Bint x4903
EA: Kathy Warren x4760
HRAM:
EA:
Vina Cullum X4815
Raj Narayan x4709
67
Denise McGregor x4303
58
David Archer
39
34
Sandy Natha x4242
EA: Eugénie Bint x4903
Whetu Wereta
Government Statistician
General Manager
Christchurch Office
Corporate Support
Financial Services
03
General Manager
Geoff Bascand
Greta Gordon x4223
Human Resources Business Unit
Elizabeth Bridge x4696
15
Nancy McBeth
08
Maori Statistics Unit
14
Deputy Government
Statistician (Acting)
Macro-Economic,
Environment, Regional &
Geography Statistics
EA: Indigo Freya x4858
09
07
Rachael Milicich
Work, Knowledge &
Skills
62
Paul Maxwell x4727
Judith Hughes X4803
55
Industry & Labour Statistics
HRAM: Lisa Mulholland x4871
Strategic Policy &
Planning
Michael Anderson x4930
52
Planning & Performance
Reporting
Corporate Services and Maori
Statistics Unit
HRAM: Robynn Cade x4681
Strategy & Communications
HRAM: Alan McIntyre x4662
98
Nathan Scott x4156
Business & Dissemination
Services and
Chief Information Officer
HRAM: Lisa Mulholland x4871
91
IT Operations &
Services
Sharon Hastie x4645
Business model Transformation Strategy
1.
A number of standard, generic end-to end processes for
collection, analysis and dissemination of statistical data
and information



2.
3.
Includes statistical methods
Covering business process life-cycle
To enable statisticians to focus on data quality and
implemented best practice methods, greater coordination and
effective resource utilisation.
A disciplined approach to data and metadata
management, using a standard information lifecycle
An agreed enterprise-wide technical architecture
BmTS & Metadata
The Business Model Transformation Strategy (BmTS) is designing
a metadata management strategy that ensures metadata:
–
–
–
–
–
–
–
–
fits into a metadata framework that can adequately describe all
of Statistics New Zealand's data, and under the Official Statistics
Strategy (OSS) the data of other agencies
documents all the stages of the statistical life cycle from
conception to archiving and destruction
is centrally accessible
is automatically populated during the business process, where
ever possible
is used to drive the business process
is easily accessible by all potential users
is populated and maintained by data creators
is managed centrally
A - Existing Metadata Issues
•
•
•
•
•
•
•
•
•
•
metadata is not kept up to date
metadata maintenance is considered a low priority
metadata is not held in a consistent way
relevant information is unavailable
there is confusion about what metadata needs to be stored
the existing metadata infrastructure is being under utilised
there is a failure to meet the metadata needs of advanced
data users
it is difficult to find information unless you have some
expertise or know it exists
there is inconsistent use of classifications/terminology
in some instances there is little information about data, where
it came from, processes it has been under or even the
question to which it relates
B - Target Metadata Principles
•
•
•
•
•
•
•
•
•
•
•
•
metadata is centrally accessible
metadata structure should be strongly linked to data
metadata is shared between data sets
content structure conforms to standards
metadata is managed from end-to-end in the data life cycle.
there is a registration process (workflow) associated with
each metadata element
capture metadata at source, automatically
ensure the cost to producers is justified by the benefit to users
metadata is considered active
metadata is managed at as a high a level as is possible
metadata is readily available and useable in the context of
client's information needs (internal or external)
track the use of some types of metadata (eg. classifications)
How to come from A to B?
1.
Identified the key (10) components of our
information model.
2.
Service Oriented Architecture.
3.
Developed Generic Business Process Model.
4.
Development approach from ‘stove-pipes’ to
‘components’ and ‘core’ teams.
5.
Governance – Architectural Reviews & Staged
Funding Model.
6.
Re-use of components.
10 Components within BmTS
E-Form
Raw
Data
Clean
Data
Aggregate
Data
2. Output Data Store
‘UR’
Data
Summary
Data
Official Statistics System &
Data Archive
1. Input Data Store
Web
6. Transformations
RADL
5. Information Portal
Output Channels
CAI
Multi-Modal Collection
Imaging Admin.
Data
4. Analytical Environment
INFOS CURFS
10. Dashboard / Workflow
8. Customer Management
7. Respondent Management
3. Metadata Store
Statistical
Process
Knowledge Base
9. Reference Data Stores
Statistics New Zealand Current Information Framework
Need
Design/
Build
Collect
Process
Analyse
Disseminate
Generic Business Process
QMS, Ag
Range of
information
stores by subject
area (silos)
HES etc.
Time Series
Store
(& INFOS)
ICS Store
Web Store
Metadata Store (statistical, e.g. SIM)
Reference Data Store (e.g. BF, CARS)
Software Register
Document Register
Management Information - HR & Finance Data Stores
Statistics New Zealand Future Information Framework
Need
Design/
Build
Collect
Analyse
Process
Disseminate
Generic Business Process
Input Data Store
Raw
Data
Clean
Data
Summary
Data
Output Data Store
TS
(confidentialised
copy of IDS Physically separated)
ICS
Metadata Store (statistical/process/knowledge)
Reference Data Store
Software Register
Document Register
Management Information - HR & Finance Data Stores
WEB
CMF – gBPM Mapping
CMF Lifecycle Model
Statistics NZ gBPM (sub-process level)
1 - survey planning and design
Need (sub-processes 1.1 - 1.5) + Develop &
Design (sub-processes 2.1 - 2.6)
2 - survey preparation
Build (sub-processes 3.1 - 3.7) + Collect (subprocess 4.1)
3 - Data collection
Collect (sub-processes 4.2 - 4.4)
4 - Input processing
Collect (sub-process 4.5) + Process (subprocesses 5.1 - 5.3)
5 - Derivation, Estimation,
Aggregation
Process (sub-processes 5.4 - 5.7)
6 - Analysis
Analyse (sub-processes 6.1 - 6.6)
7 - Dissemination
Disseminate (sub-processes 7.1 - 7.5)
8 - Post survey evaluation
Not an explicit process, but seen as a vital
feedback loop.
Metadata: End-to-End

Need
–
–

capture requirements eg usage of data, quality requirements
access existing data element concept definitions to clarify requirements
Design
–
–
–
–

capture constraints, basic dissemination plans eg products
capture design parameters that could be used to drive automated
processes eg stratification
capture descriptive metadata about the collection - methodologies used
reuse or create required data definitions, questions, classifications
Build
–
–

capture operational metadata about selection process eg number in each
stratum
access design metadata to drive selection process
Collect
–
–
–
capture metadata about the process
access procedural metadata about rules used to drive processes
capture metadata eg quality metrics
Metadata: End-to-End (2)

Process
–
–
–

capture metadata about operation of processes
access procedural metadata, eg edit parameters
create and/or reuse derivation definitions and imputation parameters
Analyse
–
–
–
–

capture metadata eg quality measures
access design parameters to drive estimation processes
capture information about quality assurance and sign-off of products
access definitional metadata to be used in creation of products
Disseminate
–
–
–
–
–
capture operational metadata
access procedural metadata about customers
Needed to support Search, Acquire, Analyse (incl; integrate), Report
capture re-use requirements, including importance of data - fitness for
purpose
Archive or Destruction - detail on length of data life cycle.
Metadata: End-to-End - Worked Example
Question Text: “Are you employed?”

Need
–
–
–

Concept discussed with users
Check International standards
Assess existing collections & questions
Design
–
–
–
–

Design question text, answers & methodologies
Align with output variables (e.g. ILO classifications)
Data model, supported through meta-model
Develop Business Process Model – process & data / metadata flows
Build
–
–
–
Concept Library – questions, answers & methods
‘Plug & Play’ methods, with parameters (metadata) the key
System of linkages (no hard-coding)
Metadata: End-to-End - Worked Example
Question Text: “Do you live in Wellington?”


–
–
–
–
–

Collect
Question, answers & methods rendered to questionnaire
Deliver respondents question
Confirm quality of concept
Process
Draw questions, answers & methods from meta-store
Business logic drawn from ‘rules engine’
Analyse
–
–
–

Deliver question text, answers & methods to analyst
Search & Discover data, through metadata
Access knowledge-base (metadata)
Disseminate
–
–
Deliver question text, answers & methods to user
Archive question text, answers & methods
Conceptual View of Metadata


Anything related to
data, but not dependent
on data = metadata
There are four types of
metadata in the model:
Conceptual (including
contextual),
Operational, Quality
and Physical
…defined by MetaNet
Implementation: Dimensional Model
Metadata
•Standard
classifications
Dimension
•Standard
variables
•Standard
Dimension
questions
•Survey
•Instruments
Dimension
•Survey mode
•Standard data
Dimension
definition
FACT
Architecture
User access
INFORMATION PORTAL
Reference data
Classifications
Metadata
Service layer
FACT
FACT
Input Data Environment
Version 2.0.06
IDE/MetaStore
question
q_key
question_text
<pk>
fact_definition
int identity
varchar(1000)
fd_key
desc _text
<pk>
class ification_used
fact_definition_class ification
int identity
varchar(1000)
fd_key
cu_key
<pk,fk>
<pk,fk>
int
int
cu_key
class fn_nbr
class fn_ver_nbr
level_nbr
class fn_cat_code
<pk>
Fact definitions
instrument_attribute_type
iat_key
attribute_type_code
<pk>
variable_library
int identity
varchar(10)
v_key
var_name
fd_key
data_type_code
<pk>
<fk>
fact_life_cycle
int identity
int
int
int
varchar(15)
flc_key
status_code
int identity
varchar(30)
Versioning
fact_group_key
cu_key
<fk> int
<fk> int
rfc_key
reason_text
<pk>
period_key
year
month
day
date
week
int identity
int
int
varchar(25)
varchar(25)
int
int
varchar(25)
varchar(25)
char(1)
dim_member
<pk>
int identity
int
int
int
datetime
int
dm_key
dl_key
dm_parent
dm_text
<pk>
<fk>
<fk>
int identity
int
int
varchar(255)
Generic
Dimensions
instrument_question_map
answer_part
ap_key
<pk> int identity
answer_part_text
varchar(255)
Time
Dimensions
Hiearchies
fact
<pk>
<fk>
<fk>
<fk> int
<fk> int identity
Static
Reference
Tables
instrument_attribute
<pk,fk> int
<pk,fk> int
varchar(255)
iqm_key
i_key
qap_key
q_code
ap_code
line_seq_nbr
column_seq_nbr
unit_of_measure
magnitude
question_type_code
dm_key
fd_key
time
int identity
varchar(255)
Questions & Variables
iat_key
iqm_key
attribute_text
fact_defn_dimension
varchar(50)
varchar(50)
varchar(30)
varchar(255)
reason_for_change
Question
Dimensions
question_answer_part
qap_key
<pk> int identity
q_key
<fk> int
ap_key
<fk> int
fd_key
<fk> int
data_type_code
char(1)
domain_table
domain_column
domain_code
domain_label
Versioning
Dimensions
fact_class ification
int identity
varchar(255)
int
char(1)
domain_value
<pk>
instrument_variable_map
ivm_key
i_key
v_key
column_nbr
file_offset
var_length
unit_of_measure
magnitude
<pk>
<fk>
<fk>
int identity
int
int
int
int
int
varchar(25)
varchar(25)
f_key
fact_group_key
fact_ver_nbr
flc_key
r_key
ci_key
uoi_key
qap_key
fd_key
i_key
rfc_key
su_key
v_key
actual_period_start_key
actual_period_end_key
create_date
create_user
fact_value
<pk>
int identity
int
int
int
int
int
int
int
int
int
int
int
int
int
int
datetime
sy sname
varchar(2000)
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
<fk>
IDE Operational Area
and Exceptions Area
exception_fact
ef_key
<pk>
exception_type_code
f_key
<fk>
fact_group_key
fact_ver_nbr
flc_key
<fk>
r_key
<fk>
ci_key
<fk>
uoi_key
<fk>
qap_key
<fk>
fd_key
<fk>
i_key
<fk>
rfc_key
<fk>
su_key
<fk>
v_key
<fk>
actual_period_start_key
<fk>
actual_period_end_key
<fk>
create_date
create_user
fact_value
dim_level
dl_key
ad_key
dl_parent
dl_text
int identity
char(1)
int
int
int
int
int
int
int
int
int
int
int
int
int
int
int
datetime
sy sname
varchar(2000)
ad_key
ad_text
<pk>
<fk>
<fk>
int identity
int
int
varchar(255)
additional_dimension
<pk> int identity
varchar(255)
* e xc ep tion_ fac t ta ble r ela tio ns hip s ha ve no t b ee n de pic te d.
Re lation sh ip s a re implied be tw ee n pa r en t ta ble p rima ry ke ys
an d c hild tab le for eig n k ey s tha t e xist in e xc ep tio n_ fact.
collection
c_key
name_text
freq_code
<pk>
instrument
int identity
varchar(255)
char(1)
i_key
name_text
instrument_code
instrument_type_code
<pk>
instrument_mode
int identity
varchar(255)
varchar(30)
char(1)
i_key
m_key
Collections
& Instruments
Collection
Dimensions
collection_instance
ci_key
<pk>
c_key
<fk>
collection_instance_code
collection_instance_type_code
name_text
status_code
reference_period_start_date
reference_period_end_date
int identity
int
varchar(25)
char(1)
varchar(255)
varchar(30)
datetime
datetime
m_key
mode_code
ii_key
ci_key
i_key
su_key
instrument_instance
<pk>
<fk>
<fk>
<fk>
response_attribute
<fk> int
<fk> int
rat_key
r_key
attribute_text
mode
<pk>
<pk,fk>
<pk,fk>
response_attribute_type
int
int
varchar(255)
response
int identity
varchar(10)
r_key
m_key
ii_key
response_id
<pk>
<fk>
<fk>
rat_key
attribute_type_code
<pk>
int identity
varchar(10)
supplying_unit
int identity
int
int
varchar(50)
su_key
su_id
su_source_code
name_text
su_type_code
<pk>
Respondents
Respondent
Dimensions
int identity
varchar(25)
char(3)
varchar(100)
char(3)
unit_of_interest
uoi_key
ii_key
uoi_id
uoi_source_code
name_text
uoi_type_code
status_code
s_key
<pk>
<fk>
<fk>
strata_attribute
int identity
int
char(10)
char(3)
varchar(100)
char(1)
char(1)
int
sa_key
s_key
data_type_code
name_text
value_text
<pk>
<fk>
int identity
int
char(1)
varchar(50)
varchar(255)
Units of Interest
weight
uoi_key
s_key
weight_type_code
weight_value
create_date
comment_text
<fk> int
<fk> int
char(1)
float
datetime
varchar(1000)
int
int
int
int
strata
s_key
ci_key
strata_code
sub_strata_code
<pk>
<fk>
int identity
int
varchar(10)
varchar(10)
Goal: Overall Metadata Environment
Search and Discovery
Metadata and Data Access
Data
Passive Metadata
Store/s
Business Logic
Classification
Question Library
Data Definition
Management
Frames/Reference Stores
Schema
Metadata: Recent Practical Experiences


–
–
–
–
Generic data model – federated cluster design
Metadata the key
Corporately agreed dimensions
Data is integrateable, rather than integrated
Blaise to Input Data Environment
Exporting Blaise metadata
‘Rules Engine’

–
–

Based around s/sheet
Working with a workflow engine to improve (BPM based)
IDE Metadata tool


Currently s/sheet based
Audience Model
–
Public, professional, technical – added system
SOA
Channel Interfaces
Data Warehouse
Databases
Support Functions
Functions
Services
Internet
Intranet
Extranet
BI Cubes, SAS etc
Web
Services
Security
Application
Admin
System
Monitoring
Application Services
Analytics
Execution Engine
Services
Transaction Mgmt
Process Management
Workflow
Directory Services
Services
Resource Mgmt
Service Layer (Message and Data Bus)
Queuing
Load Mgmt
Scheduling
Business Rules
Rules Engine
Transformations
Rules Engine
Adapter
Adapter
Adapter
Adapter
Adapter
Adapter
Adapter
Adapter
Respondent
Management
CRM
Customer
Management
CRM
Call Centre
SAS
ETL Tools
SQL Server
Blaise
Other
Services
Standards & Models - The MetaNet
Reference ModelTM
 Two Level Model based on:

Concepts = basic ideas, core of
model

Characteristics = elements,
attributes, make concepts
unique
 Terms and descriptions can be adapted
 Concepts must stay the same
 Concepts should be distinct and consistent
 Concepts have hierarchy and relationships
Collection
Eg. Census
Frequency= 5 yearly
Collection Instance
Questionaire A
Eg. Census 2006
Questionaire B
Do you live in Wellington?
Question 1
Question 1
What is your age?
2 WGTN
Classification: Question
CITY Category:
Classification: NZ Island Category: NTH
ISL
Question 2
Question 3
Fact definition 1
Classifications
How old are you?
Fact definition 2
Person lives in
Wellington
Classifications
Fact definition 3
Classifications
Fact definition 4
Age of person
Classifications
Defining Metadata Concepts: Example
How will we use MetaNet?
1.
Use to guide the development of a Stats NZ
model
2.
Another model (SDMX) will be used for
additional support in gaps
3.
Provides the base for consistency across
systems and frameworks
4.
Will allow for better use and understanding of
data
5.
Will highlight duplications and gaps in current
storage
Metainformation systems
Concept Based Model
SIM
Data Collections
Variables
CARS
Classifications
Categories
Statistical Units
Sample Design
Concordance
IDE
Other Metadata
stored in:
Domain Value
•Business Frame
Fact
Classification
Response
Collection
•Survey Systems
•BmTS
components
•etc
Metadata Users - External
• Government,
• Public,
• External Statisticans (incl. Intl Orgs)
Metadata Users - Internal
– Statistical Analysts
– IT Personnel (business analysts, IT designers & technical leads,
developers, testers etc.)
– Management
– Data Managers / Custodians / Archivists
– Statistical Methodologists
– External Statisticians (researchers etc.)
– Architects - data, process & application
– Respondent Liaison
– Survey Developers
– Metadata and Interoperability Experts
– Project Managers & Teams
– IT Management
– Product Development and Publishing
– Information Customer Services
Lessons Learnt – Metadata Concepts
•
•
•
Apart from 'basic' principles, metadata principles
are quite difficult. To get a good understanding of
and this makes communication of them even
harder.
Every-one has a view on what metadata they need
- the list of metadata requirements / elements can
be endless. Given the breadth of metadata - an
incremental approach to the delivery of storage
facilities is fundamental.
Establish a metadata framework upon which
discussions can be based that best fits your
organisation - we have agreed on MetaNet,
supplemented with SDMX.
Lessons Learnt – BPM
• To make data re-use a reality there is a need
to go back to 1st principles, i.e. what is the
concept behind the data item. Surprisingly it
might be difficult for some subject matter
areas to identify these 1st principles easily,
particularly if the collection has been in
existence for some time.
• Be prepared for survey-specific requirements:
the BPM exercise is absolutely needed to
define the common processes and identify
potentially required survey-specific features.
Lessons Learnt – Implementation
•
•
Without significant governance it is very easy to
start with a generic service concept and yet still
deliver a silo solution. The ongoing upgrade of all
generic services is needed to avoid this.
Expecting delivery of generic services from input /
output specific projects leads to significant
tensions, particularly in relation to added scope
elements within fixed resource schedules. Delivery
of business services at the same time as
developing and delivering the underlying
architecture services adds significant complexity to
implementation.
Lessons Learnt – Implementation
• Well defined relationship between data
and metadata is very important, the
approach with direct connection
between data element defined as
statistical fact and metadata dimensions
proved to be successful because we
were able to test and utilize the concept
before the (costly) development of
metadata management systems.
Lessons Learnt – SOA
• The adoption and implementation of SOA as
a Statistical Information Architecture requires
a significant mind shift from data processing
to enabling enterprise business processes
through the delivery of enterprise services.
• Skilled resources, familiar with SOA concepts
and application are very difficult to recruit,
and equally difficult to grow.
Lessons Learnt – Governance
•
•
The move from ‘silo systems’ to a BmTS type
model is a major challenge that should not be
under-estimated.
Having an active Standards Governance
Committee, made up of senior representatives from
across the organisation (ours has the 3 DGSs on
it), is a very useful thing to have in place. This
forum provides an environment which standards
can be discussed & agreed and the Committee can
take on the role of the 'authority to answer to' if
need be.
Lessons Learnt – Other
•
•
•
There is a need to consider the
audience of the metadata.
Some metadata is better than no
metadata - as long as it is of good
quality.
Do not expect to get it 100% right the
very first time.
Questions?