End-to-End Management of the Statistical Process An Initiative by ABS Bryan Fitzpatrick Rapanea Consulting Limited and Australian Bureau of Statistics Work Session on Statistical Metadata (METIS) March 2010,

Download Report

Transcript End-to-End Management of the Statistical Process An Initiative by ABS Bryan Fitzpatrick Rapanea Consulting Limited and Australian Bureau of Statistics Work Session on Statistical Metadata (METIS) March 2010,

End-to-End Management of the
Statistical Process
An Initiative by ABS
Bryan Fitzpatrick
Rapanea Consulting Limited
and
Australian Bureau of Statistics
Work Session on Statistical Metadata (METIS)
March 2010, Geneva
The Objectives
•
Business transformation aimed at
• reducing cost
• improving effectiveness and ability to respond
– A holistic approach to managing and improving the entire statistical life-cycle
•
International collaboration
– ABS does not want to go it alone
– aim is for a shared approach
• sharing of ideas, interfaces, tools
• but with acceptance of national differences
•
Build on recent progress in international statistical community
– standards (SDMX, DDI), GSBPM
– aim is to make them work in practice
•
A new program – IMTP
– Information Management Transformation Program
End-to-End Management of the
Statistical Process
• Metadata is always the key to better approaches and process
improvements
– it has been in all previous ABS improvement programs
– ABS has a long history in trying to manage metadata (with modest
successes)
• Metadata means all the information we use in and around the
processes and the data
– to improve things we need to understand it, rationalise it, share it, and
use it to automate and drive processes and make the outputs more
integrated and usable
• Previous improvement programs have generally been much more
limited
– Focused on few areas in a few projects
– Narrow metadata focus
SDMX and DDI
• They are useful standards
– they are not the focus of ABS interest in the exercise
• the focus is optimising the statistical processes and improving the
results from the processes
• but we need to describe and manage all aspects the statistical
process and that is their target domain
– they are international standards
• sponsored and used by the community ABS is part of for purposes
that are relevant to IMTP
• to discuss the issues internally and with other organisations we
need models
– SDMX and DDI are in use, relevant, and fit for purpose
– IMTP aims to apply these standards (along with some others –
ISO 11179, ISO 19115) and make them work
• build on recent work in the international statistical community
IMTP and Metadata
Management
• Metadata Management will be a major part of IMTP
– storing it, rationalising it, making it available for sharing and easy
use, presenting it in different ways
• and integrating with existing stores such as Input Data Warehouse,
Data Element Repository, ABS Information Warehouse
– we talk of a “Metadata Bus” and “Metadata Services”
• some technical jargon
– it means the metadata is easily available to all systems running in the
ABS environment
– we are still figuring out precisely what we mean and how it should
look
• we need to get “use cases” – examples of what business areas and
their systems need to do with the metadata
• but the services will deliver various sorts of metadata in XML formats
– conforming to schemas from DDI and SDMX
IMTP and Metadata
Management
• IMTP focus will be on metadata that is “actionable”
– it means we want it in a form that both people and systems can
use
• that can be easily stored and passed around
• that can be used easily to generate whatever format is required in
any particular case
– including web pages, PDFs, manuals, other human-readable forms
• SDMX and DDI both represent the metadata in XML
• Major focus on metadata management
– version and maintained as in SDMX and DDI
– “confrontation” across collections and processes
• aim is consistent, standard, metadata across the organisation
– and consistent with international use wherever sensible
What sorts of metadata?
•
Current ABS metadata management has many shortcomings
– much metadata in corporate stores
• in too many stores, and often documentary rather than actionable
• often not used to drive systems even where it is available and actionable
– the systems predated the stores
– but much metadata is still embedded in individual systems
– there are cases of good managed shared approaches
• but often narrowly focused
– eg around dissemination
•
End-to-end management of the process requires a comprehensive,
consistent approach
–
–
–
–
–
–
–
questions, question controls, interviewer instructions
coding, editing and derivation metadata
data relationship metadata
table structures
classification evolution and history
alternative hierarchies in geography and other classifications
…
SDMX and DDI
•
SDMX comes from the international agencies (OECD, IMF, Eurostat,
UNSD, World Bank, ECB, BIS)
– they get aggregate statistical tables from many countries regularly over time
– they want to automate and manage the process
• they need standard agreed definitions and classifications, standard agreed table
structures, standard agreed formats for both data and metadata
– They commissioned SDMX in 2002
• started a project, gathered use cases, employed consultants
• produced a standard and presented it to large numbers of international statistical
forums
• started to use it and to pressure NSOs to use it
– SDMX is pretty good
• excellent for managing dissemination of statistical data
– very good tools for very impressive web sites based on data organised in the SDMX model
• also some good frameworks for managing evolution of classifications
• a framework for discussing agreements on concepts and classifications
– Metadata Common Vocabulary, Cross-Domain Concepts, Domain-specific Concepts
SDMX and DDI
• DDI (Data Documentation Initiative) comes from the data archive
organisations across many countries
– trying to capture and store survey data for future use
• and to document it so future users can understand it and make sense of it
• mostly social science collections from researchers
• funding organisations are requiring such data to be preserved for further use
– mostly they had to grab data and try to salvage metadata after the event
• but DDI now aims to capture all metadata “at source”
– early versions were narrowly focused on an individual data set
• grew out of their documentation processes
– latest version (DDI V3) is much more extensive, better organised
• common analysis/designer support with SDMX
• an end-to-end model compatible with the Generic Statistical Business
Process Model (GSBPM)
DDI Metadata
• DDI has
– Survey-level metadata
• Citation, Abstract, Purpose, Coverage, Analysis Unit, Embargo, …
– Data Collection Metadata
• Methodology, Sampling, Collection strategy
• Questions, Control constructs, and Interviewer Instructions organised into
schemes
– Processing metadata
• Coding, Editing, Derivation, Weighting
– Conceptual metadata
• Concepts organised into schemes
– Including 11179 links
• Universes organised into schemes
• Geography structures and locations organised into schemes
DDI Metadata
• DDI has (cont)
– Logical metadata
• Categories organised into schemes
– (categories are labels and descriptions for question responses, eg, Male,
Unemployed, Plumber, Australia, ..)
• Codes organised into schemes and linked to Categories
– Codes are representations for Categories, eg “M” for Male, “Aus” for Australia)
• Variables organised into schemes
– Variables are the places where we hold the codes that correspond to a response
to a question
• Data relationship metadata
– eg, how Persons are linked to Households and Dwellings
• NCube schemes
– descriptions for tables
DDI Metadata
• DDI has (cont)
– Physical metadata
• record structures and layouts
– File instance metadata
• specific data files linked to their record structures
– Archive metadata
• archival formats, locations, retention times, etc
– Places for other stuff not elsewhere described
• Notes, Other Material
– References to “Agencies” which own artefacts but no explicit structure
to describe them
– Inheritance and links embedded in most schemes
• but need to be ferreted out, not necessarily easily usable
SDMX Metadata
• SDMX has
– Organisations organised into schemes
• Organisations own and manage artefacts, and provide or
receive things
– Concepts organised into schemes|
– Codelists, including classifications
• a Codelist combines DDI Categories and Codes
– Data Structure Definitions (Key Families)
• a DSD describes a conceptual multi-dimensional cube used
in a Data Flow and referenced in Datasets
SDMX Metadata
• SDMX has
– Data Flows
• described by a DSD, linked to registered data sets, and categorised
– Categories organised into schemes
• not the same as a DDI Category
• provide a basis for indexing and searching data
– Hierarchical Codelists
• a misnomer – maps relationships amongst inter-related classifications
• explicit, actionable representations of relationships
– Process metadata
• a Process has steps with descriptions, transition rules, computation
information, inputs, outputs
• all actionable, linked to other SDMX artefacts or to external sources
SDMX Metadata
• SDMX has
– Structure Sets
• additional linking of related DSD and Flows
– Reporting Taxonomies
• information about assembling reports or publications
– Reference Metadata, Metadata Structure Definitions,
and Metadata Flows
• additional, probably useful, options for attaching metadata to
data
– Annotations almost everywhere
• good options for managed, actionable extensions
What sorts of metadata?
• What are we interested in?
– Concepts
• probably organised into schemes
• what are the use cases?
– Classifications
• broken up into Categories and Codes DDI-style?
• with links to related classifications SDMX Hierarchical
Codelist-style?
• what are the use cases?
– Questions and related metadata
• just how should it look?
– a DDI package but precisely what is useful
– what are the use cases?
What sorts of metadata?
• What are we interested in?
– Survey-level metadata?
• what are the use cases?
– Structure Definitions
• almost certainly, but we need use cases
– Variable, Relationship, and Record Structure
metadata
• maybe, but we need use cases
– Processing metadata
• almost certainly, but we need use cases
• SDMX Process and/or DDI artefacts
What are the next steps?
• Basically we need use cases
– How do we see our metadata being used?
– What are trying to support?
– What can we get from our pilot programs?
• we need to do our own abstraction from that
• We can then start to define a provisional set of services
– with parameters and schemas
• We can then think about existing sources and
demonstration systems
• We can then think about repositories and stores
Timeframe and Process
• We are at the start of the process
– a project team that is still forming
– several “satellite” projects
• small, sometimes significant projects attempting to apply ideas
– and provide use cases for design
• Have had substantial training and discussion around application of
DDI and SDMX
– international experts providing training
– significant numbers of ABS staff involved
– more to come later this month
• Not a “big bang” new implementation
– rather a framework and environment for all new developments
• with some retro-fitting to existing systems
– some direct development of key components
International Collaboration
•
A definite part of the project
– most national agencies are feeling financial pressures and struggling to build
everything themselves
•
Need to discuss how collaboration might proceed
– some discussions have been held amongst heads of NSOs
• more planned
– agreed standards are important enabler
• need participation of NSOs in evolution of standards
– what are barriers to collaboration and how might we manage it
– probably do not want too large a group of collaborators at the start
•
ABS (and others) will continue to report to international forums and
meetings
– managerial and technical
– important part of fostering the collaboration
• and finding out what others are doing
• and getting feedback on our ideas
Questions?
• [email protected]