Metadata models to support the statistical cycle: IMDB Alice Born Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

Download Report

Transcript Metadata models to support the statistical cycle: IMDB Alice Born Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

Metadata models to support the statistical cycle:
IMDB
Alice Born
Statistics Canada
UNECE Workshop on Statistical Metadata
July 4 to 6, 2007
Outline
• Survey life cycle and the IMDB
• IMDB model
– Data dimension model
– Business dimension model
– Questionnaire model
• Registration
• Classification of administered items
• Use of metadata in the statistical system
Role of the IMDB
• Information management – interpretability of
Statistics Canada’s 590+ current surveys
• Assist in coherence of the data
• Promote knowledge sharing across STC and
with external users
• Preserve corporate memory
• Promote reuse of our metadata assets
IMDB in the survey life cycle
Data Warehouses
Operations
Management
Quality
Assurance
Metadata
IMDB
Design
Analysis
Collect
Operational
Data
Edit
Estimate
Registers
Dissemination
IMDB
Tabulate
Survey
Data
Operational Data Stores
Publish
Administrative
Data
Archive
IMDB metadata model
• Corporate Metadata Repository (CMR), which is an
extension of ISO/IEC 11179 Metadata Registries
– Statistical surveys
– Sample
– Questionnaire
– Data sets
– Products
– Systems
• IMDB – data dimension, business dimension,
questionnaire model, administration and documents
model
Data dimension model – ISO/IEC 11179
Data Element
Data Element Concept
Object Class
Property
Conceptual Domain
Value Domain
Survey
variable
Data dimension model
Currently in the IMDB:
85 object classes (statistical units)
290 properties
506 data element concepts (O.C. + property)
202 conceptual domains (representation class + property)
1509 value domains (classifications)
1034 data elements (= representation class + property +
object class; variables)
Type of revenues of establishments
Business dimension model in the IMDB
Applications/
Software
Survey
Frame and
sample
Survey
instance
Questionnaire
Datasets
Products
(COR)
Survey design
Data elements
Value domains
Administrative layer
Statistical Activity
Organization
Survey
Stewardship
Contact
Universe
Documentation
Frame
Identification
Survey instance
Time Frame
Instrument
Keyword
Question
Identification
Classification
Theme
Data file
Methodology
Administered
items
Instrument design
Sampling
Data source
Error detection
Imputation
Estimation
Quality evaluation
Disclosure control
Revisions and seasonal
adjustment
Data accuracy
Data Element
Data Element Concept
Object Class
Property
Formula
Conceptual Domain
Value Domain
Information management - Administered items
• Any item that is managed, tracked, organized
and registered in a registry
• Administered items have
– their own set of characteristics specific to the
administered item
– and shared administrative characteristics
which are common to all administered items –
administrative layer
Information management - Administrative Layer
• Shared administrative characteristics
– Terminological Designation (Names)
– Terminological Description
– Time Frame
– Organization/Contact
– Reference Document1
– Version Management
– Stewardship/Registration
– Classification
1
Reference document is an administered item with all the administrative layer characteristics.
IMDB Administrative Layer - Version Management
• A snapshot of the information recorded for the
administered item.
• Rules for creation of a version are established for each
type of administered item.
Information Management - IMDB Administrative Layer
• The administrative layer is used to manage
administrative information for all IMDB
administered items.
• Administered items are managed in a consistent
manner.
Surveys
• Metadata in the IMDB is organized around the survey
administered item
• Refers to collection, compilation and publication of data
measuring characteristics of a population
• Three types of surveys are recognized:
• Direct
• Administrative
• Derived
Statistical Activities
• Group of surveys that share common feature,
common explanatory text
• E.g., System of National Accounts, Unified
Enterprise Statistics, Health Statistics
Common metadata set
Statistical activity
Survey (direct, administrative, derived)
Target population (population, statistical unit)
Survey instance (each survey process)
Collection instrument (questionnaire)
Methodology
Data accuracy
Documentation
Data file
(Data elements, value domains)
Common metadata set for survey life cycle
Methodology
Instrument design
Sampling
Collection method
Error detection
Imputation
Estimation
Quality evaluation
Disclosure control
Revisions and seasonal adjustment
Questionnaire model
Question block
Item_ID
Block_type, etc…
Response choice
Question_item_ID
Response choice,
etc…
Questionnaire
Item_ID,
etc…
Question
Item_ID
DE_item_ID,
etc…
Data element
Item_ID
Representation_class,
etc…
Value domain
Item_ID
VD_type,
etc…
Questionnaire model in the IMDB
• Metadata for survey planning and design phase
– Does the concept or question already exist?
• Metadata discovery - STCWiki
– Align with output variables - definitions
• Harmonized Content Modules Project
– Content development of key socio-demographic data
elements (e.g., marital status, age, ethnic origin) in IMDB
for registration as a STC standard
– Leading to development of standard question blocks and
questions – stored in the IMDB
– Specifications (i.e., skip patterns, modes) / BLAISE and
other code stored in Survey Specification Manager
Registration/Stewardship
• Registration and stewardship information is managed for
each administered item
–
–
–
–
–
Who is the owner of the item?
Who is responsible for the item’s information?
Who is responsible for registration?
Verification for editorial, accuracy, bilingual conformance?
State – new, candidate, recorded, qualified, standard,
preferred/prescribed standard, retired?
– Degree of sharing/harmonization – divisional, branch,
agency, provincial, national, international?
– Dissemination – Internal, public?
– Versioning note
Registration Attributes in the IMDB
Three registration attributes:
1. Registration status – identifies the quality or
progression of quality
2. Registration level – level of conformance or
harmonization
3. Administrative status – stage in the registration
process
1. Registration status
Registration
Authority
(Completeness, accuracy,
adherence to quality and
terminological description
standards)
Preferred standard
Retired
Standard
Superseded
Standards Division
Registrar
Qualified
Regular
Registrar
Recorded
Responsible
Owner
Candidate
(Content)
Steward
Historical
Submitter
Incomplete
Application
2. Registration level
Level of conformance or harmonization
Departmental
International
Recommended
U.S.
Program-specific
Canadian
Survey
Provincial
3. Administrative status
Stages in registration process
Registered
De-registered
Reserved for edit
New
Not registered
Classification of administered items
• Organization and classification of the administered item
– Keyword
– STC taxonomy (28 themes, 200+ sub-themes)
– UNECE Classification of International Statistical Activities
– data elements
– Program Activity Architecture for reporting to Treasury
Board Secretariat and to parliament
– …
• Organization of the item’s administrative and itemspecific information for different purposes
– HTML, Wiki, SDMX, CWM, DDI, XBRL., …
Survey design and dissemination phases
Design
Collect
Edit
Estimate
Concepts
(Object Class, Property,
Data Element Concept)
Tabulate
Publish
Survey
IMDB
Universe
Data Elements
Frame
Questions
Instance
Questions Blocks
Collection Instrument
Classifications
Methodology
(Conceptual Domain
Value Domain)
Data Files
Enterprise Architecture
Reuse of Information Assets in Applications Development
Classification coding
IMDB
Collection instrument development
Survey Specification Manager;
Integrated Questionnaire and
Metadata System
Publishing
Other applications
Software Register
Reuse of Information Assets
Integration with Data
Data Warehouses
IMDB
CANSIM
Reuse of Information Assets in Dissemination and
information discovery
Wiki
IMDB
HTML
SDMX
DDI
One meta data source
many uses for the
information
many output formats
?
Corporate Memory: Data Files
Dissemination and archive phases
Operational
Data
Survey
Data
Registers
Administrative
Data
Operational Data Stores
IMDB
Public Use
Master File
Clean Master
File
Archival
information
Archived
Data