Case Studies: Statistics Canada (WP 11) Alice Born [email protected] Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

Download Report

Transcript Case Studies: Statistics Canada (WP 11) Alice Born [email protected] Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007

Case Studies:
Statistics Canada (WP 11)
Alice Born
[email protected]
Statistics Canada
UNECE Workshop on Statistical Metadata
July 4 to 6, 2007
Outline
1.
2.
3.
4.
5.
Overview
Statistical metadata systems and the statistical cycle
Statistical metadata in each phase of the statistical
cycle
Systems and design issues
Organizational and cultural issues
Overview of Integrated Metadatabase (IMDB)
• To support interpretation of the data –
dissemination phase
• Responsibility of Standards Division (metadata,
classifications and standard definitions)
• Adherence to Policy on Informing Users on Data
Quality and Methodology, Policy on Standards
and Quality Assurance Framework
• In general, metadata goes back November 2001
Overview of Integrated Metadatabase (IMDB)
• Contains metadata on 350 active and 250 inactive surveys and
statistical programs
– Purpose
– Methodology used to produce the data
– Measures of data accuracy
– Variables, classifications for the data
– Location of clean master datafile
– Contacts
• Survey managers cannot release data without the prescribed
metadata – mandatory
Overview of Integrated Metadatabase (IMDB)
Next priorities:
• Complete documentation of variables
• Complete questionnaire model
• determine metadata for archived datafiles – may require
additional metadata
Lessons learned:
• Opportunities in collecting metadata in the first phase of
the statistical cycle – not at the time of dissemination
Statistical metadata systems and the statistical
cycle
Relationship with survey planning and design phase
• IMDB expanded its role as part of the Household Survey Content
Harmonization
• Standardize concepts, questions, question blocks across household
surveys
• Variables follow the ISO-IEC 11179
• Questions and question blocks, associated response choices linked
to variables and classifications are stored in the IMDB at the
beginning
• Survey Specification Manager pulls metadata from the IMDB but
contains specifications and code
Statistical metadata systems and the statistical
cycle
Relationship to dissemination systems
•
Metadata for information modules on the STC website
– mandatory
•
Information for survey respondents – requires
metadata prior to release of data
•
Data Liberation Initiative – public-use microdata files
documented in DDI
•
Metadata to support data exchange – SDMX, DDI,
XBRL, Wiki, HTML, etc….
Statistical metadata systems and the statistical
cycle
Relationship to aggregation - analysis phase
• Analytical datawarehouses use IMDB to organize their
tables (variables and classifications)
Relationship to archive phase
• IMDB contains location of master datafile, record layout,
contact information
• Currently developing business rules for archived
datafiles
Statistical metadata systems and the statistical
cycle
Relationship with management systems
• Software Register – registry of Agency’s software and
applications organized by survey and statistical program
– IMDB is the inventory
• Quality management assessment and questionnaire –
based on inventory of surveys in the IMDB; reuse of
existing metadata
IMDB in the survey life cycle
Data Warehouses
Operations
Management
Quality
Assurance
Metadata
IMDB
Design
Analysis
Collect
Operational
Data
Edit
Estimate
Registers
Dissemination
IMDB
Tabulate
Survey
Data
Operational Data Stores
Publish
Administrative
Data
Archive
Statistical metadata for phases in the statistical
cycle
Metadata describing statistical business
processes
– Data dissemination for interpretation of data
– IMDB serves as the corporate inventory of all
surveys and statistical programs,
questionnaires, master datafiles
– metadata or paradata resides in other
metainformation systems – SSM, IQMS
Statistical metadata for phases in the statistical
cycle
Metadata for data elements
– Supports: Survey planning and design; Analysis;
Dissemination; Archiving
– Metadata objects tracked over time for changes
(versioning) and validity (registration)
– Output to online data tables and STC products
– For discovery – inventory of DE on STC website and
STCWiki (internal review before going public)
– Links to questions, question blocks, datafiles
STCWiki – Type of marital status of person
Statistical metadata for phases in the statistical
cycle
Metadata for survey planning and design
– Questions, standard questions blocks and
standard response choices in IMDB
– Mapped to value domains, data elements and
surveys in the IMDB
– These metadata assembled into collection
instruments in other metainformation systems
outside the IMDB
Systems and design issues
• IMDB started in 1998
– Phase 1 Consolidation of existing metadata stores
– Phase 2 Metadata describing statistical business
processes
– Phase 3 Metadata for data elements, etc.
• MetaStat system – Statistical activity, survey, instance,
frame, universe, instrument, datafiles, survey
methodology, documentation, data accuracy
• MetaWeb system – object class, property, data element,
value domain, question, response choices, question
block, value meaning manager
Phase 2 Input Screens
Text strings related to data components
Directives Resource Bundle
IMDB
database
Key
Value
SurveySDDS
…
Statistical Data Doc…
...
Labels Resource Bundle
Key
Value
SurveySDDS
…
SDDS
...
Phase 2 Input Screen
Administered Item
Phase 2 - Identification Tab
Systems and design issues
Dissemination and information discovery
systems
• Web publication from IMDB is through HTML,
dynamically generated with Perl scripts
• Conforms to government standards – CLF
• Survey-centric view and developing DE-centric view
• Discovery from Wiki solution – non-linear view of Phase
2 and 3 metadata
• Allows users to view links among administered items in
the IMDB
Organizational and cultural issues
•
•
•
•
•
Information management
Assist in harmonization / usage of standards
Knowledge sharing
Corporate memory
Reuse of our metainformation assets
Knowledge Sharing/Corporate Memory
Survey Life Cycle
Design
Collect
Edit
Estimate
Concepts
(Object Class, Property,
Data Element Concept)
Tabulate
Publish
Survey
IMDB
Universe
Data Elements
Frame
Questions
Instance
Questions Blocks
Collection Instrument
Classifications
Methodology
(Conceptual Domain
Value Domain)
Data Files
Enterprise Architecture
Corporate Memory
Data Files
Operational
Data
Survey
Data
Registers
Administrative
Data
Operational Data Stores
IMDB
Public Use
Master File
Clean Master
File
Archival
information
Archived
Data
Reuse of Information Assets
Information Discovery/Dissemination
HTML
IMDB
Wiki
One meta data source
SDMX
DDI
?
many uses for the
information
many output formats
Reuse of Information Assets
Applications Development
Classification
coding
IMDB
Collection
instrument
development
Publishing
Other
applications
Reuse of Information Assets
Integration with Data
Data Warehouses
IMDB
CANSIM
Organizational and cultural issues
• STC is one of the most integrated statistical systems in
the world
• As part of its Enterprise Architecture strategy – moving
towards centralized and generalized systems, including
the IMDB
• IMDB was built initially to support interpretation of
disseminated data
• Pressure is to provide metadata up (and down) the
statistical value chain and into management systems
• Opportunities at the Survey planning and design phase –
reuse of existing metadata (variables, classifications,
questions, etc) registered in the IMDB – coherence