Metadata models to support the statistical cycle: IMDB Alice Born Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007
Download ReportTranscript Metadata models to support the statistical cycle: IMDB Alice Born Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007
Metadata models to support the statistical cycle: IMDB Alice Born Statistics Canada UNECE Workshop on Statistical Metadata July 4 to 6, 2007 Outline • Survey life cycle and the IMDB • IMDB model – Data dimension model – Business dimension model – Questionnaire model • Registration • Classification of administered items • Use of metadata in the statistical system Role of the IMDB • Information management – interpretability of Statistics Canada’s 590+ current surveys • Assist in coherence of the data • Promote knowledge sharing across STC and with external users • Preserve corporate memory • Promote reuse of our metadata assets IMDB in the survey life cycle Data Warehouses Operations Management Quality Assurance Metadata IMDB Design Analysis Collect Operational Data Edit Estimate Registers Dissemination IMDB Tabulate Survey Data Operational Data Stores Publish Administrative Data Archive IMDB metadata model • Corporate Metadata Repository (CMR), which is an extension of ISO/IEC 11179 Metadata Registries – Statistical surveys – Sample – Questionnaire – Data sets – Products – Systems • IMDB – data dimension, business dimension, questionnaire model, administration and documents model Data dimension model – ISO/IEC 11179 Data Element Data Element Concept Object Class Property Conceptual Domain Value Domain Survey variable Data dimension model Currently in the IMDB: 85 object classes (statistical units) 290 properties 506 data element concepts (O.C. + property) 202 conceptual domains (representation class + property) 1509 value domains (classifications) 1034 data elements (= representation class + property + object class; variables) Type of revenues of establishments Business dimension model in the IMDB Applications/ Software Survey Frame and sample Survey instance Questionnaire Datasets Products (COR) Survey design Data elements Value domains Administrative layer Statistical Activity Organization Survey Stewardship Contact Universe Documentation Frame Identification Survey instance Time Frame Instrument Keyword Question Identification Classification Theme Data file Methodology Administered items Instrument design Sampling Data source Error detection Imputation Estimation Quality evaluation Disclosure control Revisions and seasonal adjustment Data accuracy Data Element Data Element Concept Object Class Property Formula Conceptual Domain Value Domain Information management - Administered items • Any item that is managed, tracked, organized and registered in a registry • Administered items have – their own set of characteristics specific to the administered item – and shared administrative characteristics which are common to all administered items – administrative layer Information management - Administrative Layer • Shared administrative characteristics – Terminological Designation (Names) – Terminological Description – Time Frame – Organization/Contact – Reference Document1 – Version Management – Stewardship/Registration – Classification 1 Reference document is an administered item with all the administrative layer characteristics. IMDB Administrative Layer - Version Management • A snapshot of the information recorded for the administered item. • Rules for creation of a version are established for each type of administered item. Information Management - IMDB Administrative Layer • The administrative layer is used to manage administrative information for all IMDB administered items. • Administered items are managed in a consistent manner. Surveys • Metadata in the IMDB is organized around the survey administered item • Refers to collection, compilation and publication of data measuring characteristics of a population • Three types of surveys are recognized: • Direct • Administrative • Derived Statistical Activities • Group of surveys that share common feature, common explanatory text • E.g., System of National Accounts, Unified Enterprise Statistics, Health Statistics Common metadata set Statistical activity Survey (direct, administrative, derived) Target population (population, statistical unit) Survey instance (each survey process) Collection instrument (questionnaire) Methodology Data accuracy Documentation Data file (Data elements, value domains) Common metadata set for survey life cycle Methodology Instrument design Sampling Collection method Error detection Imputation Estimation Quality evaluation Disclosure control Revisions and seasonal adjustment Questionnaire model Question block Item_ID Block_type, etc… Response choice Question_item_ID Response choice, etc… Questionnaire Item_ID, etc… Question Item_ID DE_item_ID, etc… Data element Item_ID Representation_class, etc… Value domain Item_ID VD_type, etc… Questionnaire model in the IMDB • Metadata for survey planning and design phase – Does the concept or question already exist? • Metadata discovery - STCWiki – Align with output variables - definitions • Harmonized Content Modules Project – Content development of key socio-demographic data elements (e.g., marital status, age, ethnic origin) in IMDB for registration as a STC standard – Leading to development of standard question blocks and questions – stored in the IMDB – Specifications (i.e., skip patterns, modes) / BLAISE and other code stored in Survey Specification Manager Registration/Stewardship • Registration and stewardship information is managed for each administered item – – – – – Who is the owner of the item? Who is responsible for the item’s information? Who is responsible for registration? Verification for editorial, accuracy, bilingual conformance? State – new, candidate, recorded, qualified, standard, preferred/prescribed standard, retired? – Degree of sharing/harmonization – divisional, branch, agency, provincial, national, international? – Dissemination – Internal, public? – Versioning note Registration Attributes in the IMDB Three registration attributes: 1. Registration status – identifies the quality or progression of quality 2. Registration level – level of conformance or harmonization 3. Administrative status – stage in the registration process 1. Registration status Registration Authority (Completeness, accuracy, adherence to quality and terminological description standards) Preferred standard Retired Standard Superseded Standards Division Registrar Qualified Regular Registrar Recorded Responsible Owner Candidate (Content) Steward Historical Submitter Incomplete Application 2. Registration level Level of conformance or harmonization Departmental International Recommended U.S. Program-specific Canadian Survey Provincial 3. Administrative status Stages in registration process Registered De-registered Reserved for edit New Not registered Classification of administered items • Organization and classification of the administered item – Keyword – STC taxonomy (28 themes, 200+ sub-themes) – UNECE Classification of International Statistical Activities – data elements – Program Activity Architecture for reporting to Treasury Board Secretariat and to parliament – … • Organization of the item’s administrative and itemspecific information for different purposes – HTML, Wiki, SDMX, CWM, DDI, XBRL., … Survey design and dissemination phases Design Collect Edit Estimate Concepts (Object Class, Property, Data Element Concept) Tabulate Publish Survey IMDB Universe Data Elements Frame Questions Instance Questions Blocks Collection Instrument Classifications Methodology (Conceptual Domain Value Domain) Data Files Enterprise Architecture Reuse of Information Assets in Applications Development Classification coding IMDB Collection instrument development Survey Specification Manager; Integrated Questionnaire and Metadata System Publishing Other applications Software Register Reuse of Information Assets Integration with Data Data Warehouses IMDB CANSIM Reuse of Information Assets in Dissemination and information discovery Wiki IMDB HTML SDMX DDI One meta data source many uses for the information many output formats ? Corporate Memory: Data Files Dissemination and archive phases Operational Data Survey Data Registers Administrative Data Operational Data Stores IMDB Public Use Master File Clean Master File Archival information Archived Data