StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands What is StatLine? StatLine is online output database of Statistics Netherlands. – Primary output channel – Contains all.
Download ReportTranscript StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands What is StatLine? StatLine is online output database of Statistics Netherlands. – Primary output channel – Contains all.
StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands What is StatLine? StatLine is online output database of Statistics Netherlands. – Primary output channel – Contains all published data – Current size: 1500 data cubes, 2 billion data cells, over 150 million facts – Contains much functionality, including very good search engine StatLine in Bussiness Architecture StatLine in statistical process What is StatLine 4? Redesign current StatLine 3 dissemination software: Reasons redesign: – – – – – Improve coherence Changing publication policy Handle time dependence Archiving Many new features StatLine coherence Ideally: StatLine coherent & consistent Currently (StatLine 3): – 1500 independent data cubes StatLine 4: – Data cubes share metadata: – centrally moderated, quality improvement – Data cubes share data: – Each fact stored once. StatLine 4 metadata management Metadata management centralized: – What? Conceptual metadata: – Classifications – Variables – By whom? Two organization units: 1. Coordination: Maintaining structure and meaning of classifications 2. Dissemination: Textual editing and translations – Data producers own data, but not meta data. – Result: Every fact in StatLine 4 uses central classifications. StatLine in Bussiness Architecture StatLine in statistical process Classification status In StatLine 4 each classification has status: – (Inter)national standard – Coordinated – within Statistics Netherlands – Shared – Shared but not coordinated – Private – Can only be used by 1 data cube – Only during conversion This status is used for coordination purposes. Cristal model: StatLine 4 uses Cristal model –Model for classifications and variables (Van Bracht et al.) – Focus on Conceptual and Value domain (ISO 11179) Model elements: – Category (value): – value of variable, creates subpopulation. e.g.: male (gender: male) – Can be part of other category (partial order) – Level: – set of disjoint categories – Equals “flat” classification Cristal model (2): – Hierarchy: – Sequence of levels (total order) with contained categories – Every category in hierarchy has 1 parent in higher level – Equals “hierarchical” classification – Classification: – set of hierarchies with contained levels and categories – Equals: Family of hierarchical classifications. Cristal model (3) – Classification versioning – Each metadata object has lifetime (begin and end date) – Each metadata object can have a predecessor and successor – Models versions of categories, levels and hierarchies. Cristal model (4) Multilingual – All textual properties are multilingual – E.g. Mannelijk (dutch) -> Male – All metadata and tables can be shown in each defined language – All textual properties have popular versions – E.g. Consumer Price Index -> Inflation – All metadata and tables can be shown in “popular” or “expert” mode Object class: Is stored, but not coordinated (yet) StatLine 4 conversion All content current StatLine must be converted – From 1500 independent cubes – To 1500 coordinated cubes Conversion means coordination! – Total coordination -> very long conversion – No coordination -> no added value Ergo: Partial classification coordination Conversion strategy (1) Strategy: – Coordinate standardized metadata – Allow non standards for 2 year period – Phased conversion – Preparation, conversion, coordination Conversion strategy (2) Preparation phase: until June 2006 – Collect and store standard classifications – E.g. Time, Region (50 versions), Age, Marital status, Sex, NACE – Including variations (disclosure control) – For each data cube – Check usage standard classifications – Non standard is marked “private” – Define StatLine 4 structure Conversion strategy (3) Conversion phase: (June 2006) – Convert data cube – Add missing meta data to metadata server – Check conversion Coordination phase (November 2006) – After conversion: StatLine 4 contains coordinated and private metadata – In two years time all private metadata must be replaced with coordinated metadata Benefits metadata StatLine 4 –Coordinated classifications and variables –Uniform naming and description –Standard/coordinated metadata can be downloaded –Better comparability of data –Better search results Future improvements StatLine 4.1 – Centralize population (object class) management: – E.g.: person, enterprise – Model populations and subpopulations Statistical process – Centralize: – process metadata – quality metadata.