StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands What is StatLine? StatLine is online output database of Statistics Netherlands. – Primary output channel – Contains all.

Download Report

Transcript StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands What is StatLine? StatLine is online output database of Statistics Netherlands. – Primary output channel – Contains all.

StatLine 4 metadata
implementation
Edwin de Jonge
Statistics Netherlands
What is StatLine?
StatLine is online output database of Statistics
Netherlands.
– Primary output channel
– Contains all published data
– Current size: 1500 data cubes, 2 billion data
cells, over 150 million facts
– Contains much functionality, including very good
search engine
StatLine in Bussiness Architecture
StatLine in statistical
process
What is StatLine 4?
Redesign current StatLine 3 dissemination
software:
Reasons redesign:
–
–
–
–
–
Improve coherence
Changing publication policy
Handle time dependence
Archiving
Many new features
StatLine coherence
Ideally: StatLine coherent & consistent
Currently (StatLine 3):
– 1500 independent data cubes
StatLine 4:
– Data cubes share metadata:
– centrally moderated, quality improvement
– Data cubes share data:
– Each fact stored once.
StatLine 4 metadata
management
Metadata management centralized:
– What? Conceptual metadata:
– Classifications
– Variables
– By whom? Two organization units:
1. Coordination: Maintaining structure and
meaning of classifications
2. Dissemination: Textual editing and translations
– Data producers own data, but not meta data.
– Result: Every fact in StatLine 4 uses central
classifications.
StatLine in Bussiness Architecture
StatLine in statistical
process
Classification status
In StatLine 4 each classification has status:
– (Inter)national standard
– Coordinated
– within Statistics Netherlands
– Shared
– Shared but not coordinated
– Private
– Can only be used by 1 data cube
– Only during conversion
This status is used for coordination purposes.
Cristal model:
StatLine 4 uses Cristal model
–Model for classifications and variables (Van Bracht et
al.)
– Focus on Conceptual and Value domain (ISO
11179)
Model elements:
– Category (value):
– value of variable, creates subpopulation.
e.g.: male (gender: male)
– Can be part of other category (partial order)
– Level:
– set of disjoint categories
– Equals “flat” classification
Cristal model (2):
– Hierarchy:
– Sequence of levels (total order) with contained
categories
– Every category in hierarchy has 1 parent in
higher level
– Equals “hierarchical” classification
– Classification:
– set of hierarchies with contained levels and
categories
– Equals: Family of hierarchical classifications.
Cristal model (3)
– Classification versioning
– Each metadata object has lifetime (begin and end
date)
– Each metadata object can have a predecessor
and successor
– Models versions of categories, levels and
hierarchies.
Cristal model (4)
Multilingual
– All textual properties are multilingual
– E.g. Mannelijk (dutch) -> Male
– All metadata and tables can be shown in each
defined language
– All textual properties have popular versions
– E.g. Consumer Price Index -> Inflation
– All metadata and tables can be shown in
“popular” or “expert” mode
Object class:
Is stored, but not coordinated (yet)
StatLine 4 conversion
All content current StatLine must be converted
– From 1500 independent cubes
– To 1500 coordinated cubes
Conversion means coordination!
– Total coordination -> very long conversion
– No coordination -> no added value
Ergo: Partial classification coordination
Conversion strategy (1)
Strategy:
– Coordinate standardized metadata
– Allow non standards for 2 year period
– Phased conversion
– Preparation, conversion, coordination
Conversion strategy (2)
Preparation phase: until June 2006
– Collect and store standard classifications
– E.g. Time, Region (50 versions), Age,
Marital status, Sex, NACE
– Including variations (disclosure control)
– For each data cube
– Check usage standard classifications
– Non standard is marked “private”
– Define StatLine 4 structure
Conversion strategy (3)
Conversion phase: (June 2006)
– Convert data cube
– Add missing meta data to metadata server
– Check conversion
Coordination phase (November 2006)
– After conversion: StatLine 4 contains
coordinated and private metadata
– In two years time all private metadata must
be replaced with coordinated metadata
Benefits metadata StatLine 4
–Coordinated classifications and variables
–Uniform naming and description
–Standard/coordinated metadata can be
downloaded
–Better comparability of data
–Better search results
Future improvements
StatLine 4.1
– Centralize population (object class)
management:
– E.g.: person, enterprise
– Model populations and subpopulations
Statistical process
– Centralize:
– process metadata
– quality metadata.