Dissemination Metadata Metadata on website and outputdatabase

Download Report

Transcript Dissemination Metadata Metadata on website and outputdatabase

Metadata
What is it, and what could it be?
EU Twinning Project
Activity E.2
26 May 2013
>>
The role of the NSI’s and
>>
metadata needs to change
200 years ago: statistics as state
secrets
30 years ago: statistics a planning
instrument for politicians /
economists
Today: “handling complexity”
• A lighthouse in the turbulent sea of
information
• Focus on metadata to support
knowledge processes
• Metadata must give users exact
knowledge on products –
”Information at your fingertips”
2
>>
Two types of metadata
 Structural metadata
data about the containers of data
 design and specification of data structures
 DDI (Data Documentation Initiative) – i.e. a technical
standard, e.g. Colectica, that works together with tools
like Blaise, SAS, SPSS, Stata

 Descriptive metadata (meta-content)
data about data content
 individual instances of application
 standards like SDMX

3
Generic Statistical Business Process
>>
Model (GSBPM)
4
>> GSBPM combined with DDI and SDMX
DDI
SDMX
DDI
5
DEMO
Statistics and DDI in 60 seconds
using
Survey
Instruments
Study
made up of
measures
about
Questions
Concepts
Universes
Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
DEMO Statistics and DDI in 60 seconds
with values of
Categories/
Codes,
Numbers
Questions
Variables
collect
made up of
Responses
Data Files
resulting in
Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
>>
Work processes in DST
Subject matter division
Dissemination, IT-Centre
Cleaned micro data
Statistical registers
Anonymized
micro data
for
researchers
Charged, i.e.
tailormade
statistics
and analysis
Aggregation
to macro data
StatBankDenmark
SumDatabase
pdf
Publication
International organisations
The Public
-
www.dst.dk
Print
Binding
dst.dk
8
>>
Some quotations
..… Once collected, the DDI metadata could be used to generate SAS set-up files for
processing. A tool such as Space-Time Research’s SuperCross could then be used to
tabulate the microdata, and render it into SDMX. Once in SDMX format, it could be
directly disseminated using a tool such as OECD.stat, or further manipulated in a
data warehouse to meet the required data structures for reporting in SDMX format.
….. Another case of interest in the use of the two standards is more focused on
enriching data dissemination. Because DDI metadata are very rich, and describe the
process of collection and tabulation, they could potentially be linked to disseminated
aggregates, being exposed alongside the aggregate data products on a website, as
embedded metadata, or actually presented in native DDI XML format, or mapped
into an SDMX metadata report. A related case “mines” DDI metadata for the
automatic population of SDMX-based quality reporting.
From: The data Documentation Initiative (DDI): An Introduction for National Statistical Institutes (Open Data Foundation, July 2011)
9
Challenges on metadata – in
>>
Denmark and elsewhere!
• Metadata are not reused and often only linked to
final data
• No link to GSBPM processes
• Too many separate, parallel systems
• Presentation of metadata on internet is fragmented
and incomplete
• Concepts databases incomplete with no hierarchy
(super-, sub- and synonym-concepts)
• Classifications/codelists in too many places (versions)
• No clear awareness of populations and units
10
Metadata – strategic goal in DST
>>
Purpose
• to fulfil needs of external users for metadata related to their
desired use of statistics in their processes
• to achieve internal efficiency via integrated use/reuse and
production of metadata guided by GSBPM-processes
Needs
• Fulfilment of requirement from Eurostat (SIMS, QAF etc.)
• Detailed requirements based on additional user-consultations
•
”Information at your fingertips”
•
”We want integrated metadata”
11
>>
Current level of detail
The statistics
in short
Definitions
On the
statistics
Use
Time
Coherency
Other
documentation
1. Contact
2. Introduction
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
Metadata update
Statistical concepts
Units of measure
Reference time
Law
Dissemination policy
Frequency
Documentation
Quality
Users
Uncertainty
Time
Availability
Comparbility
SConsistency
Response burden
Discretionary policy
Data collection
Commentsr
Responsible in DST, tables in StatBank
Kort beskrivelse af statistikken og dens
historie; ekstrakt af seneste offentliggjorte tal
Latest update of documentation
Classifications, groupings, variables etc.
Definitions
Flow or stock
EU or Danish Law
Release calender
How frequent, preliminary vs. final data
Separate documentation notes etc.
Overall evaluation, specific problems
User types
Overall, different types of certainty
Timeliness
Types of dissemination sources
Geographically; over time
With other statistics
Most recent measurement
Anonymization rules
Data sources, validation rules, calculations
Other issues
12
Hvordan #1: Integrerede
>>
metadata
StatBank
Methods
Methodological papers
Quality Declarations
Concepts
Variables/data sets
Concept
Hvaddatabase
betyder
Variable database
Classifications
Klassifikationsdatabase
Classification database
>>
On-going project in DST
Pilot study using DDI and GSBPM
DDI as common model with reuse of concepts, variables,
categories and codes
• Fulfilment of Code of Practice (CoP) and Quality Assurance
Framework (QAF) using Single Integrated Metadatastructure
(SIMS)
• Thesaurus with concepts that links the micro- and macrolevel (selected areas)
• Common categories and codes
• “Information at your fingertips” via metadata on Internet
• GSBPM-processes and external-user processes established
•
14