IZA Data Service Center DDI/SDMX Workshop Wiesbaden

Transcript IZA Data Service Center DDI/SDMX Workshop Wiesbaden

IZA Data Service Center
DDI/SDMX Workshop
Wiesbaden, Germany, June 18th 2008
The Data Documentation Initiative (DDI)
Arofan Gregory / Pascal Heus
[email protected] / [email protected]
Open Data Foundation
Content
•
•
•
•
•
•
•
Background on metadata and XML
Metadata and Microdata
XML and Microdata: the DDI
DDI 2.0
DDI 3.0
DDI 2.0 vs 3.0
Major stakeholders / initiatives
http://www.opendatafoundation.org
[email protected]
Metadata / XML
What is metadata?
• Common definition: Data about Data
Labeled stuff
Unlabeled stuff
The bean example is taken from: A Manager’s
Introduction to Adobe eXtensible Metadata Platform,
http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf
http://www.opendatafoundation.org
[email protected]
What is XML?
•
•
•
Today's Universal language on the web
Purpose is to facilitate sharing of structured information
across information systems
XML stands for eXtensible Markup Language
–
–
–
•
•
•
•
•
eXtensibe  can be customized
Markup  tags, marks, attach attributes to things
Language  syntax (grammatical rules)
HTML (HyperText Markup Language) is a markup
language but not extensible! It is also concerned about
presentation, not content.
XML is a text format (not a binary black box)
XML is a also a collection of technologies (built on the
XML language)
It is platform independent and is understood by modern
programming languages (C++, Java, .NET, pHp, perl,
etc.)
It is both machine and human readable
http://www.opendatafoundation.org
[email protected]
Simple XML example
Attributes
<catalog>
<book isbn=”0385504209”>
<title>Da Vinci Code</title>
Elements
<author>Dan Brown</author>
</book>
<book isbn=”0553294385” pages=”352”>
<title>I, robot</title>
Opening and
<author>Isaac Asimov</author>
Closing tags
<language>English</language>
</book>
</catalog>
Text content
http://www.opendatafoundation.org
[email protected]
XML Technology overview
Structure
DTD
XSchema
Specialized software and
database systems can be used
to create and edit XML
documents. In the future the
XForm standard will be used
Very much like a database
system, XML documents can be
searched and queried through
the use of XPath oe XQuery.
There is no need to create tables,
indexes or define relationships
Manage
Transform
Software
XForms
XSL, XSLT
XSL-FO
Capture
XML
Search
Discover
XPath
XQuery
Registries
Databases
Exchange
XML Documents can be sent like
regular files but are typically
exchanged between applications
through Web Services using the SOAP
and other protocols
http://www.opendatafoundation.org
Document Type Definition (DTD) and
XSchema are use to validate an XML
document by defining namespaces,
elements, rules
Web Services
SOAP
REST
XML separates the metadata
storage from its presentation.
XML documents can be
transformed into something
else, like HTML, PDF, XML,
other) through the use of the
eXtensible Stylesheet
Language, XSL
Transformations (XSLT) and
XSL Formatting Objects
(XSL-FO)
XML metadata or data can
be published in “smart”
catalogs often referred to as
registries than can be used
for discovery of information.
[email protected]
What is an XML Schema?
• Exchange / sharing / harmonization implies
agreement on structure
– We need a specification that describes the structure
and rules  Schema
• A schema is a set of rules to which an XML
document must conform in order to be
considered 'valid'
– XML Schema was also designed with the intent that
determination of a document's validity would produce
a collection of information adhering to specific data
types
– Similar to relational databases structural definition
• Many schemas exists for different purposes
• Examples
– DDI, SDMX ,Dublin Core, RSS, XHTML, etc.
http://www.opendatafoundation.org
[email protected]
Metadata, XML and Microdata
What is a survey?
• More than just data….
– A complex process to produce data for the purpose of
statistical analysis
– Beyond this, a tool to support evidence based policy
making and results monitoring
• The data is surrounded by a large body of
documentation
• Survey data often come with limited
documention
• Note that microdata is intended for experts
–
–
–
–
Statisticians / researchers
Represents a single point in time and space
Need to be aggregated to produce meaningful results
It is the beginning of the story
http://www.opendatafoundation.org
[email protected]
What is survey metadata?
• Survey documentation can be broken down into
structured metadata and documents
– Structured metadata can be captured using XML
– Documents can be described in structured metadata
• Example of metadata:
– Survey level: Title, country, year, abstract, sampling,
agencies, access policy, etc.
– Variable level: filename, label, code, questions,
instructions, derivation, etc.
– Related materials: report, questionnaire, papers,
manuals, scripts/programs, photos
– Cross-surveys: catalogs, longitudinal, concepts,
comparability, etc.
http://www.opendatafoundation.org
[email protected]
Importance of survey metadata
• Data Quality:
– Usefulness = accessibility + coherence +
completeness + relevance + timeliness + …
– Undocumented data is useless
– Partially documented data is risky (misuse)
•
•
•
•
•
•
Data discovery and access
Preservation
Replication standard (Gary King)
Information exchange
Reduce need to access sensitive data
Maintain coherence / linkages across the
complete life cycle (from respondent to policy
maker)
• Reuse
http://www.opendatafoundation.org
[email protected]
The Data Documentation Initiative
• The Data Documentation Initiative is an
XML specification to capture structured
metadata about “microdata” (broad sense)
• First generation DDI 1.0…2.1 (2000-2008)
– focus on single archived instance
• Second generation DDI 3.0 (2008)
– focus on life cycle
– go beyond the single survey concept
– mutli-purpose
http://www.opendatafoundation.org
[email protected]
DDI Timeline / Status
•
Pre-DDI 1.0
–
–
–
–
–
•
•
–
•
–
Aggregate data (based on matrix
structure)
Added geographic material to aid
geographic search systems and GIS
users
–
–
–
–
–
Lifecycle model
Shift from the codebook centric /
variable centric model to capturing the
lifecycle of data
Agreement on expanded areas of
coverage
http://www.opendatafoundation.org
•
Vote to move to Candidate Version
(CR)
Establishment of a set of use cases to
test application and implementation
October 3.0 CR2
2008
–
–
–
–
–
–
2003 - Establishment of DDI Alliance
2004 – Acceptance of a new DDI
paradigm
Presentation of first complete 3.0
model
Internal and public review
2007
–
•
Presentation of schema structure
Focus on points of metadata creation
and reuse
2006
–
2003 – DDI 2.0
–
•
•
Simple survey
Archival data formats
Microdata only
2005
–
–
2000 – DDI 1.0
–
–
–
•
70’s / 80’s OSIRIS Codebook
1993: IASSIST Codebook Action Group
1996 SGML DTD
1997 DDI XML
1999 Draft DDI DTD
•
February 3.0 CR3
March 3.0 CR3 update
April 3.0 CR3 final
April 28th 3.0 Approved by DDI Alliance
May 21st DDI 3.0 Officially announced
Initial presentations at IASSIST 2008
2009
–
DDI 3.1 and beyond
[email protected]
DDI 1/2.x
The archive perspective
• Focus on preservation of a survey
• Often see survey as collection of data files
accompanied by documentation
– Code book centric
– report, questionnaire, methodologies, scripts,
etc.
•
•
•
•
Result in a static event: the archive
Maintained by a single agency
Is typically documentation after the facts
This is the initial DDI perspective (DDI 2.0)
http://www.opendatafoundation.org
[email protected]
DDI 2.0 Technical Overview
• Based on a single structure (DTD)
• 1 codeBook, 5 sections
– docDscr: describes the DDI document
• The preparation of the metadata
– stdyDscr: describes the study
• Title, abstract, methodologies, agencies, access policy
– fileDscr: describes each file in the dataset
– dataDscr: describes the data in the files
• Variables (name, code, )
• Variable groups
• Cubes
– othMat: other related materials
• Basic document citation
http://www.opendatafoundation.org
[email protected]
Characteristics of DDI 1.0/2.0
• Focuses on the static object of a codebook
• Designed for limited uses
– End user data discovery via the variable or high level
study identification (bibliographic)
– Only heavily structured content relates to information
used to drive statistical analysis
• Coverage is focused on single study, single data
file, simple survey and aggregate data files
• Variable contains majority of information
(question, categories, data typing, physical
storage information, statistics)
http://www.opendatafoundation.org
[email protected]
Impact of these limitations
• Treated as an “add on” to the data collection
process
• Focus is on the data end product and end users
(static)
• Limited tools for creation or exploitation
• The Variable must exist before metadata can be
created
• Producers hesitant to take up DDI creation
because it is a cost and does not support their
development or collection process
http://www.opendatafoundation.org
[email protected]
DDI 1/2.x Tools
• Nesstar
– Nesstar Publisher, Nesstar Server
• IHSN
– Microdata Management Toolkit
– NADA (online catalog for national data archive)
– Archivist / Reviewer Guidelines
• Other tools
– SDA, Harvard/MIT Virtual Data Center (Dataverse)
– UKDA DExT, ODaF DeXtris
– http://tools.ddialliance.org
http://www.opendatafoundation.org
[email protected]
DDI 2.0 perspective
Media/Press
General Public
Academic
Users
Producers
Policy Makers
Government
Archivists
Sponsors
DDI 2
Survey
DDI 2
Survey
http://www.opendatafoundation.org
DDI 2
Survey
DDI 2
Survey
DDI 2
Survey
Business
DDI 2
DDI 2 Survey
Survey
[email protected]
DDI 3.0
The life cycle
When to capture metadata?
• Metadata must be captured at the time the event occurs!
• Documenting after the facts leads to considerable loss of
information
• Multiple contributors are typically involved in this process
(not only the archivist)
• This is true for producers and researchers
http://www.opendatafoundation.org
[email protected]
DDI 3.0 and the Survey Life Cycle
• A survey is not a static process: It dynamically evolved across
time and involves many agencies/individuals
• DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”
• 3.0 focus on metadata reuse (minimizes
redundancies/discrepancies, support comparison)
• Also supports multilingual, grouping, geography, and others
• 3.0 is extensible
http://www.opendatafoundation.org
[email protected]
Requirements for 3.0
• Improve and expand the machine-actionable aspects of
the DDI to support programming and software systems
• Support CAI instruments through expanded
description of the questionnaire (content and question
flow)
• Support the description of data series (longitudinal
surveys, panel studies, recurring waves, etc.)
• Support comparison, in particular comparison by design
but also comparison-after-the fact (harmonization)
• Improve support for describing complex data files
(record and file linkages)
• Provide improved support for geographic content to
facilitate linking to geographic files (shape files,
boundary files, etc.)
http://www.opendatafoundation.org
[email protected]
Approach
• Shift from the codebook centric model of early versions
of DDI to a lifecycle model, providing metadata support
from data study conception through analysis and
repurposing of data
• Shift from an XML Data Type Definition (DTD) to an
XML Schema model to support the lifecycle model,
reuse of content and increased controls to support
programming needs
• Redefine a “single DDI instance” to include a “simple
instance” similar to DDI 1/2 which covered a single
study and “complex instances” covering groups of
related studies. Allow a single study description to
contain multiple data products (for example, a
microdata file and aggregate products created from the
same data collection).
• Incorporate the requested functionality in the first
published edition
http://www.opendatafoundation.org
[email protected]
Designing to support registries
• Resource package
– structure to publish non-study-specific materials for
reuse
• Extracting specified types of information in to
schemes
– Universe, Concept, Category, Code, Question,
Instrument, Variable, etc.
• Allowing for either internal or external references
– Can include other schemes by reference and select
only desired items
• Providing Comparison Mapping
– Target can be external harmonized structure
http://www.opendatafoundation.org
[email protected]
Technical Overview
• DDI 3 is composed of several schemas
– Use only what you need!
– Schemas represent modules, sub-modules
(substitutions), reusable, external schemas
•
•
•
•
•
•
•
•
•
•
•
•
archive
comparative
conceptualcomponent
datacollection
dataset
dcelements
DDIprofile
ddi-xhtml11
ddi-xhtml11-model-1
ddi-xhtml11-modules-1
group
inline_ncube_recordlayout
http://www.opendatafoundation.org
•
•
•
•
•
•
•
•
•
•
•
•
instance
logicalproduct
ncube_recordlayout
physicaldataproduct
physicalinstance
proprietary_record_layout (beta)
reusable
simpledc20021212
studyunit
tabular_ncube_recordlayout
xml
set of xml schemas to support xhtml
[email protected]
Technical Overview
• Any element that can be referenced is globally uniquely
identified
– Maintainable (by an agency)
– Versionable (can change across time)
– Identifiable (within a maintainable scheme)
• Modules
– Reflect closely related sets of information similar to the sections
of DDI 1/2.* DTD
– Modules can be held as separate XML instances and be
included in a large instance by either inclusion or reference
– All modules are maintainable (but not all maintainables are
modules)
http://www.opendatafoundation.org
[email protected]
Technical Overview: Maintainable Schemes
(that’s with an ‘e’ not an ‘a’)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Category Scheme
Code Scheme
Concept Scheme
Control Construct Scheme
GeographicStructureScheme
GeographicLocationScheme
InterviewerInstructionScheme
Question Scheme
NCubeScheme
Organization Scheme
Physical Structure Scheme
Record Layout Scheme
Universe Scheme
Variable Scheme
http://www.opendatafoundation.org
Packages of
reusable metadata
maintained by a
single agency
[email protected]
DDI 3.0 Use Cases
•
•
•
•
•
•
•
•
•
•
Study design/survey instrumentation
Questionnaire generation/data collection and procesing
Data recoding, aggregation and other processing
Data dissemination/discovery
Archival ingestion/metadata value-add
Question/concept/variable banks
DDI for use within a research project
Capture of metadata regarding data use
Metadata mining for comparison, etc.
Generating instruction packages/presentations
http://www.opendatafoundation.org
[email protected]
Study Design/Survey Instrumentation
• This use case concerns how DDI 3.0 can
support the design of studies and survey
instrumentation
– Without benefit of a question or concept bank
http://www.opendatafoundation.org
[email protected]
Types of Metadata:
• Concepts (conceptual module)
• Universe (conceptual module)
• Questions (datacollection module)
• Flow Logic (datacollection module)
<DDI 3.0>
Concepts
Universes
Drafting/
Review/
Revision
As the survey instrument
is tested, all revisions and
history can be tracked
and preserved. This would
include question translation
and internationalization.
http://www.opendatafoundation.org
<DDI 3.0>
Concepts
Universes
Final
+
<DDI 3.0>
Questions
Flow Logic
<DDI 3.0>
Concepts
Universes
Questions
Flow Logic
Final
Drafting/
Testing/
Revision
[email protected]
Questionnaire Generation, Data Collection, and
Processing
• This use case concerns how DDI 3.0 can
support the creation of various types of
questionnaires/CAI, and the collection and
processing of raw data into microdata.
http://www.opendatafoundation.org
[email protected]
<DDI 3.0>
Concepts
Universes
Questions
Flow Logic
Final
Types of Metadata:
Paper
• Concepts (conceptual module)
Questionnaire • Universe (conceptual module)
• Questions (datacollection module)
• Flow Logic (datacollection module)
Online Survey • Variables (logicalproduct module)
Instrument
• Categories/Codes (logicalproduct module)
• Coding (datacollection module)
CAI
Instrument
Raw Data
DDI captures
the content – XML
allows for each
application to do
its own presentation
http://www.opendatafoundation.org
<DDI 3.0>
Concepts
Universes
Questions
Flow Logic
+
<DDI 3.0>
Variables
Coding
Microdata
+
<DDI 3.0>
Categories
Codes
Physical Data Product
Physical Data Instance
[email protected]
Data Recoding, Aggregation, etc.
• This use case concerns how DDI 3.0 can
describe recodes, aggregation, and similar
types of data processing.
http://www.opendatafoundation.org
[email protected]
Could be a recode,
an aggregation,
or other process.
Microdata
Microdata/
Aggregates
<DDI 3.0>
Conceptual
Datacollection
Variables
Categories
Codes
<DDI 3.0>
Codings
Variables (new)
Categories (new)
Codes (new)
NCubes
http://www.opendatafoundation.org
+
Initial microdata has:
• Concepts (conceptual module)
• Universes (conceptual module)
• Questions (datacollection module)
• Flow Logic (datacollection module)
• Variables (logicalproduct module)
• Coding (datacollection module)
• Categories (logicalproduct module)
• Codes (logicalproduct module)
• Physical Data Product
• Physical Data Instance
Recode adds:
• More codings (datacollection module)
• New variables
• New categories
• New codes
• NCubes (for aggregation)
[email protected]
Data Dissemination/Data Discovery
• This use case concerns how DDI 3.0 can
support the discovery and dissemination of
data.
http://www.opendatafoundation.org
[email protected]
<DDI 3.0>
Can add archival
events meta-data
Rich metadata supports
auto-generation of websites
and other delivery formats
Codebooks
<DDI 3.0>
[Full metadata set]
+
Microdata/
Aggregates
Websites
Databases,
repositories
Research
Data Centers
Data-Specific
Info Access
Systems
Registries
Catalogues
Question/Concept/
Variable Banks
http://www.opendatafoundation.org
[email protected]
Archival Ingestion and
Metadata Value-Add
• This use case concerns how DDI 3.0 can
support the ingest and migration functions
of data archives and data libraries.
http://www.opendatafoundation.org
[email protected]
Supports automation
of processing if good
DDI metadata is
captured upstream
<DDI 3.0>
[Full metadata set]
(?)
+
Microdata/
Aggregates
Ingest
Processing
<DDI 3.0>
[Full or
additional
metadata]
Archival events
http://www.opendatafoundation.org
Provides a neutral
format for data
migration as analysis
packages are
versioned
Data Archive
Data Library
Provides good format &
foundation for valueadded metadata by archive
[email protected]
Question/Concept/Variable Banks
• This use case describes how DDI 3.0 can support question,
concept, and variable banks. These are often termed “registries” or
“metadata repositories” because they contain only metadata – links
to the data are optional, but provide implied comparability. The focus
is metadata reuse.
http://www.opendatafoundation.org
[email protected]
Because DDI has links, each type of bank
functions in a modular, complementary way.
<DDI 3.0>
Questions
Flow Logic
Codings
<DDI 3.0>
Variables
Categories
Codes
<DDI 3.0>
Concepts
Question
Bank
Variable
Bank
Concept
Bank
<DDI 3.0>
Questions
Flow Logic
Codings
Users
and
Applications
<DDI 3.0>
Variables
Categories
Codes
Users
and
Applications
<DDI 3.0>
Concepts
Users
and
Applications
Supports but
does not require
ISO 11179
http://www.opendatafoundation.org
[email protected]
DDI For Use within a Research Project
• This use case concerns how DDI 3.0 can
support various functions within a research
project, from the conception of the study through
collection and publication of the resulting data.
http://www.opendatafoundation.org
[email protected]
Prinicpal
Investigator
Research Staff
Collaborators
<DDI 3.0>
Concepts
Universe
Methods
Purpose
People/Orgs
+
Submitted
Proposal
<DDI 3.0>
Funding
Revisions
$
€£
+
<DDI 3.0>
Questions
Instrument
+
+
<DDI 3.0>
Data Collection
Data Processing
Presentations
http://www.opendatafoundation.org
<DDI 3.0>
Variables
Physical Stores
+
Publication
Data
Archive/
Repository
[email protected]
Capture of Metadata Regarding
Data Use
• This use case concerns how DDI 3.0 can capture
information about how researchers use data, which can
then be added to the overall metadata set about the data
sources they have accessed.
http://www.opendatafoundation.org
[email protected]
Data Sets
<DDI 3.0>
StudyUnit
DataCollection
LogicalProduct
PhysicalDataProduct
PhysicalInstance
+
Data Analysis
http://www.opendatafoundation.org
Types of Metadata
•Recodes (datacollection module)
•Record subsets (physicalinstance
module)
•Variable subsets (logicalproduct
module)
•Comparison (comparative module)
<DDI 3.0>
•Recodes
•Case Selection
•Variable Selection
•Comparison to
original study
•Resulting physical
file descriptions
Data
+
[email protected]
Metadata Mining for Comparison, etc.
• This use case concerns how collections of DDI 3.0
metadata can act as a resource to be explored, providing
further insight into the comparability and other features
of a collection of data.
http://www.opendatafoundation.org
[email protected]
Questions
Variable
Concepts
Universe
Types of Metadata
•Universe (comparative module)
•Concept (comparative module)
•Question (datacollection module)
•Variable (logicalproduct module)
Metadata
Repositories/
Registries
<DDI 3.0>
Instances
?
http://www.opendatafoundation.org
Data Sets
<DDI 3.0>
Comparison
•Questions
•Categories
•Codes
•Variables
•Universe
•Concepts
Recodes
Harmonizations
[email protected]
Generating Instruction Packages/Presentations
• This use case concerns how DDI 3.0 can
support automation around the instruction
of students and others.
http://www.opendatafoundation.org
[email protected]
<DDI 3.0>
StudyUnit 1
<DDI 3.0>
StudyUnit 3
Types of Metadata
• Individual studies (studyunit module)
• Grouping purpose (group module)
• Linking information (comparative module)
• Processing assistance (group module)
<DDI 3.0>
StudyUnit 2
<DDI 3.0>
StudyUnit 4
• Topically related studies
selected
• Group is made with description
of the intended use for the group
• Comparative information is
added indicating matching fields
for linking and mapping between
similar variables
• Other materials such as
SAS/SPSS recode command are
referenced from the group
http://www.opendatafoundation.org
<DDI 3.0>
StudyUnit 1
StudyUnit 2
StudyUnit 3
StudyUnit 4
<DDI 3.0>
StudyUnit 1
StudyUnit 2
StudyUnit 3
StudyUnit 4
Comparative
OtherMaterials
Instructional
Package
[email protected]
DDI 3.0 Tools
• Under developments
• DDI Foundation Tools Program
– Road Map
– XML Beans, validation,
– DDI DExT, DDI2StatsProgs
• Other tools
– R SPSS Export, Algenta SurveyViz, others presented
at IASSIST
• DDI Editing Suite
– Proposed as extension of DDI-FTP
– Plan for generic editor in 6-9 months
• DDI 3.0 related projects / initiatives
– RDC Canada, Germany RDC / EURASI, DANS
MIXED, NORC
http://www.opendatafoundation.org
[email protected]
DDI 3 Relationship to Other Standards
• SDMX (from microdata to indicators / time series)
– Completely mapping to and from DDI NCubes
• Dublin Core (surveys and documents gets cited)
– Mapping of citation elements
– Option for DC namespace basic entry
• ISO 19115 – Geography (microdata gets mapped)
– Search requirements
– Support for GIS users
• METS
– Designed to support profile development
• OAIS (alignment of archiving standards)
– Reference model for the archival lifecycle
• ISO/IEC 11179 (metadata mining through concepts)
– Variable linking representation to concept and universe
– Optional data element construct in ConceptualComponent that
allows for complete ISO/IEC 11179 structure as a maintained
item
http://www.opendatafoundation.org
[email protected]
DDI 3.0 perspective
Media/Press
General Public
Academic
Policy Makers
Government
Sponsors
Business
Users
Producers
Archivists
http://www.opendatafoundation.org
[email protected]
DDI 2.0 and DDI 3.0
DDI 2 / DDI 3
• Single survey
• Focus on the archive
• Non-reusable
metadata
• Maintained by single
agency
• Loose validation
– DTD based
– Sparse documentation
• Designed by
archivists
• Some tools are
available
http://www.opendatafoundation.org
• Multiple surveys
• Focus on life cycle
• Highly reusable
metadata
• Maintained by many
agencies
• Tied validation
– Schema based
– Extensive guide
• Designed by expert
groups
• Tools are beginning to
emerge
[email protected]
What 3.0 can do for you
•
•
•
•
•
•
•
•
•
Manage multi-surveys
Support multiple contributors
Support many different perspectives
Support many different use cases
Maintain metadata integrity across the life
cycle
Connect to other metadata spaces
Metadata reuse
Publication in registries
Backward compatibility with 2.0
http://www.opendatafoundation.org
[email protected]
DDI Community
DDI Organizations/ Agencies
• DDI Alliance (http://www.ddialliance.org)
• Interuniversity Consortium for Political and Social
Research (ICPSR) (http://icpsr.umich.edu)
• International Association for Social Science Infromation
Service & Technology (IASSIST)
(http://www.iassistdata.org)
• International Household Survey Network (IHSN)
(http://www.surveynetwork.org)
• Open Data Foundation (ODaF)
(http://www.opendatafoundation.org)
• National Opinion Research Center Data Enclave
(NORC) (http://dataenclave.norc.org)
• Metadata Technology
(http://www.metadatatechnology.com)
http://www.opendatafoundation.org
[email protected]
IZA Data Service Center
DDI/SDMX Workshop
Wiesbaden, Germany, June 18th 2008
The Statistical Data and Metadata
Exchange Standard (SDMX):
An Introduction
Arofan Gregory / Pascal Heus
[email protected] / [email protected]
Open Data Foundation
Overview of the Session
•
•
•
•
SDMX Background and Goals
SDMX and Data
SDMX and Metadata
SDMX and Best Practices: The ContentOriented Guidelines
• The SDMX Information Model
• SDMX and Web Services
– The SDMX Registry
– SDMX Data Services
• Tools and Resources
http://www.opendatafoundation.org
[email protected]
SDMX Background and Goals
What is SDMX?
• The problem space:
– Statistical collection, processing, and
exchange is time-consuming and resourceintensive
– Focus on aggregate data (esp. time series)
– Various international and national
organisations have individual approaches for
their constituencies
– Uncertainties about how to proceed with new
technologies (XML, web services …)
http://www.opendatafoundation.org
[email protected]
What is SDMX?
The Statistical Data and Metadata
Exchange (SDMX) initiative is taking steps
to address these challenges and
opportunities that have just been
mentioned:
– By focusing on business practices in the field
of statistical information
– By identifying more efficient processes for
exchange and sharing of data and metadata
using modern technology and open standards
http://www.opendatafoundation.org
[email protected]
Who is SDMX?
• SDMX is an initiative made up of seven
international organizations:
–
–
–
–
–
Bank for International Settlements
European Central Bank
Eurostat
International Monetary Fund
Organisation for Economic Cooperation and
Development
– United Nations
– World Bank
• The initiative was launched in 2002
http://www.opendatafoundation.org
[email protected]
National Statistical
Organisations
Banks, Corporates
Individual Households
http://www.opendatafoundation.org
accounts
statistics
transactions,
micro-data,
accounts
www.z.org
www.hub.org
www.y.org
Internet, Search, Navigation
180 + Countries
International Organisations accounts
Regional Organisations statistics
www.x.org
[email protected]
SDMX Products
• Technical standards for the formatting and
exchange of aggregate statistics:
– SDMX Technical Specifications version 1.0
(now ISO/TS 17369 SDMX – TC 154 WG2)
– SDMX Technical Specifications version 2.0
(soon to be submitted to ISO – TC 154 WG2)
• Content-Oriented Guidelines (in draft)
– Common Metadata Vocabulary
– Cross-Domain Statistical Concepts
– Statistical Subject-Matter Domains
http://www.opendatafoundation.org
[email protected]
Major Features of SDMX
• Structure and formats (XML, EDIFACT) for
aggregate data
• Structure and formats (XML) for metadata
• Formal information model (UML) for
managing statistical exchange and
sourcing
• Web-services guidelines and registry
services specification for use of modern
technologies
• Content-oriented guidelines to recommend
best practices
http://www.opendatafoundation.org
[email protected]
Recent Events
• Jan 2007 – Launch meeting at the World
Bank for SDMX 2.0 Technical
Specifications
• February 2007 – Endorsement of SDMX
by EU’s Statistical Programme Committee
• March 2008 – SDMX becomes the
preferred standard for data and metadata
of the UN Statistical Commission
– Other standards were mentioned – DDI and
XBRL specifically
http://www.opendatafoundation.org
[email protected]
Adopters/Interest
•
The following are known adopters (or planning to adopt):
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
US Federal Reserve Board and Bank of New York
European Central Bank
Joint External Debt Hub (WB, IMF, OECD, BIS)
UN/TRADECOM at UN Statistical Division
NAAWE (National Accounts from OECD/Eurostat)
SODI (Eurostat and European Governments)
Mexican Federal System
Vietnamese Ministry of Planning and Investment
Qatar Information Exchange
IMF (BOP, SNA, SDDS/GDDS)
Food and Agriculture Organization
Millenium Development Goals (UN System, others)
International Labor Organization
Bank for International Settlements
OECD
World Bank
Marchioness Islands (Spanish/Portugese Statistical Region)
UNESCO (Education)
Australian Bureau of Statistics
Statistics Canada
There are many others not listed or which we are not aware of
http://www.opendatafoundation.org
[email protected]
Rate of Adoption
• Between January 2007 and January 2008,
adoption has doubled
• We anticipate a similar rate of growth for
the coming year
– Tools are becoming available
– UNSC recommendation makes it a safe
course to follow for risk-averse institutions
– Training courses are in increasing demand
(Eurostat, Metadata Technology)
– Standard data and metadata structures for
many domains are being developed
http://www.opendatafoundation.org
[email protected]
SDMX and Data
SDMX and Data Formats
• SDMX provides a format for describing the structure of
data (“structural metadata”)
– EDIFACT (was GESMES/TS, now SDMX-EDI)
– XML (SDMX-ML)
• SDMX provides formats for transmission and processing
of data
– EDIFACT (1 message)
– XML (4 different equivalent flavors for different functions)
• Data is tabulated, aggregate data (eg, multidimensional/OLAP cubes)
– Can be any aggregate data!
• Most data formats are derived from the structural
metadata (eg, XML schemas are generated for each
type of structure according to the business rules)
http://www.opendatafoundation.org
[email protected]
Data Set: Structure
http://www.opendatafoundation.org
[email protected]
First: Identify the Concepts
• A statistical concept is a characteristic of a time
series or an observation (MCV)
• A concept is a unit of knowledge created by a
unique combination of characteristics (SDMX
Information Model)
• Whatever the definition, statistical concepts are
the DNA of the key family
– Their usage (type, structure, sequence) define the
structure of the data
http://www.opendatafoundation.org
[email protected]
Data Set Structure:Concepts
Country
Stock/Flow
Unit Multiplier
Unit
Time/Frequency
Computers need structure of data
•Concepts
•Code lists
•Data values
Topic
•How these fit together
http://www.opendatafoundation.org
[email protected]
Data Set Structure: Code Lists
Code Lists
TOPIC
COUNTRY
STOCK/FLOW
A Brady Bonds
AR Argentina
1 Stock
B Bank Loans
MX Mexico
2 Flow
C Debt Securities
ZA South Africa
Concepts
CONCEPTS
Topic
Country
Flow
http://www.opendatafoundation.org
[email protected]
Data Makes Sense
Q,ZA,B,1,1999-06-30=16547
Quarterly, South Africa, Bank Loans,
Stocks, for 30 June 1999
16457
http://www.opendatafoundation.org
[email protected]
Data Set Structure: Defining MultiDimensional Structures
• Comprises
– Dimensions
Concepts that identify the observation value
– Attributes
Concepts that add additional metadata about the
observation value
– Measure
Concept that is the observation value
– Any of these may be
•
•
•
•
•
http://www.opendatafoundation.org
coded
text
date/time
number
etc.
Representation
[email protected]
Data Set Structure: Concept Usage
Country
(Dimension)
Stock/Flow
(Dimension)
Unit Multiplier
(Attribute)
Unit
(Attribute)
Time/Frequency
(Dimension)
Topic
(Dimension)
(Dimension)
Observation
(Measure)
http://www.opendatafoundation.org
[email protected]
SDMX and Metadata
SDMX and Metadata
• SDMX provides for several types of
metadata
– Structural (describes structures of data sets
and metadata sets and related items)
– Provisioning (describes the sourcing of data
between departments and organizations)
– “Reference” metadata – all other types of
metadata (footnotes, methodology, quality,
etc. Can be specified by the user!)
• Reference metadata is the most important
one – it is what we typically think of as
metadata
http://www.opendatafoundation.org
[email protected]
SDMX Metadata Sets
• Version 2.0 of the SDMX Technical
Specifications provides XML formats for
metadata sets (SDMX-ML)
– To describe their structure
– To exchange metadata in XML
• This is based on concepts (similar to the data
formats)
– SDMX supports any metadata concepts the users
wishes to report/exchange/process
– May be flat lists or hierarchical
– Definitions provided by users, but recommendations
exist for many common concepts
• Metadata sets are attached to a formal object in
the information model (an organization, a data
set, a codelist, etc.)
http://www.opendatafoundation.org
[email protected]
SDMX and Metadata
• This is a very powerful feature of SDMX
– It can be used to integrate/mimic other
metadata standards!
• Provides very good support for standard
exchange of metadata which cannot be
anticipated by the designers of
systems/standards
– Must be based on common agreements about
the meaning of metadata concepts
– Often, concepts are taken from other
metadata models/standards such as DDI,
Dublin Core, etc.
http://www.opendatafoundation.org
[email protected]
The SDMX Information Model
The SDMX Information Model
• A formal, documented conceptual model of
statistical exchange, management, and sourcing
• Expressed as a UML model
• Used as the basis of all SDMX implementation
– XML
– EDIFACT
– Any other programming language/platform
• Provides consistency between implementations
• Based on analysis of many statistical processing
systems
– Describes existing business practices in a generic
way
http://www.opendatafoundation.org
[email protected]
Information Model: High-Level Schematic
Structure
Maps
Data or
Metadata Set
structure and
code list maps
conforms to business
rules of the
data/metadata flow
http://www.opendatafoundation.org
uses specific
data/metadata
structure
can be linked to
categories in
multiple category
schemes
Data or
Metadata Flow
publishes/reports
data/metadata sets
Data Provider
Category
Scheme
Data or Metadata
Structure Definition
can provide
data/metadata for
many data/metadata
flows using agreed
data/metadata
structure
can get data/metadata
from multiple
data/metadata providers
Provision
Agreement
URL,
registration
date etc.
comprises
subject or
reporting
categories
Category
can have child
categories
Registration of
Data or
Metadata Set
registers existence of
data and metadata
[email protected]
SDMX and Best Practices: The
Content-Oriented Guidelines
SDMX Content-Oriented Guidelines
• There is a long history of discussion about what
is best practice in the collection of statistics
• SDMX decided to define the technical basis for
statistical exchange, and then engage in this
debate
– It makes reaching agreements between organizations
easier!
• These documents build on many years of work
defining statistical concepts, terms, and
classifications
• Although described as “statistical”, much of what
is here also applies to social science (and other)
research
http://www.opendatafoundation.org
[email protected]
SDMX Content-Oriented Guidelines
• Four main documents:
– Overview
– Metadata Common Vocabulary (annex)
– Cross-Domain Concepts (2 annexes)
– Statistical Subject-Matter Domains (annex)
• These will not become ISO specifications,
but will evolve as publications of the
SDMX Initiative
• They are now available in their first official
release at www.sdmx.org
http://www.opendatafoundation.org
[email protected]
Common Metadata Vocabulary
• A set of terms and definitions for the
different parts of the SDMX technical
standards, and many common concepts
used in data and metadata structures
• Does not replace other major vocabularies
in this space (such as the OECD glossary)
but references these other works
http://www.opendatafoundation.org
[email protected]
Cross-Domain Concepts
• Includes concepts which are common
across many statistical domains
– Names & Definitions
– Representations
• Approximately 130 concepts, some with
recommended representations (codelists)
• These are concepts which support both
data and metadata structures
– Emphasis on quality frameworks for reference
metadata concepts
http://www.opendatafoundation.org
[email protected]
Statistical Subject-Matter Domains
• Based on the UN/ECE classification of
statistical activities
• Provides a classification system for use in
exchanging statistics across domain
boundaries
• Provides a breakdown of the various
domains within official statistics
http://www.opendatafoundation.org
[email protected]
SDMX and Web Services
Web-Services Components of SDMX
• Web-Services Guidelines
– Part of the Technical Specifications package
• SDMX Query message
– Part of SDMX-ML
• SDMX Registry Services
– Part of version 2.0 Technical Specifications
– Interfaces are in SDMX-ML
– Document describes implementation rules
http://www.opendatafoundation.org
[email protected]
Web Services Guidelines
• Recommends use of WSS 1.1 for web
services which use SOAP, WSDL
• Provides standard function names for
many typical web-services functions
– Querying for data
– Querying for metadata
– Querying for structural information
http://www.opendatafoundation.org
[email protected]
SDMX Query Message
• An XML Query to support two-way webservices calls using XML messages
• Designed to support:
– Queries for structural information from online
databases/repositories
– Queries for data from online databases
– Queries for metadata from online databases
• Part of SDMX-ML
– Very similar to the SQL query language
supported by all database packages
– Specific to SDMX objects
http://www.opendatafoundation.org
[email protected]
SDMX Registry Services
• A “registry” is a common type of technology
– Every Windows machine has a “Windows registry” to
let applications know what other applications are on
that machine, and where they are located
– Web services registries do the same thing on a
network
– Functions like a card catalogue in a print library – you
can look up resources and find out how to obtain
them
• A registry provides a single place on the Internet
where everyone can discover the data,
metadata, and structures that other
organizations use/publish
– They do not contain the data and metadata – it just
indexes it and links to it
http://www.opendatafoundation.org
[email protected]
SDMX Registry Services (cont.)
• SDMX Registry Services are based on
generic, standard web-services registry
technology
– ISO 15000 ebXML Registry/Repository
– OASIS UDDI Registry (part of .NET, etc.)
• SDMX Registry Services are not generic
– They are specific to SDMX exchanges of data
and metadata, etc.
• There is not one central “SDMX Registry”
– Each domain will have its own registry for its
members
– The registries can be linked (“federated”)
http://www.opendatafoundation.org
[email protected]
SDMX Registry/Repository
Indexes data and
metadata
Describes data and
metadata sources and
reporting processes
Register
REGISTRY
Data Set/
Metadata Set
Query
Submit
REPOSITORY
Provisioning
Metadata
Query
Submit
Describes data and
metadata structures
http://www.opendatafoundation.org
REPOSITORY
Structural
Metadata
S
D
M
X
R
e
g
i
s
t
r
y
I
n
t
e
r
f
a
c
e
s
Query
[email protected]
SDMX Registry/Repository
Indexes data and
metadata
Register
REGISTRY
Data Set/
Metadata Set
Subscription/
Notification
Applications can
subscribe to
notification of new
or changed objects
Query
Submit
REPOSITORY
Provisioning
Metadata
Query
Submit
Describes data and
metadata structures
http://www.opendatafoundation.org
REPOSITORY
Structural
Metadata
S
D
M
X
R
e
g
i
s
t
r
y
I
n
t
e
r
f
a
c
e
s
Query
[email protected]
The Old JEDH Site
BIS
WEBSITE
IMF
OECD
World
Bank
http://www.opendatafoundation.org
(Various
Formats)
(3-month production cycle)
[email protected]
JEDH with SDMX
Retrieves data from sites
BIS
IMF
OECD
World
Bank
SDMX-ML
SDMX
“Agent”
SDMX-ML
SDMX-ML
SDMX
Registry
http://www.opendatafoundation.org
Discover data
and URLs
Data provided
in real time
to site
SDMX-ML
SDMX-ML
SDMX-ML
Loaded into
JEDH DB
(Debtor database)
JEDH Site
[email protected]
Recent and On-Going Developments
• Many organizations using SDMX have
been implementing web services
• There is growing interest in forming a
working group to further extend the
specification for use with web-services
technology
– Standard error messages
– Expanded function calls
– Standard WSDLs
• If you are interested in this, please tell me!
http://www.opendatafoundation.org
[email protected]
Tools and Resources
SDMX Tools
• There are now several sources for SDMX tools
– All are free or open-source
• Eurostat – complete package of tools for data, metadata,
and registry services
• Metadata Technology Ltd – similar package of tools
• Data editors are usually based on Excel
• Some other tools
– Open Data Foundation “SDMX Browser” for data visualization
– OECD, ECB, and UN/Statistical Division provide some other
tools for specific applications
– Integration with PC-Axis has been prototyped, to be available
this summer
– DevInfo has SDMX support
– FAME is developing SDMX support
• Commercial vendors provide good support through webservices functionality
– Eg, Oracle 11, .NET, etc.
http://www.opendatafoundation.org
[email protected]
Resources
• The SDMX Initiative Site:
http://www.sdmx.org
• The SDMX Toolkit and Forums:
http://www.metadatatechnology.com
• Various papers and (soon) open-source
tools:
http://www.opendatafoundation.org
http://www.opendatafoundation.org
[email protected]
IZA Data Service Center
DDI/SDMX Workshop
Wiesbaden, Germany, June 18th 2008
SDMX, DDI, and Other Standards
Arofan Gregory / Pascal Heus
[email protected] / [email protected]
Open Data Foundation
Overview of the Session
• DDI/SDMX: Philosophy and Timing of
Standards Development
• DDI/SDMX: Points of Functional Overlap
• DDI/SDMX: Direct Mappings
• DDI/SDMX: Integration Approaches
• Other Related Standards and On-Going
Work
http://www.opendatafoundation.org
[email protected]
DDI/SDMX: Philosophy and Timing
of Development
Development Philosophies/Timing
• Unlike many standards bodies, both the SDMX
Initiative and the DDI Alliance have attempted to
create standards which do not duplicate existing
efforts
– There is an awareness that users need to deal with
several different standards
– DDI (3.0) and SDMX were both intentionally aligned
with other, related standards
• DDI 1.*/2.* existed before SDMX
– It was largely self-contained
• SDMX was created before DDI 3.0 existed
– Created with an awareness of DDI 1.*/2.*
• DDI 3.0 benefited from having SDMX as a
published specification
– Actively aligned with SDMX and many other standards
http://www.opendatafoundation.org
[email protected]
SDMX Design
• SDMX was intentionally designed to
accommodate integration of standards
which are used with the inputs to
aggregate data
– This included DDI and XBRL
– Mechanism for integration is generic
• The key point for this integration is the
SDMX Registry
– It provides links between aggregate (SDMX)
data sets, and also to source data and
metadata
http://www.opendatafoundation.org
[email protected]
DDI/SDMX: Points of Functional
Overlap
SDMX and DDI as Complementary
• DDI is designed to document micro-data
– 1.*/2.* versions were archival, after-the-fact
documentation
– 3.0 version covers entire life cycle, but still has an
after-the fact function
• SDMX is designed as a standard for processing
and automation
– It is not documentary, but is aimed at automation of
statistical systems and exchanges
• These purposes are related, but not duplicative
– SDMX and DDI can both do useful things within a
single system
http://www.opendatafoundation.org
[email protected]
Examples
• DDI could be used to document SDMX-based
aggregates more completely for archival
purposes
• DDI could be used to document the micro-data
on which aggregates are based
– As soon as tabulation occurs, SDMX can be used to
describe and format the data
– SDMX can describe micro-data, but it is not very
useful
– DDI can be used to automate processing of multidimensional data cubes, but it is more difficult than
with SDMX
• SDMX can be used to link DDI instances with
other types of standard data and metadata
(including both SDMX and DDI)
http://www.opendatafoundation.org
[email protected]
DDI and SDMX
SDMX
DDI
Aggregated data
Microdata
Indicators, Time Series
Across time
Across geography
Open Access
Easy to use
Low level observations
Single time period
Single geography
Controlled access
Expert Audience
• Microdata data is a important source of aggregated data
• Crucial overlap and mappings exists between both
worlds (but commonly undocumented)
• Interoperability provides users with a full picture of the
production process
http://www.opendatafoundation.org
[email protected]
Generic Process Example
DDI
Anonymization, cleaning,
recoding, etc.
Raw Data Set
Indicators
Micro-Data Set/
Public Use Files
Aggregation,
harmonization
Aggregate Data Set
http://www.opendatafoundation.org (Higher Level)
Aggregate Data Set
(Lower level)
SDMX
[email protected]
DDI + SDMX?
• When you have data which has been
tabulated/aggregated, it may be useful to
have both SDMX and DDI
– SDMX for processing and exchanging the
data
– DDI for documenting these processes, in case
they are of interest to researchers
• DDI has a much richer descriptive
capability for addressing the exact
processes used in statistical packages
• SDMX is easier to process
http://www.opendatafoundation.org
[email protected]
DDI/SDMX: Direct Mappings
Direct Mappings: DDI & SDMX
• IDs and referencing use the same approach
(identifiable – versionable - maintainable;
structured URN syntax)
• Both are organized around schemes
– Reusable packages of data, similar to relational
tables in databases
• Both describe multi-dimensional data
– A “clean” cube in DDI maps directly to/from SDMX
• Both have concepts and codelists
– DDI has much less emphasis on concepts
– SDMX emphasizes concepts because they are
needed for comparison
• Both contain mappings (“comparison”) for codes
and concepts
http://www.opendatafoundation.org
[email protected]
Formal Mapping
• There is on-going work to describe a
formal mapping between SDMX and DDI
– It will cover these direct correspondences
– They are quite obvious: a code maps to a
code; a concept to a concept; etc.
– There are currently no tools, because generic
tools such as XSLT will work for this
transformation
– Drafts of this work are expected this summer,
as part of the SDMX submission to ISO for the
version 2.0 Technical Specifications
• The direct mappings are the easy part!
http://www.opendatafoundation.org
[email protected]
Issues with Direct Mapping
• It is possible to describe everything in the DDI as an SDMX
Metadata Set
– This is probably not the best way to use SDMX with DDI!
– It is usually better to select the important fields, and keep the rest in
native DDI format
• When you map from DDI to SDMX, you typically will not
carry much of the descriptive metadata, question text, etc.
– Mostly structural (codelists, dimensions, attributes, concepts)
– You must have concepts for SDMX which are not always present in
DDI
• Going from SDMX to DDI, it is not always possible to map
all the data
– Especially for SDMX Metadata Sets, which may have userconfigured concepts that don’t always exist in DDI
• Note that SDMX-DDI mappings refer to all versions of DDI
http://www.opendatafoundation.org
[email protected]
DDI/SDMX: Integration Approaches
Integration Use Cases
• The most important aspect of DDI – SDMX
integration is understanding what the use cases
are
– This defines what mapping/transformation is needed
– It also defines what links need to be stored between
data and metadata files
• There are some common use cases
– DDI used to describe and link microdata inputs to
SDMX aggregates
– DDI used to more fully document SDMX aggregates
for dissemination to users
– Using the SDMX Registry as a lifecycle management
tool for DDI, SDMX, etc.
http://www.opendatafoundation.org
[email protected]
Linking Source Data and Aggregates
• DDI provides a wealth of information about the micro-data
which serves as an input to SDMX aggregates
– It is possible to capture these links in SDMX, at the cell level or
higher, to provide automated access to source data
– An SDMX registry can be used to provide easy access to these links
– The user/collector of aggregate data can access the rich DDI
metadata, and possibly the data (if they have access rights)
• It is possible to automatically generate SDMX output from
the DDI metadata describing tabulation of micro-data
– This may not be useful if the desired SDMX target is a standard cube
structure described by another organization
– It may make transformation to the standard cube easier, however
• The SDMX Registry provides a good tool for managing links
– Links between SDMX and DDI files are stored as Metadata Reports
http://www.opendatafoundation.org
[email protected]
Demo: SDMX – DDI Source Links
http://www.opendatafoundation.org
[email protected]
DDI + SDMX for Dissemination
• Typically, the full DDI documentation is not
provided on web-sites which publish
aggregates/indicators
• SDMX is becoming a popular dissemination
format for these data
– It has been shown to increase the use of data on the
Web
• If the DDI documentation is available, this could
also be delivered as additional documentation
– Especially useful at study level
– Links could be directly embedded in SDMX data files
as attributes or stored in an SDMX Registry, or both
http://www.opendatafoundation.org
[email protected]
The SDMX Registry for Lifecycle Management
• The SDMX Registry provides a tool for tracking
the sources of data for aggregates
• It can also track the transformation of versions of
DDI as the data moves through the lifecycle
• There is an SDMX model for processes
– This can be used to describe the DDI lifecycle model
– SDMX Metadata Reports can be used to link DDI
metadata to specific stages of the DDI lifecycle, and
to each other
• Applications could query the SDMX Registry to
discover all of the DDI metadata produced
upstream, as micro-data is collected and
processed
http://www.opendatafoundation.org
[email protected]
Demos
• SDMX Metadata Report used to express
DDI metadata
• SDMX Metadata Report used to link DDI
instances
http://www.opendatafoundation.org
[email protected]
Other Related Standards and OnGoing Work
Many Related Standards
• DDI
• SDMX
• ISO/IEC 11179 – concept management and
semantic modelling
• ISO 19115 – Geographical metadata
• METS – packaging/archiving of digital objects
• PREMIS – Archival lifecycle metadata
• XBRL – business reporting
• Dublin Core – citation metadata
• Standard mappings are being defined by people
from many different organizations (see
presentation from METIS 2008 in Luxembourg)
http://www.opendatafoundation.org
[email protected]
ISO/IEC 11179
• ISO/IEC 11179 is used to describe the meanings
and representations of terms and concepts
• Both SDMX and DDI are aligned with ISO/IEC
11179
– SDMX and DDI concepts can be defined using the
ISO/IEC 11179 attributes
– Codelists and categories can be directly mapped (and
other representations)
– ISO/IEC 11179 can be implemented with DDI (directly,
for concepts) and/or with SDMX (as a Metadata
Report)
– ISO/IEC 11179 has no standard expression in XML –
it is just a model
http://www.opendatafoundation.org
[email protected]
ISO 19115 Geographical Metadata
• ISO 19115 describes geographies
(bounding boxes for countries, etc.)
• DDI uses the ISO 19115 model in its own
XML
– It does not use the standard ISO 19115 XML
format, but there is a 1-to-1 mapping
• SDMX could model ISO 19115 if desired
– Linking to DDI or ISO 19115 XML is probably
more useful, using the standard SDMX
mechanism
– Most geographies in SDMX aggregate data
sets are coded, not directly described
http://www.opendatafoundation.org
[email protected]
METS
• METS is used to package a set of files
which work together as a digital object
• Both DDI and SDMX metadata could be
placed inside a METS wrapper
– They would be “metadata sections”
– The primary use case would be for archiving
of a set of related data and metadata files,
possibly with other related materials such as
research publications
http://www.opendatafoundation.org
[email protected]
PREMIS
• PREMIS allows for the capture of
administrative metadata as a collection is
placed and managed within the archive
• DDI and SDMX files would be treated like
any other files forming part of the
collection
– Both may contain metadata which can be
extracted and used to populate PREMIS
instances (access levels, confidentiality, etc.)
http://www.opendatafoundation.org
[email protected]
XBRL
• XBRL is used by business to report
required information to national
supervisory bodies
– This includes banking supervision and other
economic data
– XBRL is a source format for some aggregate
statistics
• XBRL International and the SDMX
Sponsors are working together to define a
cross-walk between the two standards
http://www.opendatafoundation.org
[email protected]
Dublin Core
• Dublin Core is used to capture citationtype metadata for resources on the
Internet and elsewhere
– It is widely used in digital repositories for
research papers
• DDI has the basic Dublin Core XML format
as an integral part of the DDI 3.0
specification
• Dublin Core can be easily mimicked as an
SDMX Metadata Report [Demo]
http://www.opendatafoundation.org
[email protected]
High-Level Vision – Standards Mappings
Federated Registries (Based on SDMX, ebXML, web services)
Semantic
definitions
ISO
11179
Aggregated
Data/Metadata
(SDMX)
Organized
using
registered
References to source data
METS/PREMIS
Standard
classifications
XBRL
Business
Reports
DDI
Microdata
Sets
Used in
http://www.opendatafoundation.org
Dublin Core
Citations
ISO 19115
Geographies
[email protected]

IZA Data Service Center DDI/SDMX Workshop Wiesbaden

Transcript IZA Data Service Center DDI/SDMX Workshop Wiesbaden

Directory