SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva 3-5 April 2006

Download Report

Transcript SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva 3-5 April 2006

SDMX Standards
Relationships to ISO/IEC 11179/CMR
Arofan Gregory
Chris Nelson
Joint UNECE/Eurostat/OECD workshop on statistical
metadata (METIS): Geneva 3-5 April 2006
Major Points
•
•
•
•
•
•
Activity and Results
Scope
Areas of Commonality
Applying ISO 11179 to SDMX
CMR and SDMX Registries
Use in Applications: A Case Study
Activity
• We were asked to describe the
relationship between ISO 11179/CMR and
SDMX
• There is not complete overlap
– We focused on points of contact between
them
• This was an interesting project, which we
will continue working on
Results
• ISO 11179/CMR and SDMX:
– ISO 11179 provides a useful pivotal model for
mapping between CMR (or others) and SDMX
constructs
– ISO 11179 does not model the semantics of a
metamodel – it is useful for the semantics of the
models (specific metadata structure) or even
instances of the model (specific data set)
– The utility of this mapping depends on the
application
• The SDMX and ISO 11179 registries are
complementary in function (semantics + access)
Characterization
• ISO 11179 models the semantics of data
elements
• CMR extends this to cover survey lifecycle
metadata – adds structure
• SDMX models the structure of metadata
and data for aggregate statistical data –
you supply your own semantics (concepts)
• This is not a comparison of like things: it is
a connecting of complementary ones
Scope – ISO 11179
• ISO 11179 is a metadata content standard focused
primarily on the semantics of data. Secondarily, it
provides the rules and structures for registering
descriptions of data.
• The standard exists in 6 Parts, and the 2nd edition is now
available.
–
–
–
–
–
–
Part 1: Framework for the specification and standardization of data elements
Part 2: Classification for data elements
Part 3: Registry metamodel and basic attributes
Part 4: Rules and guidelines for the formulation of data definitions
Part 5: Naming and identification principles for data elements
Part 6: Registration of data elements
• This presentation is concerned mainly with Parts 3 and 5
SCOPE: ISO 11179 Part 3
High Level View - Data Elements and Concepts
Country Identifier
Country Name
Country Code
Countries of the World
ISO 3166: Country Code List
SCOPE – CMR
• The CMR model is designed to support
– metadata necessary to describe the survey
life cycle
– linkages between similar designs and
processes used across surveys
– use of metadata to drive systems in support of
the survey life cycle
• The CMR model extends the ISO 11179
(Part 3) metamodel to support CMR
specific artefacts
SCOPE - CMR
ISO 11179
CMR
Links to
appropriate ISO
11179 artefact to
supply semantics
and management
SCOPE - SDMX
• SDMX standards comprise
– Information Model
– Syntax implementations based on the
Information model (XML schemas and
UN/EDIFACT)
– Content-oriented guidelines
• Statistical subject matter domain scheme
• Cross Domain metadata concepts
• Metadata vocabulary (MCV – Metadata Common
Vocabulary)
Information Meta Model
Aggregated data, metadata, structural metadata
Code-List
Concept-Scheme
Cube-Structure
Design Meta Schema
Metadata-Structure
-Definition
Key-Family
Data-Set
SDMX-Base
Metadata-Set
for aggregated data
and metadata
Category-Scheme
Provisioning
Generate
structure
specific schema
Create data/metadata
structure instance
(data/metadata structure
definition)
Report/publish
data/metadata
Register
SDMX
Registry
Use cross domain
metadata
concepts and code
lists
Areas of Commonality
ISO 11179 and SDMX
• Both standards support the definition of
– Concepts
– Use of concepts in structures
– Representations/Value domains
• Both standards support the concept of a registry
– ISO 11179 specifies a registry metamodel
– SDMX does not specify a registry metamodel
– ISO 11179 does not specify registry interfaces based on the
registry model
– SDMX specifies registry interfaces based on the SDMX model
• Implementation of a registry and mapping to a registry model is left
to the implementor
• Can use ISO 11179 registry implementation, ebXML registry
implementation, or bespoke registry implementation
Concepts – ISO 11179
Concepts - SDMX
ConceptScheme
1
/items
0..1
Data Element Concept
0..*
coreType
+child
0..*
Concept
1
1..*
Type
type : DataType
Conceptual Domain
(for SDMX this defines the
core/default
+defaultRepresentation
0..1representation: implicit
coreRepresentation
here is whether it is
Representation
enumerated or non
0..1
enumerated)
1
[not given explicit
0..* name in SDMX]
ISO-11179 equivalent
Facet
facetType : FacetType
facetValue : String
<<enumeration>>
DataType
string : String
bigInteger : String
identifiableObjectType : String
integer : String
long : String
short : String
decimal : String
float : String
double : String
boolean : String
dateTime : String
time : String
date : String
year : String
month : String
day : String
monthDay : String
yearMonth : String
duration : String
timeSpan : String
uri : String
count : String
observationalTimePeriod : String
base64Binary : String
<<enumeration>>
FacetType
isSequence : Boolean
isInclusive : Boolean
minLength : Integer
maxLength : Integer
minValue : String
maxValue : String
startValue : String
endValue : String
increment : Double
timeInterval : Duration
decimals : Integer
pattern : String
enumeration : ItemScheme
Data Element: ISO 11179
Concept Usage - SDMX
/conceptIdentity
Concept
Dimension
(from Concept-Scheme)
1
1
0..*
1
/conceptIdentity
0..*
Concept Usage = Data
Element
Measure
[can be given explicit
name in SDMX (e.g. ISO
/localRepresentation 11179
/localRepresentation
equivalent name),
but this is not required
and is not used for
identification purposes]
coreRepresentation
/conceptIdentity
0..1
0..*
0..1
/localRepresentation
Representation
DataAttribute
0..1
(from SDM X-Base)
0..1
Value Domain
Concept Usage - SDMX
•
There is no explicit “Data Element” name for usage of a Concept (e.g.
FREQUENCY Concept used for a Dimension)
– Concept names are used within the context of the role of the Concept
(Dimension, Attribute, Measure) – these are structure components
– The unique identifier for a structure component comprises:
<data structure definition.structure component list.concept>
e.g. BOP.KEYDESCRIPTOR.FREQUENCY
[actual identifier includes maintenance agencies]
•
This is useful if one wishes to reference the component in, say, a registry
scenario, but it is not normally retained in processing systems
– For instance in the data structure definition the identifier of the Concept is used,
but the full identifier can be constructed from its context in the structure
– However the SDMX-ML specification does allow the full identifier [URN] to be
specified as well
•
ISO 11179 equivalent name could be BOP_KEYDESCRIPTOR.FREQUENCY.CODE
dD...
DataSet
Category
DataflowDefinition
(from Category-Scheme)
(from Data-Set)
0..*
0..*
0..*
SDMX: Part of Key Family
1
KeyFamily
AttributeDescriptor
0..1
1
1
0..*
1
1
1
GroupKeyDescriptor
0..*
1
1
KeyDescriptor
MeasureDescriptor
1
UncodedMeasure
1..*
Measure
CodedMeasure
0..*
0..*
{ordered, full-key}
1..*
0..*
Dimension
1
Concept
DataAttribute
(from Concept-Scheme)
/localRepresentation
coreRepresentation
0..1
CodedD...
{ordered, partial-key}
0..1
0..*
1
/localRepresentation
/localRepresentation
CMR and ISO 11179
1..N
• CMR is a model that
supports the survey
lifecycle
• Element and Concept
semantics are provided
by an association to ISO
11179 e. g
+Maps
+Mapped by 1..1
+Parent
0..1
Question Map
0..N
+Child
Map Identifier: [1..1]
0..N
1..N
0..N
0..N
Questionnaire
Data Element
Concept
Questionnaire
Administration: [1..1]
OMB Number: [1..1]
DEC Administration: [0..1]
Object Class: [0..1]
Property:[ 0..1]
0..N
+Followed by
0..N
+Contained by
+Links
+Preceded by
+Corresponds to
+Follows
+Precedes
+Corresponds to
+Contains
1..1
+Linked by
Response Choice
0..N
RC Identifier: [1..1]
Choice Text: [1..N]
0..N
0..N 1..N
1..1
0..N
Question
+Parent
0..1
Question
Administration: [1..1]
Question Text:[ 1..1]
0..N
+Child
+Corresponds to
•Question is associated to Data Element Concept
•Response Domain is associated to Value Domain
•Data Set component (element) is associated to Data Element
•Data set contains the output from a survey
•All data elements are explicitly named
•ISO 11179 is the pivot between CMR and SDMX
•We need to apply ISO 11179 to SDMX
Value Domain
VD Administration:[ 1..1]
Permissible Values: [0..N]
Description: [0..1]
Data Type: [1..1]
+Corresponds to
0..N
Applying ISO 11179 to SDMX
• ISO 11179 gives you:
[Object_Class].[Property].[Representation]
• An ISO 11179 instance would have:
[Object_ID].[Property_Term].[Representation_Term]
• We have these constructs in SDMX data
and metadata sets
SDMX Objects
• For data, we have a limited set of object classes:
–
–
–
–
Data Sets
Groups
Series
Observations
• For reference metadata, we have a larger set of object
classes:
–
–
–
–
Data Providers
Category Schemes
Data and Metadata Flows
Etc.
• Note that SDMX instance IDs are compound
– For data, a “key”
– For metadata, a “target identifier”
SDMX Properties and
Representations
• In SDMX, properties are taken from
Concepts
• The names and definitions are supplied by
the user for most properties
– Some are required by the model
• Representations are equivalent to those in
ISO 11179 (code, text, etc.)
Object (from the
SDMX model)
Property (from the
Concept that is the
Data Attribute or
Measure in the Data
Structure Definition)
Representation Term
(derived from the
Representation of the
Data Attribute or
Measure in the Data
Structure Definition)
[Observation_ID].
Confidentiality.
code
[Series_ID].
Availability.
code
[Group_ID].
Title.
text
[Observation_ID].
Value.
number
[Data Set_ID].
Title.
text
SDMX Object IDs
• Compound object IDs in SDMX are either:
– Data Keys (for Group, Series, or Observation), which
depend on the data structure definition
– Metadata Target Identifiers (for data provider,
metadata flow, categorization scheme, etc.), which
depend on the metadata structure definition
• Data Examples
– Observation Key: “Annual – Total Population – Kenya
– 1994”
– Series Key: “Annual – Total Population – Kenya”
– Group Key: “Annual – Total Population” (this is the
group of all countries)
Examples, cont.
• For the concept of “Availability” for a series, in
ISO 11179 we would have:
Annual – Total Population – Kenya. Availability. Code
• Note that the SDMX distinction between
“dimensions” and “attributes” is not important to
ISO 11179 – the semantic models use the same
basic approach for both
– You only model the semantics of discrete, valuecontaining constructs: attributes and observation
values
Reference Metadata Object IDs in
SDMX
• All of the objects in the SDMX Information
Model can be Object Classes for ISO
11179
• They can be combined to identify new
object classes
• The IDs for instances of objects are
composite “target identifiers”
Example of Some SDMX Object Classes
Category
Scheme
Structure
Definition
Data Set
or
Metadata
Set
publishes/
reports
data sets
or
metadata
sets
Data
Provider
uses specific
data/metadata
structure
conforms to business
rules of the
data/metadata flow
can provide
data/metadata for
many data/metadata
flows using agreed
data/metadata
structure
Data or
Metadata
Flow
can be linked to
categories in
multiple category
schemes
can get
data/metadata
from multiple
data/metadata
providers
Provision
Agreement
comprises
subject or
reporting
categories
Category
can have
child
categories
Reference Metadata Example
• A data provider is identified with an ID
which references the organization scheme
(and agency) the data provider ID code
comes from:
[OrgSchemeAgencyID]_[OrgSchemeID]_[Data_ProviderID]
• Example:
sdmx_sdmxProviderScheme1_IMF shows the ID for a
data provider from agency scheme 1 maintained by
SDMX, identifying the IMF as a data provider
Reference Metadata Example,
cont.
• To model a “Contact_Name” concept in an
SDMX metadata report for a specific data
provider, we have in ISO 11179
sdmx_sdmxProviderScheme1_IMF.Contact_Name.Text
A more useful approach is possible:
Data_Provider.Contact_Name.Text
This is not possible for data objects in SDMX.
Registry Model: ISO 11179
• But no API specification for implementing
the model
Registry Metamodel: SDMX
• SDMX does not have an abstract registry metamodel
• SDMX specifies registry interfaces based on the SDMX
Information Model
• Compliance with SDMX registry standards is based on
implementation of the SDMX registry APIs
• The actual registry implementation can be whatever the
registry service provider chooses
– The JEDH pilot project is implemented using an ebXML registry
(this has a model which is very similar to the ISO 11179 model)
– An SDMX compliant registry could use an ISO 11179 registry (if
such an implementation exists)
Functionality
• An ISO 11179 registry is used to provide a
semantic registry of specific data elements
– The focus is on understanding the meaning of
data and metadata
• An SDMX registry provides mechanistic
services to facilitate the exchange of data
and metadata sets
– The focus is on access to data and metadata
• These are complementary functions
Using the Mapping
• CMR is one example of a model which could be
mapped against SDMX
– SDMX is about aggregate data exchange
– CMR is about survey lifecycle metadata
– These are potentially related, but are not the same
thing
• To be useful in an application, we care about the
semantic equivalencies of data and metadata
– These equivalencies allow us to re-use and map data
between models and systems
– ISO 11179 acts as a pivot
ISO 11179
SDMX
Semantics
Semantics
Structures and
rules are
proprietary to
individual
organisations
Data and
Metadata
Structures
CMR defines a conceptual
model for data and metadata
structures for the survey
lifecycle, using the ISO 11179
model for structuring semantics
Describe SDMX
semantics
based on SDMX
structures
Organisations map
between their data and
metadata structures and
and SDMX data and
metadata structures
using ISO 11179
semantics
Structures and
rules prescribed
by SDMX
standards
Data and
Metadata
Structures
SDMX defines a conceptual
model for data and metadata
structures for the aggregated
data and reference metadata
and provides a canonical
syntax representation
Another Case Study
• One use case – CMR – is not enough to validate
our work
• At the ISO TC-154 meeting in Vancouver we met
with the designers of ISO-15000 part 5, the
“Core Components Technical Specification”
(CCTS)
– CCTS is a modelling methodology for e-commerce,
standardized in ISO TC 154
– It is based on ISO-11179
• We were asked: “Could SDMX data be modelled
according to CCTS?”
Why Bother?
• Statistics and e-commerce do have connection
points:
– Customs and international trade
– Payments transactions and business reporting
• Some systems today map between business
transactions and statistical reports
• Thus, mapping between CCTS (e-commerce
transactions) and SDMX (statistics) is potentially
a real-world use case.
The Answer is “Yes”
• Based on our SDMX-to-ISO 11179
mapping:
– We could express our semantics
– Compare them to the e-commerce semantics
– Determine the relationship between specific
data elements
• This was a quick, straightforward
validation of our work
Conclusions
• ISO 11179 is very useful as a pivotal semantic model for
working with other models (CMR, CCTS, etc.)
• SDMX is mapped at the model level, not the metamodel
level
– Semantics are introduced at this level
• The SDMX and ISO 11179 registries are complementary
in function (semantics + access)
• The structural mapping of any model depends on the
ISO 11179 implementation
– ISO 11179 only does semantics, not structure
– Applications need both structural and semantic mapping