INEGI: Introduction to SDMX

Download Report

Transcript INEGI: Introduction to SDMX

EDDI: Introduction to SDMX
Arofan Gregory
Open Data Foundation
What is SDMX?
• The problem space:
– Statistical collection, processing, and
exchange is time-consuming and resourceintensive
– Various international and national
organisations have individual approaches for
their constituencies
– Uncertainties about how to proceed with new
technologies (XML, web services …)
National Statistical
Organisations
accounts
statistics
Banks, Corporates
Individual Households
transactions
accounts
www.z.org
www.hub.org
www.y.org
www.x.org
Internet, Search, Navigation
180 + Countries
International Organisations accounts
Regional Organisations statistics
What is SDMX?
The Statistical Data and Metadata
Exchange (SDMX) initiative is taking steps
to address these challenges and
opportunities that have just been
mentioned:
– By focusing on business practices in the field
of statistical information
– By identifying more efficient processes for
exchange and sharing of data and metadata
using modern technology
Historical Note
• SDMX uses an approach based on the 10-yearlong success of an earlier standard –
GESMES/TS
• GESMES/TS was an initiative that is used today
in many countries for collecting, exchanging,
and updating statistical databases
– GESMES/TS is now SDMX-EDI
• Focus is on time-series, and is mostly used by
central banks
Who is SDMX?
• SDMX is an initiative made up of seven
international organizations:
–
–
–
–
–
Bank for International Settlements
European Central Bank
Eurostat
International Monetary Fund
Organisation for Economic Cooperation and
Development
– United Nations
– World Bank
• The initiative was launched in 2002
SDMX Products
• Technical standards for the formatting and
exchange of aggregate statistics:
– SDMX Technical Specifications version 1.0 (now
ISO/TS 17369 SDMX)
– SDMX Technical Specifications version 2.0
(submitted to ISO)
– SDMX Technical Specifications version 2.1 under
review (will be forwarded to ISO)
• Content-Oriented Guidelines
– Common Metadata Vocabulary
– Cross-Domain Statistical Concepts
– Statistical Subject-Matter Domains
Detailed SDMX Goals
• Reduce national reporting burden to international institutions
• Fostering consistency, accuracy, and timeliness between
data and metadata disseminated by national and
international institutions, relying on what is decentrally
released via national websites
• Enhancing national statistical processing efficiency,
especially through internationally-recognised standard
formats for exchanges between statistical silos within
institutions and with other national statistical agencies
• Providing standards for web-based dissemination formats
that are computer readable and facilitate updating of
databases
• Enhancing comparison of data and metadata analysis
through standard formats and content-oriented guidelines
Official Recommendations
• SDMX has been officially recommended:
– February 2007: SDMX endorsed by the
European Union’s Statistical Programme
Committee
– March 2008: UN Statistical Commission
declares SDMX to be the preferred standard
for data and metadata
Exchange Patterns
• Bilateral: Institutions exchange data
according to bilateral agreements
regarding format, timing, protocols, etc.
• Gateway: Institutions share the data they
collect with their peers, in agreed formats
among counterparty communities
• Data-sharing: standard exchange of data
using standard formats and protocols
Bilateral Exchange
Gateway Exchange
Data-Sharing Exchange
Notes About Data-Sharing
• Data-sharing only works if there are standard
formats
• Data-sharing works only if the data themselves
are decentralized
– One big database doesn’t work!
• Like the Web itself, a data-sharing model relies
on pull exchanges, not push exchanges
– Data consumers discover the data they need, and its
location, and then go and get it
– Data producers don’t have to send data
Adopters/Interest
•
The following are known adopters (or planning to adopt):
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
US Federal Reserve Board and Bank of New York
European Central Bank
Joint External Debt Hub (WB, IMF, OECD, BIS)
UN/TRADECOM at UN Statistical Division
NAAWE (National Accounts from OECD/Eurostat)
European Statistical System (Eurostat and National Statistical Institutes)
Mexican Federal System
Vietnamese Ministry of Planning and Investment
Qatar Information Exchange
IMF (BOP, SNA, SDDS/GDDS)
Food and Agriculture Organization
Millennium Development Goals (UN System, others)
International Labor Organization
Bank for International Settlements
OECD
World Bank World Development Indicators (WDI)
Marchioness Islands (Spanish/Portuguese Statistical Region)
UNESCO (Education)
Australian Bureau of Statistics
WHO (SDMX-HD)
Statistics Canada
There are many others!
OECD
• Data structures are specified using SDMX
standards
• Data sets are held in SDMX-ML format and
navigated “on the fly”
– OECD.Stat
• http://stats.oecd.org/WBOS/index.aspx
• Experimenting with graphical presentation of
data
• Serves all OECD data as SDMX through
OECD.stat web service
Eurostat
• Builds on long experience of using GESMES for data
transmission (GESMES is main format for transmission
of data in several important domains e.g. national
accounts, balance of payments, short-term statistics)
• More than 50 Data Structure Definitions for GESMES
developed and maintained (in partnership with ECB)
• Software components developed and made available as
open-source software (see Tools page of SDMX
website)
• Now creating a portal for all European Census data,
collected as SDMX
SDMX Information Model: High level
Schematic
Category
Scheme
Data or Metadata
Structure Definition
Data or
Metadata Set
conforms to business
rules of the
data/metadata flow
Metadata Flow
publishes/reports
data/metadata sets
Data Provider
uses specific
data/metadata
structure
can be linked to
categories in
multiple category
schemes
Data or
can provide
data/metadata for
many data/metadata
flows using agreed
data/metadata
structure
can get data/metadata
from multiple
data/metadata providers
Provision
Agreement
registers existence of
data and metadata
is registered for
comprises
subject or
reporting
categories
Category
can have child
categories
Registered
Data or
Metadata Set
SDMX Technical Specs v 1.0
• Information Model (data structure
definitions and data formats)
• SDMX-ML: XML formats for data structure
definitions and data
• SDMX-EDI: EDI formats for data structure
definitions and data
• Web-Services Guidelines
• User Guide
Technical Notes on Version 1.0
• Only numeric observations were supported
• Only coded key values were supported
• Intended to provide an XML version of the
existing GESMES/TS data model
– GESMES/TS became SDMX-EDI
– XML extended the data model to provide for
more types of groups and cross-sectional data
• Hierarchical codelists not supported
SDMX Technical Spec v. 2.0
• Expanded data model includes
– Registry interfaces
– Metadata structures and formats
– Data and metadata provisioning
– Other advanced features (process flow,
reporting taxonomy, structure mapping, etc.)
• Data formats now include uncoded
dimensions, hierarchical codelists, and
non-numeric observations
Technical Notes on Version 2.0
• A very large expansion of scope
– Model covers the process of statistical
exchange, not just the data formats
– Many cases which version 1.0 could not
support were included in version 2.0 as a
result of implementations
• Full support for the “data sharing” pattern
of exchange
– Resulting from the inclusion of the registry
Changes for Version 2.1
• Expanded Web Services Guidelines
–
–
–
–
Standard WSDL Functions
Standard RESTful syntax (URL-based API)
Standard Error Codes
Will allow for interoperable web services for SDMX – so generic
clients can use multiple sources
• Simplified Data Formats
– All data formats will be more consistent
– Cross-sectional and time-series formats are more similar
• SDMX Query has been improved
• Note: SDMX 2.1 is available for public review now!
The Old JEDH (Joint External
Debt Hub) Site
BIS
WEBSITE
IMF
OECD
World
Bank
(Various
Formats)
(3-month production cycle)
JEDH with SDMX
Retrieves data from sites
BIS
IMF
OECD
World
Bank
SDMX-ML
SDMX
“Agent”
SDMX-ML
SDMX-ML
SDMX
Registry
Discover data
and URLs
Data provided
in real time
to site
SDMX-ML
SDMX-ML
SDMX-ML
Loaded into
JEDH DB
(Debtor database)
JEDH Site
FOOD AND AGRICULTURE ORGANIZATION
OF THE UNITED NATIONS
SDMX in Action: Prototype System
FAO SDMX
Registry
2
National
Publication
Server(s)
1
CountrySTAT
3a
Regional
Publication
Server
3b
Flow of FAO CountrySTATRegionSTAT Implementation
4
RegionSTAT
Slide courtesy of the FAO
FOOD AND AGRICULTURE ORGANIZATION
OF THE UNITED NATIONS
Prototype System: Explanation
1
CountryStat National Publication Server
•The web site is published from the files in CountryStat
2
SDMX Publication
•The new CountryStat files are converted to SDMX-ML data sets
and made web accessible on the CountryStat web site
•These files are registered in the FAO SDMX Registry
RegionStat Regional Publication Server
3a
•Queries the registry for new registrations which responds with
registration details including the URL of the new data sets
3b
•Retrieves the new data sets from the CountryStat web site
•Converts the SDMX-ML files to an internal format and integrates
the new data sets with existing RegionStat data sets
4
•Re-publishes the RegionStat web site
Slide courtesy of the FAO
Questions?