Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues (2004-2005) Marco Pellegrino Eurostat [email protected] Denis Ward OECD [email protected].
Download ReportTranscript Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues (2004-2005) Marco Pellegrino Eurostat [email protected] Denis Ward OECD [email protected].
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues (2004-2005) Marco Pellegrino Eurostat [email protected] Denis Ward OECD [email protected] Overview SDMX Metadata Common Vocabulary Background Objectives and benefits of MCV Status of the project Expected benefits and use 1 Terminology problems... 2 Starting point No universally accepted metadata framework Semantics is important to interoperation - registries contain related and sometimes overlapping information - data must be kept updated and synchronized with a minimum effort Common understanding on meaning of: - general concepts (metadata, quality,…) - basic “atomic” metadata components 3 Components of International Standards BUSINESS MODELS SEMANTICS (content) SYNTAX (e.g. XML) 4 Metadata Common Vocabulary Ultimate goal of project: to develop a common understanding of standard metadata items focusing on descriptions of statistical concepts and methodologies used by statisticians in the collection, processing and dissemination of statistical data. Immediate objective: to develop a Metadata glossary of those standard components, consistent with existing international standards and with terminology being used within SDMX organizations, other international / national agencies and related projects. 5 Main references for standardisation ISO/IEC 11179, part 4 (Formulation of data definitions) Recommendations for constructing definitions for data and metadata ISO/IEC 11179, parts 1 and 3 (Metadata registry) Definition of main metadata items Quality glossaries UN and UN/ECE-CES methodological documents and glossaries (on metadata modelling, classifications, data editing,…) SDMX documents (Gesmes/TS users guide, ISO framework) Definitions of main items for data-metadata exchange 6 Tentative classification of MCV terms (draft 3rd public release, April 2005) Specification Number of terms Total % 346 of which: Synonyms 10 Definitions 339 100 General statistical terminology 215 64 85 25 97 29 83 24 24 7 17 5 of which Quality (assessment) Metamodelling of which ISO Data exchange of which GESMES/TS 7 Metadata Standard Components Administrative, Sources Concepts, coverage, definitions Standards Methodology (collection, compilation,...) Quality assessment Metadata elements describing different elements of statistical production cycle Unambiguous accepted definition of metadata elements located in a glossary comprising the Metadata Common Vocabulary 8 Fields of MCV glossary Title (mandatory) Definition (mandatory) Context for the definition (optional, but widely used) Definition source (mandatory) Links to related terms within the glossary (optional) URL to more detailed information (optional) 9 Reference Metadata Reference metadata Definition: Reference metadata describe statistical concepts, methodologies for the generation of data and information on data quality. Source: Statistical Data and Metadata Exchange (SDMX) - BIS, ECB, Eurostat, IBRD, IMF and OECD, “Framework for SDMX standards”, Version 1.0, First revision December 2004 Hyperlinks: www.sdmx.org, www.sdmx.info Context: Reference metadata, sometimes generated, collected or disseminated separately from the data to which they refer can be relevant to all instances of data described: entire collections of data, data sets from a given country, or for a data item concerning one country and one year. Preferably, reference metadata should include all of the following: a) "conceptual" metadata, describing the concepts used and their practical implementation, allowing users to understand what the statistics are measuring and, thus, their fitness for use; b) "methodological" metadata, describing methods used for the generation of the data (e.g. sampling, collection methods, editing processes); c) "quality" metadata, describing the different quality dimensions of the resulting statistics (e.g. timeliness, accuracy). Related term: Metadata, statistical 10 Accuracy Accuracy Definition: Accuracy in the general statistical sense denotes the closeness of computations or estimates to the exact or true values as contrasted with precision, which refers to reproducibility. Source: The International Statistical Institute, "The Oxford Dictionary of Statistical Terms", edited by Yadolah Dodge, Oxford University Press, 2003. Hyperlinks: Context: Accuracy refers to the closeness between the estimated value and the (unknown) true value that the statistics were intended to measure (International Monetary Found, "Data Quality Assessment Framework - DQAF Glossary"). Accuracy of data or statistical information is the degree to which those data correctly estimate or describe the quantities or characteristics that the statistical activity was designed to measure. Accuracy has many attributes, and in practical terms there is no single aggregate or overall measure of it. Of necessity, these attributes are typically measured or described in terms of error, or the potential significance of error, introduced through individual major sources of error, e.g. coverage, sampling, non-response, response, processing and dissemination (Statistics Canada," Statistics Canada Quality Guidelines", 3rd edition, October 1998, page 4, available at http://www.statcan.ca/english/freepub/12-539-XIE/12-539-XIE.pdf). Accuracy is the second quality component in the Eurostat Definition. The third element of the IMF definition of quality is "accuracy and reliability". Related term: Quality (Eurostat) Quality (IMF) Error, statistical Reliability (quality) Error of estimation Precision 11 Expected benefits and use The use of MCV terminology would: • support standardisation and consistency of metadata compiled within each institute, when associated to SDMX standards and “key family” descriptions • Facilitate comparisons across geographical entities • Facilitate mapping of different metadata systems, as it can be used independently from any specific metadata model 12 Metadata Common Vocabulary For more info: SDMX: http://www.sdmx.org OECD: http://cs3-hq.oecd.org/scripts/stats/glossary/index.htm CODED: http://forum.europa.eu.int/irc/dsis/coded/info/data/coded/en.htm CIRCA: http://forum.europa.eu.int/Public/irc/dsis/metadata/library 13 Thanks for your attention Marco Pellegrino Eurostat [email protected] Denis Ward OECD [email protected] 14