Session 2 - The Open Data Foundation

Download Report

Transcript Session 2 - The Open Data Foundation

Workshop on Metadata Standards and Best Practices
November 19-20th, 2007
Session 2
Metadata specifications for socio-economic
science and supporting initiatives
Pascal Heus
Open Data Foundation
[email protected]
http://www.opendatafoundation.org
Outline
•
•
•
•
Metadata specifications
Key players
Ongoing initiatives
Conclusions / Q&A
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
What is Metadata?
• Common definition: Data about Data
Labeled stuff
Unlabeled stuff
The bean example is taken from: A Manager’s
Introduction to Adobe eXtensible Metadata Platform,
http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
What are XML specifications? (1)
• XML is a language that facilitate the capture
of descriptive elements and attributes
• Different objects carry different
characteristics (book, car, weather)
• We need to agreed on common set of
descriptive elements (semantic)
• Just like we used to design database, we
have to describe the structure
• This modeling process creates a Document
Type Definition (DTD) or an XML Schema
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
What are XML specifications? (2)
• Specifications are made available to the
general public on the web
– Usually a URL
• Can be turned into a “standard” (ISO)
• Typically maintained by a consortium of
agencies
– Independent model
– OASIS, W3C
– ISO
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
A suggested set for socio-economic data
• Statistical Data and Metadata Exchange (SDMX)
– Macrodata, time series, indicators, registries
– http://www.sdmx.org
• Data Documentation Initiative (DDI)
– Microdata (surveys, studies)
– http://www.ddialliance.org
• ISO 11179
– Semantic modeling, concepts, registries
– http://metadata-standards.org/11179/
• ISO 19115
– Geography
– http://www.isotc211.org/
• Dublin Core
– Resources (documentation, images, multimedia)
– http://www.dublincore.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Statistical Data and Metadata Exchange (SDMX)
• Purpose: Exchange of statistical information (time
series/indicators).
– Covers the metadata capture as well as implementation of
registries.
– Currently version 2.0 and also an ISO standard
(17369:2005)
• Sponsors: Bank for International Settlements (BIS),
European Central Bank (ECB), EUROSTAT,
International Monetary Fund (IMF), Organization for
Economic Cooperation and Development (OECD),
United Nations (UN), World Bank
• Can actually be used for many other purposes. It’s a
metadata metadata model.
• http://www.sdmx.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Data Documentation Initiative 1/2.x
• Purpose: Archive and document survey
microdata
– Effort to establish an international XML-based
standard for the content, presentation, transport,
and preservation of documentation for datasets
in the social and behavioral sciences
– Sections: document, survey, files, variables, other
material
– Used by data archives (producers) and librarians
• Sponsors: DDI Alliance
• http://www.ddialliance.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Data Documentation Initiative 3.0
• Purpose: Document the survey life cycle
– Major shift from DDI 1/2.x
– Currently in candidate recommendation, release
in 2008
• Sponsors: DDI Alliance
• http://www.ddialliance.org/ddi3
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
DDI & SDMX
• Are complementary specifications
• DDI 3.0 and SDMX 2.0 have been designed
to work with each other
– SDMX registries can wrap DDI documents
– Microdata: single point in time / geography, high
level of details (for statisticians, researchers)
– Macrodata: high level indicators across time and
geography (fro economists, policy makers)
– Using DDI+SDMX allows linkages and drilling
down from indicator to its source
• See "DDI and SDMX: Complementary, Not Competing,
Standards", A. Gregory, P. Heus, July 2007 available at
http://www.opendatafoundation.org/?lvl1=resources&lvl2=pape
rs
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
ISO 11179
• Purpose: Manage registries / concepts
– international standard for representing metadata
for an organization in a Metadata Registry (a
central location in an organization where
metadata definitions are stored and maintained in
a controlled method)
– Compliance with this standard is important and
both DDI 3.0 and SDMX have mapping
mechanisms
• Sponsors: ISO/IEC Joint Technical
Committee on Metadata Standards
• http://metadata-standards.org/
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
ISO 19115
• Purpose: Capture geography
– It is a component of the series of ISO 191xx
standards for Geospatial metadata.
– ISO 19115 defines how to describe geographical
information and associated services, including
contents, spatial-temporal purchases, data
quality, access and rights to use.
– Compliance in DDI 3.0
• Sponsors: ISO/TC 211 Geographic
information/Geomatics
• http://www.isotc211.org/
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Dublin Core
• Purpose: describe resources
– standard for cross-domain information resource
description
– widely used to describe digital materials such as
video, sound, image, text, and composite media
– Small sore set of elements
– Used for survey documentation
• Sponsors: Dublin Core Metadata Initiative
• http://dublincore.org/
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Advantages of XML metadata
• Metadata is easy to transform
– From one standard to another or into different
format
• DDI to SDMX, Dublin Core, MARC
– To other formats fro presentation
• HTML, PDF
• Metadata is easy to exchange
– Web services (SOAP, REST, etc.)
• Metadata is searchable
– XPath, XQuery
• All these are native feature of XML
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
PART 2
Active agencies and ongoing initiatives
DDI Alliance
• Membership based organization
– Agencies: ICPSR, World Bank, Open Data
Foundation
– National data archives: Danish, Finish, Dutch,
Norway, Swiss, UK
– Germany: Centre for Survey Research and
Methodology (ZUMA), German Socio-Economic
Panel Study (SOEP), Zentralarchiv fuer
Empirische Sozialforschung (University of Koeln)
– Universities: Alberta, Berkeley, Guelph,
Harvard/MIT, Minnesota, etc.
• Steering and Expert Committee
• Meets annually at IASSIST
• http://www.ddialliance.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
ICPSR
• The Interuniversity Consortium for Political
and Social Research
• The world's largest archive of digital social
science data
– Acquire and preserve social science data
– Provide open and equitable access to these data
– Promote effective data use
• Home of the DDI Alliance
• http://www.icpsr.umich.edu
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
International Household Survey Network
• Partnership of international organizations seeking to
improve the availability, quality and use of survey
data in developing countries
• United Kingdom Department for International
Development (DfID), * International Labor
Organization (ILO), Partnership for Statistics in the
21st Century (PARIS21), United Nations Children
Fund (UNICEF), United Nations Statistics Division
(UNSD), World Health Organization and the Health
Metrics Network (WHO/HMN), World Bank
• Plays a major role in the adoption of DDI around the
globe, active in many developing countries
• Developer of the Microdata Management Toolkit
• http://www.surveynetwork.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Open Data Foundation
• US based non-profit organization
• Adoption of global metadata standards and
the development of open-source solutions
promoting the use of statistical data
• Coordination of development efforts
• Board of directors, advisors and
management group
• Open to individual membership, institutional
association is through projects
• http://www.opendatafoundation.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Metadata Technology
• UK based private company
• Consulting services and development of
tools based on open standards and open
source
• Training services, registry services,
metadata repositories, hosting
• Focus on SDMX, DDI and related standards
• http://www.metadatechnology.com
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
IASSIST
• International Association for Social Science
Information Service & Technology
• IASSIST is an international organization of
professionals working in and with information
technology and data services to support
research and teaching in the social sciences.
• Individual based membership
• Primary platform for DDI community
• Annual conference
– 2008: Stanford, CA, 2009: Tampere, Finland
– DDI Alliance annual meeting
• http://www.iassistdata.org/
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
DDI Foundation Tools Program
• Initiative aiming at the development of a
Foundation Framework and a Toolkit to
support the implementation of DDI
applications and utilities (open source)
• MOU established September 2007, 2-year
program (renewable on a annual basis
afterwards)
• Canada Research Data Centre Network,
Danish Data Archive, DDI Alliance, GESISZUMA, National Opinion Research Center
(NORC), Open Data Foundation (ODaF),
and the UK Data Archive (UKDA)
• Web site coming soon
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
UKDA Data Exchange Tools (DExT)
• Aim to develop, refine and test models for
data exchange for both survey data and
qualitative research data based on
XML/RDF schema and will develop tools for
data import and export
• Research the feasibility of developing
automated conversion procedures for legacy
formats
• ODaF currently involved in data conversion
tool and qualitative metadata (QuDExT)
• http://www.data-archive.ac.uk/dext/
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
NORC Data Enclave
• National Opinion Research Center
• provides a secure environment within which
authorized researchers can access sensitive
microdata remotely from their offices or
onsite
• Data from National Institute for Standards
and Technology’s (NIST) Technology
Innovation Program (TIP), the Ewing Marion
Kauffman Foundation, and the Economic
Research Service at the US Department of
Agriculture
• Possibly the first virtual data enclave
• http://dataenclave.norc.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Canada RDC Project
• Consists of 14 Research Data Centres
Centres, 6 branch RDCs and the Federal
Research Data Centre in Ottawa
• Data provided by Statistics Canada
• RDC are now connected through a high
speed secure network
• Project to adopt a DDI 3.0 based metadata
framework for survey documentation and
research work and sponsor development of
tools
• ODaF providing technical assistance
• http://www.statcan.ca/english/rdc/index.htm
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
EU 7th Research Framework Program
• Under Socio-economic Sciences and Humanities –
related specific 2007 objectives: to bring together
existing research infrastructures to support the
efficient provision of essential research services
• INFRA-2008-1.1.2.27: promoting European wide
access to microdata sets of official statistics for
research and leading to a European statistical
system open to researchers.
– INFRA-2008-1.1.2.28 (through the development,
harmonisation and optimal use of indicators and data for
economic and innovation research)
– INFRA-2008-1.1.2.29 (Developing improved access to
historical archives and cultural collections for research
purpose).
• Call coming out this month (due mid-Feb)
• Proposal will be made for RDC networking/remote
access, data disclosure and metadata (Germany
contact is Stefan Bender at IAB Nurnberg RDC)
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Conclusions
• Metadata specifications available but need
tools
• Lost of complementary ongoing initiatives
and potential synergies
• Need coordination and partnerships (ODaF)
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11