ODaF Europe 2008 Colchester, UK, April 14-15, 2008 DDI Landscape Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org.

Download Report

Transcript ODaF Europe 2008 Colchester, UK, April 14-15, 2008 DDI Landscape Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org.

ODaF Europe 2008
Colchester, UK, April 14-15, 2008
DDI Landscape
Pascal Heus
Open Data Foundation
[email protected]
http://www.opendatafoundation.org
Background
• Concept of DDI and definition of needs grew out of
the data archival community
• Established in 1995 as a grant funded project
initiated and organized by ICPSR
• Members:
– Social Science Data Archives (US, Canada, Europe)
– Statistical data producers (including US Bureau of the
Census, the US Bureau of Labor Statistics, Statistics
Canada and Health Canada)
• February 2003 – Formation of DDI Alliance
– Membership based alliance
– Formalized development procedures
http://www.opendatafoundation.org
DDI Timeline / Status
•
2000 – DDI 1.0
– Simple survey
– Archival data formats
– Microdata only
•
2003 – DDI 2.0
•
– Presentation of first complete
3.0 model
– Internal and public review
•
2004 – Acceptance of a new
DDI paradigm
•
2005
– Presentation of schema
structure
– Focus on points of metadata
creation and reuse
2008
–
–
–
–
– Lifecycle model
– Shift from the codebook centric
/ variable centric model to
capturing the lifecycle of data
– Agreement on expanded areas
of coverage
•
2007
– Vote to move to Candidate
Version (CR)
– Establishment of a set of use
cases to test application and
implementation
– October 3.0 CR2
– Aggregate data (based on
matrix structure)
– Added geographic material to
aid geographic search systems
and GIS users
•
2006
•
February 3.0 CR3
March 3.0 CR3 update
April 3.0 CR3 final
May: anticipated vote to
publish DDI 3.0 at DDI Meeting
(after IASSIST)
2009
– DDI 2.2?
– DDI 3.1?
http://www.opendatafoundation.org
DDI 3.0 and the Survey Life Cycle
•
•
•
•
•
A survey is not a static process: It dynamically evolved across
time and involves many agencies/individuals
DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”
3.0 focus on metadata reuse (minimizes
redundancies/discrepancies, support comparison)
Also supports multilingual, grouping, geography, and others
3.0 is extensible
http://www.opendatafoundation.org
DDI User Base (1)
• National Statistical Offices
• Line Ministries and other governmental
agencies
• Data archives and libraries world-wide
• Research data centers
• Health Canada, Statistics Canada, HRSDC
Canada
• Transport for London, Gallup-Europe
• Etc.
http://www.opendatafoundation.org
DDI User Base (2)
• International Household Survey Network (IHSN)
–
–
–
–
Major international organizations involved
Coordination of activities
Adopted DDI 1/2.x as standard
Developed the Microdata Management Toolkit and related tools /
guidelines
– http://www.surveynetwork.org
• Accelerated Data Program (ADP)
– World Bank / Paris 21
– Implement IHSN activities in developing countries
• Task 1. Documentation and dissemination of existing survey microdata.
– Has introduced DDI in national statistical agencies in over 50 countries
– http://www.surveynetwork.org/adp
http://www.opendatafoundation.org
DDI Alliance
• Membership based organization
– Agencies: ICPSR, World Bank, Open Data Foundation
– National data archives: Danish, Finish, Dutch, Norway,
Swiss, UK
– Germany: Centre for Survey Research and Methodology
(ZUMA), German Socio-Economic Panel Study (SOEP),
Institute for Study of Labor (IZA), Zentralarchiv fuer
Empirische Sozialforschung (University of Koeln)
– Universities: Alberta, Berkeley, Guelph, Harvard/MIT,
Minnesota, etc.
• Steering and Expert Committee
• Meets annually at IASSIST
• http://www.ddialliance.org
http://www.opendatafoundation.org
ICPSR
• The Interuniversity Consortium for Political
and Social Research
• One of the world's largest archive of digital
social science data
– Acquire and preserve social science data
– Provide open and equitable access to these data
– Promote effective data use
• Home of the DDI Alliance
• http://www.icpsr.umich.edu
http://www.opendatafoundation.org
International Household Survey Network
• Partnership of international organizations seeking to
improve the availability, quality and use of survey
data in developing countries
• Steering Committee:
– United Kingdom Department for International Development
(DfID), International Labor Organization (ILO), Partnership
for Statistics in the 21st Century (PARIS21), United Nations
Children Fund (UNICEF), United Nations Statistics Division
(UNSD), World Health Organization and the Health Metrics
Network (WHO/HMN), World Bank
• Plays a major role in the adoption of DDI around the
globe, active in many developing countries
• Developer of the Microdata Management Toolkit
• http://www.surveynetwork.org
http://www.opendatafoundation.org
Open Data Foundation
• US based non-profit organization
• Adoption of global metadata standards and
the development of open-source solutions
promoting the use of statistical data
• Coordination of development efforts
• Board of directors, advisors and
management group
• Open to individual membership, institutional
association is through projects
• http://www.opendatafoundation.org
http://www.opendatafoundation.org
Metadata Technology
• UK based private company
• Consulting services and development of
tools based on open standards and open
source
• Training services, registry services,
metadata repositories, hosting
• Focus on SDMX, DDI and related standards
• http://www.metadatechnology.com
http://www.opendatafoundation.org
IASSIST
• International Association for Social Science
Information Service & Technology
• IASSIST is an international organization of
professionals working in and with information
technology and data services to support research
and teaching in the social sciences.
• Individual based membership
• Primary platform for DDI community
• Annual conference
– 2008: Stanford, CA, 2009: Tampere, Finland
– DDI Alliance annual meeting
• http://www.iassistdata.org/
http://www.opendatafoundation.org
DDI Foundation Tools Program
• Initiative aiming at the development of a Foundation
Framework and a Toolkit to support the
implementation of DDI applications and utilities
(open source)
• MOU established September 2007, 2-year program
(renewable on a annual basis afterwards)
• Canada Research Data Centre Network, Danish
Data Archive, DDI Alliance, GESIS-ZUMA, National
Opinion Research Center (NORC), Open Data
Foundation (ODaF), and the UK Data Archive
(UKDA)
• Web site coming soon
http://www.opendatafoundation.org
UKDA Data Exchange Tools (DExT)
• Aim to develop, refine and test models for data
exchange for both survey data and qualitative
research data based on XML/RDF schema and will
develop tools for data import and export
• Research the feasibility of developing automated
conversion procedures for legacy formats
• Collaborative efforts underway (w/ODaF) for data
conversion tool (DExT) and qualitative metadata
(QuDExT)
• http://www.data-archive.ac.uk/dext/
http://www.opendatafoundation.org
NORC Data Enclave
• National Opinion Research Center
• Provides a secure environment within which
authorized researchers can access sensitive
microdata remotely from their offices or onsite
• Data from National Institute for Standards and
Technology’s (NIST) Technology Innovation
Program (TIP), the Ewing Marion Kauffman
Foundation, and the Economic Research Service at
the US Department of Agriculture
• Virtual data enclave
• Using DDI and exploring innovative methods to link
producer and researcher knowledge (collaborative
spaces, source code analysis, researcher provided
metadata)
• Technical support by ODaF
• http://dataenclave.norc.org
http://www.opendatafoundation.org
Canada RDC Project
• Consists of 14 Research Data Centres Centres, 6
branch RDCs and the Federal Research Data
Centre in Ottawa
• Data provided by Statistics Canada
• RDC are now connected through a high speed
secure network
• Project to adopt a DDI 3.0 based metadata
framework for survey documentation and research
work and sponsor development of tools
• ODaF providing technical assistance
• http://www.statcan.ca/english/rdc/index.htm
http://www.opendatafoundation.org
EU 7th Research Framework Program
• Under Socio-economic Sciences and Humanities – related
specific 2007 objectives: to bring together existing research
infrastructures to support the efficient provision of essential
research services
• INFRA-2008-1.1.2.27: promoting European wide access to
microdata sets of official statistics for research and leading to a
European statistical system open to researchers.
– INFRA-2008-1.1.2.28 (through the development, harmonization
and optimal use of indicators and data for economic and
innovation research)
– INFRA-2008-1.1.2.29 (Developing improved access to historical
archives and cultural collections for research purpose).
• European Access to Statistical Information (EURASI)
– proposal was completed end of February for European RDC
networking/remote access, data disclosure and metadata
– Netherlands, Italy, Germany, Spain, Slovenia, Sweden, Hungary,
Austria, Swiss, Denmark, UK, Bulgaria
http://www.opendatafoundation.org
Other DDI Projects
• GESIS-ZUMA
– Tools for mapping from SPSS and SAS save files to DDI 3.0
metadata
– Requires a copy of SPSS for those transforms
• CASES
– Has said they will support DDI 3.0
• Algenta
– New company producing survey design tools using the DDI 3.0
model as their internal data structure
• Blaise
– Currently looking at DDI 3.0
– Was involved in its creation
• CSPro
– Has support for earlier version of DDI 3.0 (Public Comment)
– Integration performed by a third party
• Nesstar
– Currently evaluating DDI 3.0
• ???
http://www.opendatafoundation.org
DDI Editor for archivists?
• Currently gathering requirements / wish list
• Likely to start with light, entry level editor
(based on Flex) then move towards robust
products (but likely on a case by case /
project basis)
• For on archiving (DDI 1/2/3), different editors
will need to be developed fo other purposes
(but could use same codebase)
http://www.opendatafoundation.org
Editor for Archivist Wish List
•
•
•
•
•
•
User Friendly: platform
independent, multiple languages,
No knowledge of XML required
Metadata import: Read from data
files, instrument design tools, DDI
Data import: Read common
formats & Nesstar, save to ASCII
(preservation)
Metadata template: DDI Profile
Metadata editing: Survey groups
(catalogs), Survey description, File
description and relationships,
Variable-level metadata, Variable
groups, Cubes?, ability to highlight
text in an untagged document and
tag it (external utility?)
Metadata validation: Internal
validation based on DDI 3.0 parser,
Template-based validation, Pluginbased validation (external
transforms), Ability to check
spelling in various languages for all
free text fields
•
•
•
•
•
•
Repositories: Concepts,
Classifications, Universes,
Variables / Questions (with
definition, question, interviewer
instructions)
Change tracking: versioning,
reviewer's comments
Metadata export: DDI 3.0, DDI
1/2.x (1.2.2 for backward
compatibility with Nesstar),
Mappings to Dublin Core, MARC,
SDMX, etc.
Metadata reporting: Generic
facility to produce XSLT-based
reports (fully customizable), Ability
to schedule reporting, Option to
disseminate report output through
email, PDF, FTP, etc.
Data export: Write ASCII data files
+ setup files for various software
packages
Extensions: Ability to add
extension plugins (Nesstar?) or call
external tools for processing
http://www.opendatafoundation.org