Transcript Slide 1

3rd International Digital Curation Conference
Washington, DC, Dec 2007
Paper Presentations: Interoperability, Metadata & Standards
Data Documentation Initiative:
Toward a Standard for the Social Sciences
Mary Vardigan, Pascal Heus, Wendy Thomas
ICPSR/University of Michigan / Open Data Foundation / Minnesota Population Center
[email protected] / [email protected] / [email protected]
What is Metadata?
• Common definition: Data about Data
Unlabeled stuff
Labeled stuff
The bean example is taken from: A Manager’s
Introduction to Adobe eXtensible Metadata Platform,
http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf
DDI Alliance – http://www.ddialliance.org
Managing data and metadata is
challenging!
We are in charge of the
We
want
easy
access
need
to
collect
the
data.
We
support
our
We
have
an
to
high
quality
well
information
from
the
users but also and
need
to
documented
data! it,
producers,
information
protect
our preserve
and
provide access to
respondents!
management
our users!
Academic
problem
Producers
Users
Government
Sponsors
Librarians
Business
Policy Makers
General Public
Media/Press
DDI Alliance – http://www.ddialliance.org
Metadata issues
• Without producer / archive metadata
– researchers can’t work discover data or perform efficient
analysis
• Without researcher metadata
– Research process is not documented and cannot be
reproduced (Gary King  replication standard!)
– Other researchers are not aware of what has been done
(duplication / lack of visibility)
– Producer don’t know about data usage and quality issues
• Without standards
– Such information can’t be properly managed and
exchanged between actors or with the public
• Without tools:
– We can’t capture, preserve or share knowledge
DDI Alliance – http://www.ddialliance.org
XML to the rescue!
• XML stands for eXtensible Markup Language
• Technology that is driving today’s web service
oriented architecture of the Internet and Intranets
• Using XML, we can capture, structure, transform,
discover, exchange, query, edit and secure
metadata and data
• XML is platform & language independent and can
be used by everyone
• XML is both machine and human readable
• XML is non-proprietary, public domain and many
open tools exist
• Domain specific standards are available!
DDI Alliance – http://www.ddialliance.org
Suggested XML metadata specifications for
socio-economic data
• Statistical Data and Metadata Exchange (SDMX)
– Macrodata, time series, indicators, registries
– http://www.sdmx.org
• Data Documentation Initiative (DDI)
– Microdata (surveys, studies)
– http://www.ddialliance.org
• ISO 11179
– Semantic modeling, concepts, registries
– http://metadata-standards.org/11179/
• ISO 19115
– Geography
– http://www.isotc211.org/
• Dublin Core
– Resources (documentation, images, multimedia)
– http://www.dublincore.org
DDI Alliance – http://www.ddialliance.org
The Data Documentation Initiative (DDI)
• International XML based specification for the
documentation of social and behavioral data
– Started in 1995, now driven by DDI Alliance (30+ members)
– Became XML specification in 2000 (v1.0)
– Current version is 2.1 with focus on archiving
(survey/codebook)
• New Version 3.0 (2008)
– Focus on entire survey “Life Cycle”
– Provide comprehensive metadata on the entire survey
process and usage
– Aligned on other metadata standards (DC, MARC, ISO
11179, SDMX, …)
– Include machine actionable elements to facilitate
processing, discovery and analysis
• DDI is being adopted by producers/archives but
needs to extends to the researchers (who are using
the data!)
DDI Alliance – http://www.ddialliance.org
DDI 3.0 and the Survey Life Cycle
•
•
•
•
•
A survey is not a static process: It dynamically evolved across time and
involves many agencies/individuals
DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”
3.0 focus on metadata reuse (minimizes redundancies/discrepancies,
support comparison)
Also supports multilingual, grouping, geography, and others
3.0 is extensible
DDI Alliance – http://www.ddialliance.org
Metadata Components
• Producer metadata:
– Codebook, questionnaires, reports,
methodologies, processing, scripts, quality,
admin, etc.
• Research metadata
– Recodes, analysis, table, scripts, papers, logs,
data quality, usage
– Citations, references
– Activities, discussions, knowledge base
• Outputs
– Papers, presentations, tables, reports
DDI Alliance – http://www.ddialliance.org
When to capture metadata?
• Metadata must be captured at the time the event
occurs! (not after the facts)
• Documenting after the facts leads to considerable
loss of information
• This is true for producers and researchers
DDI Alliance – http://www.ddialliance.org
Solutions?
• Simple solutions: use good practices
– File and variable naming conventions, sound
statistical methods (metadata in names!)
– Comment source code
– Document your work
• Adopt DDI & other standard based metadata
solutions:
– DDI tools, citation database, source code level
metadata capture, variable recodes, table
disclosure, data quality feedback, comparability
• Take advantage of web based collaborative
tools
– Wiki, blogs, discussion groups, lists
DDI Alliance – http://www.ddialliance.org
Benefits
• Comprehensive data documentation
– Through good metadata practices, comprehensive
documentation captured by producers, librarians and users
is available to ALL researchers
• Preservation, integration and sharing of knowledge
– Research process is captured and preserved in standard
formats
– Research knowledge becomes integrant part of the survey
and available to all
– Reduce duplication of efforts and facilitates reuse
– Producer gets feedback from the data users (usage, quality
issues), which lead to better and more relevant data
• Research outputs and dissemination
– Facilitate production of research outputs
– Facilitate dissemination and fosters broader visibility of
research results
DDI Alliance – http://www.ddialliance.org
Conclusions
• Metadata is a crucial component of social and
behavioral science
• The Data Documentation Initiative (DDI) is a globally
accepted specification for capturing microdata
documentation and knowledge
• Latest version 3.0 extends into the entire survey Life
Cycle
• Producers and data archives are rapidly adopting
metadata standards.
• This adoption process should extend into the
research community
• Best practices in data and metadata management
benefit all users and have the potential to change
the way we conduct research
• http://www.ddialliance.org or [email protected]
DDI Alliance – http://www.ddialliance.org