Transcript Slide 1

Managing the Metadata Lifecycle
The Future of DDI at GESIS and ICPSR
Peter Granda, ICPSR
Meinhard Moschner, GESIS
Mary Vardigan, ICPSR
Joachim Wackerow, GESIS
Wolfgang Zenk-Möltgen, GESIS
Research Data Life Cycle
Archiving
Concept
Collection
Processing
Distribution
Discovery
Repurposing
Analysis
Current Uses of DDI
• DDI 2 used for many different purposes by many
different archival institutions, e.g., metadata records for
data catalogs, export to Web-based information systems
such as Nesstar, long-term preservation, and PDF
codebooks
• GESIS and ICPSR are developing procedures and
systems to extend use of DDI in their institutions
DDI 3 Expands in Scope
• To date use mainly limited to Distribution and
Archiving stages of data life cycle
• DDI 3 enables use of new elements and structures to
extend markup to other stages of the life cycle - both
earlier and later
• Emphasis is on projects and tasks already in process at
each institution
DDI 3 Use at GESIS
•
•
•
•
Structured Comments – Processing
Translation of EVS Questionnaire – Collection
Supporting Enhanced Publications – Analysis
Continuity Guides: Trends by Concepts – Concept,
Discovery, Repurposing
Extracting structured
information in current workflow
• Example: building derived variables by SPSS
• SPSS setups contain commands and comments
• Necessary steps for using SPSS setups as information
source for DDI
– Improving comments for automated extraction
• formalize layout
• add keywords from a list
– Extraction of structured comments and related commands by
custom tool.
– Transformation of this information into DDI 3 fragments
Extracting structured
information in current workflow
***v* Variables/DerivedVariables
* DESCRIPTION
*
This section is on derived variables;
***.
***v* DerivedVariables/w101_new
* NAME
*
w101_new
* DESCRIPTION
*
w101_new is a derived variable from w101;
*
It has the original value from w101
*
when w102 is equal 1
*
otherwise it has the value 5;
* USED VARIABLES
*
w101, w102
* SOURCE
**.
compute w101_new = 5 .
if ( w102 = 1 ) w101_new = w101 .
**
* VERSION
*
2009-04-18
* AUTHOR
*
Achim Wackerow
* EMAIL
*
[email protected]
***.
Report (HTML)
Extractor
DDI 3 fragments
GenerationInstruction
Description
Command
SPSS
Result
Translation of EVS Questionnaire
DSDM
http://zacat.gesis.org
Supporting Enhanced Publications
DDI Alliance
Publications with References to Data:
DDI 3.1 URN contains:
Agency
Object
Version
Publication with
References
(URNs)
http://resolve.gesis.org
find object
return URL
http://www.gesis.org/doc/docxyz
URL of
Documentatio
n and/or Data
<urn:ddi:3_1:VariableScheme.Variable=gesis.de.ddi:ZA3811_VarSch(1_0).V8(1_0)>
Supporting Enhanced Publications
DSDM DDI 3 EPE Simple Export Wizard 1.2.0
Grouping Trends
• Continuity guides in different contexts
– Synoptical question / variable lists
– Documentation of changes in question wording / answer scales
• Systematic organization by conceptual categories
– CodebookExlorer tool (relational DB)
– Publication as html links on variable level in ZACAT
• Taking advantage of DDI3 in the future
– Defining the standard and comparison
– Qualifying relations (e.g. q-text modified, scale modified,…)
Continuity guides
Literal question text over time
Conceptual categories
Deviations in answer categories
Trends by concepts
Trend variables by study
Conceptual categories
Country 1
Country 2
DDI3 RESOURCE
„Ex-post Standard“
Universe
Concept
Comparison map
 Equivalency
 Relationship
 Description
Data Collection
STUDY UNIT 1 … n
DataCollection
<dc:QuestionScheme id="QS">
<dc:QuestionItem id="Q">
<dc:QuestionText>
<dc:LiteralText>
<dc:Text>Do you …?</dc:Text>
</dc:LiteralText> …
<dc:CodeDomain>
<r:CodeSchemeReference>
<r:ID>CODS1</r:ID>
</r:CodeSchemeReference>
Logical Product
<l:CategoryScheme id="CATS1">
<l:Category id="Cat1">
<r:Label>often</r:Label>
…
<l:CodeScheme id="CODS1">
<l:CategorySchemeReference>
<r:ID>CATS1</r:ID>
</l:CategorySchemeReference>
<l:Code isDiscrete="true">
<l:CategoryReference>
<r:ID>Cat1</r:ID>
</l:CategoryReference>
<l:Value>1</l:Value>
</l:Code> …
<dc:QuestionScheme id="QS">
<dc:QuestionItem id="Qn">
…
<dc:Text>Have you …?</dc:Text>
…
LogicalProduct
Label
<>identical<>
Values
<>different>>
<>generation instruction<>
<>scale reversed<>
<l:CategoryScheme id="CATS1">
<l:Category id="Cat1">
<r:Label>often</r:Label>
…
<l:CodeScheme id="CODS1">
…
<l:Code isDiscrete="true">
<l:CategoryReference>
<r:ID>Cat1</r:ID>
</l:CategoryReference>
<l:Value>4</l:Value>
</l:Code> …
GROUP
STUDY UNIT 8-14
DataCollection
…
GROUP
LogicalProduct
STUDY
UNIT 15-x
…
DataCollection
…
LogicalProduct
…
DDI 3 Use at ICPSR
• Information collected from data producers in precollection phase – Concept
• Metadata output from CAI applications – Data Collection
• Processor‘s dashboard – Data Processing
• Metadata mining: New faceted search tool to facilitate
discovery through more precise searching – Data
Discovery
• Relational database for comparison and harmonization
across studies – Repurposing
SMDS Metadata Modules
OAIS
AIP
Repurposing
SIP
 A combination of this information forms a

The
structured
metadata
combined
traditional
SIP. with
Concept  An AIPCollection
Processing
mustdata
be specially
built,
because
the
metadata
forms the core ofInformation
the archive.
from each life cycle stage can include just
to other reused
metadata.
 Itreferences
would be organised
in
a
way
where
sent to the archive - can be understood as
 An AIP should
includecan
everything
of one
DDI can
metadata
be reused
and study,
information
can
dynamic
SIP.
be
also
the
main
structure
of
the
AIP.
Data
can
be
inline
Custom Tools
CAI be
Tools
Information
ingested
and distributed
infrom
a dynamic
extracted
Self-archiving
by web forms can be offered
in
DDI.
An
AIP
would
exist
beside
the
core
structure
in
(e.g. Forms-based)
MQDS
etc.
SPSS
etc.
way.
for the different stages.
the archive.
 An easy roundtrip should be possible between the core
structure and the AIP.
 The purpose of the AIP is comparable to PDF/A where
all fonts are included.
DDI as
backbone
forisstructured
metadata
 The
core structure
headed to efficient
processing
Archive
and reuse of metadata.
Data / Documents outside of DDI
DIP
Distribution Packages
Web information system
Search engines.
Distribution
Statistical packages
Online Analysis.
Discovery
Analysis
DDI-based archive as collection of reusable components
•
•
Metadata in DDI is structured in small items which can be identified and
maintained by one or more institutions
These parts can be
– the basis for comparison and metadata mining (discovery of new
relationships)
– a candidate for reuse in other studies or new studies (like standard
questions or variables)
Study 1
Study-specific information
Items for reuse
Study 1
Study-specific information
Items for reuse
New study
Repository of
reusable components
 Standard concepts
 Standard questions
 Standard variables
 Harmonized information
 Controlled vocabularies
Issues for Discussion
• Advantages and disadvantages of seeking to capture
additional metadata throughout the data life cycle
• How much information to make available to funding
agencies, data producers, and secondary users?
• Rules for structured documentation and delivery of items
to archives for preservation
• An overall DDI tool to capture and curate all metadata
and data – the Holy Grail???