Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org.

Download Report

Transcript Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation [email protected] http://www.opendatafoundation.org.

Workshop on Metadata Standards and Best Practices
November 19-20th, 2007
Session 1
Leveraging Metadata Standards in RDC
Pascal Heus
Open Data Foundation
[email protected]
http://www.opendatafoundation.org
Outline
• PART 1: General issues & ODaF
– Needs and challenges in statistical data and
metadata management
– Metadata and XML solutions
– Selecting specifications
– Need for tools
– Open Data Foundation
• PART 2: RDC Specific issues
– Metadata in RDCs
– Solutions and benefits
– Tools and ongoing initiatives
• Conclusions / Q&A
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
What is Metadata?
• Common definition: Data about Data
Labeled stuff
Unlabeled stuff
The bean example is taken from: A Manager’s
Introduction to Adobe eXtensible Metadata Platform,
http://www.adobe.com/products/xmp/pdfs/whitepaper.pdf
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Managing data and metadata is challenging!
We are in charge of the
We
want
easy
access
need
to
collect
the
data.
We
support
our
We
have
an
to
high
quality
well
information
from
the
users but also and
need
to
documented
data! it,
producers,
information
protect
our preserve
and
provide access to
respondents!
management
our users!
Academic
problem
Producers
Users
Government
Sponsors
Librarians
Business
Policy Makers
General Public
http://www.opendatafoundation.org
Media/Press
Open Data Foundation – IZA 2007/11
XML to the rescue!
• XML is driving today’s web service oriented
architecture of the Internet and Intranets
• Using XML, we can capture, structure,
transform, discover, exchange, query, edit
and secure metadata and data
• XML is platform & language independent
and can be used by everyone
• XML is both machine and human readable
• XML is non-proprietary, public domain and
many open tools exist
• Domain specific standards are available!
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
XML Technical Overview
Structure
DTD
XSchema
Manage
Transform
Software
XForms
XSL, XSLT
XSL-FO
Capture
XML
Search
Discover
XPath
XQuery
Registries
Databases
Exchange
Web Services
SOAP
REST
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
XML Solutions
Well documented
data, here we
come!
Great, I can provide
public metadata!
XML Specs
Producers
Use our specifications
and
your
happy!
Now
wewill
canbetalk
to
It each
will harmonize
other!
everything.
Academic
Users
Government
Sponsors
Librarians
Business
Policy Makers
General Public
http://www.opendatafoundation.org
Media/Press
Open Data Foundation – IZA 2007/11
Let’s use XML, but….
XML Specs
?
Producers
Which
specifications
should we
adopt?
http://www.opendatafoundation.org
Open Data Foundation
Librarians
Open Data Foundation – IZA 2007/11
Users
How do we do
this? Where are
the tools and
guidelines?
Open Data Foundation (ODaF)
• US Based non-profit organization,
established 2006
• Directors, advisors and managers from
statistical and ICT communities
• Project oriented
• Mission
–
–
–
–
–
Focus on socio-economic data
Adoption of global metadata standards
Coordinated development of open-source tools
Capacity building
Improving data and metadata accessibility and
overall quality
– Operate at the global level
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Selecting XML specifications
• A single specification is not enough!
– XML specifications commonly focus on a specific
area of knowledge and/or set of functionalities
– Cannot answer the needs of all actors
• XML mappings between specifications are
possible
– Information can be converted from one domain to
another and be carried across communities
• Which ones should we use?
– Fit for purpose
– Widely accepted and supported
– Can be mapped to a cross-domain family
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
A suggested set for socio-economic data
• Statistical Data and Metadata Exchange (SDMX)
– Macrodata, time series, indicators, registries
– http://www.sdmx.org
• Data Documentation Initiative (DDI)
– Microdata (surveys, studies)
– http://www.ddialliance.org
• ISO 11179
– Semantic modeling, concepts, registries
– http://metadata-standards.org/11179/
• ISO 19115
– Geography
– http://www.isotc211.org/
• Dublin Core
– Resources (documentation, images, multimedia)
– http://www.dublincore.org
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
The need for Tools
We produce data
not tools! We
don’t have the
expertise.
Producers
We preserve and
disseminate data not
software! We don’t
have the expertise
http://www.opendatafoundation.org
XML Specs
Open Data Foundation
Librarians
Open Data Foundation – IZA 2007/11
We set
specifications
and standards.
Tools are not our
mandate
Users
We use data and
software but we
don’t build tools! We
don’t have the
expertise
The need for Tools
Mandated to develop tools
Provide cross-domain expertise in ICT and statistics
Provide umbrella for coordinated development
Open Data Foundation
Ensure inter-operability
Outline harmonized architecture and environment
Promote open source / maximize reusability
Foster global registries
Resources/Fund raising
…
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
ODaF Vision
• Promote and facilitate the production and use of “open data”
– Public metadata, high quality, fully documented, respondent
protected, easy to find, accessible in accordance to statistical
principles and legislations
• Foster a global harmonized framework
– Facilitate the flow of data and metadata
– Promotes dialog between all stakeholders
Unlock the Data!
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Some Projects & Ideas
• Guidelines for an harmonized architecture
and development environment
• Roadmap for tools development
• XML mappings
• Facility to host development of open source
projects (GForge)
• Provide hosting services for agencies
• Implement registries / catalogs
• Produce training and reference material
• Technical support & capacity building
• Advocacy
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
ODaF partners / clients
•
•
•
•
•
•
•
•
•
Statistical agencies / producers
Data Archives
Academic & Research communities
Standard settings agencies & consortiums
Governmental organizations
International organizations
Open source community
Software developers
IT Vendors
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Growing solutions in a complex environment
XML-DB
Programming
XSLT
XPath
SOAP
Databases
Warehouse
Web
SDMX
GIS
Infrastructure
SAS
Stata
SPSS
Toolkit
XML DDI
TECHNOLOGY
Excel
METADATA
CSPro
ISO 19115
DCMI
Accessibility
Registries
ANALYSIS
DISSEMINATION
DISCOVERY
PRESERVATION
Blaise
ISO 11179
PRODUCTION
USE
Privacy
Legal
Disclosure
SECURITY
Access
SDDS
QUALITY
GDDS
DQAF
What are we concerned with?
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Growing solutions in a complex environment
XML-DB
Programming
XSLT
XPath
SOAP
Databases
Warehouse
Web
SDMX
GIS
Infrastructure
SAS
Stata
SPSS
XML DDI
TECHNOLOGY
Excel
METADATA
DISCOVERY
PRODUCTION
CSPro
USE
CHALLENGE
We need a set of tools that work
together in an harmonized
framework. This requires
coordinated efforts and expertise
from the various communities
http://www.opendatafoundation.org
DCMI
Accessibility
DISSEMINATION
PRESERVATION
Blaise
ISO 19115
Registries
ANALYSIS
Toolkit
ISO 11179
Privacy
Legal
Disclosure
SECURITY
Access
SDDS
QUALITY
GDDS
DQAF
OPEN DATA FOUNDATION
• Provide cross-domain & IT expertise
• Coordinate and support development
• Knowledge sharing
• Capacity Building
• Provide global vision and guidance
Open Data Foundation – IZA 2007/11
Challenges
•
•
•
•
The technology is available today
The right people are available today
The need and the will are there
The real challenges are:
–
–
–
–
–
–
Tools availability
Awareness / Understanding of technology
Change management
Coordination & Guidance
Focused resources and funding
Institutional commitment
• Learn for the past for a better future
• It’s not about data, it’s about people
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Summary
• Managing data and metadata is challenging
– Solutions exist to make it easier and provide
better information to unlock the data
• Adopt a set of specifications that answer
your requirements and can connect across
domains
– DDI, SDMX, ISO 11179, Dublin Core, ISO 19115
• Promote the use and development of open
tools, do not work in isolation, get the
appropriate expertise
– Open Data Foundation
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
PART 2: Metadata & RDCs
PART 2
•
•
•
•
•
RDC metadata perspective
List of stakeholders / initiatives
Benefits of adopting metadata
Challenges
Tools demo (IHSN Toolkit)
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC Objectives
• Provide a secure environment for the
researcher to perform the in depth analysis
of sensitive/confidential data in a cost
effective way
• Facilitate the capture, sharing and
dissemination of research knowledge
• Provide feedback to the producer on data
usage and quality
• Exchange information with other RDC’s /
agencies / public
• Overall: benefit all stakeholders: producers,
librarians, researcher, general public, etc.
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC metadata
• Simple access to data file and codebook is
insufficient. Researcher need high quality
comprehensive metadata and a
collaborative environment to promote
dynamic research
• Traditionally, survey metadata has focused
on archiving/preservation (current DDI 1/2.x)
• This however insufficient and should
extended into both the survey production
process and the secondary use of the data
• New DDI 3.0 meets such requirements
• RDC ideal environment for capture of
researcher metadata
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
DDI 3.0 and the Survey Life Cycle
•
•
•
•
•
A survey is not a static process
It dynamically evolved across time and involves many
agencies/individuals
DDI 2.x is about archiving, DDI 3.0 extends to life cycle
3.0 is a modular framework available for multiple purposes (use cases)
Metadata is key to comprehensive capture of knowledge
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC issues
• Without producer metadata
– researchers can’t work discover data or perform efficient
work
• Without researcher metadata
– producer don’t know about data usage and quality issues
– Other researcher are not aware of what has been done
• Without standards
– Information can’t be properly managed and exchanged
between agencies or with the public
• Without tools:
– Can’t capture and preserve/share knowledge
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
When to capture metadata?
• Metadata must be captured at the time the event occurs!
• Documenting after the facts leads to considerable loss of
information
• This is true for producers and researchers
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC Metadata Framework
1. Producer provide data & basic docs
2. Need to enhance existing metadata
3. Start capturing researcher metadata
4. Knowledge grows and gets reused
5. Provides usage and quality
feedback to producer / RDC
6. Repeat across surveys/topics
7. Metadata facilitates output
8. Public metadata facilitates data
discovery / fosters global knowledge
9. Metadata exchange between agencies
Researcher
Research
Output
Research
Metadata
Producer/Archive
Metadata
Public Use
metadata
Producers
Data
RDC
RDC
RDC
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
External users
Metadata Components
• Producer metadata:
– Codebook, questionnaires, reports, methodologies,
processing, scripts, quality, admin, etc.
• Research metadata
– Recodes, analysis, table, scripts, papers, references, logs,
quality, usage
– Activities, discussions, knowledge base
• Outputs
– Papers, presentations, tables
• Public metadata
– Metadata stripped out of sensitive information (summary
statistics, sensitive variables, etc.)
• Metadata capture can be manual, semi-automated,
automated
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC Solutions
• Metadata management
– Adopt standards and provide researcher with
comprehensive metadata
– Use related tools to capture research process
– Metadata mining and reporting utilities
• Collaborative environment
– Used web technologies to foster a dynamic research
environment
• Connected and Remote enclaves
– Connect RDCs through secure networks
– Consider virtual data enclave or batch analysis
• Data disclosure
– Protect respondent through sound data disclosure
techniques (using metadata as well)
• Train producers/researchers (methods and data)
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Solution Examples
• Simple solutions: use good practices
– File and variable naming conventions, sound
statistical methods
– Comment source code
– Document the work
• Metadata solutions:
– DDI tools, citation database, source code level
metadata capture, variable recodes, table
disclosure, data quality feedback, comparability
• Web based collaboration environment
– Wiki, blogs, discussion groups, events/todo
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Benefits (1)
• Comprehensive data documentation
– Through good metadata practices,
comprehensive documentation is available to the
researchers
• Preservation, integration and sharing of
knowledge
– Research process is captured and preserved in
harmonized formats
– Research knowledge becomes integrant part of
the survey and available to all
– Reduce duplication of efforts and facilitates reuse
– Producer gets feedback from the data users
(usage, quality issues)
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Benefits (2)
• Research outputs and dissemination
– Facilitate production of research outputs
– Facilitate dissemination and fosters broader
visibility of research outputs
• Exchange of information
– Metadata exchange between RDC, producers,
librarians
– Importance of public metadata for sensitive
datasets
– Facilitate data discovery (inside and outside
RDC)
• Advanced metadata mining / comparability
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Answering the tools challenge
• Metadata standards are available but there
is a lack of tools for metadata management
• Several efforts are ongoing
– DDI Alliance, International Household Survey
Network, UK Data archive, NORC Data Enclave,
Canada RDC, Open Data Foundation
– DDI Foundation Tools Program, UK DExT,
Canada RDC, EU Framework 7
• Joint efforts will minimize costs, maximize
reusability and foster tool harmonization /
interoperability
• Open source model: availability &
sustainability
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
RDC challenges
• Adopting good metadata management
framework takes effort
– Survey metadata must first be compiled
– ICT capacity building and tools development
– Producer and researchers need to be trained
• Not only a technological challenge
– change management, training
• Leads to better research, shared knowledge,
better user/producer dialog, improved data
quality
• Meets the mandate of RDC
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
IHSN Toolkit Quick Demo
1
Import data and compile metadata
3
2
Import metadata and prepare CD-ROM
http://www.opendatafoundation.org
Open Data Foundation – IZA 2007/11
Generate HTML based CD-ROM