Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, [email protected] for UNECE Training Workshop on Census Technology for SPECA member countries,

Download Report

Transcript Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, [email protected] for UNECE Training Workshop on Census Technology for SPECA member countries,

Documenting and disseminating
census and survey data sets
Ilpo Survo, United Nations ESCAP, Bangkok,
[email protected]
for
UNECE Training Workshop on Census Technology for
SPECA member countries, Astana, 7-8 June 2007
Content
A. Systematic documenting of census data sets
B. Why to disseminate microdata?
C. Microdata Management Toolkit
A. Systematic documenting of census
data sets
A good census dataset is..
• Documented clearly
• Contains no surprises
• Allows users to
– Start working effectively quickly
– Find the data they are interested in
– Understand what the data are measuring and how the data
have been created
– Assess the quality of the data
Evolving documentation technology
• Own documentation standards => International
metadata standards
• National practices => International good practices.
• Ad hoc tools => Structuring tools, databases
• Text-based codebooks => XML-based codebooks
Maintain metadata in a centralised database
• Manage definitions, methodology information,
variable information, data collection information in
one place
• Ensures consistency across data holdings
• Approach useful for planning, data collection,
processing, analysis and dissemination
Good practices in data documentation
• Explanatory material
– Minimum material required to ensure the long-term viability and
functionality of a dataset
• Contextual information
– Material about the context in which the data was collected, and how
it was put to use
– Enables the secondary user to fully understand the background and
processes behind the data collection exercise.
• Cataloguing material
– Bibliographic record of the dataset, for proper acknowledgement
and citation
– Basic instrument used for resource discovery
• http://www.esds.ac.uk/news/goodPractice.pdf
B. Why to disseminate microdata?
Untapped potential of microdata for national
development
• Even the best planned tabulations cannot exhaustively
bring out all valuable information from census data
• Diversity, disparities and related causalities are best
analysed from microdata, e.g.
– Tracking the effects of policy interventions on target
groups
– Determining dimensions of within-country disparities
• The quality of research would improve
=> Return on data collection would increase
=> National policies could be targeted better
=> More efficient use of public resources
Factors that might hinder microdata dissemination
- Discussion
• Concerns about data confidentiality
• Ambiguous or missing national legislation
• Narrow mandate of statistical agency
• Concerns about data quality
• Low demand from data users
International initiatives
• Marrakech Action Plan on Statistics,
http://www.surveynetwork.org/home/docs/Marrakech_Action_Pla
n_for_Statistics.pdf
• International Household Survey Network,
http://www.surveynetwork.org/
• IHSN Microdata Management Toolkit
• ESCAP-World Bank-PARIS21 project on improving access to
survey microdata in Asia and the Pacific
ESCAP project on improving access to survey
microdata in Asia and the Pacific, 2007-2008
• Household surveys and population and housing
censuses, not establishment surveys
• Assessment of status of microdata dissemination
• Regional inventory and data archive of household
surveys
• Regional advocacy and training workshops
• On-site training and technical advice on
documentation and anonymization
C. Microdata Management Toolkit
Microdata Management Toolkit – Summary
A set of software tools for the documentation, archiving,
dissemination and preservation of microdata
1. Metadata Editor
– Document survey data in accordance with international
standards
2. CD-Rom Builder
– Generates user-friendly outputs, such as CDs, websites,
for dissemination and archiving
3. The Explorer
– For viewing metadata
– For re-exporting data to various formats
Download and use
• The Toolkit can be downloaded from
http://www.surveynetwork.org/home/?lvl1=tools&lvl2=
documentation&lvl3=toolkit
• Except Metadata Editor, all Toolkit components are
available for free
• Nesstar Editor: One free license for NSOs of the
World bank IDA countries (e.g. Afghanistan, Georgia,
Kyrgyz Republic, Moldova, Tajikistan)
Metadata Editor
• Documents survey data in accordance with
international standards
• Data Documentation Initiative (DDI)
• Dublin Core Metadata Initiative (DCMI)
• Data & metadata in one single file
• Data can be imported from various formats, incl.
statistical packages
• Produces survey documentation in PDF format
Extensible Mark-up Language (XML)
• Language to describe data using tags
• Tags conceptually the same as fields in
databases
• XML files are regular text files
• Can be edited with text editors
• XML files, like databases, can be:
• Searched and queried
• Edited
• Tutorial: http://w3schools.com/xml
XML example
<titl> Multiple Indicator Cluster Survey 2005 </titl>
<altTitl> MICS </altTitl>
<AuthEnty> National Statistics Office (NSO) </AuthEnty>
<fundAg abbr= "UNICEF">United Nations Children Fund </fundAg>
<collDate date= "2005-01" event="start"/>
<collDate date= "2005-03" event="end"/>
<nation> Popstan </nation>
<geogCover> National </geogCover>
<sampProc> 5,000 households, stratified two stages </sampProc>
<respRate> 98 percent </respRate>
XML advantages
• Creation of a comprehensive checklist of useful
metadata elements
• Potential to assess the content of a file by
determining whether particular tags are, or are
not, within that file
• Creation of a dataset catalogue which can be queried for
key metadata elements
• Potential to transform the file into more userfriendly formats, such as HTML, PDF
• XML files can be exchanged across networks or over the
Internet using web services or SOAP
CD-ROM Builder
• Integrates with Metadata Editor
• Generates user-friendly outputs (CD-Rom, website)
for dissemination and archiving (HTML format)
• Allows customization
– Branding: look and feel of CD or website
– Content: single or multiple surveys
CD-ROM Builder process
1
Create new CD-ROM Project
2
3
Add a survey to the project
and select its type and
branding
Click the Save button to
generate the HTML interface
4
• Selecting a
consisting survey by
opening the DDI-XML
or Nesstar file
• The survey
branding determines
the overall look and
feel of the CD
• The survey type
determines the
default metadata
content
After a few minutes, your CD
Project is ready for publishing!
CD-ROM Builder sample outputs
Demonstration of Metadata Editor
A live demonstration with Popstan dataset, on-screen
in English and Russian
Thank you!
Discussion, questions, answers