Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, [email protected] for UNECE Training Workshop on Census Technology for SPECA member countries,
Download ReportTranscript Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, [email protected] for UNECE Training Workshop on Census Technology for SPECA member countries,
Documenting and disseminating census and survey data sets Ilpo Survo, United Nations ESCAP, Bangkok, [email protected] for UNECE Training Workshop on Census Technology for SPECA member countries, Astana, 7-8 June 2007 Content A. Systematic documenting of census data sets B. Why to disseminate microdata? C. Microdata Management Toolkit A. Systematic documenting of census data sets A good census dataset is.. • Documented clearly • Contains no surprises • Allows users to – Start working effectively quickly – Find the data they are interested in – Understand what the data are measuring and how the data have been created – Assess the quality of the data Evolving documentation technology • Own documentation standards => International metadata standards • National practices => International good practices. • Ad hoc tools => Structuring tools, databases • Text-based codebooks => XML-based codebooks Maintain metadata in a centralised database • Manage definitions, methodology information, variable information, data collection information in one place • Ensures consistency across data holdings • Approach useful for planning, data collection, processing, analysis and dissemination Good practices in data documentation • Explanatory material – Minimum material required to ensure the long-term viability and functionality of a dataset • Contextual information – Material about the context in which the data was collected, and how it was put to use – Enables the secondary user to fully understand the background and processes behind the data collection exercise. • Cataloguing material – Bibliographic record of the dataset, for proper acknowledgement and citation – Basic instrument used for resource discovery • http://www.esds.ac.uk/news/goodPractice.pdf B. Why to disseminate microdata? Untapped potential of microdata for national development • Even the best planned tabulations cannot exhaustively bring out all valuable information from census data • Diversity, disparities and related causalities are best analysed from microdata, e.g. – Tracking the effects of policy interventions on target groups – Determining dimensions of within-country disparities • The quality of research would improve => Return on data collection would increase => National policies could be targeted better => More efficient use of public resources Factors that might hinder microdata dissemination - Discussion • Concerns about data confidentiality • Ambiguous or missing national legislation • Narrow mandate of statistical agency • Concerns about data quality • Low demand from data users International initiatives • Marrakech Action Plan on Statistics, http://www.surveynetwork.org/home/docs/Marrakech_Action_Pla n_for_Statistics.pdf • International Household Survey Network, http://www.surveynetwork.org/ • IHSN Microdata Management Toolkit • ESCAP-World Bank-PARIS21 project on improving access to survey microdata in Asia and the Pacific ESCAP project on improving access to survey microdata in Asia and the Pacific, 2007-2008 • Household surveys and population and housing censuses, not establishment surveys • Assessment of status of microdata dissemination • Regional inventory and data archive of household surveys • Regional advocacy and training workshops • On-site training and technical advice on documentation and anonymization C. Microdata Management Toolkit Microdata Management Toolkit – Summary A set of software tools for the documentation, archiving, dissemination and preservation of microdata 1. Metadata Editor – Document survey data in accordance with international standards 2. CD-Rom Builder – Generates user-friendly outputs, such as CDs, websites, for dissemination and archiving 3. The Explorer – For viewing metadata – For re-exporting data to various formats Download and use • The Toolkit can be downloaded from http://www.surveynetwork.org/home/?lvl1=tools&lvl2= documentation&lvl3=toolkit • Except Metadata Editor, all Toolkit components are available for free • Nesstar Editor: One free license for NSOs of the World bank IDA countries (e.g. Afghanistan, Georgia, Kyrgyz Republic, Moldova, Tajikistan) Metadata Editor • Documents survey data in accordance with international standards • Data Documentation Initiative (DDI) • Dublin Core Metadata Initiative (DCMI) • Data & metadata in one single file • Data can be imported from various formats, incl. statistical packages • Produces survey documentation in PDF format Extensible Mark-up Language (XML) • Language to describe data using tags • Tags conceptually the same as fields in databases • XML files are regular text files • Can be edited with text editors • XML files, like databases, can be: • Searched and queried • Edited • Tutorial: http://w3schools.com/xml XML example <titl> Multiple Indicator Cluster Survey 2005 </titl> <altTitl> MICS </altTitl> <AuthEnty> National Statistics Office (NSO) </AuthEnty> <fundAg abbr= "UNICEF">United Nations Children Fund </fundAg> <collDate date= "2005-01" event="start"/> <collDate date= "2005-03" event="end"/> <nation> Popstan </nation> <geogCover> National </geogCover> <sampProc> 5,000 households, stratified two stages </sampProc> <respRate> 98 percent </respRate> XML advantages • Creation of a comprehensive checklist of useful metadata elements • Potential to assess the content of a file by determining whether particular tags are, or are not, within that file • Creation of a dataset catalogue which can be queried for key metadata elements • Potential to transform the file into more userfriendly formats, such as HTML, PDF • XML files can be exchanged across networks or over the Internet using web services or SOAP CD-ROM Builder • Integrates with Metadata Editor • Generates user-friendly outputs (CD-Rom, website) for dissemination and archiving (HTML format) • Allows customization – Branding: look and feel of CD or website – Content: single or multiple surveys CD-ROM Builder process 1 Create new CD-ROM Project 2 3 Add a survey to the project and select its type and branding Click the Save button to generate the HTML interface 4 • Selecting a consisting survey by opening the DDI-XML or Nesstar file • The survey branding determines the overall look and feel of the CD • The survey type determines the default metadata content After a few minutes, your CD Project is ready for publishing! CD-ROM Builder sample outputs Demonstration of Metadata Editor A live demonstration with Popstan dataset, on-screen in English and Russian Thank you! Discussion, questions, answers