Transcript PPT
CountryData SDMX for Development Indicators MDG DSD vs. the Di Database: Using the Mapping Tool MDG Data Structure Definition (DSD) background Developed by SDMX Task Team of IAEG on MDGs Supports exchange of MDG Indicator data between international agencies (UN, UNICEF, UNESCO, …) Implemented in SDMX 2.0 Latest version (2.4) finalised in Feb 2013 DevInfo (Di) background Data dissemination software supported and promoted by UNICEF DevInfo7 (Di7) launched in Nov 2012 SDMX 2.1 & 2.0 compliant Web base software 9 out of 11 project countries using DevInfo Stable version compare to previous releases Simple relation between Di & DSD Di Database MDG DSD • Frequency (Default = “Annual”) • Area • Reference Area • Indicator • Series • Unit • Units of measurement • Unit multiplier (Default = 0) • Subgroup (i.e. Sex, Age, Location etc.) • Location (Default = “Total”) • Age group (Default = “All Ages”) • Sex (Default = “Both Sexes”) • Source Type (Default = “NA”) • Source • Source details • Time Period • Time Period • Time period details • Nature of data points (Default = “C”) • Footnotes • Footnotes Mapping to the DSD DSD dimensional structure means values are mandatory for LOCATION, SEX & AGE GROUP. Due the nature of this domain (i.e. MDGs), not obvious which values should be used in these dimensions For example, what is SEX for “Births attended by skilled personnel”: Not Applicable? Total? Female? Mapping to the DSD Inconsistent mappings lead to duplications and other anomalies In CountryData, mappings for indicators/ time series are agreed before data exchange (see mapping for MDGs from 1st workshop) However, this is just one side of the story… Mapping to the DSD Understanding the structure and contents of the origin database is fundamental to the mapping process Mapping to the DSD requires the data to enter into certain ‘restrictions’ it’s not bound by in the database (and vice versa). Mapping to the DSD The mapping tool in di software is designed to work with the di database as simply as possible… the tool is based on mapping between the codelists of the DSD and origin database; certain situations require some further manual effort to map a time series; and sometimes a “fix” is required to the database where the data simply isn’t valid or it’s duplicated. Therefore it’s good to review di structure to understand where these issues usually occur. Di Data Architecture Area, hierarchical dimension IUS = Indicator, Unit and Subgroup Time series data are stored with the combination of the 3 dimensions Source & Time Period Indicator Unit Subgroup: Combination of one or more sub-dimensions Together with IUS “uniquely” defines each data value Footnote “Free text” field stored with data value Di INDICATOR IUS: Indicator Unit Subgroup Indicator, for example: Infant Mortality AIDS Death Malaria Death Similar to SERIES in the DSD Contains only Indicator specific values Di UNIT IUS: Indicator Unit Subgroup Unit: Percentage Number USD Square KM Similar to UNIT of Measurement in DSD Contains only Unit specific values Di SUBGROUP IUS: Indicator Unit Subgroup SubGroup Dimension: Combination of one or more subdimensions “Age”, “Sex”, “Location” and “Other” subdimensions are set initially in database Specific values can be created under each sub-dimension Relate to SEX, AGE GROUP and LOCATION in DSD. Di SUBGROUP IUS: Indicator Unit Subgroup Formation Logic: Sub-Dimension Age Sex Location Other Sub-Dimension values < 1 Year < 5 Year 5 – 10 Year Male Female Urban Rural Total Rice Wheat SUBGROUP (Combination) <1 Year Male <5 Year Female Rural Urban Di SUBGROUP Di SUBGROUP Di SOURCE Di TIME PERIOD Di Mapping Tool: Introduction Once data exists in di7 web-based software then data can be mapped and published which conforms with the MDG DSD. This is all done online through the di7 webbased repository through the administration profile, so let’s begin… Getting Started… Scroll down to ‘Registry’ menu Log onto administrative profile Log onto administrative profile Full access to ‘Registry’ features Prepare the Dbase for mapping Prepares the SDMX artefactes Ready to ‘Upload’ the DSD Choose a DSD from your folders Choose a DSD from your folders DSD Upload is a success… Now you are ready to map… 1st Step: Codelist mapping 1st Step: DSD Codelists Indicator CodeList SH_HIV_INCD HIV incidence rate SH_MLR_MORT Notified cases of malaria SE_ADT_1524 Literacy rate SE_PRM_CMPL Primary completion rate UNIT CodeList NA Not applicable CUR_LCU Local currency USD USD NUMBER Number RATIO Ratio PERCENT Percent KM2 Square kilometers T Metric Tons PER_100_LIVE_BIRTHS Per 100 live births PER_100_POP Per 100 population PER_1000_LIVE_BIRTHS Per 1,000 live births PER_1000_POP Per 1,000 population PER_100000_LIVE_BIRTHS Per 100,000 live births PER_100000_POP Per 100,000 population T U R Location CodeList T Total (national level) U Urban R Rural NA F M T SEX CodeList Not applicable Female Male Both sexes AGE CodeList NA Not applicable 000_099_Y All age ranges 000_006_M under 6 month olds 000_005_Y under 5 year olds 000_001_Y under 1 year olds 000_018_Y under 18 year olds 000_006_Y under 6 year olds 010_005_Y 10-14 year olds 015_005_Y 15-19 year olds 015_010_Y 15-24 year olds 015_035_Y 15-49 year olds 006_054_M 6-59 months old 006_009_Y 6-14 year olds 005_013_Y 5-17 year olds 015_050_Y 15-64 year olds 1st Step: (A) Map Indicator codes 1st Step: (B) Map Unit codes 1st Step: (C) Map Subgroup codes 1st Step: (C) Choose Subgroup list 1st Step: (C) Map Age subgroup 1st Step: (C) Map Sex & Location 1st Step: (D) Map Area 1st Step: Save codelist mappings 1st Step: Ignore warning 1st Step: Confirm mapping saved Exercise 1: Codelist mapping Use unstats.un.org/unsd/demodiweb[1-6] Username = [email protected] Password = support@2012 Map the codelists (where possible) for Unit Age Sex Location Area And just one indicator, “Antenatal care coverage for at least one visit” 1st Step: Complete 2nd Step: Confirm IUS mapping 2nd Step: Save IUS Mappings Exercise 2: mapping time series Use unstats.un.org/unsd/demodiweb[1-6] Username = [email protected] Password = support@2012 Map the time series for 1. 2. 3. 4. 5. “Antenatal care coverage for at least four visits” “Employment to population ratio” “Literacy rate of 15-24 year-olds” “Death rate associated with malaria” “Proportion of population using solid fuels” 2nd Step: Complete Final Step: Register the mappings Final Step: Select mappings Final Step: Generate SDMX-ML Final Step: Complete Exercise 3: Publish time series Use unstats.un.org/unsd/demodiweb[1-6] Username = [email protected] Password = support@2012 Publish/ register the time series for 1. 2. 3. 4. 5. “Antenatal care coverage for at least four visits” “Employment to population ratio” “Literacy rate of 15-24 year-olds” “Death rate associated with malaria” “Proportion of population using solid fuels” Why the 2nd step? The default values for SEX, LOCATION or AGE GROUP mapping may not be applicable to all mappings The codelist mapping may only provide a partial mapping of the time series (i.e. more information is required) These changes are made in the 2nd step. Where are the default values? Admin panel: Application settings Insert screens shot/details of admin panel and default value storage… Application settings has all mapping default values Manual mapping of SUBGROUP • Where a subgroup value is missing the default values will apply, for example… Indicator Unit Antenatal care coverage for at least one visit - Percent Default Values Sex Female • Location = T ? Location Rural • … Urban Age 15-49 yr 15-49 yr 15-49 yr Time Period 2000 95.6 92.2 96.5 2004 96.5 95.9 96.3 2006 98.8 98.7 98.6 2010 98 97.4 98.1 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Antenatal care coverage, at least four visits PERCENT T 015_035_Y F Antenatal care coverage, at least one visit PERCENT T 015_035_Y F Manual mapping of SUBGROUP • So subgroups coverage affects the number of manual changes which have to be made… Indicator Unit Subgroup for Age and Sex? Antenatal care coverage rate - Percent Location Total Rural Time Period 2000 44.8 41.2 2005 71.8 70.4 2010 89.1 87.6 Urban 67.3 80.7 97 Default Values •… • Age Group = 000_099_Y • Sex = T SERIES LABEL UNITS LOCATION AGE_GROUP SEX Antenatal care coverage, at least four visits PERCENT T 015_035_Y F Antenatal care coverage, at least one visit PERCENT T 015_035_Y F Manual mapping of SUBGROUP • Common example of where default subgroup mapping do not apply Indicator Subgroups? Unit Land under forest cover - Percent Time Period Data Value 1993 1997 2002 2005 2006 2008 2009 2010 59.82 58.6 61.15 60 59.09 57.99 57.6 57.56 Default Values • Location = T • Age Group = 000_099_Y • Sex = Both sexes SERIES LABEL UNITS LOCATION AGE_GROUP SEX Land area covered by forest PERCENT T NA NA Manual mapping of SUBGROUP • Common example of where default subgroup mapping do not apply Subgroup for Sex? Indicator Unit Adolescent birth rate - Births per woman Location Rural Age 15-19 yr Time Period 1995 1999 2003 2005 54.6 2008-2009 81 Total 15-19 yr 114 77 69.4 59.8 67 Default Urban Values •15-19 … yr •… • Sex = T 63.2 62 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Adolescent birth rate PER_1000_POP T 015_005_Y F Manual mapping of SUBGROUP • Common example of where default subgroup mapping do not apply Indicator Unit Children under-five sleeping under insecticide-treated net (ITN) - Percent, 2011 Other ? Data Value Rural 45.9 Total 39 Urban 30.1 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Children sleeping under insecticide-treated bed nets PERCENT T Subgroup for Location, Age and Sex? 000_005_Y Default Values • Location = T • Age Group = 000_099_Y • Sex = T T Manual mapping of SUBGROUP • If the subgroups are sorted more simply, this also helps with the mapping: Indicator Unit Condom use at last high-risk sex - Percent Location Total Sex Age 15-24 yr Other Total 15-24 yr Time Period 2000-2001 49.8 2002-2003 55.1 2004-2005 2005-2006 52.9 2006 2012 ? Rural Female 15-24 yr Male 15-24 yr Female 15-24 yr 39 61 48 27.1 38.3 70 53 65.3 54.5 73 Male 15-24 yr 47.7 Default Values • Location = T Urban • …Female Male 15-24 yr 51.3 15-24 yr 66.9 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Condom use at last high-risk sex PERCENT T 015_010_Y M Condom use at last high-risk sex PERCENT T 015_010_Y F 70.5 Manual mapping of SUBGROUP • Common example of where default subgroup mapping do not apply Unit Indicator Condom use at last high-risk sex - Percent Other ? Time Period 2003 2008 Female Male 1515-24 yr 24 yr 32.7 29.2 51.7 46.3 Rural Rural Urban Urban female male 15- female male 1515-24 yr 24 yr 15-24 yr 24 yr 21.2 21.6 38.9 40.6 32.5 33.5 50.3 52 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Condom use at last high-risk sex PERCENT T 015_010_Y M Condom use at last high-risk sex PERCENT T 015_010_Y F Subgroup for Location, Age and Sex? Default Values • Location = T • Age Group = 000_099_Y • Sex = T Back to mapping… 2nd Step: Amend Indicator When using the check box to tick the mapping, you are “fixing” the mapped DSD values. If the box is unchecked again and the mappings saved, then DSD values revert to those mapped at codelist/ default values (i.e. any manual changes are undone.) Final Step: Register new mappings Exercise 4: Amend time series Use unstats.un.org/unsd/demodiweb[1-6] Map/ amend/ publish the time series for; 1. 2. 3. 4. 5. 6. “Antenatal coverage rate” “Children orphaned by AIDS” “Children under-five sleeping under insecticidetreated net (ITN)” “Proportion of births attended by skilled health personnel” “Share of women in wage employment in the nonagricultural sector” “Proportion of urban population living in slums” More complex mappings under the 1st and 2nd mapping step? The most common changes made to mappings are between subgroups and the Sex, Age Group and Location dimensions But sometimes manual changes are required between di and DSD indicator and unit, either… More than one di code relates to a single DSD code OR More than one DSD code relates to a single di code Many-to-one mapping for Indicator codelist (Example 1) Indicator Proportionof seats headed by women in national parliament - Percent Seats held by men in national parliament Number Seats held by women in national parliament - Number Seats in national parliament - Number Time Period 1999 Data Value Indicator 11.1 2008 Indicator 80 2008 2008 Indicator 14 94 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Seats in national parliament Seats in national parliament Seats in national parliament NUMBER NUMBER PERCENT T T T 000_099_Y 000_099_Y 000_099_Y M F F Many-to-one mapping for Indicator codelist (Example 2) Indicator Female Male 1515-24 yr 24 yr Indicator Time Period Population 15-24 year-olds who have comprehensive 2003 72.2 75.1 correct knowledge of HIV/AIDS - Percent 2008 65.9 76.1 Men 15-24 years with comprehensive knowledge of AIDS 2008 - Percent Women 15-24 years with comprehensive knowledge of 2008 AIDS - Percent Other Total Indicator 34.2 28.3 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y M Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y F Many-to-one mapping for Indicator codelist (Example 3) Indicator Indicator Population below national poverty line - Percent Share of population below poverty line - Percent Location Time Period 1993 2000 2005-2006 2010 2001 2005 2006 2008 Indicator Total Rural 51.2 60.3 56.7 44.9 60.4 56.9 56.9 56.9 Urban 65.7 61.9 48.7 14.3 28.5 22.1 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population below national poverty line PERCENT R 000_099_Y T Population below national poverty line PERCENT T 000_099_Y T Population below national poverty line PERCENT U 000_099_Y T Many-to-one mapping for Unit codelist (Example 1) Unit Indicator Gender parity index in primary education - Ratio Gender parity index at primary education - Index SERIES LABEL Other Time Period 2002 2003 2004 2005 2006 2007 2007-2008 2008 2009 UNITS Gender Parity Index in primary level enrolment RATIO Unit Total Rural 0.92 0.92 0.93 0.93 0.96 0.96 0.96 0.99 0.96 Urban 0.99 0.99 LOCATION AGE_GROUP SEX T 000_099_Y T Manual mapping of INDICATOR Indicator ? Unit Unmet need for family planning - Percent Location Other Limiting Spacing Time Period 1992 19.4 21 2000 11.6 24 2005 13.4 24.5 2010 Total Rural 40.4 35.6 37.9 18.9 Urban 38.4 19.5 34.4 15.5 Manual change SERIES LABEL UNITS LOCATION AGE_GROUP SEX Unmet need for family planning, limiting PERCENT T 015_035_Y F Unmet need for family planning, spacing PERCENT T 015_035_Y F Unmet need for family planning PERCENT T 015_035_Y F Manual mapping of INDICATOR Unit • Indicator ? Telephone lines - Number Location Telephone Cellular lines Time Period 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 SERIES LABEL 72,602 276,034 505,627 893,035 1,165,035 1,525,125 2,697,616 5,163,414 8,554,864 9,383,734 12,828,264 Total Fixed lines 58,261 56,147 59,472 65,793 82,495 100,777 129,863 165,788 168,481 233,533 327,114 Manual change 2,827,479 5,329,202 8,723,345 9,617,267 13,155,378 UNITS LOCATION AGE_GROUP SEX Mobile cellular telephone subscriptions NUMBER T Telephone lines NUMBER T NA NA NA NA Manual mapping of UNIT • Indicator Unit Gender Parity Index in primary level enrolment - Percent Time Data Value Period 2000 0.93 2003 0.97 2006 0.99 2009 1 SERIES LABEL UNITS Gender Parity Index in primary level enrolment RATIO Manual change • Unit = “Ratio” LOCATION AGE_GROUP SEX T 000_099_Y T Back to mapping… 1st Step: many di to 1 DSD code 2nd Step: 1 di to many DSD codes Final Step: Register new mappings Exercise 5: Complex time series Use unstats.un.org/unsd/demodiweb[1-6] Map/ amend/ publish the time series for; 1. 2. 3. 4. 5. 6. “Contraceptive prevalence rate” “Primary completion rate” “Gender parity index in primary education” “Seats held by men in national parliament” “Seats held by women in national parliament” “Telephone lines” Other issues encountered with generating SDMX from DevInfo • The MDG DSD requires any data point to be uniquely described by the following dimensions; Type Dimension Dimension Dimension Dimension Dimension Dimension Dimension Name Series Units of measurement Location Age group Sex Reference Area Time Period • However, DevInfo allows data to be stored in overlapping time intervals and with multiple sources. These issues need to be resolved to conform to the “uniqueness” required by the MDG DSD. Multiple sources Allowable in DevInfo but not in the DSD Proportion of population with access to improved sanitation - Percent Location Total Source CPC Census 1995 NCEHWS_2007 NCEHWS 2003 NCEHWS 2004 Time Period 1990 11 1995 29 2000 37 2001 40 2002 42 41.6 2003 42 42.2 2004 44 44.3 2005 46 2006 47 2007 49 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population using improved sanitation facilities PERCENT R 000_099_Y T Population using improved sanitation facilities PERCENT T 000_099_Y T Population using improved sanitation facilities PERCENT U 000_099_Y T Overlapping time • This issue is only a problem where overlapping periods begin from the same year, as the mapping tool takes the first year in the period as the value for the “Time Period” dimension. Infant mortality rate - Deaths per 1000 live births Location Total Time Period 1990-1994 1995-1999 1999-2003 2005-2006 2005-2009 SERIES LABEL UNITS Infant mortality rate PER_1000_LIVE_BIRTHS 27.3 25.5 24.2 25.3 18.9 LOCATION AGE_GROUP SEX T 000_001_Y T Targets in the database Targets are also an issue when found in the database since they should not be exchanged as observed values Target in database (Example 1) Sometimes stored as subgroup which can be ignored at the 2nd stage… Proportion of people living below the national poverty line - Percent Location Total Rural Urban Other MDG target Time Period 1990 48 1992 46 51.8 26.5 1997 39.1 42.5 22.1 2002 33.5 37.6 19.7 2015 24 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population below national poverty line PERCENT R 000_099_Y T Population below national poverty line PERCENT T 000_099_Y T Population below national poverty line PERCENT U 000_099_Y T Target in database (Example 2) But other times can be found as a time period among observed values… Maternal mortality ratio - Deaths per 100,000 live births Location Total Rural Urban CPC CPI Census Governme SPC RHS SPC RHS SPC RHS Source Census 2005 nt 2007 2000 2000 2000 Time period 1995 650 2000 530 580 170 2005 405 2015 260 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Maternal mortality ratio PER_100000_LIVE_BIRTHS T 000_099_Y F Use of filters at registration To deal with the issues of; multiple sources for a given time period, overlapping time period beginning at the same year; And targets presented alongside observed values The mapping tool provides a feature to filter out data from a generated SDMX message associated with specific time periods and source references. Back to mapping… Final Step: Filter by time/ source Final Step: Select source filter Final Step: Select time filter Final Step: Register new mappings Final Step: Complete Exercise 6: Filter time series Use unstats.un.org/unsd/demodiweb[2-6] Map/ amend/ publish the time series for; 1. 2. 3. 4. 5. 6. “Under-five mortality rate” “Maternal mortality ratio (MMR)” “Net enrolment ratio in primary education (NER)” “Orphans primary school enrolment” “Tuberculosis prevalence rate” “Proportion of the population using improved sanitation facilities” DSD Maintenance • The mapping and registry tool allows users to edit and delete the DSD as well as upload. • For when the DSD is updated, it is recommended to edit the DSD rather than delete • DSD deletion has the effect of removing all the mappings and subscriptions used for that DSD DSD Maintenance… DSD Header…