Transcript PPTX
Complex SDMX mapping issues for development indicators UNSD – DFID Project on Improving the collation, availability and dissemination of development indicators (including the MDGs) 22 April 2014, Phnom Penh, Cambodia Agenda for Today CountryData Data Structure Definition (DSD) Mapping best practices and guidelines Review of mapping process Complex SDMX cases Overview of new features in DevInfo 7 Mapping Tool Mapping best practices and guidelines MDG / CountryData DSD Concepts Type Dimension Dimension Dimension Dimension Dimension Dimension Dimension Dimension Dimension Measure Attribute Attribute Attribute Attribute Attribute Name Frequency Series Units of measurement Location Age group Sex Reference Area Source Type Time Period Observation Value Unit multiplier Time period details Nature of data points Source details Footnotes Type of code used i.e. Annual, Quarterly, etc. Indicator title i.e. Percent, number i.e. Total, Urban, Rural i.e. 15–49 yr olds, under 5 years old i.e. Total, male, female Country name i.e. Survey, census, admin. i.e. 1990, 1991, etc. i.e. per 10,000, per 1,000 etc. i.e. 2001 – 2003, Q1 2010 – Q3 2011 i.e. Estimated, Modelled, Adjusted etc. Source name & date Details of methodology & other notes etc. CountryData DSD Single DSD used for all MDG (and other development indicators) - superset of MDG DSD Using cross-domain concepts and codelists: Frequency (CL_FREQ) Sex (CL_SEX) Age (CL_AGE) (modified) Unit multiplier (CL_UNIT_MULT) CountryData DSD Support for diverse indicators means not all dimensions are applicable in all cases e.g. Age Group is not applicable to indicator “Telephone lines” Value NA is used when a dimension or attribute is not applicable. CountryData DSD: Mappings Not always obvious which values should be used in some dimensions What should be SEX in indicator “Births attended by skilled health personnel”: Female? Total? Not Applicable? INDICATOR SERIES UNIT Births attended by skilled health personnel SH_STA_BRTC Percent SEX ?? NA What about the AGE dimension? AGE ?? NA LOCATION Total CountryData DSD: Mappings (2) Inconsistent mappings lead to duplications and other anomalies In CountryData, mappings for time series are agreed before data exchange For the MDG dataset, there has been a spreadsheet developed with the recommended mappings for each time series CountryData DSD: Maintenance CountryData codelists are maintained by UNSD Periodically, the DSD needs to be updated A new version of the CountryData DSD (1.4), with modified codelists, was released in January 2014 DevInfo 7 DevInfo 7 (Di7) launched in Nov 2012 SDMX 2.1 & 2.0 compliant Web-based software Most countries have migrated to version 7 from version 6 DevInfo Support Group has been adding enhancements and bugfixes; newest version is 7.1.0.1 Mapping to the DSD (in DevInfo) The DevInfo mapping tool is designed to facilitate the mapping of the database to the DSD based on mapping between the codelists of the DSD and origin database certain situations require further manual effort to map a time series sometimes a “fix” is required to the database where the data simply isn’t valid or where there are duplicates DevInfo Data Architecture Area, hierarchical dimension IUS = Indicator, Unit and Subgroup Time series data are stored with the combination of the 3 dimensions Source & Time Period Indicator Unit Subgroup: Combination of one or more sub-dimensions Together with IUS “uniquely” defines each data value Footnote “Free text” field stored with data value DevInfo Database CountryData DSD • Frequency (Default = “Annual”) • Area • Reference Area • Indicator • Series • Unit • Units of measurement • Unit multiplier (Default = 0) • Subgroup (i.e. Sex, Age, Location etc.) • Location (Default = “Total”) • Age group (Default = “All Ages”) • Sex (Default = “Both Sexes”) • Source Type (Default = “NA”) • Source • Source details • Time Period • Time Period • Time period details • Nature of data points (Default = “C”) • Footnotes • Footnotes DevInfo Mapping Process Review DevInfo Mapping Process Review Go to: unstats.un.org/demoginfo1 ( /demoginfo2 /demoginfo3 /demoginfo5 /demoginfo6 ) /demoginfo4 Log onto administrative profile Log onto administrative profile Username: [email protected] Password: 1234 Scroll down to ‘Registry’ menu Step 1: Codelist mapping Step 1: (A) Map Indicator codes Step 1: (B) Map Unit codes Step 1: (C) Map Subgroup codes Step 1: (C) Choose Subgroup list Step 1: (C) Map Age subgroup Step 1: (C) Map Sex & Location Step 1: (D) Map Area Step 1: Save codelist mappings Step 1: Ignore warning Step 1: Confirm mapping saved Step 1: Complete Exercise 1: Codelist mapping Use unstats.un.org/demoginfo[1-6] Map the codelists (where possible) for: Just one indicator, “Antenatal care coverage for at least one visit” Unit Age Sex Location Area Step 2: Confirm IUS mapping Step 2: Save IUS Mappings Why is Step 2 necessary? The default values for SEX, LOCATION or AGE GROUP mapping may not be applicable to all mappings The codelist mapping may only provide a partial mapping of the time series (i.e. more information is required) Any necessary mapping changes are made in Step 2 Where are the default values? Admin panel: Application settings Insert screens shot/details of admin panel and default value storage… Scroll down Application settings: mapping default values Mapping of SUBGROUP (Defaults) • Where a subgroup value is missing the default values will apply: Indicator Unit Default Values • Location = T Antenatal care coverage for at least one visit - Percent Sex Female Female Location Rural Age 15-49 yr 15-49 yr Time Period 2000 95.6 92.2 2004 96.5 95.9 2006 98.8 98.7 2010 98 97.4 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Antenatal care coverage, at least one visit PERCENT T 015_035_Y F Antenatal care coverage, at least one visit PERCENT R 015_035_Y F Overriding defaults in mapping of SUBGROUP • Common example of where default subgroup mappings do not apply Indicator Subgroups? Unit Land under forest cover - Percent Time Period Data Value 1993 1997 2002 2005 2006 2008 2009 2010 59.82 58.6 61.15 60 59.09 57.99 57.6 57.56 Default Values • Location = T • Age Group = 000_099_Y • Sex = Both sexes SERIES LABEL UNITS LOCATION AGE_GROUP SEX Land area covered by forest PERCENT T NA NA Overriding defaults in mapping of SUBGROUP • So subgroups coverage affects the number of manual changes which have to be made… Unit Indicator Antenatal care coverage, at least one visit - Percent Location Total Rural Urban Time Period 2000 44.8 41.2 67.3 2005 71.8 70.4 80.7 2010 89.1 87.6 97 Subgroup for Age and Sex? Default Values • Age Group = 000_099_Y • Sex = T SERIES LABEL UNITS LOCATION AGE_GROUP SEX Antenatal care coverage, at least one visit PERCENT T 015_035_Y F Antenatal care coverage, at least one visit PERCENT R 015_035_Y F Antenatal care coverage, at least one visit PERCENT U 015_035_Y F Overriding defaults in mapping of SUBGROUP • Using the Other subgroup to clump age, sex and location information results in more manual mapping Unit Indicator Condom use at last high-risk sex - Percent Other ? Time Period 2003 2008 Female Male 1515-24 yr 24 yr 32.7 29.2 51.7 46.3 Rural Rural Urban Urban female male 15- female male 1515-24 yr 24 yr 15-24 yr 24 yr 21.2 21.6 38.9 40.6 32.5 33.5 50.3 52 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Condom use at last high-risk sex PERCENT T 015_010_Y M Condom use at last high-risk sex PERCENT T 015_010_Y F Subgroup for Location, Age and Sex? Default Values • Location = T • Age Group = 000_099_Y • Sex = T Step 2: Amend Indicator When using the check box to tick the mapping, you are “fixing” the mapped DSD values. If the box is unchecked again and the mappings saved, then DSD values revert to those mapped at codelist / default values (i.e. any manual changes are undone.) Exercise 2: Mapping time series Use unstats.un.org/demoginfo[1-6] Username = [email protected] Password = 1234 Map the time series for 1. 2. 3. “Literacy rate of 15-24 year-olds” “Condom use at last high risk sex (Version 2)” “Land under forest cover” Step 2: Complete Final Step: Register mappings Final Step: Generate SDMX-ML Final Step: Complete Complex SDMX Cases Complex mappings under the 1st and 2nd mapping steps Most commonly mappings need to be overridden for dimensions Sex, Age Group and Location But sometimes manual changes are required between DevInfo and DSD indicator and unit, such as when… More than one DevInfo code relates to a single DSD code OR More than one DSD code relates to a single DevInfo code Many-to-one mapping for Indicator codelist (Example 1) Indicator Proportion of seats headed by women in national parliament - Percent Seats held by men in national parliament Number Seats held by women in national parliament - Number Seats in national parliament - Number Time Period 1999 Data Value Indicator 11.1 2008 Indicator 80 2008 2008 Indicator 14 94 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Seats in national parliament Seats in national parliament Seats in national parliament Seats in national parliament PERCENT NUMBER NUMBER NUMBER T T T T 000_099_Y 000_099_Y 000_099_Y 000_099_Y F M F T Many-to-one mapping for Indicator codelist (Example 2) Female Male 1515-24 yr 24 yr Indicator Time Period Population 15-24 year-olds who have comprehensive 2003 72.2 75.1 correct knowledge of HIV/AIDS - Percent 2008 65.9 76.1 Men 15-24 years with comprehensive knowledge of AIDS 2008 - Percent Women 15-24 years with comprehensive knowledge of 2008 AIDS - Percent Other Total Indicator Indicator 34.2 Indicator 28.3 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y M Population with comprehensive correct knowledge of HIV/AIDS PERCENT T 015_010_Y F Many-to-one mapping for Indicator codelist (Example 3) Indicator Indicator Indicator Population below national poverty line - Percent Share of population below poverty line - Percent Location Time Period 1993 2000 2005-2006 2010 2001 2005 2006 2008 Total Rural 51.2 60.3 56.7 44.9 60.4 56.9 56.9 56.9 Urban 65.7 61.9 48.7 14.3 28.5 22.1 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population below national poverty line PERCENT R 000_099_Y T Population below national poverty line PERCENT T 000_099_Y T Population below national poverty line PERCENT U 000_099_Y T Many-to-one mapping for Unit codelist (Example 1) Unit Indicator Gender parity index in primary education - Ratio Gender parity index at primary education - Index SERIES LABEL Other Time Period 2002 2003 2004 2005 2006 2007 2007-2008 2008 2009 UNITS Gender Parity Index in primary level enrolment RATIO Unit Total Rural 0.92 0.92 0.93 0.93 0.96 0.96 0.96 0.99 0.96 Urban 0.99 0.99 LOCATION AGE_GROUP SEX T 000_099_Y T One-to-many mapping of Indicator Indicator ? Telephone lines - Number Location Other Cellular lines Fixed lines Time Period 2000 72,602 58,261 2001 276,034 56,147 2002 505,627 59,472 2003 893,035 65,793 2004 1,165,035 82,495 2005 1,525,125 100,777 2006 2,697,616 129,863 2007 5,163,414 165,788 SERIES LABEL Total Subgroup Manual change 2,827,479 5,329,202 UNITS LOCATION AGE_GROUP SEX Mobile cellular telephone subscriptions NUMBER T Telephone lines NUMBER T NA NA NA NA One-to-many mapping of Indicator Indicator ? Unit Unmet need for family planning - Percent Location Other Limiting Spacing Time Period 1992 19.4 21 2000 11.6 24 2005 13.4 24.5 2010 Total Rural 40.4 35.6 37.9 18.9 Manual change Urban 38.4 19.5 34.4 15.5 Subgroup SERIES LABEL UNITS LOCATION AGE_GROUP SEX Unmet need for family planning, limiting PERCENT T 015_035_Y F Unmet need for family planning, spacing PERCENT T 015_035_Y F Unmet need for family planning PERCENT T 015_035_Y F “Manual” mapping of UNIT Indicator Unit Gender Parity Index in primary level enrolment - Percent Time Data Value Period 2000 0.93 2003 0.97 2006 0.99 2009 1 SERIES LABEL UNITS Gender Parity Index in primary level enrolment RATIO Manual change • Unit = “Ratio” LOCATION AGE_GROUP SEX T 000_099_Y T Attribute: Unit Multiplier (UNIT_MULT) “Exponent in base 10 that multiplied by the observation numeric value gives the result expressed in the unit of measure.” If the observation value is in millions, unit multiplier is 6; if in billions, 9, and so on. Where the number is simple units, use 0. Mandatory attribute Attribute: Unit Multiplier (UNIT_MULT) Population undernourished, millions Time Period Data Value 1993 1997 2002 2005 2006 2008 2009 2010 5.82 5.6 6.15 6 5.09 4.99 4.6 4.56 SERIES LABEL UNITS LOCATION AGE_GROUP SEX UNIT_MULT Population undernourished NUMBER T 000_099_Y T 6 Back to mapping… Example 1: Many-to-one mapping Example 2: One-to-many mapping Final Step: Register new mappings Exercise 3: Complex mappings Use unstats.un.org/demoginfo[1-6] Map/ amend/ publish the time series for: 1. 2. 3. 4. “Gender parity index in primary education” “Seats held by men in national parliament” “Seats held by women in national parliament” “Telephone lines” Other issues encountered with generating SDMX from DevInfo • The CountryData DSD requires any data point to be uniquely described by the following dimensions: Type Dimension Dimension Dimension Dimension Dimension Dimension Dimension Name Series Units of measurement Location Age group Sex Reference Area Time Period • However, DevInfo allows data to be stored in overlapping time intervals and with multiple sources. These issues need to be resolved to conform to the “uniqueness” required by the CountryData DSD. Multiple sources Allowable in DevInfo but not in the DSD Proportion of population with access to improved sanitation - Percent Location Total Source CPC Census 1995 NCEHWS_2007 NCEHWS 2003 NCEHWS 2004 Time Period 1990 11 1995 29 2000 37 2001 40 2002 42 41.6 2003 42 42.2 2004 44 44.3 2005 46 2006 47 2007 49 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population using improved sanitation facilities PERCENT T 000_099_Y T Overlapping time • This issue is only a problem where overlapping periods begin from the same year, as the mapping tool takes the first year in the period as the value for the “Time Period” dimension. Infant mortality rate - Deaths per 1000 live births Location Total Time Period 1990-1994 1995-1999 1999-2003 2005-2006 2005-2009 SERIES LABEL UNITS Infant mortality rate PER_1000_LIVE_BIRTHS 27.3 25.5 24.2 25.3 18.9 LOCATION AGE_GROUP SEX T 000_001_Y T Targets in the database Targets are also an issue when found in the database since they should not be exchanged as observed values Target in database (Example 1) Sometimes stored as subgroup which can be ignored at the 2nd stage… Proportion of people living below the national poverty line - Percent Location Total Rural Urban Other MDG target Time Period 1990 48 1992 46 51.8 26.5 1997 39.1 42.5 22.1 2002 33.5 37.6 19.7 2015 24 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Population below national poverty line PERCENT R 000_099_Y T Population below national poverty line PERCENT T 000_099_Y T Population below national poverty line PERCENT U 000_099_Y T Target in database (Example 2) But other times can be found as a time period among observed values… Maternal mortality ratio - Deaths per 100,000 live births Location Total Rural Urban CPC CPI Census Governme SPC RHS SPC RHS SPC RHS Source Census 2005 nt 2007 2000 2000 2000 Time period 1995 650 2000 530 580 170 2005 405 2015 260 SERIES LABEL UNITS LOCATION AGE_GROUP SEX Maternal mortality ratio PER_100000_LIVE_BIRTHS T 000_099_Y F Use of filters at registration To deal with the issues of multiple sources for a given time period overlapping time period beginning with the same year targets presented alongside observed values The registration page provides a feature to filter out data from a generated SDMX message associated with specific time periods and source references Filter by time/ source Final Step: Select source filter Filter by time Final Step: Select time filter Final Step: Register new mappings Final Step: Complete Exercise 4: Filter time series Use unstats.un.org/demoginfo[1-6] Map/ amend/ publish the time series for; 1. 2. 3. 4. “Under-five mortality rate” “Maternal mortality ratio (MMR)” “Tuberculosis prevalence rate” “Proportion of the population using improved sanitation facilities” New features in DevInfo 7.1 Improved Quick Data Search Database updates and publishing After updating your database with new data, you can optimize and publish the updates in SDMX at the same time in Admin Panel Database updates and publishing Database updates and publishing Database updates and publishing DevInfo installation and upgrading The latest version of DevInfo is 7.1.0.1 Starting with this version, you no longer need to uninstall and reinstall in order to upgrade Instead you will use patches available on the DevInfo Downloads page Exporting and importing mappings Exporting mappings Editing exported mappings Editing exported mappings Exporting and importing mappings Importing edited mappings Importing edited mappings Importing edited mappings Thank you for your attention