Transcript PPTX

Complex SDMX mapping
issues for development
indicators
UNSD – DFID Project on Improving the collation,
availability and dissemination of development
indicators (including the MDGs)
22 April 2014, Phnom Penh, Cambodia
Agenda for Today

CountryData Data Structure Definition (DSD)

Mapping best practices and guidelines

Review of mapping process

Complex SDMX cases

Overview of new features in DevInfo 7
Mapping Tool
Mapping best practices and
guidelines
MDG / CountryData DSD Concepts
Type
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Measure
Attribute
Attribute
Attribute
Attribute
Attribute
Name
Frequency
Series
Units of measurement
Location
Age group
Sex
Reference Area
Source Type
Time Period
Observation Value
Unit multiplier
Time period details
Nature of data points
Source details
Footnotes
Type of code used
i.e. Annual, Quarterly, etc.
Indicator title
i.e. Percent, number
i.e. Total, Urban, Rural
i.e. 15–49 yr olds, under 5 years old
i.e. Total, male, female
Country name
i.e. Survey, census, admin.
i.e. 1990, 1991, etc.
i.e. per 10,000, per 1,000 etc.
i.e. 2001 – 2003, Q1 2010 – Q3 2011
i.e. Estimated, Modelled, Adjusted etc.
Source name & date
Details of methodology & other notes etc.
CountryData DSD

Single DSD used for all MDG (and other
development indicators) - superset of MDG
DSD

Using cross-domain concepts and codelists:




Frequency (CL_FREQ)
Sex (CL_SEX)
Age (CL_AGE) (modified)
Unit multiplier (CL_UNIT_MULT)
CountryData DSD

Support for diverse indicators means not all
dimensions are applicable in all cases

e.g. Age Group is not applicable to indicator
“Telephone lines”

Value NA is used when a dimension or attribute is not
applicable.
CountryData DSD: Mappings

Not always obvious which values should be
used in some dimensions

What should be SEX in indicator “Births
attended by skilled health personnel”:

Female? Total? Not Applicable?
INDICATOR
SERIES
UNIT
Births attended by skilled health personnel
SH_STA_BRTC
Percent

SEX
??
NA
What about the AGE dimension?
AGE
??
NA
LOCATION
Total
CountryData DSD: Mappings (2)

Inconsistent mappings lead to duplications
and other anomalies

In CountryData, mappings for time series are
agreed before data exchange

For the MDG dataset, there has been a
spreadsheet developed with the
recommended mappings for each time series
CountryData DSD: Maintenance

CountryData codelists are maintained by
UNSD

Periodically, the DSD needs to be updated

A new version of the CountryData DSD (1.4),
with modified codelists, was released in
January 2014
DevInfo 7

DevInfo 7 (Di7) launched in Nov 2012




SDMX 2.1 & 2.0 compliant
Web-based software
Most countries have migrated to version 7 from
version 6
DevInfo Support Group has been adding
enhancements and bugfixes; newest version is
7.1.0.1
Mapping to the DSD (in DevInfo)

The DevInfo mapping tool is designed to
facilitate the mapping of the database to the
DSD



based on mapping between the codelists of the
DSD and origin database
certain situations require further manual effort to
map a time series
sometimes a “fix” is required to the database where
the data simply isn’t valid or where there are
duplicates
DevInfo Data Architecture


Area, hierarchical dimension
IUS = Indicator, Unit and Subgroup

Time series data are stored with the combination of
the 3 dimensions




Source & Time Period


Indicator
Unit
Subgroup: Combination of one or more sub-dimensions
Together with IUS “uniquely” defines each data value
Footnote

“Free text” field stored with data value
DevInfo Database
CountryData DSD
• Frequency (Default = “Annual”)
• Area
• Reference Area
• Indicator
• Series
• Unit
• Units of measurement
• Unit multiplier (Default = 0)
• Subgroup (i.e. Sex, Age,
Location etc.)
• Location (Default = “Total”)
• Age group (Default = “All Ages”)
• Sex (Default = “Both Sexes”)
• Source Type (Default = “NA”)
• Source
• Source details
• Time Period
• Time Period
• Time period details
• Nature of data points (Default = “C”)
• Footnotes
• Footnotes
DevInfo Mapping Process Review
DevInfo Mapping Process Review
Go to: unstats.un.org/demoginfo1
( /demoginfo2 /demoginfo3
/demoginfo5 /demoginfo6 )
/demoginfo4
Log onto administrative profile
Log onto administrative profile
Username: [email protected]
Password: 1234
Scroll down to ‘Registry’ menu
Step 1: Codelist mapping
Step 1: (A) Map Indicator codes
Step 1: (B) Map Unit codes
Step 1: (C) Map Subgroup codes
Step 1: (C) Choose Subgroup list
Step 1: (C) Map Age subgroup
Step 1: (C) Map Sex & Location
Step 1: (D) Map Area
Step 1: Save codelist mappings
Step 1: Ignore warning
Step 1: Confirm mapping saved
Step 1: Complete
Exercise 1: Codelist mapping

Use unstats.un.org/demoginfo[1-6]

Map the codelists (where possible) for:






Just one indicator, “Antenatal care coverage for at least
one visit”
Unit
Age
Sex
Location
Area
Step 2: Confirm IUS mapping
Step 2: Save IUS Mappings
Why is Step 2 necessary?

The default values for SEX, LOCATION or
AGE GROUP mapping may not be
applicable to all mappings

The codelist mapping may only provide a
partial mapping of the time series (i.e. more
information is required)

Any necessary mapping changes are made
in Step 2
Where are the default values?
Admin panel: Application settings
Insert screens shot/details of admin panel and default value
storage…
Scroll down
Application settings: mapping default
values
Mapping of SUBGROUP (Defaults)
• Where a subgroup value is missing the default values
will apply: Indicator
Unit
Default Values
• Location = T
Antenatal care coverage for at least one visit - Percent
Sex
Female
Female
Location
Rural
Age
15-49 yr
15-49 yr
Time Period
2000
95.6
92.2
2004
96.5
95.9
2006
98.8
98.7
2010
98
97.4
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Antenatal care coverage, at least one visit
PERCENT T
015_035_Y
F
Antenatal care coverage, at least one visit
PERCENT R
015_035_Y
F
Overriding defaults in mapping of SUBGROUP
• Common example of where default subgroup
mappings do not apply
Indicator
Subgroups?
Unit
Land under forest cover - Percent
Time Period
Data Value
1993
1997
2002
2005
2006
2008
2009
2010
59.82
58.6
61.15
60
59.09
57.99
57.6
57.56
Default Values
• Location = T
• Age Group = 000_099_Y
• Sex = Both sexes
SERIES LABEL
UNITS
LOCATION AGE_GROUP
SEX
Land area covered by forest
PERCENT
T
NA
NA
Overriding defaults in mapping of SUBGROUP
• So subgroups coverage affects the number of
manual changes which have to be made…
Unit
Indicator
Antenatal care coverage, at least one visit - Percent
Location
Total
Rural
Urban
Time Period
2000
44.8
41.2
67.3
2005
71.8
70.4
80.7
2010
89.1
87.6
97
Subgroup for
Age and Sex?
Default Values
• Age Group = 000_099_Y
• Sex = T
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Antenatal care coverage, at least one visit
PERCENT T
015_035_Y
F
Antenatal care coverage, at least one visit
PERCENT R
015_035_Y
F
Antenatal care coverage, at least one visit
PERCENT U
015_035_Y
F
Overriding defaults in mapping of SUBGROUP
• Using the Other subgroup to clump age, sex and
location information results in more manual mapping
Unit
Indicator
Condom use at last high-risk sex - Percent
Other
?
Time Period
2003
2008
Female Male 1515-24 yr
24 yr
32.7
29.2
51.7
46.3
Rural
Rural
Urban
Urban
female male 15- female male 1515-24 yr
24 yr
15-24 yr
24 yr
21.2
21.6
38.9
40.6
32.5
33.5
50.3
52
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Condom use at last high-risk sex
PERCENT
T
015_010_Y
M
Condom use at last high-risk sex
PERCENT
T
015_010_Y
F
Subgroup for
Location, Age
and Sex?
Default Values
• Location = T
• Age Group = 000_099_Y
• Sex = T
Step 2: Amend Indicator
When using the check box to tick the
mapping, you are “fixing” the mapped
DSD values. If the box is unchecked
again and the mappings saved, then
DSD values revert to those mapped at
codelist / default values (i.e. any
manual changes are undone.)
Exercise 2: Mapping time series

Use unstats.un.org/demoginfo[1-6]
Username = [email protected]
Password = 1234

Map the time series for


1.
2.
3.
“Literacy rate of 15-24 year-olds”
“Condom use at last high risk sex (Version 2)”
“Land under forest cover”
Step 2: Complete
Final Step: Register mappings
Final Step: Generate SDMX-ML
Final Step: Complete
Complex SDMX Cases
Complex mappings under the 1st and
2nd mapping steps


Most commonly mappings need to be overridden
for dimensions Sex, Age Group and Location
But sometimes manual changes are required
between DevInfo and DSD indicator and unit,
such as when…


More than one DevInfo code relates to a single DSD
code
OR
More than one DSD code relates to a single DevInfo
code
Many-to-one mapping for Indicator codelist
(Example 1)
Indicator
Proportion of seats headed by women in
national parliament - Percent
Seats held by men in national parliament Number
Seats held by women in national
parliament - Number
Seats in national parliament - Number
Time
Period
1999
Data
Value
Indicator
11.1
2008
Indicator
80
2008
2008
Indicator
14
94
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Seats in national parliament
Seats in national parliament
Seats in national parliament
Seats in national parliament
PERCENT
NUMBER
NUMBER
NUMBER
T
T
T
T
000_099_Y
000_099_Y
000_099_Y
000_099_Y
F
M
F
T
Many-to-one mapping for Indicator codelist
(Example 2)
Female Male 1515-24 yr
24 yr
Indicator
Time Period
Population 15-24 year-olds who have comprehensive
2003
72.2
75.1
correct knowledge of HIV/AIDS - Percent
2008
65.9
76.1
Men 15-24 years with comprehensive knowledge of AIDS 2008
- Percent
Women 15-24 years with comprehensive knowledge of
2008
AIDS - Percent
Other
Total
Indicator
Indicator
34.2
Indicator
28.3
SERIES LABEL
UNITS
LOCATION
AGE_GROUP
SEX
Population with comprehensive correct knowledge of HIV/AIDS
PERCENT T
015_010_Y M
Population with comprehensive correct knowledge of HIV/AIDS
PERCENT T
015_010_Y F
Many-to-one mapping for Indicator codelist
(Example 3)
Indicator
Indicator
Indicator
Population below national poverty
line - Percent
Share of population below poverty
line - Percent
Location
Time Period
1993
2000
2005-2006
2010
2001
2005
2006
2008
Total
Rural
51.2
60.3
56.7
44.9
60.4
56.9
56.9
56.9
Urban
65.7
61.9
48.7
14.3
28.5
22.1
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population below national poverty line
PERCENT
R
000_099_Y
T
Population below national poverty line
PERCENT
T
000_099_Y
T
Population below national poverty line
PERCENT
U
000_099_Y
T
Many-to-one mapping for Unit codelist
(Example 1)
Unit
Indicator
Gender parity index in primary education - Ratio
Gender parity index at primary education - Index
SERIES LABEL
Other
Time Period
2002
2003
2004
2005
2006
2007
2007-2008
2008
2009
UNITS
Gender Parity Index in primary level enrolment RATIO
Unit
Total
Rural
0.92
0.92
0.93
0.93
0.96
0.96
0.96
0.99
0.96
Urban
0.99
0.99
LOCATION AGE_GROUP SEX
T
000_099_Y
T
One-to-many mapping of Indicator
Indicator ?
Telephone lines - Number
Location
Other
Cellular lines Fixed lines
Time Period
2000
72,602
58,261
2001
276,034
56,147
2002
505,627
59,472
2003
893,035
65,793
2004
1,165,035
82,495
2005
1,525,125
100,777
2006
2,697,616
129,863
2007
5,163,414
165,788
SERIES LABEL
Total
Subgroup
Manual change
2,827,479
5,329,202
UNITS
LOCATION AGE_GROUP SEX
Mobile cellular telephone subscriptions NUMBER T
Telephone lines
NUMBER T
NA
NA
NA
NA
One-to-many mapping of Indicator
Indicator ?
Unit
Unmet need for family planning - Percent
Location
Other
Limiting
Spacing
Time Period
1992
19.4
21
2000
11.6
24
2005
13.4
24.5
2010
Total
Rural
40.4
35.6
37.9
18.9
Manual change
Urban
38.4
19.5
34.4
15.5
Subgroup
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Unmet need for family planning, limiting
PERCENT T
015_035_Y
F
Unmet need for family planning, spacing
PERCENT T
015_035_Y
F
Unmet need for family planning
PERCENT T
015_035_Y
F
“Manual” mapping of UNIT
Indicator
Unit
Gender Parity Index in primary level enrolment - Percent
Time
Data Value
Period
2000
0.93
2003
0.97
2006
0.99
2009
1
SERIES LABEL
UNITS
Gender Parity Index in primary level enrolment RATIO
Manual change
• Unit = “Ratio”
LOCATION AGE_GROUP SEX
T
000_099_Y
T
Attribute: Unit Multiplier (UNIT_MULT)

“Exponent in base 10 that multiplied by the
observation numeric value gives the result
expressed in the unit of measure.”

If the observation value is in millions, unit
multiplier is 6; if in billions, 9, and so on.
Where the number is simple units, use 0.

Mandatory attribute
Attribute: Unit Multiplier (UNIT_MULT)
Population undernourished, millions
Time Period
Data Value
1993
1997
2002
2005
2006
2008
2009
2010
5.82
5.6
6.15
6
5.09
4.99
4.6
4.56
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX UNIT_MULT
Population undernourished
NUMBER T
000_099_Y
T
6
Back to mapping…
Example 1: Many-to-one mapping
Example 2: One-to-many mapping
Final Step: Register new mappings
Exercise 3: Complex mappings

Use unstats.un.org/demoginfo[1-6]

Map/ amend/ publish the time series for:
1.
2.
3.
4.
“Gender parity index in primary education”
“Seats held by men in national parliament”
“Seats held by women in national parliament”
“Telephone lines”
Other issues encountered with generating
SDMX from DevInfo
• The CountryData DSD requires any data point to be
uniquely described by the following dimensions:
Type
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Name
Series
Units of measurement
Location
Age group
Sex
Reference Area
Time Period
• However, DevInfo allows data to be stored in overlapping
time intervals and with multiple sources. These issues need
to be resolved to conform to the “uniqueness” required by
the CountryData DSD.
Multiple sources
Allowable in DevInfo but not in the DSD
Proportion of population with access to improved sanitation - Percent
Location
Total
Source
CPC Census 1995 NCEHWS_2007 NCEHWS 2003
NCEHWS 2004
Time Period
1990
11
1995
29
2000
37
2001
40
2002
42
41.6
2003
42
42.2
2004
44
44.3
2005
46
2006
47
2007
49
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population using improved sanitation facilities
PERCENT
T
000_099_Y
T
Overlapping time
• This issue is only a problem where overlapping periods
begin from the same year, as the mapping tool takes the
first year in the period as the value for the “Time Period”
dimension.
Infant mortality rate - Deaths per 1000 live births
Location
Total
Time Period
1990-1994
1995-1999
1999-2003
2005-2006
2005-2009
SERIES LABEL
UNITS
Infant mortality rate PER_1000_LIVE_BIRTHS
27.3
25.5
24.2
25.3
18.9
LOCATION AGE_GROUP SEX
T
000_001_Y
T
Targets in the database
Targets are also an issue when found in the database
since they should not be exchanged as observed values
Target in database (Example 1)
Sometimes stored as subgroup which can be ignored
at the 2nd stage…
Proportion of people living below the national poverty line - Percent
Location
Total
Rural
Urban
Other
MDG target
Time Period
1990
48
1992
46
51.8
26.5
1997
39.1
42.5
22.1
2002
33.5
37.6
19.7
2015
24
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population below national poverty line
PERCENT
R
000_099_Y
T
Population below national poverty line
PERCENT
T
000_099_Y
T
Population below national poverty line
PERCENT
U
000_099_Y
T
Target in database (Example 2)
But other times can be found as a time period
among observed values…
Maternal mortality ratio - Deaths per 100,000 live births
Location
Total
Rural
Urban
CPC
CPI Census Governme SPC RHS
SPC RHS
SPC RHS
Source
Census
2005
nt 2007
2000
2000
2000
Time period
1995
650
2000
530
580
170
2005
405
2015
260
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Maternal mortality ratio
PER_100000_LIVE_BIRTHS T
000_099_Y
F
Use of filters at registration

To deal with the issues of




multiple sources for a given time period
overlapping time period beginning with the same
year
targets presented alongside observed values
The registration page provides a feature to
filter out data from a generated SDMX
message associated with specific time
periods and source references
Filter by time/ source
Final Step: Select source filter
Filter by time
Final Step: Select time filter
Final Step: Register new mappings
Final Step: Complete
Exercise 4: Filter time series

Use unstats.un.org/demoginfo[1-6]

Map/ amend/ publish the time series for;
1.
2.
3.
4.
“Under-five mortality rate”
“Maternal mortality ratio (MMR)”
“Tuberculosis prevalence rate”
“Proportion of the population using improved
sanitation facilities”
New features in DevInfo 7.1
Improved Quick Data Search
Database updates and publishing
After updating your database with new data, you can
optimize and publish the updates in SDMX at the
same time in Admin Panel
Database updates and publishing
Database updates and publishing
Database updates and publishing
DevInfo installation and upgrading

The latest version of DevInfo is 7.1.0.1


Starting with this version, you no longer need to
uninstall and reinstall in order to upgrade
Instead you will use patches available on the DevInfo
Downloads page
Exporting and importing mappings
Exporting mappings
Editing exported mappings
Editing exported mappings
Exporting and importing mappings
Importing edited mappings
Importing edited mappings
Importing edited mappings
Thank you for your attention