Transcript PPT
CountryData
SDMX for
Development Indicators
MDG DSD vs. the Di Database:
Using the Mapping Tool
MDG Data Structure Definition
(DSD) background
Developed by SDMX Task Team of IAEG on
MDGs
Supports exchange of MDG Indicator data
between international agencies (UN,
UNICEF, UNESCO, …)
Implemented in SDMX 2.0
Latest version (2.4) finalised in Feb 2013
DevInfo (Di) background
Data dissemination software supported
and promoted by UNICEF
DevInfo7 (Di7) launched in Nov 2012
SDMX 2.1 & 2.0 compliant
Web base software
9 out of 11 project countries using
DevInfo
Stable version compare to previous
releases
Simple relation between Di & DSD
Di Database
MDG DSD
• Frequency (Default = “Annual”)
• Area
• Reference Area
• Indicator
• Series
• Unit
• Units of measurement
• Unit multiplier (Default = 0)
• Subgroup (i.e. Sex, Age,
Location etc.)
• Location (Default = “Total”)
• Age group (Default = “All Ages”)
• Sex (Default = “Both Sexes”)
• Source Type (Default = “NA”)
• Source
• Source details
• Time Period
• Time Period
• Time period details
• Nature of data points (Default = “C”)
• Footnotes
• Footnotes
Mapping to the DSD
DSD dimensional structure means
values are mandatory for LOCATION, SEX &
AGE GROUP.
Due the nature of this domain (i.e. MDGs), not
obvious which values should be used in these
dimensions
For example, what is SEX for “Births attended
by skilled personnel”:
Not Applicable? Total? Female?
Mapping to the DSD
Inconsistent mappings lead to duplications
and other anomalies
In CountryData, mappings for indicators/ time
series are agreed before data exchange (see
mapping for MDGs from 1st workshop)
However, this is just one side of the story…
Mapping to the DSD
Understanding the structure and contents of
the origin database is fundamental to the
mapping process
Mapping to the DSD requires the data to
enter into certain ‘restrictions’ it’s not bound by
in the database (and vice versa).
Mapping to the DSD
The mapping tool in di software is
designed to work with the di database as simply
as possible…
the tool is based on mapping between the codelists
of the DSD and origin database;
certain situations require some further manual effort
to map a time series;
and sometimes a “fix” is required to the database
where the data simply isn’t valid or it’s duplicated.
Therefore it’s good to review di structure to
understand where these issues usually occur.
Di Data Architecture
Area, hierarchical dimension
IUS = Indicator, Unit and Subgroup
Time series data are stored with the combination of
the 3 dimensions
Source & Time Period
Indicator
Unit
Subgroup: Combination of one or more sub-dimensions
Together with IUS “uniquely” defines each data
value
Footnote
“Free text” field stored with data value
Di INDICATOR
IUS: Indicator Unit Subgroup
Indicator, for example:
Infant Mortality
AIDS Death
Malaria Death
Similar to SERIES in the DSD
Contains only Indicator specific values
Di UNIT
IUS: Indicator Unit Subgroup
Unit:
Percentage
Number
USD
Square KM
Similar to UNIT of Measurement in DSD
Contains only Unit specific values
Di SUBGROUP
IUS: Indicator Unit Subgroup
SubGroup Dimension:
Combination of one or more subdimensions
“Age”, “Sex”, “Location” and “Other” subdimensions are set initially in database
Specific values can be created under
each sub-dimension
Relate to SEX, AGE GROUP and
LOCATION in DSD.
Di SUBGROUP
IUS: Indicator Unit Subgroup
Formation Logic:
Sub-Dimension
Age
Sex
Location
Other
Sub-Dimension
values
< 1 Year
< 5 Year
5 – 10 Year
Male
Female
Urban
Rural
Total
Rice
Wheat
SUBGROUP
(Combination)
<1 Year Male
<5 Year Female Rural
Urban
Di SUBGROUP
Di SUBGROUP
Di SOURCE
Di TIME PERIOD
Di Mapping Tool: Introduction
Once data exists in di7 web-based software
then data can be mapped and published
which conforms with the MDG DSD.
This is all done online through the di7 webbased repository through the administration
profile, so let’s begin…
Getting Started…
Scroll down to ‘Registry’ menu
Log onto administrative profile
Log onto administrative profile
Full access to ‘Registry’ features
Prepare the Dbase for mapping
Prepares the SDMX artefactes
Ready to ‘Upload’ the DSD
Choose a DSD from your folders
Choose a DSD from your folders
DSD Upload is a success…
Now you are ready to map…
1st Step: Codelist mapping
1st Step: DSD Codelists
Indicator CodeList
SH_HIV_INCD
HIV incidence rate
SH_MLR_MORT
Notified cases of malaria
SE_ADT_1524
Literacy rate
SE_PRM_CMPL
Primary completion rate
UNIT CodeList
NA
Not applicable
CUR_LCU
Local currency
USD
USD
NUMBER
Number
RATIO
Ratio
PERCENT
Percent
KM2
Square kilometers
T
Metric Tons
PER_100_LIVE_BIRTHS
Per 100 live births
PER_100_POP
Per 100 population
PER_1000_LIVE_BIRTHS
Per 1,000 live births
PER_1000_POP
Per 1,000 population
PER_100000_LIVE_BIRTHS
Per 100,000 live births
PER_100000_POP
Per 100,000 population
T
U
R
Location CodeList
T Total (national level)
U Urban
R Rural
NA
F
M
T
SEX CodeList
Not applicable
Female
Male
Both sexes
AGE CodeList
NA
Not applicable
000_099_Y
All age ranges
000_006_M
under 6 month olds
000_005_Y
under 5 year olds
000_001_Y
under 1 year olds
000_018_Y
under 18 year olds
000_006_Y
under 6 year olds
010_005_Y
10-14 year olds
015_005_Y
15-19 year olds
015_010_Y
15-24 year olds
015_035_Y
15-49 year olds
006_054_M
6-59 months old
006_009_Y
6-14 year olds
005_013_Y
5-17 year olds
015_050_Y
15-64 year olds
1st Step: (A) Map Indicator codes
1st Step: (B) Map Unit codes
1st Step: (C) Map Subgroup codes
1st Step: (C) Choose Subgroup list
1st Step: (C) Map Age subgroup
1st Step: (C) Map Sex & Location
1st Step: (D) Map Area
1st Step: Save codelist mappings
1st Step: Ignore warning
1st Step: Confirm mapping saved
Exercise 1: Codelist mapping
Use unstats.un.org/unsd/demodiweb[1-6]
Username = [email protected]
Password = support@2012
Map the codelists (where possible) for
Unit
Age
Sex
Location
Area
And just one indicator, “Antenatal care coverage for
at least one visit”
1st Step: Complete
2nd Step: Confirm IUS mapping
2nd Step: Save IUS Mappings
Exercise 2: mapping time series
Use unstats.un.org/unsd/demodiweb[1-6]
Username = [email protected]
Password = support@2012
Map the time series for
1.
2.
3.
4.
5.
“Antenatal care coverage for at least four visits”
“Employment to population ratio”
“Literacy rate of 15-24 year-olds”
“Death rate associated with malaria”
“Proportion of population using solid fuels”
2nd Step: Complete
Final Step: Register the mappings
Final Step: Select mappings
Final Step: Generate SDMX-ML
Final Step: Complete
Exercise 3: Publish time series
Use unstats.un.org/unsd/demodiweb[1-6]
Username = [email protected]
Password = support@2012
Publish/ register the time series for
1.
2.
3.
4.
5.
“Antenatal care coverage for at least four visits”
“Employment to population ratio”
“Literacy rate of 15-24 year-olds”
“Death rate associated with malaria”
“Proportion of population using solid fuels”
Why the 2nd step?
The default values for SEX, LOCATION or
AGE GROUP mapping may not be
applicable to all mappings
The codelist mapping may only provide a
partial mapping of the time series (i.e. more
information is required)
These changes are made in the 2nd step.
Where are the default values?
Admin panel: Application
settings
Insert screens shot/details of admin panel and default value
storage…
Application settings has all
mapping default values
Manual mapping of SUBGROUP
• Where a subgroup value is missing the default
values will apply, for example…
Indicator
Unit
Antenatal care coverage for at least one visit - Percent
Default Values
Sex
Female
• Location = T
?
Location
Rural
• … Urban
Age
15-49 yr
15-49 yr
15-49 yr
Time Period
2000
95.6
92.2
96.5
2004
96.5
95.9
96.3
2006
98.8
98.7
98.6
2010
98
97.4
98.1
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Antenatal care coverage, at least four visits
PERCENT T
015_035_Y
F
Antenatal care coverage, at least one visit
PERCENT T
015_035_Y
F
Manual mapping of SUBGROUP
• So subgroups coverage affects the number of
manual changes which have to be made…
Indicator
Unit
Subgroup for
Age and Sex?
Antenatal care coverage rate - Percent
Location
Total
Rural
Time Period
2000
44.8
41.2
2005
71.8
70.4
2010
89.1
87.6
Urban
67.3
80.7
97
Default Values
•…
• Age Group =
000_099_Y
• Sex = T
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Antenatal care coverage, at least four visits
PERCENT T
015_035_Y
F
Antenatal care coverage, at least one visit
PERCENT T
015_035_Y
F
Manual mapping of SUBGROUP
• Common example of where default subgroup
mapping do not apply
Indicator
Subgroups?
Unit
Land under forest cover - Percent
Time Period
Data Value
1993
1997
2002
2005
2006
2008
2009
2010
59.82
58.6
61.15
60
59.09
57.99
57.6
57.56
Default Values
• Location = T
• Age Group = 000_099_Y
• Sex = Both sexes
SERIES LABEL
UNITS
LOCATION AGE_GROUP
SEX
Land area covered by forest
PERCENT
T
NA
NA
Manual mapping of SUBGROUP
• Common example of where default subgroup
mapping do not apply
Subgroup
for Sex?
Indicator
Unit
Adolescent birth rate - Births per woman
Location
Rural
Age
15-19 yr
Time Period
1995
1999
2003
2005
54.6
2008-2009
81
Total
15-19 yr
114
77
69.4
59.8
67
Default
Urban Values
•15-19
… yr
•…
• Sex = T
63.2
62
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Adolescent birth rate
PER_1000_POP
T
015_005_Y
F
Manual mapping of SUBGROUP
• Common example of where default subgroup
mapping do not apply
Indicator
Unit
Children under-five sleeping under insecticide-treated net (ITN) - Percent, 2011
Other ?
Data Value
Rural
45.9
Total
39
Urban
30.1
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Children sleeping under insecticide-treated bed nets PERCENT T
Subgroup for
Location, Age
and Sex?
000_005_Y
Default Values
• Location = T
• Age Group =
000_099_Y
• Sex = T
T
Manual mapping of SUBGROUP
• If the subgroups are sorted more simply, this also
helps with the mapping:
Indicator
Unit
Condom use at last high-risk sex - Percent
Location
Total
Sex
Age
15-24 yr
Other
Total 15-24 yr
Time Period
2000-2001
49.8
2002-2003
55.1
2004-2005
2005-2006
52.9
2006
2012
?
Rural
Female
15-24 yr
Male
15-24 yr
Female
15-24 yr
39
61
48
27.1
38.3
70
53
65.3
54.5
73
Male
15-24 yr
47.7
Default Values
• Location = T
Urban
• …Female
Male
15-24 yr
51.3
15-24 yr
66.9
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Condom use at last high-risk sex
PERCENT
T
015_010_Y
M
Condom use at last high-risk sex
PERCENT
T
015_010_Y
F
70.5
Manual mapping of SUBGROUP
• Common example of where default subgroup
mapping do not apply
Unit
Indicator
Condom use at last high-risk sex - Percent
Other
?
Time Period
2003
2008
Female Male 1515-24 yr
24 yr
32.7
29.2
51.7
46.3
Rural
Rural
Urban
Urban
female male 15- female male 1515-24 yr
24 yr
15-24 yr
24 yr
21.2
21.6
38.9
40.6
32.5
33.5
50.3
52
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Condom use at last high-risk sex
PERCENT
T
015_010_Y
M
Condom use at last high-risk sex
PERCENT
T
015_010_Y
F
Subgroup for
Location, Age
and Sex?
Default Values
• Location = T
• Age Group = 000_099_Y
• Sex = T
Back to mapping…
2nd Step: Amend Indicator
When using the check box to tick the
mapping, you are “fixing” the mapped DSD
values. If the box is unchecked again and
the mappings saved, then DSD values
revert to those mapped at codelist/ default
values (i.e. any manual changes are
undone.)
Final Step: Register new mappings
Exercise 4: Amend time series
Use unstats.un.org/unsd/demodiweb[1-6]
Map/ amend/ publish the time series for;
1.
2.
3.
4.
5.
6.
“Antenatal coverage rate”
“Children orphaned by AIDS”
“Children under-five sleeping under insecticidetreated net (ITN)”
“Proportion of births attended by skilled health
personnel”
“Share of women in wage employment in the nonagricultural sector”
“Proportion of urban population living in slums”
More complex mappings under
the 1st and 2nd mapping step?
The most common changes made to mappings
are between subgroups and the Sex, Age
Group and Location dimensions
But sometimes manual changes are required
between di and DSD indicator and unit,
either…
More than one di code relates to a single DSD code
OR
More than one DSD code relates to a single di code
Many-to-one mapping for
Indicator codelist (Example 1)
Indicator
Proportionof seats headed by women in
national parliament - Percent
Seats held by men in national parliament Number
Seats held by women in national
parliament - Number
Seats in national parliament - Number
Time
Period
1999
Data
Value
Indicator
11.1
2008
Indicator
80
2008
2008
Indicator
14
94
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Seats in national parliament
Seats in national parliament
Seats in national parliament
NUMBER
NUMBER
PERCENT
T
T
T
000_099_Y
000_099_Y
000_099_Y
M
F
F
Many-to-one mapping for
Indicator codelist (Example 2)
Indicator
Female Male 1515-24 yr
24 yr
Indicator
Time Period
Population 15-24 year-olds who have comprehensive
2003
72.2
75.1
correct knowledge of HIV/AIDS - Percent
2008
65.9
76.1
Men 15-24 years with comprehensive knowledge of AIDS 2008
- Percent
Women 15-24 years with comprehensive knowledge of
2008
AIDS - Percent
Other
Total
Indicator
34.2
28.3
SERIES LABEL
UNITS
LOCATION
AGE_GROUP
SEX
Population with comprehensive correct knowledge of HIV/AIDS
PERCENT T
015_010_Y M
Population with comprehensive correct knowledge of HIV/AIDS
PERCENT T
015_010_Y F
Many-to-one mapping for
Indicator codelist (Example 3)
Indicator
Indicator
Population below national poverty
line - Percent
Share of population below poverty
line - Percent
Location
Time Period
1993
2000
2005-2006
2010
2001
2005
2006
2008
Indicator
Total
Rural
51.2
60.3
56.7
44.9
60.4
56.9
56.9
56.9
Urban
65.7
61.9
48.7
14.3
28.5
22.1
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population below national poverty line
PERCENT
R
000_099_Y
T
Population below national poverty line
PERCENT
T
000_099_Y
T
Population below national poverty line
PERCENT
U
000_099_Y
T
Many-to-one mapping for Unit
codelist (Example 1)
Unit
Indicator
Gender parity index in primary education - Ratio
Gender parity index at primary education - Index
SERIES LABEL
Other
Time Period
2002
2003
2004
2005
2006
2007
2007-2008
2008
2009
UNITS
Gender Parity Index in primary level enrolment RATIO
Unit
Total
Rural
0.92
0.92
0.93
0.93
0.96
0.96
0.96
0.99
0.96
Urban
0.99
0.99
LOCATION AGE_GROUP SEX
T
000_099_Y
T
Manual mapping of INDICATOR
Indicator ?
Unit
Unmet need for family planning - Percent
Location
Other
Limiting
Spacing
Time Period
1992
19.4
21
2000
11.6
24
2005
13.4
24.5
2010
Total
Rural
40.4
35.6
37.9
18.9
Urban
38.4
19.5
34.4
15.5
Manual change
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Unmet need for family planning, limiting
PERCENT T
015_035_Y
F
Unmet need for family planning, spacing
PERCENT T
015_035_Y
F
Unmet need for family planning
PERCENT T
015_035_Y
F
Manual mapping of INDICATOR
Unit
•
Indicator ?
Telephone lines - Number
Location
Telephone
Cellular lines
Time Period
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
SERIES LABEL
72,602
276,034
505,627
893,035
1,165,035
1,525,125
2,697,616
5,163,414
8,554,864
9,383,734
12,828,264
Total
Fixed lines
58,261
56,147
59,472
65,793
82,495
100,777
129,863
165,788
168,481
233,533
327,114
Manual change
2,827,479
5,329,202
8,723,345
9,617,267
13,155,378
UNITS
LOCATION AGE_GROUP SEX
Mobile cellular telephone subscriptions NUMBER T
Telephone lines
NUMBER T
NA
NA
NA
NA
Manual mapping of UNIT
•
Indicator
Unit
Gender Parity Index in primary level enrolment - Percent
Time
Data Value
Period
2000
0.93
2003
0.97
2006
0.99
2009
1
SERIES LABEL
UNITS
Gender Parity Index in primary level enrolment RATIO
Manual change
• Unit = “Ratio”
LOCATION AGE_GROUP SEX
T
000_099_Y
T
Back to mapping…
1st Step: many di to 1 DSD code
2nd Step: 1 di to many DSD codes
Final Step: Register new mappings
Exercise 5: Complex time series
Use unstats.un.org/unsd/demodiweb[1-6]
Map/ amend/ publish the time series for;
1.
2.
3.
4.
5.
6.
“Contraceptive prevalence rate”
“Primary completion rate”
“Gender parity index in primary education”
“Seats held by men in national parliament”
“Seats held by women in national parliament”
“Telephone lines”
Other issues encountered with
generating SDMX from DevInfo
• The MDG DSD requires any data point to be uniquely
described by the following dimensions;
Type
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Name
Series
Units of measurement
Location
Age group
Sex
Reference Area
Time Period
• However, DevInfo allows data to be stored in overlapping
time intervals and with multiple sources. These issues need
to be resolved to conform to the “uniqueness” required by
the MDG DSD.
Multiple sources
Allowable in DevInfo but not in the DSD
Proportion of population with access to improved sanitation - Percent
Location
Total
Source
CPC Census 1995 NCEHWS_2007 NCEHWS 2003
NCEHWS 2004
Time Period
1990
11
1995
29
2000
37
2001
40
2002
42
41.6
2003
42
42.2
2004
44
44.3
2005
46
2006
47
2007
49
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population using improved sanitation facilities
PERCENT
R
000_099_Y
T
Population using improved sanitation facilities
PERCENT
T
000_099_Y
T
Population using improved sanitation facilities
PERCENT
U
000_099_Y
T
Overlapping time
• This issue is only a problem where overlapping periods
begin from the same year, as the mapping tool takes the
first year in the period as the value for the “Time Period”
dimension.
Infant mortality rate - Deaths per 1000 live births
Location
Total
Time Period
1990-1994
1995-1999
1999-2003
2005-2006
2005-2009
SERIES LABEL
UNITS
Infant mortality rate PER_1000_LIVE_BIRTHS
27.3
25.5
24.2
25.3
18.9
LOCATION AGE_GROUP SEX
T
000_001_Y
T
Targets in the database
Targets are also an issue when found in the database
since they should not be exchanged as observed values
Target in database (Example 1)
Sometimes stored as subgroup which can be ignored
at the 2nd stage…
Proportion of people living below the national poverty line - Percent
Location
Total
Rural
Urban
Other
MDG target
Time Period
1990
48
1992
46
51.8
26.5
1997
39.1
42.5
22.1
2002
33.5
37.6
19.7
2015
24
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Population below national poverty line
PERCENT
R
000_099_Y
T
Population below national poverty line
PERCENT
T
000_099_Y
T
Population below national poverty line
PERCENT
U
000_099_Y
T
Target in database (Example 2)
But other times can be found as a time period
among observed values…
Maternal mortality ratio - Deaths per 100,000 live births
Location
Total
Rural
Urban
CPC
CPI Census Governme SPC RHS
SPC RHS
SPC RHS
Source
Census
2005
nt 2007
2000
2000
2000
Time period
1995
650
2000
530
580
170
2005
405
2015
260
SERIES LABEL
UNITS
LOCATION AGE_GROUP SEX
Maternal mortality ratio
PER_100000_LIVE_BIRTHS T
000_099_Y
F
Use of filters at registration
To deal with the issues of;
multiple sources for a given time period,
overlapping time period beginning at the same
year;
And targets presented alongside observed values
The mapping tool provides a feature to filter
out data from a generated SDMX message
associated with specific time periods and
source references.
Back to mapping…
Final Step: Filter by time/ source
Final Step: Select source filter
Final Step: Select time filter
Final Step: Register new mappings
Final Step: Complete
Exercise 6: Filter time series
Use unstats.un.org/unsd/demodiweb[2-6]
Map/ amend/ publish the time series for;
1.
2.
3.
4.
5.
6.
“Under-five mortality rate”
“Maternal mortality ratio (MMR)”
“Net enrolment ratio in primary education (NER)”
“Orphans primary school enrolment”
“Tuberculosis prevalence rate”
“Proportion of the population using improved
sanitation facilities”
DSD Maintenance
• The mapping and registry tool allows users to edit
and delete the DSD as well as upload.
• For when the DSD is updated, it is recommended to
edit the DSD rather than delete
• DSD deletion has the effect of removing all the
mappings and subscriptions used for that DSD
DSD Maintenance…
DSD Header…