Preparing and Importing Data into StatPlanet

Download Report

Transcript Preparing and Importing Data into StatPlanet

Nov 2013
Training Session B:
Preparing and Importing Data
www.statsilk.com
Creative Commons BY-SA 3.0
Preparing & Importing Data Into
StatPlanet
• Anatomy
of Data Editor Worksheets
• StatPlanet Data Structure
• Importing Data
• Identifying & Fixing Import Errors
Anatomy of the StatPlanet Data
Editor
Worksheet descriptions
Key StatPlanet Data Editor Worksheets
Worksheet Purpose
Import
Set up your Shapefile map
Import, organise and customise data
Alter display names for territories
Import
Names
Stores the names, aliases & variant names of places in the map
Indicates import failures
Data
Identical to ‘Import’ but uses map IDs rather than names
Generated by the ‘Save Data’ macro in ‘Import’ sheet
The data.csv file generated is identical to the contents of this
sheet
Map regions
Specify custom groupings of regions
Specify custom zooms to regions
Settings
Customise appearance and functionality of StatPlanet
4
Other StatPlanet Data Editor Worksheets
Worksheet
Purpose
TextTranslations
Contains translations of interface elements
Ability to add new languages, modify interface text
Tools
 A range of tools relating to data management – use with
caution
Includes ability to upgrade from older versions of
StatPlanet
Help
Some help on common Data Editor tasks
License
Description of license terms and conditions
5
Importing into the StatPlanet Data
Structure
How the Data Editor stores imported data
6
The StatPlanet Data Editor
Open the StatPlanet Data Editor in the Training_Materials
\StatPlanet_Vietnam folder, and go to the worksheet ‘Import’
Examine how column A contains the categories, column B the
time period, and column C the indicators. Columns D to K are
for various optional customizations and can be empty. Map
regions are in the columns from L onwards.
Run StatPlanet.exe in the same folder, and see how the data is
displayed.





Categories are displayed in the top menu – try clicking on them to
switch to a different map.
Explore the interface a bit, e.g. select an indicator in the top left
panel, move the mouse over the map regions and over the graph
‘bars’.
Close StatPlanet

7
StatPlanet Training 2013
21/07/2015
Importing Prepared Data
Go back to the StatPlanet Data Editor, and:
Click on ‘Clear data’, select ‘Yes’ to remove all data.


Click on the ‘Import data’ button

Navigate to the Training_Materials\Example_Data folder and select
the MDG_one_category_example.xlsx file
The data will then automatically be imported. It shouldn’t report any
errors. You may wish to change ‘New category’ towards the end of
the Import data to ‘Income’



Click on the Save Data button.
This saves the StatPlanet Data Editor, and updates the data.csv &
settings.csv files (or data.txt & settings.txt files)

Run StatPlanet.exe from the StatPlanet_Vietnam folder to view the
imported data within StatPlanet.
8
A closer look at how Import works
Examine the data structure in the Data Editor:




Place names are listed horizontally across the top of the worksheet
Data is organized first by Category (column A), then by Time Period
(column B) and then by Indicator (column C)
Indicator names are listed vertically
Open the MDG_one_category_example.xlsx spreadsheet:



Indicator names are listed horizontally, place names are listed
vertically
There are multiple columns repeating indicators for 4 years
Fortunately, the Import macro is clever enough to know all
this:



9
It will identify the data structure and transpose vertical / horizontal if
needed
It can also be ‘taught’ to identify multiple naming variations for
regions (the next exercise)
The StatPlanet Data Structure
CATEGORY
Category A
TIME
INDICATOR
2012


Indicator A(1)
Indicator A(2)
Indicator A(3)

2011
Indicator A(1)
Indicator A(2)
Indicator A(3)

2010

Indicator A(1)
Indicator A(2)
Indicator A(3)
Category B
2007

Indicator B(1)
Indicator B(2)
2006
Indicator B(1)
Indicator B(2)
2005
Indicator B(1)
Indicator B(2)
Category C
2001
Indicator C(1)
Indicator C(2)
10
Basic rules for the data structure:

Only one instance of each region
name
Data is structured first by category,
then by time, and then by indicator.
Time must be in order from most
recent to oldest.
Within one category, the list of
indicators must be exactly the same
for each time period
Spacing between the different
categories, times and indicators needs
to be maintained as shown in the
example left.
The Import “macro” automatically
structures your data this way
Common Data Import Failure
Fail – multiple rows or columns for a map area / region
Country
Afghanistan
Albania
Algeria
Andorra
Dimension
A
B
A
B
A
B
A
B
High Medium Low
6
3
2
7
5
2
9
6
3
12
5
2
11
5
1
21
11
9
15
14
12
14
11
10
Success
Country
A - High A - Medium A - Low B - High B - Medium B - Low
Afghanistan
6
3
2
7
5
2
Albania
9
6
3
12
5
2
Algeria
11
5
1
21
11
9
Andorra
15
14
12
14
11
10
11
Importing Unprepared Data




Open MDG_one_category_example_errors.xlsx in Excel
from the Training_Materials\Example_Data folder
Restructure the data as shown in the previous slide (see
MDG_one_category_example_errors_fixed.xlsx for an
example)
Return to the StatPlanet Data Editor (sheet “Import”)
Import the file MDG_one_category_example_errors.xlsx


Data imports but on completion, you are routed to the
‘Import names’ worksheet to see a list of failed imports
Re-open MDG_one_category_example_errors.xlsx in Excel

12
Note that not all the names in this file match the names in the
StatPlanet Data Editor
Two ways to resolve naming issues
1. Fix Source Data


Do this where name
variant is unlikely to be
used very often
Adjust the names that
failed to import in the
source data so they will
be recognised on import
13
2. Add names to Data Editor



Do this where name
version is likely to be used
often
Also consider doing this if
the edits to the source
data are very extensive
Add the region names in
the ‘Import names’
worksheet from row 3
onwards, in the
corresponding column
Fix the names in the Data Editor

Go to the ‘Import names’ worksheet and locate the list of regions
that failed to import (in column B).



Find the corresponding name that failed to import in the first row of
the worksheet



This is a list of names in the source data file which was not recognized –
because it is not listed in the sheet ‘Import names’. (The names in the
sheet ‘Import names’ were taken from the shapefile map DBF file).
Please note that punctuation & non-standard characters will have been
removed during the import
For example, ‘TPHaNoi’ in the source data is ‘HaNoiCityHanoi’ in the
sheet ‘Import names’.
A useful approach to find the corresponding names is to press CTRL+F
to search for part of the name in the document. For example, to find
the name corresponding to “TPHaNoi”, search for ‘Hanoi’. To find the
name corresponding to “DakNong”, search for “Nong”.
Add TPHaNoi to the list of ‘recognized name variants’ of Hanoi city,
by copying the name TPHaNoi and inserting it in column GI, row 3.
Repeat this for the other unrecognized names.
14
Fix the names in the Data Editor


“HaTay” cannot be found anywhere, because it does not exist
in the map since it became part of “Hanoi”. You need to go to
the sheet “Import” to add it (if required). Scroll all the way to
the right, to column PI, and enter it in row 2.
Click on “Save data”.


In the same sheet (“Import”) remove the imported rows


This adds “HaTay” to the sheets “Import names” and “Data”, and it
will now be displayed as a “map point” in StatPlanet.
They need to be removed as data for some map regions was not
imported the first time you did it.
Save, close and re-open the StatPlanet Data Editor. Then
import the source file again

15
There should be no more ‘Import names’ errors. and this time data
for all map regions should be imported.
Importing Bulk Data





Return to the StatPlanet Data Editor (Import
spreadsheet)
Import “MDG_multiple_categories_example.xlsx” from the
Training_Materials\Example_Data folder
Note the large number of rows added by this import
Open the file MDG_multiple_categories_example.xlsx and
note that the sheet names correspond to the categories
imported in the StatPlanet Data Editor.
Click on the Save Data button & re-open StatPlanet.exe to
see new data for multiple categories.
16
Data Editor – Preventing errors

Sorting is strongly discouraged within the Data Editor – many
features will only continue to work if they are left in the right
order:



Where data fails to import properly, make necessary fixes and
re-import the data.



The relationships and column ordering between the map columns in
the worksheets Import, Import names and Data are crucial
Sorting on the Import worksheet could destroy the data structure
Importing data is preferable over inserting or re-arranging data
manually, as importing automatically sets the right data structure.
Please note that the ‘Clear data’ button clears everything and
it is not possible to undo this.
Making frequent backups is a good idea to be able to go back
to a previous working version in case things go wrong.
17
Successfully Preparing Data for Import Summary

Preparing the Source data file
There should be only one row (or column) of data for each
region in the source data, per Excel worksheet.
 Make sure that each row and column in your source data has
clear headers, such as the indicator name, or the map region
name. For example, indicator names in row 1, and region
names in column A.
 Clearly identify dates in the source data (if data is time-series)
 If you wish to import data as separate categories, organize it so
that there is one category in each Excel worksheet.


Correlate names in source data and in the Data Editor

18
Either fix the source data, or add new names (spelling variants)
into the StatPlanet Data Editor, sheet ‘Import names’