How to prepare data for data integration in SeaDataNet V1?

Download Report

Transcript How to prepare data for data integration in SeaDataNet V1?

SeaDataNet annual meeting, Madrid, 25-27 March 2009
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
How to prepare data for
integration in SeaDataNet V1?
M. Fichaut, R. Lowry, R. Schlitzer
www.seadatanet.org
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Overview
• 2 parts
• First part gives an overview of SeaDataNet system
and of the available tools that can be used by
SeaDataNet partners, and details some practical use
cases
• Second part is dedicated to ODV version 4
presentation
www.seadatanet.org
2
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SDN V1 – Data centres
• In SeaDataNet version 1 : 2 types of data centres
• Pilot data centres (11 TTT members + volunteers)
• Automatic data download from their system to SeaDataNet
portal
• Requires a minimal technical infrastructure “Application server
like TOMCAT or IIS” and software implementation including
“Download manager” and “Coupling table”
• Other data centres
• Manual preparation of data for downloading by SeaDataNet
Web portal
www.seadatanet.org
3
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
SeaDatanet
Vocabulary
Metadata
in
Database
Metadata
Input
Metadata
In
Excel files
XML Validator
MIKADO
XML
Metadata
Files
MIKADO
CSR
Coupling
table
Data in
Database
Data Input
EDMED
Collection
Collection
NEMO
of
of
ASCII files
ASCII files
Format SDN
Format X Med2MedSDN
EDMERP
CDI
ODV
www.seadatanet.org
S
E
A
D
A
T
A
N
E
T
Local copy
of data
to download
Partner system : pilot data centre
Data request
Data download
P
O
R
T
A
L
European portal
4
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
SeaDatanet
Vocabulary
Metadata
Input
Metadata
In
Excel files
MIKADO
Metadata
in
Database
XML Validator
MIKADO
XML
Metadata
Files
Data in
Database
Data Input
CSR
EDMED
Collection
Collection
NEMO
of
of
ASCII files
ASCII files Med2MedSDNFormat SDN
Format X
EDMERP
Data request
by email
CDI
ODV
www.seadatanet.org
S
E
A
D
A
T
A
N
E
T
Manual
preparation
of data
Local copy
of data
to download
Partner system : other data centre
P
O
R
T
A
L
Data download
European portal
5
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Summary
• SeaDataNet Vocabulary
• SeaDataNet formats
• SeaDataNet reformatting tools : NEMO and Med2MedSDN
• MIKADO tool and XML validator
• Interaction of these tools with the download manager
• Some use cases
• ODV version 4
www.seadatanet.org
6
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet vocabulary
• SeaDataNet vocabularies populate many metadata fields and the
parameter descriptions in data
• They are delivered through a Vocabulary Server
• May be viewed through a client on the SeaDataNet web site
(http://seadatanet.maris2.nl/v_bodc_vocab/welcome.aspx)
• May be accessed programmatically as described in Athens
(IMDIS 2008 conference)
• Master copy of vocabularies always accessible from a wellknown location (BODC)
• Vocabularies developed through the group governance of the
SeaDataNet TTT or wider international bodies (SeaVoX, ICES
platforms)
www.seadatanet.org
7
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet vocabulary
• Vocabularies in metadata
• Most partners will encounter vocabularies in metadata
through Mikado and NEMO tools
• Most common problem will be that an entry required in a
vocabulary isn’t there
• For example a ship required for a CSR record isn’t present
in the C174 list.
• If this happens, contact the SeaDataNet help desk
• They will advise what you should do and contact Roy if
necessary
www.seadatanet.org
8
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet vocabulary
• Vocabularies in metadata
• Adding new entries involves:
• Proposals for change are discussed on the appropriate
e-mail list
• Editing the master vocabulary database
• Publication of the changes
• This takes time so please send requests as soon as
possible and be patient.
www.seadatanet.org
9
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet vocabulary
• Vocabularies in metadata
• Ship codes
• If the ship isn’t present in the full ICES list as published
on the ICES web site (the SeaDataNet Ship and
Platform Codes at
http://www.ices.dk/datacentre/reco/reco.asp) a new
code must be obtained from ICES
• This has caused delays
• New on-line application system now available that will
streamline the process (next April)
www.seadatanet.org
10
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet vocabulary
• Vocabularies in data
• Parameters are labelled using terms from the P011 vocabulary
• This is comprehensive, but very large (21,000 terms)
• Thesaurus navigation tool on the SeaDataNet web site
(http://seadatanet.maris2.nl/v_bodc_vocab/vocabrelations.aspx?list=P0
81) helps a lot
• Mapping for MEDATLAS parameter codes under construction
and accessible through NEMO and Med2MedSDN tools
• Report other mapping problems to the SeaDataNet help desk
• Roy will provide whatever help he can
www.seadatanet.org
11
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet formats
• ASCII formats
• defined for vertical profiles, times-series and trajectories
• ODV mandatory
• MEDATLAS optional
• NetCDF format
• CF (Climate and Forecast) compatible
• For gridded data (model output, satellite data and data syntheses)
• Also for other types of data difficult to handle in ASCII formats,
due to their large volume or structural complexity
• Still being defined
http://www.seadatanet.org/standards_software/data_transport_formats
www.seadatanet.org
12
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet extensions to ODV and MEDATLAS (1)
• SeaDataNet format extensions fulfil two functions
• Provide a linkage between data and metadata
• ODV : 2 additional columns :
LOCAL_CDI_ID and EDMO_CODE of the data centre providing the CDI
• MEDATLAS : 2 additional comment lines with key-words :
* LOCAL_CDI_ID =
* EDMO_CODE =
• Provide a linkage to standardised SeaDataNet semantic
information such as detailed parameter descriptions
• ODV and MEDATLAS : additional comment lines
www.seadatanet.org
13
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 SeaDataNet extensions to ODV and MEDATLAS (2)
• Additional Comment lines for parameter mapping
• ODV
//SDN_parameter_mapping
//<subject>SDN:LOCAL:DEPH</subject><object>SDN:P011::ADEPZZ01</object><unit>SDN:P061::ULAA</unit>
//<subject>SDN:LOCAL:TEMP</subject><object>SDN:P011::TEMPPR01</object><unit>SDN:P061::UPAA</unit>
• MEDATLAS
*SDN_parameter_mapping
*<subject>SDN:LOCAL:PRES</subject><object>SDN:P011::PRESPR01</object><unit>SDN:P061::UPDB</unit>
*<subject>SDN:LOCAL:TEMP</subject><object>SDN:P011::TEMPPR01</object><unit>SDN:P061::UPAA</unit>
www.seadatanet.org
14
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Tools to generate SeaDataNet ASCII formats
NEMO
• JAVA tool to reformat ASCII files to SeaDataNet ODV and MEDATLAS
formats - available under Windows
• Version 1.2.0 and user manual available at :
• http://www.seadatanet.org/standards_software/software/nemo
Med2MedSDN
• Java tool to translate MEDATLAS files to SeaDataNet MEDATLAS files
- available under Windows
• Version 1.0 and user manual available at :
• http://www.seadatanet.org/standards_software/software/Med2MedSDN
www.seadatanet.org
15
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO main features
• Reformat any ASCII file of vertical profiles, time-series or
trajectories to a SeaDataNet ASCII format (ODV, MEDATLAS).
• The input ASCII files can be :
• one file per station for vertical profiles or time series
• one file for one cruise for vertical profiles, time series or
trajectories
• Related to cruises or not
• If not related to cruise, only ODV re-formatting is
available
www.seadatanet.org
16
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO main principles
• Users of NEMO describe the entry files format so that NEMO is able to
find the information which is necessary in the SeaDataNet formats.
• One pre-requirement is that all entry files processed at the same time
by NEMO must be at the same format : the information about the
stations must :
• be located at the same position : same line in the file, same
position on the line or same column if CSV format
• be in the same format,
for example : for all the stations the latitude is :
• on line 3 on the station header,
• from character 21 to character 27, or 3rd column in CSV
• the format is +DD.ddd
• Other pre-requirement is that data must be provided in columns in
the data files.
www.seadatanet.org
17
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO – 5 steps for file conversion
www.seadatanet.org
18
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO : 5 steps
• Description of the file
• Description of the cruise :
input manually or import of
CSR XML V1
• Description of the station
header
• Description of the measured
parameters
• File conversion
• Model can be saved and
reused
www.seadatanet.org
19
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO – Description of the input files (1)
www.seadatanet.org
20
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO – Description of the input files (2)
www.seadatanet.org
21
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO new functionalities
• Trajectories taken into account
• SeaDataNet extensions to ODV and MEDATLAS formats
• Possibility to keep quality flags if existing in input files and to map
them to SeaDataNet QC flags scale
• Generation of a CDI summary file directly usable by MIKADO to
generate XML CDI exports
• Generation of the coupling file to make the mapping between a
LOCAL_CDI_ID (one profile, one time-series or one trajectory) and
the name of the file containing this LOCAl_CDI_ID. This coupling file
is used by the download manage
www.seadatanet.org
22
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO next version will
• Correct the known bugs and the new ones if detected by users
• Take into account the last release of ODV format with date ISO8601 and data type ‘*’
• Improve time response for conversion of large volume files and
for vocabulary update
• Take into account the ODV multi-station files as input of NEMO
• Be tested under Unix and Linux
• Be released in June 2009
www.seadatanet.org
23
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Med2MedSDN main features
• Reformats MEDATLAS files to MEDATLAS SeaDataNet
format
• Java tool, bilingual (French, English)
• Adds the additional SeaDataNet information : mapping for
parameters and LOCAL_CDI_ID and EDMO_CODE
• Able to reformat one file or a large number of files (in one
directory)
• Linked to SeaDataNet vocabularies through Web services
for parameters mapping and for list of EDMO codes
• Need of internet connexion while updating lists
www.seadatanet.org
24
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Med2MedSDN main screen
www.seadatanet.org
25
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Med2MedSDN log files
•Errors are registered in a log file which can be open through
Med2MedSDN main screen by clicking on “see log” in the error
window
• One line in the log file is composed as following:
• Date, Name of the Software, Error severity level, Error message
INFO
Informative messages for starting of the conversion or successful conversion
ERROR
For conversion errors : conversion is cancelled on the current file but
continues on the other files
FATAL
For conversion errors which stop the processing of the files
www.seadatanet.org
26
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Med2MedSDN log file
www.seadatanet.org
27
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Med2MedSDN next version will
• Take into account the creation of the coupling file for
SeaDataNet download manager
• Be released in June 2009
www.seadatanet.org
28
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Tool to generate XML meta-data files
MIKADO
• JAVA tool to generate XML descriptions of SeaDataNet
catalogues
• EDMED : catalogue of Marine Environmental Datasets
• EDMERP : Marine Environmental Research Projects
• CSR : Cruise Summary Reports
• CDI : Common Data Index
• Version 1.5 and user manual available at :
http://www.seadatanet.org/standards_software/software/mikado
www.seadatanet.org
29
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 MIKADO version 1.5
• New functionalities
• Download EDMED files directly from central BODC catalogue
through Web services: for the time being, awaiting the new
EDMED V1 user interface developments
• World map to manage Marsden squares for CSR
• Data centre type options for CDI (SeaDataNet, ECOOP) : to allow
other data Website than SeaDataNet
• Mapping download from BODC : to get existing mappings from
BODC web site
• Sybase driver for JDBC
• Vocabulary update without restarting Mikado
www.seadatanet.org
30
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Next versions of MIKADO
• Version 1.6
• Being tested now
• Available next May
• Able to generate coupling.txt file used by the download
manager, for data stored in ASCII files or in relational data
base
• Version 1.7
• EDIOS
• Released by the end of 2009
www.seadatanet.org
31
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 NEMO to MIKADO to SeaDataNet CDI
Collection
of
ASCII files
ASCII SDN
files
CDI summary
CSV file
Explanation in NEMO user manual
www.seadatanet.org
XML
CDI files
SeaDataNet
CDI
MIKADO
Summary_CDI_NEMO.xml
32
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Links with SDN Download manager
Coupling
table
Med2MedSDN
Coupling.txt
File
Modus 1,3
Coupling.txt
File
Modus 1,3
Coupling.txt
File
Modus 1,2,3
MIKADO
www.seadatanet.org
Download
manager
S
E
A
D
A
T
SeaDataNet
A
portalN
E
T
• Modus 1 : data in mono-station file
• Modus 2 : data in database
• Modus 3 : Data in multi-station file
P
O
R
T
A
L
33
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
SeaDatanet
Vocabulary
Metadata
in
Database
Metadata
Input
Metadata
In
Excel files
XML Validator
MIKADO
XML
Metadata
Files
MIKADO
CSR
Coupling
table
Data in
Database
Data Input
EDMED
Collection
Collection
NEMO
of
of
ASCII files
ASCII files
Format SDN
Format X Med2MedSDN
EDMERP
CDI
ODV
www.seadatanet.org
S
E
A
D
A
T
A
N
E
T
Local copy
of data
to download
Partner system : pilot data centre
Data request
Data download
P
O
R
T
A
L
34
European portal
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Use cases
• Pre-requirement for all use cases is :
• Preparation of the mapping between your metadata and :
• SeaDataNet vocabularies : Sea areas, BODC parameters
(PDV), Platform classes, SDN device categories ….
• some mapping is already available on BODC Web site :
• MEDATLAS to PDV, MEDATLAS units to BODC
storage units
• EDMO : Marine organisations
• EDMERP : Marine environmental projects
(Incremental mapping managed by MIKADO)
• Quality checks of the data must be done using ODV or other
software, before sending metadata to CDI
www.seadatanet.org
35
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Use case 1 – collection of XBTs or CTDs or
Time-series files – no relational database
1. Verify that all files of the collection have homogeneous format
2. Run NEMO
• to convert the files to SeaDataNet ODV
• to generate a CDI summary file
• [To generate the coupling.txt file for these data]
3. Run MIKADO to generate the XML CDI files with the
configuration file delivered with NEMO for the CDI summary file
4. Use the XML validator to validate your XML files
5. [Implement the coupling file]
www.seadatanet.org
6. Send the XML CDI files to central catalogue
36
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Use case 2 – collection of MEDATLAS files and
metadata in relational database
1. Run Med2MedSDN
•
to convert MEDATLAS files to MEDATLAS SDN files
•
[To generate the coupling.txt file table for these MEDATLAS
SDN files]
2. Run MIKADO on the metadata database
•
to generate the XML CDI descriptions of the stations of these
MEDATLAS files.
•
[To generate the coupling.txt file table for these MEDATLAS
SDN files]
3. Use the XML validator to validate your XML files
4. [Implement the coupling file]
5. Send the XML CDI files to central catalogue
www.seadatanet.org
37
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Use case 3 – collection of ASCII files and
metadata in relational database
1. Run NEMO
• to convert ASCII files to ODV [and MEDATLAS] SDN files
• [to generate the coupling.txt file table for these SDN files]
2. Run MIKADO on the metadata database
• to generate the XML CDI descriptions of the stations of
these files.
• [to generate the coupling.txt file table for these SDN files]
3. Use the XML validator to validate your XML files
4. [Implement the coupling file]
www.seadatanet.org
5. Send the XML CDI files to central catalogue
38
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
 Use case 4 – XBTs, CTDs, Time-series
measurements – data and metadata in a
relational database
1. Run MIKADO
• To create the configuration to retrieve metadata on these
data in the database
• To export the XML CDI corresponding files
2. Run MIKADO to create the coupling table with the
appropriate select statement to retrieve these measurements
in the database
3. Use the XML validator to validate your XML files
4. Implement the coupling file
5. Send the XML CDI files to central catalogue
www.seadatanet.org
39
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
NEMO and Med2MedSDN
demonstrations are possible ,
just ask!!!
Questions or problems on MIKADO are
welcome too.
www.seadatanet.org
40
OBSERVATIONS
& PRÉVISIONS CÔTIÈRES
SeaDataNet annual meeting, Madrid, 25-27 March 2009
And now
All about ODV,
version 4, by
Reiner
……………
www.seadatanet.org
41