Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010

Download Report

Transcript Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010

Metadata projects and tasks at
Statistics Finland
METIS 2010
Saija Ylönen
[email protected]
Organizational chart
Saija Ylönen
11/03/2010
2
Co-operating parties of the metadata tasks:
organizational units
IT Management
 situated in the Secretariat of the Director General
 co-ordinates the general information architecture, of which
metadata tasks form one element
 Classification and Metadata Services
 situated in the IT and Statistical Methods department
 operational unit
 active role in developing of metadata
 Dissemination Services
 situated in the IT and Statistical Methods department
 develops the metadata connected with the dissemination

Saija Ylönen
11/03/2010
3
Metadata Co-ordination Group
Originally a co-operation group for persons working with
metadata issues in the support function departments of
SF
 The objective at present is to intensify the co-operation
between the statistics departments and the parties
responsible for general metadata work
 Comprised of members working on metadata and
permanent members from all statistics department
 Goal is to widen knowledge about metadata and
metadata systems and to give an opportunity to the
statistics departments to discuss their metadata needs
with metadata specialists

Saija Ylönen
11/03/2010
4
CoSSI Steering Group and CoSSI model
Foundation for the metadata system
 Modular, xml-based model for describing statistical tables,
classifications, concepts, variables, general information on
statistical documents, and quality, etc.
 Expandable
 CoSSI Steering Group is in charge of mastering and
developing the model according to user needs in a manner
that will not expose its main structure to risk

Saija Ylönen
11/03/2010
5
Definition of metadata
1) Statistical metadata
 variable and data descriptions
 classifications, concepts
 2) Statistical data quality
 quality reports
 statistical method descriptions
 3) Metadata of statistical documents or products
 producers
 publication information
 field or subject area

Saija Ylönen
11/03/2010
6
Definition of metadata II

4) Process metadata
 a) technical metadata
 technical metadata guide the workflow of data
production, makes it possible to follow data
production and documents the working process.
 b) conceptual process metadata
 technical information of data and variables which are
used in producing data. E.g. minimum or maximum
values, various calculation rules or use of certain
classification values
Saija Ylönen
11/03/2010
7
Metadata systems at Statistics Finland
Saija Ylönen
11/03/2010
8
Metadata systems: present situation
We are in a transitional phase from relational databases to
an xml-based environment
 Relational databases: classifications, concepts and
definitions, archiving database
 Xml database eXist: publications, classifications, concepts,
data descriptions

Saija Ylönen
11/03/2010
9
Relational databases
Built in the 1990’s
 Used in statistics production but not in all statistical
processes or all statistics
 Classifications in the relational databases are used in SAS
and Superstar
 Archiving database is in use in the archiving process
 Classifications and concepts are generated from the
relational databases to the web pages

Saija Ylönen
11/03/2010
10
XML database
At the moment, the xml database is used mostly in the
creation of publications with an Arbortext word processor
 Classifications and concepts are copied to the xml
database from the relational databases and are ready to
use
 Tools for utilising metadata objects from the xml database
are being constructed
 The first metadata tool linked to the xml database is the
variable editor

Saija Ylönen
11/03/2010
11
Variable editor
For creating and maintaining the descriptions of statistical
data and variables
 At the testing phase
 Implementation begins in 2010
 Descriptions are saved as xml documents conforming to
the CoSSI model in the eXist/xml database

Saija Ylönen
11/03/2010
12
Content and functions of the variable editor
Data descriptions are comprised of a general description of
the data, a list of variables and information about an
individual variable
 General data description includes descriptive information
on the entire data document
 Variable list interleaf allows management of the list of
variables in the data description and selection of the
variable whose description needs editing.

Saija Ylönen
11/03/2010
13
Variable list interleaf
Saija Ylönen
11/03/2010
14
Variable metadata
Field name
short name
long name
concept definition
operational definition
deduction rule
classification ID
Description
Short identifying name of variable
Name of variable in natural language
Basic conceptual description of variable
Verbal description of the formation of the variable
E.g. programming instructions, mathematical formula, etc.
Identifier of classification. Refers to a classification in the
classification database.
unit of measure
variable modified
start of validity
end of validity
status
variable group
Measurement unit of variable
Date of creation or modification of variable (yyyy-mm-dd)
Start date of validity of variable (yyyy-mm-dd)
End date of validity of variable (yyyy-mm-dd)
Stage of editing of variable: draft, ready, validated
Name of group to which variable belongs. Makes working with long
variable lists easier.
work comment
Free text field. Contains information only for the use of the
maintainer of a description.
Saija Ylönen
11/03/2010
15
Results from the variable editor project
In addition to actual variable editor application the
project also created preconditions for:
 the development of a consistent information architecture
 the construction of production applications in which
metadata need not be separately produced or manually
added to data when publishing or archiving statistics
 information service where excessive time need not be
spent on searching for metadata, or on actual reproduction
of metadata for special compilation assignments
 a system from which table column and row headings can in
tabulation applications be retrieved in multiple languages
for all statistics using the same methods.
Saija Ylönen
11/03/2010
16
Experiences gained during the variable editor
project
Various questions concerning standardisation had to be
addressed in the project although they were not originally in
the projects’ scope of task – they had to be done and they
took a lot of time
 Because the variable editor project was the first leg in the
revision of the metadata system it was subjected to a
diversity of expectations
 Project was a good test run for the CoSSI model – the data
content of the model proved to be exhaustive

Saija Ylönen
11/03/2010
17
The planning and building of a classification editor
Reasons for the renewing of the classification system:
 the present way of maintaining classifications has been
viewed as inflexible by statistics
 renunciation of the Sybase relational databases
 ICT strategy: in the next few years the agency will
introduce a common statistical metadata system based
on the CoSSI model
 Classification editor project 2010
 1) definition stage
 2) construction stage

Saija Ylönen
11/03/2010
18
Goals of the classification editor project
Analyse the service needs required from a centralised
classification system
 Create maintenance tools for classifications in connection
with the CoSSI/eXist metadata store so that the basic
maintenance needs of classifications of individual statistics
are met in a user-oriented manner which also allows further
development of the classification system
 Produce the solutions with which the interoperability of the
Sybase classification database and the eXist
metadatabase can be ensured
 Compile user instructions for the editor
 Pilot test the editor

Saija Ylönen
11/03/2010
19
Benefits of the new classification system
A classification system which serves well will encourage
centralised and structured maintenance of classification
 The documentation of classifications will improve, making
them easy to find for use in-house and for the provision of
information service
 The new classification system will support smooth
movement between data descriptions, variable descriptions
and maintenance of classifications and thus improve the
efficiency of the maintenance and use of classifications in
statistics

Saija Ylönen
11/03/2010
20
General benefits of the common classification
system
A centralised classification system eases the workload
needed to maintain classifications because classifications
are only maintained in one place
 Reduces the possibility of errors because classifications
are documented in the system consistently so that they are
accessible to everybody and easy to find
 Improves the efficiency of time use because working hours
need not be spent on looking for classifications and trying
to find their background information
 Makes the classifications used in different statistics visible
to everybody and thus creates possibilities for their
harmonisation

Saija Ylönen
11/03/2010
21
In conclusion: Why do some statistics departments
still have their own metadata systems instead of
using the centralized system?
Centralised metadata work progresses too slowly from the
perspective of individual statistics – We should rethink our
construction and implementation strategy
 Common attitude still regards the process of an individual
set of statistics as unique, and therefore incapable of
exploiting systems that are meant for all statistics – We
have to get quick results to prove the benefits of the system
 Commitment by the Management and their support to the
work is crucial – We have to convince them

Saija Ylönen
11/03/2010
22
THANK YOU FOR YOUR
ATTENTION!
Saija Ylönen
11/03/2010
23