Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010
Download
Report
Transcript Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010
Metadata projects and tasks at
Statistics Finland
METIS 2010
Saija Ylönen
[email protected]
Organizational chart
Saija Ylönen
11/03/2010
2
Co-operating parties of the metadata tasks:
organizational units
IT Management
situated in the Secretariat of the Director General
co-ordinates the general information architecture, of which
metadata tasks form one element
Classification and Metadata Services
situated in the IT and Statistical Methods department
operational unit
active role in developing of metadata
Dissemination Services
situated in the IT and Statistical Methods department
develops the metadata connected with the dissemination
Saija Ylönen
11/03/2010
3
Metadata Co-ordination Group
Originally a co-operation group for persons working with
metadata issues in the support function departments of
SF
The objective at present is to intensify the co-operation
between the statistics departments and the parties
responsible for general metadata work
Comprised of members working on metadata and
permanent members from all statistics department
Goal is to widen knowledge about metadata and
metadata systems and to give an opportunity to the
statistics departments to discuss their metadata needs
with metadata specialists
Saija Ylönen
11/03/2010
4
CoSSI Steering Group and CoSSI model
Foundation for the metadata system
Modular, xml-based model for describing statistical tables,
classifications, concepts, variables, general information on
statistical documents, and quality, etc.
Expandable
CoSSI Steering Group is in charge of mastering and
developing the model according to user needs in a manner
that will not expose its main structure to risk
Saija Ylönen
11/03/2010
5
Definition of metadata
1) Statistical metadata
variable and data descriptions
classifications, concepts
2) Statistical data quality
quality reports
statistical method descriptions
3) Metadata of statistical documents or products
producers
publication information
field or subject area
Saija Ylönen
11/03/2010
6
Definition of metadata II
4) Process metadata
a) technical metadata
technical metadata guide the workflow of data
production, makes it possible to follow data
production and documents the working process.
b) conceptual process metadata
technical information of data and variables which are
used in producing data. E.g. minimum or maximum
values, various calculation rules or use of certain
classification values
Saija Ylönen
11/03/2010
7
Metadata systems at Statistics Finland
Saija Ylönen
11/03/2010
8
Metadata systems: present situation
We are in a transitional phase from relational databases to
an xml-based environment
Relational databases: classifications, concepts and
definitions, archiving database
Xml database eXist: publications, classifications, concepts,
data descriptions
Saija Ylönen
11/03/2010
9
Relational databases
Built in the 1990’s
Used in statistics production but not in all statistical
processes or all statistics
Classifications in the relational databases are used in SAS
and Superstar
Archiving database is in use in the archiving process
Classifications and concepts are generated from the
relational databases to the web pages
Saija Ylönen
11/03/2010
10
XML database
At the moment, the xml database is used mostly in the
creation of publications with an Arbortext word processor
Classifications and concepts are copied to the xml
database from the relational databases and are ready to
use
Tools for utilising metadata objects from the xml database
are being constructed
The first metadata tool linked to the xml database is the
variable editor
Saija Ylönen
11/03/2010
11
Variable editor
For creating and maintaining the descriptions of statistical
data and variables
At the testing phase
Implementation begins in 2010
Descriptions are saved as xml documents conforming to
the CoSSI model in the eXist/xml database
Saija Ylönen
11/03/2010
12
Content and functions of the variable editor
Data descriptions are comprised of a general description of
the data, a list of variables and information about an
individual variable
General data description includes descriptive information
on the entire data document
Variable list interleaf allows management of the list of
variables in the data description and selection of the
variable whose description needs editing.
Saija Ylönen
11/03/2010
13
Variable list interleaf
Saija Ylönen
11/03/2010
14
Variable metadata
Field name
short name
long name
concept definition
operational definition
deduction rule
classification ID
Description
Short identifying name of variable
Name of variable in natural language
Basic conceptual description of variable
Verbal description of the formation of the variable
E.g. programming instructions, mathematical formula, etc.
Identifier of classification. Refers to a classification in the
classification database.
unit of measure
variable modified
start of validity
end of validity
status
variable group
Measurement unit of variable
Date of creation or modification of variable (yyyy-mm-dd)
Start date of validity of variable (yyyy-mm-dd)
End date of validity of variable (yyyy-mm-dd)
Stage of editing of variable: draft, ready, validated
Name of group to which variable belongs. Makes working with long
variable lists easier.
work comment
Free text field. Contains information only for the use of the
maintainer of a description.
Saija Ylönen
11/03/2010
15
Results from the variable editor project
In addition to actual variable editor application the
project also created preconditions for:
the development of a consistent information architecture
the construction of production applications in which
metadata need not be separately produced or manually
added to data when publishing or archiving statistics
information service where excessive time need not be
spent on searching for metadata, or on actual reproduction
of metadata for special compilation assignments
a system from which table column and row headings can in
tabulation applications be retrieved in multiple languages
for all statistics using the same methods.
Saija Ylönen
11/03/2010
16
Experiences gained during the variable editor
project
Various questions concerning standardisation had to be
addressed in the project although they were not originally in
the projects’ scope of task – they had to be done and they
took a lot of time
Because the variable editor project was the first leg in the
revision of the metadata system it was subjected to a
diversity of expectations
Project was a good test run for the CoSSI model – the data
content of the model proved to be exhaustive
Saija Ylönen
11/03/2010
17
The planning and building of a classification editor
Reasons for the renewing of the classification system:
the present way of maintaining classifications has been
viewed as inflexible by statistics
renunciation of the Sybase relational databases
ICT strategy: in the next few years the agency will
introduce a common statistical metadata system based
on the CoSSI model
Classification editor project 2010
1) definition stage
2) construction stage
Saija Ylönen
11/03/2010
18
Goals of the classification editor project
Analyse the service needs required from a centralised
classification system
Create maintenance tools for classifications in connection
with the CoSSI/eXist metadata store so that the basic
maintenance needs of classifications of individual statistics
are met in a user-oriented manner which also allows further
development of the classification system
Produce the solutions with which the interoperability of the
Sybase classification database and the eXist
metadatabase can be ensured
Compile user instructions for the editor
Pilot test the editor
Saija Ylönen
11/03/2010
19
Benefits of the new classification system
A classification system which serves well will encourage
centralised and structured maintenance of classification
The documentation of classifications will improve, making
them easy to find for use in-house and for the provision of
information service
The new classification system will support smooth
movement between data descriptions, variable descriptions
and maintenance of classifications and thus improve the
efficiency of the maintenance and use of classifications in
statistics
Saija Ylönen
11/03/2010
20
General benefits of the common classification
system
A centralised classification system eases the workload
needed to maintain classifications because classifications
are only maintained in one place
Reduces the possibility of errors because classifications
are documented in the system consistently so that they are
accessible to everybody and easy to find
Improves the efficiency of time use because working hours
need not be spent on looking for classifications and trying
to find their background information
Makes the classifications used in different statistics visible
to everybody and thus creates possibilities for their
harmonisation
Saija Ylönen
11/03/2010
21
In conclusion: Why do some statistics departments
still have their own metadata systems instead of
using the centralized system?
Centralised metadata work progresses too slowly from the
perspective of individual statistics – We should rethink our
construction and implementation strategy
Common attitude still regards the process of an individual
set of statistics as unique, and therefore incapable of
exploiting systems that are meant for all statistics – We
have to get quick results to prove the benefits of the system
Commitment by the Management and their support to the
work is crucial – We have to convince them
Saija Ylönen
11/03/2010
22
THANK YOU FOR YOUR
ATTENTION!
Saija Ylönen
11/03/2010
23