Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010
Download ReportTranscript Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010
Metadata projects and tasks at Statistics Finland METIS 2010 Saija Ylönen [email protected] Organizational chart Saija Ylönen 11/03/2010 2 Co-operating parties of the metadata tasks: organizational units IT Management situated in the Secretariat of the Director General co-ordinates the general information architecture, of which metadata tasks form one element Classification and Metadata Services situated in the IT and Statistical Methods department operational unit active role in developing of metadata Dissemination Services situated in the IT and Statistical Methods department develops the metadata connected with the dissemination Saija Ylönen 11/03/2010 3 Metadata Co-ordination Group Originally a co-operation group for persons working with metadata issues in the support function departments of SF The objective at present is to intensify the co-operation between the statistics departments and the parties responsible for general metadata work Comprised of members working on metadata and permanent members from all statistics department Goal is to widen knowledge about metadata and metadata systems and to give an opportunity to the statistics departments to discuss their metadata needs with metadata specialists Saija Ylönen 11/03/2010 4 CoSSI Steering Group and CoSSI model Foundation for the metadata system Modular, xml-based model for describing statistical tables, classifications, concepts, variables, general information on statistical documents, and quality, etc. Expandable CoSSI Steering Group is in charge of mastering and developing the model according to user needs in a manner that will not expose its main structure to risk Saija Ylönen 11/03/2010 5 Definition of metadata 1) Statistical metadata variable and data descriptions classifications, concepts 2) Statistical data quality quality reports statistical method descriptions 3) Metadata of statistical documents or products producers publication information field or subject area Saija Ylönen 11/03/2010 6 Definition of metadata II 4) Process metadata a) technical metadata technical metadata guide the workflow of data production, makes it possible to follow data production and documents the working process. b) conceptual process metadata technical information of data and variables which are used in producing data. E.g. minimum or maximum values, various calculation rules or use of certain classification values Saija Ylönen 11/03/2010 7 Metadata systems at Statistics Finland Saija Ylönen 11/03/2010 8 Metadata systems: present situation We are in a transitional phase from relational databases to an xml-based environment Relational databases: classifications, concepts and definitions, archiving database Xml database eXist: publications, classifications, concepts, data descriptions Saija Ylönen 11/03/2010 9 Relational databases Built in the 1990’s Used in statistics production but not in all statistical processes or all statistics Classifications in the relational databases are used in SAS and Superstar Archiving database is in use in the archiving process Classifications and concepts are generated from the relational databases to the web pages Saija Ylönen 11/03/2010 10 XML database At the moment, the xml database is used mostly in the creation of publications with an Arbortext word processor Classifications and concepts are copied to the xml database from the relational databases and are ready to use Tools for utilising metadata objects from the xml database are being constructed The first metadata tool linked to the xml database is the variable editor Saija Ylönen 11/03/2010 11 Variable editor For creating and maintaining the descriptions of statistical data and variables At the testing phase Implementation begins in 2010 Descriptions are saved as xml documents conforming to the CoSSI model in the eXist/xml database Saija Ylönen 11/03/2010 12 Content and functions of the variable editor Data descriptions are comprised of a general description of the data, a list of variables and information about an individual variable General data description includes descriptive information on the entire data document Variable list interleaf allows management of the list of variables in the data description and selection of the variable whose description needs editing. Saija Ylönen 11/03/2010 13 Variable list interleaf Saija Ylönen 11/03/2010 14 Variable metadata Field name short name long name concept definition operational definition deduction rule classification ID Description Short identifying name of variable Name of variable in natural language Basic conceptual description of variable Verbal description of the formation of the variable E.g. programming instructions, mathematical formula, etc. Identifier of classification. Refers to a classification in the classification database. unit of measure variable modified start of validity end of validity status variable group Measurement unit of variable Date of creation or modification of variable (yyyy-mm-dd) Start date of validity of variable (yyyy-mm-dd) End date of validity of variable (yyyy-mm-dd) Stage of editing of variable: draft, ready, validated Name of group to which variable belongs. Makes working with long variable lists easier. work comment Free text field. Contains information only for the use of the maintainer of a description. Saija Ylönen 11/03/2010 15 Results from the variable editor project In addition to actual variable editor application the project also created preconditions for: the development of a consistent information architecture the construction of production applications in which metadata need not be separately produced or manually added to data when publishing or archiving statistics information service where excessive time need not be spent on searching for metadata, or on actual reproduction of metadata for special compilation assignments a system from which table column and row headings can in tabulation applications be retrieved in multiple languages for all statistics using the same methods. Saija Ylönen 11/03/2010 16 Experiences gained during the variable editor project Various questions concerning standardisation had to be addressed in the project although they were not originally in the projects’ scope of task – they had to be done and they took a lot of time Because the variable editor project was the first leg in the revision of the metadata system it was subjected to a diversity of expectations Project was a good test run for the CoSSI model – the data content of the model proved to be exhaustive Saija Ylönen 11/03/2010 17 The planning and building of a classification editor Reasons for the renewing of the classification system: the present way of maintaining classifications has been viewed as inflexible by statistics renunciation of the Sybase relational databases ICT strategy: in the next few years the agency will introduce a common statistical metadata system based on the CoSSI model Classification editor project 2010 1) definition stage 2) construction stage Saija Ylönen 11/03/2010 18 Goals of the classification editor project Analyse the service needs required from a centralised classification system Create maintenance tools for classifications in connection with the CoSSI/eXist metadata store so that the basic maintenance needs of classifications of individual statistics are met in a user-oriented manner which also allows further development of the classification system Produce the solutions with which the interoperability of the Sybase classification database and the eXist metadatabase can be ensured Compile user instructions for the editor Pilot test the editor Saija Ylönen 11/03/2010 19 Benefits of the new classification system A classification system which serves well will encourage centralised and structured maintenance of classification The documentation of classifications will improve, making them easy to find for use in-house and for the provision of information service The new classification system will support smooth movement between data descriptions, variable descriptions and maintenance of classifications and thus improve the efficiency of the maintenance and use of classifications in statistics Saija Ylönen 11/03/2010 20 General benefits of the common classification system A centralised classification system eases the workload needed to maintain classifications because classifications are only maintained in one place Reduces the possibility of errors because classifications are documented in the system consistently so that they are accessible to everybody and easy to find Improves the efficiency of time use because working hours need not be spent on looking for classifications and trying to find their background information Makes the classifications used in different statistics visible to everybody and thus creates possibilities for their harmonisation Saija Ylönen 11/03/2010 21 In conclusion: Why do some statistics departments still have their own metadata systems instead of using the centralized system? Centralised metadata work progresses too slowly from the perspective of individual statistics – We should rethink our construction and implementation strategy Common attitude still regards the process of an individual set of statistics as unique, and therefore incapable of exploiting systems that are meant for all statistics – We have to get quick results to prove the benefits of the system Commitment by the Management and their support to the work is crucial – We have to convince them Saija Ylönen 11/03/2010 22 THANK YOU FOR YOUR ATTENTION! Saija Ylönen 11/03/2010 23