Metadata Driven Integrated Statistical Data Management System CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MSIS 2004, Geneva May 17 - 19
Download ReportTranscript Metadata Driven Integrated Statistical Data Management System CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MSIS 2004, Geneva May 17 - 19
Metadata Driven Integrated Statistical Data Management System CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MSIS 2004, Geneva May 17 - 19 META DATA DRIVEN ... ? Any action within the system is ruled by metadata, Meta data is the key element of the system, All software modules of entire system is connected with the Core Metadata module (Meta data base). Any changes within the system starts with the changes of meta data Full cycle of the data processing is possible as late as the proper description process in meta data base are completed INTEGRATED ... ? Most of the system software modules are connected with the Registers module, Registers module is an integral part of the system, All surveys are supported by adequate classifications stored in the Meta data base In all surveys respondent data fields are connected with registers data All data is stored in corporative data warehouse Statistical data processing has split in unified steps for different surveys Export / Import procedures ensure work with the system data files using different standard software packages Advantages and Restrictions Advantages 1. At most standardized main business statistics data entry, processing and storage procedures, that provide the bases for transfer from stove pipe data processing approach to process oriented data processing approach. 2. Centralized processing and storage of the statistical data, including metadata, by using data warehouse technologies and OLAP tools. 3. All the data processing procedures are being hosted from common metadata system. These procedures are being described in metadata base. Therefore for standardized procedure execution for each survey individual programming is not required. 4. The system is informatively connected with Business Register, which provides with the direct respondent data retrieval and updating. 5. Special import and export procedure is created for data exchange with other systems. 6. A link with PC Axis is created for electronic data dissemination. Restrictions 1.The system is oriented towards the data processing of different periodicity business statistics surveys. 2.Metadata base does not foreseen description of confidentiality rules they are hard coded in the system. 3. Hardware and Standard software requirements: PC’s >/= Pentium II, RAM >/=128Mb equipped with W – 95 to W-2000 and MS Office 2000. 4. Metadata base does not foreseen description of algorithm for automatic creation of respondents lists for Sample surveys from the Business register frame. 5. Diagnostic tools for the metadata descriptions are not powerful enough, therefore experts preparing meta data descriptions should be of high experience. ISDMS architecture Integrated statistical data management system Corporative data Warehouse Registers base OLAP data base Macrodata base Microdata base Dissemination data base FIREWALL Metadata base User administration data base CSB Web Site Raw data base Windows 2000 Server Advanced MS Internet Information Server SQL server 2000, PC-Axis ISDMS Business application Software Modules Core metadata base module Registers module Data entry and validation module Data aggregation module Data analysis module related with DB: related with DB: related with DB: related with DB: related with DB: METADATA MICRODATA REGISTERS USER ADMINISTRATION METADATA REGISTERS USER ADMINISTRATION USER ADMINISTRATION METADATA MICRODATA REGISTERS USER ADMINISTRATION Data dissemination module Data WEB entry module Data mass entry module Missed data imputation module related with DB: related with DB: METADATA MICRODATA REGISTERS USER ADMINISTRATION METADATA MICRODATA REGISTERS DATA IMPUTATION SOFTWARE related with DB: METADATA MACRODATA REGISTERS USER ADMINISTRATION related with DB: METADATA MICRODATA REGISTERS RAW DATABASE USER ADMINISTRATION OLAP METADATA MACRODATA User administration module related with DB: METADATA MICRODATA MACRODATA USER ADMINISTRATION Structure of Surveys (questionnaires) New survey should be registered in the System. For each survey shall by created questionnaire version, which is valid for at least one year. If questionnaire content and/or layout do not change, then current version and it description in Metadata base is usable for next year. Each survey contains one or more data entry tables or chapters (data matrix) which can be constant table - with fixed rows and columns number or table with variable rows or columns number. For each chapter we have to describe rows and columns with their codes and names in the Metadata base. This information is necessary for automatic data entry application generation, data validation e.t.c. Last step in the questionnaire content and layout description is cells formation. Cells are smallest data unit in survey data processing. Cells are created as combination of row and column from survey version side and variable from indicators and attributes side. Structure of trade statistics questionnaire (data matrix - fixed table) Name of Questionnaire, index, code, corroboration date, Nr. Respondents (object) code, name and address; Period (year, quarter, month) INDICATOR 1 Name of chapter + ATTRIBUTE Goods and commodity groups Row code Total turnover ( 2,3,4) A Goods, in total ( 2010, 2020, 2030-2190) Food products (except alcoholic beverages and tobacco goods) Retail trade turnover Metadata repository: common table of statistical indicators, table of attributes (classifications) and table of created variables Public catering turnover Wholesale trade Indicat ors B 1 2 3 4 2000 15000 9000 5000 1000 CELL VARIABLE 1 12000 [2010,1] 5600 6000 400 2020 3000 2000 400 600 spirits and liqueurs, whisky, 2021 long drinks 500 300 100 100 1000 500 200 300 Alcoholic beverages, in total of which: wines Attributes 2010 2022 1. Data matrix - Fixed number of Rows (3) and variable number of Columns (n) (Example) Main economical indicators of the economics activity Row heading Row’s code Total A B 9999 Name1 Name2 NACE 1 NACE 2 code code N Name n-1 Name n ….. NACE NACE n-1 code n code Number of employees 1110 … Net turnover 1120 … Other income 1130 2. Data matrix - Fixed number of Columns (3) and variable number of Rows (n) (Example) Production of industry products Production Produced in Name of code natural production (PRODCOM or CN code) A B Product 1 1234567 Product 2 2345678 … … Product n-1 4567890 Product n 5678901 measurement Sailed in natural measurement Income in lats (LVL) 1 2 3 ... ... ... Creating of variables INDICATOR + ATTRIBUTES (CLASSIFICATORS) = VARIABLES Dimensions (Vectors) of indicators Example: Number of employees + no attribute = Number of employees, total = Number of employees in breakdown by kind of activity (~300 variables) + Regional code (ATVK or NUTS) = Number of employees in breakdown by regions + Local kind of activity (NACE) (~26 variables) Dimensions of objects and indicators (example) Main dimensions (vectors) of respondents (objects O(t) ) NACE REGIONS (Teritory) OWNERSHIP AND ENTERPRENERSHIP EMPLOYEES GROUP TURNOVER GROUP Number of employees, total Number of employees in breakdown by regions 100 Region 1 Number of employees in breakdown by kind of activity NACE 1 NACE 2 NACE 3 NACE 4 55 35 5 5 60 Region 2 25 Region 3 15 Dimensions (vectors) of indicators Integrated Metadata Driven Quasy Process Oriented Technology SURVEY 1 SURVEY 2 SURVEY N Metadata entry Standardized output data dissemination interface PROCESS ORIENTED APPROACH IN RECTANGLES Standardized data entry interface Data validation procedure META data base Respond list MICRO data base IMPORT- EXPORT FOR PROCEDURES OUTSIDE ISDMS Business register Data aggregation procedure SURVEY 1 SURVEY 2 ..... SURVEY N MACRO data base Data output and dissemination EXPORT FOR PROCEDURES OUTSIDE ISDMS Metadata base link with Microdata and Macrodata bases General description of survey Selecting Indicators Description of survey version Selecting Attributes META DATA BASE (REPOSITORY) Creating of Variables Description of chapters (data matrix) Description of rows and columns MACRO DATABASE Linking variables to cells Data aggregation function (automatically) Generation form for data entry (automatically) Defining of data aggregation rules MICRO DATABASE IMPORT EXPORT BUSINESS REGISTER META DATA BASE Creating list Description of data entry of Responforms dents Standard data entry and validation Data entry and validation Description of validation rules Data import from files Full data validation Data validation RAW DATA BASE Mass data entry MICRO DATA BASE Data transfer to Microdata Base Fire wall Web data entry and validation Web Data validation RAW Web DATA BASE LESSONS LEARNED Design of the new information system should be based on the results of deep analysis of the statistical processes and data flows Clear objectives of achievements have to be set up, discussed and approved by all parties involved Statisticians IT personal Administration LESSONS LEARNED Within the process of the design and implementation of metadata driven integrated statistical information system both parties statisticians and IT specialists should be involved from the very beginning Both parties have to have clear understanding of all statistical processes,which will be covered by the system, as well as metadata meaning and role within the system from production and user sides LESSONS LEARNED Initiative to move from classical stove-pipe production approach to process oriented have to come from statisticians side not from IT personal or administration Motivation of the statisticians to move from existing to the new data processing environment is essential; Improvement of knowledge about metadata is one of the most important tasks through out of the all process of the design and implementation phases of the project LESSONS LEARNED Clear division of the tasks and responsibilities between statisticians and IT personal is the key point to achieve successful implementation To achieve the best performance of the entire system it is important to organize the execution of the statistical processes in the right sequence Design of the new surveys and questionnaires particularly as well as changes in the existing ones should be done in accordance with the system requirements LESSONS LEARNED As the result of feasibility study we clear understood, that some steps of statistical data processing for different surveys defy standardization, some surveys may require complementary functionality (non standard procedures), which is necessary just for this exact survey data processing; For solving problems with the non-standard procedures interfaces for data export/import to/from system has been developed to ensure use of the standard statistical data processing software packages and other generalized software available in market; LESSONS LEARNED It is necessary to establish and train special group of statisticians, which will maintain Metadata base and which will be responsible for accurateness of metadata; For the administration and maintenance of the system it is necessary to have well trained IT staff, which is familiar with the MS SQL Server 2000 administration, MS Analysis Service, other MS tools, PC AXIS family products and system Data Model, system applications;