Metadata Driven Integrated Statistical Data Management System CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MSIS 2004, Geneva May 17 - 19

Download Report

Transcript Metadata Driven Integrated Statistical Data Management System CSB of Latvia By Karlis Zeila Vice President CSB of Latvia MSIS 2004, Geneva May 17 - 19

Metadata Driven Integrated
Statistical Data
Management System
CSB of Latvia
By Karlis Zeila Vice President CSB of Latvia
MSIS 2004, Geneva May 17 - 19
META DATA DRIVEN ... ?
Any action within the system is ruled by
metadata,
Meta data is the key element of the system,
All software modules of entire system is connected
with the Core Metadata module (Meta data base).
Any changes within the system starts with the changes
of meta data
Full cycle of the data processing is possible as late as
the proper description process in meta data base are
completed
INTEGRATED ... ?
Most of the system software modules are
connected with the Registers module,
Registers module is an integral part of the system,
All surveys are supported by adequate
classifications stored in the Meta data base
In all surveys respondent data fields are
connected with registers data
All data is stored in corporative data warehouse
Statistical data processing has split in unified
steps for different surveys
Export / Import procedures ensure work with
the system data files using different standard
software packages
Advantages and Restrictions
Advantages
1. At most standardized main business statistics data entry, processing and
storage procedures, that provide the bases for transfer from stove pipe data
processing approach to process oriented data processing approach.
2. Centralized processing and storage of the statistical data, including
metadata, by using data warehouse technologies and OLAP tools.
3. All the data processing procedures are being hosted from common
metadata system. These procedures are being described in metadata
base.
Therefore for standardized procedure execution for each survey
individual programming is not required.
4. The system is informatively connected with Business Register, which
provides with the direct respondent data retrieval and updating.
5. Special import and export procedure is created for data exchange with
other systems.
6. A link with PC Axis is created for electronic data dissemination.
Restrictions
1.The system is oriented towards the data processing of
different periodicity business statistics surveys.
2.Metadata base does not foreseen description of
confidentiality rules they are hard coded in the system.
3. Hardware and Standard software requirements:
PC’s >/= Pentium II, RAM >/=128Mb equipped with
W – 95 to W-2000 and MS Office 2000.
4. Metadata base does not foreseen description of algorithm
for automatic creation of respondents lists for Sample
surveys from the Business register frame.
5. Diagnostic tools for the metadata descriptions are not
powerful enough, therefore experts preparing meta data
descriptions should be of high experience.
ISDMS architecture
Integrated statistical data management system
Corporative data Warehouse
Registers
base
OLAP
data base
Macrodata
base
Microdata
base
Dissemination data
base
FIREWALL
Metadata
base
User
administration data
base
CSB Web Site
Raw data
base
Windows 2000 Server Advanced
MS Internet Information Server
SQL server 2000,
PC-Axis
ISDMS Business application Software Modules
Core metadata
base module
Registers
module
Data entry and
validation module
Data aggregation
module
Data analysis
module
related with DB:
related with DB:
related with DB:
related with DB:
related with DB:
METADATA MICRODATA
REGISTERS
USER
ADMINISTRATION
METADATA
REGISTERS
USER
ADMINISTRATION
USER
ADMINISTRATION
METADATA
MICRODATA
REGISTERS
USER
ADMINISTRATION
Data
dissemination
module
Data WEB entry
module
Data mass entry
module
Missed data
imputation module
related with DB:
related with DB:
METADATA
MICRODATA
REGISTERS
USER
ADMINISTRATION
METADATA MICRODATA
REGISTERS
DATA IMPUTATION
SOFTWARE
related with DB:
METADATA
MACRODATA
REGISTERS
USER
ADMINISTRATION
related with DB:
METADATA
MICRODATA
REGISTERS
RAW DATABASE
USER
ADMINISTRATION
OLAP
METADATA
MACRODATA
User administration
module
related with DB:
METADATA MICRODATA
MACRODATA
USER ADMINISTRATION
Structure of Surveys (questionnaires)
New survey should be registered in the System. For each survey shall
by created questionnaire version, which is valid for at least one year. If
questionnaire content and/or layout do not change, then current version
and it description in Metadata base is usable for next year.
Each survey contains one or more data entry tables or chapters
(data matrix) which can be constant table - with fixed rows and
columns number or table with variable rows or columns number.
For each chapter we have to describe rows and columns with their
codes and names in the Metadata base. This information is necessary
for automatic data entry application generation, data validation e.t.c.
Last step in the questionnaire content and layout description is cells
formation. Cells are smallest data unit in survey data processing. Cells
are created as combination of row and column from survey version side
and variable from indicators and attributes side.
Structure of trade statistics questionnaire (data matrix - fixed table)
Name of Questionnaire, index, code, corroboration date, Nr.
Respondents (object) code, name and address;
Period (year, quarter, month)
INDICATOR 1
Name of chapter
+ ATTRIBUTE
Goods and commodity
groups
Row
code
Total
turnover
(  2,3,4)
A
Goods, in total
( 2010, 2020, 2030-2190)
Food products (except
alcoholic beverages and
tobacco goods)
Retail trade
turnover
Metadata repository: common
table of statistical indicators,
table of attributes
(classifications) and table of
created variables
Public
catering
turnover
Wholesale
trade
Indicat ors
B
1
2
3
4
2000
15000
9000
5000
1000
CELL
VARIABLE 1
12000
[2010,1]
5600
6000
400
2020
3000
2000
400
600
spirits and liqueurs, whisky, 2021
long drinks
500
300
100
100
1000
500
200
300
Alcoholic beverages, in
total
of which:
wines
Attributes
2010
2022
1. Data matrix - Fixed number of Rows (3)
and variable number of Columns (n)
(Example) Main economical indicators of the economics activity
Row heading
Row’s
code
Total
A
B
9999
Name1
Name2
NACE 1 NACE 2
code
code
N
Name n-1 Name n
…..
NACE NACE
n-1 code n code
Number of employees 1110
…
Net turnover
1120
…
Other income
1130
2. Data matrix - Fixed number of Columns (3) and
variable number of Rows (n)
(Example) Production of industry products
Production
Produced in
Name of
code
natural
production
(PRODCOM
or CN code)
A
B
Product 1
1234567
Product 2
2345678
…
…
Product n-1
4567890
Product n
5678901
measurement
Sailed in
natural
measurement
Income in
lats (LVL)
1
2
3
...
...
...
Creating of variables
INDICATOR + ATTRIBUTES (CLASSIFICATORS) = VARIABLES
Dimensions (Vectors) of
indicators
Example:
Number of
employees
+ no attribute
= Number of employees, total
= Number of employees in
breakdown by kind of
activity (~300 variables)
+ Regional code (ATVK or NUTS) = Number of employees in
breakdown by regions
+ Local kind of activity (NACE)
(~26 variables)
Dimensions of objects and indicators
(example)
Main dimensions (vectors) of respondents (objects O(t) )
NACE
REGIONS (Teritory)
OWNERSHIP AND ENTERPRENERSHIP
EMPLOYEES GROUP
TURNOVER GROUP
Number of
employees, total
Number of employees in
breakdown by regions
100
Region 1
Number of employees in breakdown by kind of activity
NACE 1
NACE 2
NACE 3
NACE 4
55
35
5
5
60
Region 2
25
Region 3
15
Dimensions (vectors) of indicators
Integrated Metadata Driven Quasy Process Oriented Technology
SURVEY 1
SURVEY 2
SURVEY N
Metadata entry
Standardized output data
dissemination interface
PROCESS
ORIENTED
APPROACH IN
RECTANGLES
Standardized data entry
interface
Data validation procedure
META
data
base
Respond
list
MICRO
data
base
IMPORT- EXPORT
FOR PROCEDURES
OUTSIDE ISDMS
Business
register
Data
aggregation
procedure
SURVEY 1
SURVEY 2
.....
SURVEY N
MACRO
data
base
Data output
and
dissemination
EXPORT
FOR PROCEDURES
OUTSIDE ISDMS
Metadata base link with Microdata and Macrodata bases
General
description
of survey
Selecting
Indicators
Description
of survey
version
Selecting
Attributes
META DATA
BASE
(REPOSITORY)
Creating of
Variables
Description
of chapters
(data matrix)
Description
of rows and
columns
MACRO
DATABASE
Linking
variables
to cells
Data aggregation
function
(automatically)
Generation form
for data entry
(automatically)
Defining of data
aggregation
rules
MICRO
DATABASE
IMPORT
EXPORT
BUSINESS
REGISTER
META
DATA
BASE
Creating list Description of
data entry
of Responforms
dents
Standard data
entry and
validation
Data entry and validation
Description
of validation
rules
Data import
from files
Full data
validation
Data
validation
RAW
DATA
BASE
Mass data
entry
MICRO
DATA
BASE
Data transfer
to Microdata
Base
Fire wall
Web data
entry and
validation
Web Data
validation
RAW
Web
DATA
BASE
LESSONS LEARNED
Design of the new information system should be based on
the results of deep analysis of the statistical processes and
data flows
Clear objectives of achievements have to be set up,
discussed and approved by all parties involved
Statisticians
IT personal
Administration
LESSONS LEARNED
Within the process of the design and implementation of
metadata driven integrated statistical information system
both parties statisticians and IT specialists should be
involved from the very beginning
Both parties have to have clear understanding of all
statistical processes,which will be covered by the
system, as well as metadata meaning and role within the
system from production and user sides
LESSONS LEARNED

Initiative to move from classical stove-pipe production
approach to process oriented have to come from
statisticians side not from IT personal or administration
 Motivation of the statisticians to move from existing to the
new data processing environment is essential;
 Improvement of knowledge about metadata is one of the
most important tasks through out of the all process of the
design and implementation phases of the project
LESSONS LEARNED
 Clear division of the tasks and responsibilities between
statisticians and IT personal is the key point to achieve
successful implementation
 To achieve the best performance of the entire system it is
important to organize the execution of the statistical
processes in the right sequence
 Design of the new surveys and questionnaires particularly
as well as changes in the existing ones should be done in
accordance with the system requirements
LESSONS LEARNED

As the result of feasibility study we clear
understood, that some steps of statistical data
processing for different surveys defy standardization,
some
surveys
may
require
complementary
functionality (non standard procedures), which is
necessary just for this exact survey data processing;
 For solving problems with the non-standard
procedures interfaces for data export/import to/from
system has been developed to ensure use of the
standard statistical data processing software packages
and other generalized software available in market;
LESSONS LEARNED
 It is necessary to establish and train special group of
statisticians, which will maintain Metadata base
and which will be responsible for accurateness of
metadata;
 For the administration and maintenance of the system
it is necessary to have well trained IT staff, which is
familiar with the MS SQL Server 2000 administration,
MS Analysis Service, other MS tools, PC AXIS family
products and system Data Model, system applications;