Open GSBPM compliant data processing system in Statistics Estonia (VAIS)

Download Report

Transcript Open GSBPM compliant data processing system in Statistics Estonia (VAIS)

Open GSBPM compliant
data processing system
in Statistics Estonia
(VAIS)
2011 MSIS Conference
Maia Ennok
Head of Data Warehouse Service
Data Processing Systems Department
Statistics Estonia
23th. of May 2011
Strategy of Statistics Estonia
2008–2011
“From data collector to information service provider”
Objective: High-quality information service
Standardise the process of data processing:
Indicator: Introduction of the unified data processing
software
 Working out and introduction of the universal data processing
information system
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Architecture of the information system
Metadata
iMETA
KUNDE
system
Economic
entities
eSTAT
Data
VVIS
collection
Persons
25.05.2016
Statistical
analysis
Processing
eGeostat
ADAM
Administrative
registers
PX-Web
VAIS
Statistical
SRS
registers
Dissemination
Users
Census-HUB
Data
Warehouse
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Data processing system (VAIS)
 VAIS is a collection of tools and technologies aimed
at automating data processing (Phase 5 in GSBPM).
 In essence, the task of check, clean, and
transforming statistical activity data can be identified
as taking the raw data from one or more sources and
transforming it to analytical system source data input
data base structures (observation registry).
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Framework for …
 Integrate data
 Classify & code
 Review, validate and edit
 Impute
 Derive new variables
& statistical units
 Calculate weights
 Calculate aggregates
 Finalize data files
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Metadata driven template based tool
Template driven approach provides an universal
solution for three main goals of the VAIS project:
 Create an easy to use statistical data processing
tool requiring minimal programming skills for
transformation package creation.
 Create a metadata driven process-oriented and
automated statistical data processing tool.
 Create an extendable data transformation tool.
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Imputation Method for
CommonStatistical Activity N
Metadata
RepositoryAggregation Def for
Statistical Activity N
Data Sources for
Statistical Activity N
Target Dataset for
Statistical Activity N
25.05.2016
VALIADTE
IMPUTE
VALIADTE
IMPUTE
AGGREGATE
AGGREGATE
INTEGRATE DATA
INTEGRATE DATA
LOAD DATA
Common XDTL Packages
Validation Rules for
Statistical Activity N
INTEGRATE DATA
INTEGRATE DATA
Common XDTL Packages
Data Sources for
Statistical Activity N
Data processingng package (XDTL) for
Statistical Activity N
Design Phase
LOAD DATA
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Data processing with VAIS
 Automating and speeding up data transformation
 Raw data, transformation metadata
and source data audit trails
 Metadata driven template
based tool
 Balancing automation
and manual intervention
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
VAIS architecture
Balancing automation and
manual intervention
Manual
data
processing
RAW
data
Automated data
processing
OK?
Data
Warehou
se
Metadata (validation and
transformation rules)
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
VAIS applications and roles
Roll
VAIS
Designer
Designer
x
Data Warehouse
programmer
x
VAIS
VAIS
Operator Administrator
Chief operator
x
Operator
x
Administrator
25.05.2016
x
URMA
x
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
URMA




User rights management application
Allows using existing user for authorization
Allows create roles and link users with roles
Allows set rights according to domain statistical work
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
VAIS Designer
 Application for data processing design
 User interfaces for designing each processing
procedures
 Procedures group to packages
 Packages setup fallows policy of ETL
 Packages are designed for each statistical work
version
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
VAIS Operator
 Allows user to manually intervene to data
processing.
 Allows to solve tasks created from data validation.
 Report of data processing gives overview of data in
process.
 Gives users information for decision, that is
necessary to solve tasks.
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Technical platform
VAIS is built on open-sourced freely available technological components.
 XDTL (eXtensible Data Transformation Language – an XML based
descriptional language designed for specifying data transformations,
see http://xdtl.org) run-time engine (XDTL RT).
 MMX Metadata Repository, part of Metadata Framework (a MOF
compliant metadata management environment designed with a wide
variety of metadata-driven applications in mind, see
http://mmframework.org).
 Apache Foundation's Velocity template engine
(http://velocity.apache.org) is used as the template engine combining
excellent template rendering functionality with very easy to use
template language.
 The user applications are programmed in Java, based on Wicket MVC
framework (http://wicket.apache.org)
 Quartz scheduling framework (http://www.quartz-scheduler.org) is
used for execution scheduling.
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Implementation
 VAIS development 05.2010- 10.2011
 Data processing of Population and Housing Census 2011
(31.12.2011)
 Reuse administrative data (2012)
 Data collecting system for administrative data (ADAM)
and eSTAT development for prefilling questionnaires in
eSTAT with administrative data (annual bookkeeping
report). (31.08.2011). VAIS is used for converting
administrative data into the statistical data format. (for
the year 2012 i.e for the reference year 2011 data
collection)
 Data processing of other statistical activities (first pilots 2013)
 Data processing of next registry based Population and Housing
Census (pilot 2014)
25.05.2016
Open GSBPM compliant data processing system in Statistics Estonia (VAIS)
Questions?
Thank you!