Data capture and data storage

Download Report

Transcript Data capture and data storage

Data capture and data storage

Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.

Enviromatics 2008 - Data capture and data storage 1

Introduction

• Because of political, legal and administrative developments both the amount and the quality of environmental data that is being collected has increased considerably over the past 35 years. This development was influenced by improvements in the collection, management and utilisation of environmental information.

• Sensor networks have been installed and upgraded to monitor the quality of water, air, and the soil. Satellite data and remote sensing data are used increasingly to obtain environmental information.

• Environmental data sets are large and complex. Their administration requires powerful processors and efficient storage technologies. The question is rather how to handle and to process these large unstructured data sets in order to obtain efficient decision support.

Enviromatics 2008 - Data capture and data storage 2

Object taxonomies

• The term

data capture

denotes the process of deriving environmental data objects from environmental objects, where any real world object can be regarded as an environmental object. Living and non-living environmental objects will be grouped into a number of classes with typical attributes (e.g. taxonomy of species).

• Simpler taxonomy structures of the environment are given by the media’s soil, water and air. This taxonomy is commonly used by governmental environmental agencies. Understanding the environment as an integrated and comple system this taxonomy leads to interdisciplinary tasks. • Therefore, interdisciplinary task forces groups and environmental network organisations are becoming increasingly common.

3 Enviromatics 2008 - Data capture and data storage

General examples of object taxonomies

• • • • • •

atmosphere

which includes all objects above the surface of the earth;

hydrosphere

contains all waterrelated objects;

lithosphere

relates to soil, sediments and rocks;

biosphere

is collecting all living matter;

technosphere

is used to denote manmade objects;

sociosphere

denotes social and economic interrelationships within the human society Enviromatics 2008 - Data capture and data storage 4

Object taxonomy of ecology

• • • Object taxonomy of ecology is given by

Autecology

(Interrelationships between species and their interrelations with the abiotic environment). Ecological processes take place at ecosystem scale;

Synecology

(interrelationships between communities and their environments, and between populations within a community). Ecological processes take place on a community scale;

Demecology

(

population ecology)

(interrelationships of individuals within a population, and interrelationships of populations with the biotic and abiotic environment). The processes take place on a population scale.

Enviromatics 2008 - Data capture and data storage 5

Mapping the environment

• The main question is which environmental objects should be monitored, and what data should be collected on them. • There are many ways to obtain environmental data objects from environmental objects. • The results are achieved as time series of measurements. • The incoming raw data has to be subjected to some domain-specific and device-specific processing. • Depending on the source of data this may include some manipulations just like optical rectification, noise suppression, filtering, or contrast enhancement.

6 Enviromatics 2008 - Data capture and data storage

Raw data processing

Raw data processing

procedures are known as complex analytical techniques (laboratory methods) which may be used to survey toxicants in the environment.

• Air and satellite imagery is increasingly being used to monitor remote areas and to recognise long-term environmental loads. The raw imagery is usually processed and represented as a thematic map in order to visualise the load distribution.

• For forest and wildlife management manual counts of animals and plantsare often the most reliable source of data. • For objects from the technosphere or from the socio-sphere it is useful to study printed documentations in order to extract and condense the required environmental data objects.

7 Enviromatics 2008 - Data capture and data storage

Data validation procedures

Data validation procedures

are given by:

1. Temporal validation

: Recent measurements are compared to previous measurements and to some reference data obtained under similar conditions.

2. Geographic validation

: Data that do not fit the usual patterns are subjected to cross-validation with measurements from other equipment in the same area that measures the same parameter.

3. Space-time validation

: Data are compared with previous measurements from the same equipment.

4. Parameter validation

: Data that do not fit the norm are forwarded to across-validation with equipment that measure different parameters.

Enviromatics 2008 - Data capture and data storage 8

Advanced techniques

• • • • • • • • For the processing and initial evaluation of raw environmental data objects knowledge-based systems have been considerable potential. With regard to knowledge representation the requirements of environmental applications can be met by standard database and artificial intelligence techniques.

Knowledge representation Data merging Bayesian probability theory and uncertain information Data storage and data security Data base management systems Databases Geographic Information Systems

Enviromatics 2008 - Data capture and data storage 9

Knowledge representation

Static knowledge

is stored in specialised file systems or in relational or objectoriented databases. •

Object-oriented databases

give the users the ability to group similar objects into classes and to connect those classes in an inheritance hierarchy. • The objects in each class all, share a set of attributes and possibly a number of methods, i.e., special procedures that take one or more objects of the class as arguments. • The notion of inheritance is that attributes and methods that are defined for some class

C

higher up in the heritance hierarchy are also valid for all classes in the sub-tree below

C

.

10 Enviromatics 2008 - Data capture and data storage

Dynamic knowledge

• Dynamic knowledge in EIS is represented by IF-THEN rules. The concept of a rule-based knowledge system is to encode the available information on environmental objects by a possible large number of relatively simple rules rather than by a complex procedural program. • Each rule consists of a IF part and a THEN part. Starting from some initial state, the system checks which of the rules are currently applicable • If more than one rule can be applied, the system picks up one of the rules according to a given priority scheme.

11 Enviromatics 2008 - Data capture and data storage

Data merging

• Environmental data capture can be performed with techniques that are standard in the areas of statistical classification, database management, and artificial intelligence. If raw data are aggregated and evaluated, the input data are only one part of information. • Other circumstantial information is also taken into account in order to extract those environmental data objects that the user is interested in. Human experts always take such information into account when evaluating a sample. • A promising strategy is to form a working hypothesis, and to support this hypothesis based on the information available. This has to include the possibility that the input information may partly contradict each other.

Enviromatics 2008 - Data capture and data storage 12

Bayesian probability theory and uncertain information

• Environmental data are often inaccurate and uncertain. From such data statements of probability can be derived only. • Bayesian statistics requires that events are

independent

from each other. • This assumption is rarely true in environmental context. • Uncertainties within raw data sets and data bases can be valuated by the

Dempster-Shafer

approach. It is used widely for environmental data capture. Enviromatics 2008 - Data capture and data storage 13

Dempster-Shafer approach

• The key idea is that one should logically separate the arguments for and against a given hypothesis

H

. This separation is managed by distinguishing between

B(H)

and

plausibility Pl(H).

belief

• Both concepts are represented by a number between

zero

and

one.

• The belief represents the weight of the facts which support the working hypothesis.

• In opposite of this plausibility is one minus the weight of the facts speaking against

H

.

14 Enviromatics 2008 - Data capture and data storage

Degree of uncertainty

• Therefore

Pl(H) = 1 - B(H*),

if

H*

denotes the hypothesis that

H

is false. • The belief of the counterhypothesis

B(H*)

is sometimes referred to as doubt

D(H)

with respect to the workinghypothesis

H

. • Therefore

Pl(H) = 1 - D(H).

• In Bayesian probability theory belief and plausibility coincide

p(H) = B(H) = Pl(H) = 1 - p(H*).

• For Dempster-Shafer theory:

B(H)

Pl(H).

• The difference between

B(H)

and

PI(H)

represents the degree of uncertainty U(H) about the hypothesis.

15 Enviromatics 2008 - Data capture and data storage

Data storage and data security

• In former years, most environmentally relevant data are only available in analogue form. This concerns historical data records but also a large number of more recent thematic maps, images, and documents. • Those historical data sets that are of relevance in current and future applications are rapidly being digitised. This process is supported by the continuous progress in scanning technologies. • New data is almost captured in some digital format, and it is mainly a question of logistics to make the data available.

• There are essentially two options for storing a given digital data set.

16 Enviromatics 2008 - Data capture and data storage

Data storage and data security (2)

1. A data base management system (DBMS) with a welldefined data model, typical relational, object-relational, or object-oriented; • • 2. an application-specific file system, as it is still used by many geographic information systems (GIS).

Environmental data have special demands to databases and data storage. In the most cases, environmental data consist of three parts of information:

matter or substance

based information, information.

time

information, and

space

An environmental data base is characterised by the data stored in the data base, by the

management

system used for data storage and by the

type

of information available from the data base.

type

of Enviromatics 2008 - Data capture and data storage 17

Data storage and data security (3)

• Operations between applications and inquiries are organised by interfaces. While in the past data storage and data processing were tightly coupled, more recent systems make a clear distinction between those tasks. • This trend results of the general tendency towards to

open systems

. As user demand comfortable interfaces between different hardware and software tools across heterogeneous computer platforms, vendors have been forced to decompose their products along the lines of more narrowly defined functionalities. • GIS in particular used for data storage, data querying and data visualisation of geographic information in a tightly integrated manner.

Enviromatics 2008 - Data capture and data storage 18

Data base management systems

• A DBMS serves as a complete pool of data languages where the parts are given by – data definition language (DDL), – query language (QL), – data manipulation language (DML). • Links to higher programming languages are given. Mainly a structured query language (SQL) is used.

• All data operations within the data base are performed by transactions which should allow multi user operations. Mostly, commercial DMBS are the result of application oriented developments. 19 Enviromatics 2008 - Data capture and data storage

Geographic Information Systems

• Geographic information systems (GIS) are essential environmental informatic tools for the management of the environment, including decision support and visualisation of large amounts of environmental data. The original idea for GIS was to computerise the metaphor of a thematic map. • In general, GIS are computer- based tools to capture, manipulate, process, and display spatial or georeferenced data. The spatial data is still mostly held in proprietary file systems. Therefore, most of the underlying data models are layer-based. The information is encoded in a number of thematic maps, such as vegetation maps, soil maps, or topographic maps. • With regard to geometry, each map corresponds to a partition of the universe into disjoint polygons. Each polygon represents a region that is sufficiently homogeneous with respect to the theme of the map. Maps may be enhanced by lines and points to represent specific features, such as roads or cities.

Enviromatics 2008 - Data capture and data storage 20

Questions?

Enviromatics 2008 - Data capture and data storage 21