A comprehensive model for Data Quality Value of Data, and

Download Report

Transcript A comprehensive model for Data Quality Value of Data, and

A comprehensive model for
Data Quality
Value of Data, and
User Interface Design
Andrew U. Frank
Geoinformation
TU Vienna
[email protected]
Andrew Frank
7/18/2015
1
What are the most important problem
hindering wide use of GIS today?
Gueting said: Support for temporal data
Spaccapietra said: Semantics
Andrew Frank
7/18/2015
2
What are the most important practical
problems for the GI industry?
Consider that the market for GI in Europe is
only 1/10 of the comparable industry in the
USA (approx. same population).
Impediments for business:
User Interface
Value of Data
Data Quality
Andrew Frank
7/18/2015
3
Comprehensive model of GI use
Different applications of GIS are operating with
very different concepts of what the GIS
produces:
Produce maps (for decision makers)
Analyze situations
Explore data
Each time, a different user interface must be
learned, which is a high cost and a large
impediment.
Andrew Frank
7/18/2015
4
Economic value of information
(Geographic) information can only be used to
improve decision.
This is the only situation in which data can
produce economic value.
Read:
Varian & Shapiro: Network economy
Andrew Frank
7/18/2015
5
Model of rational decision making:
A rational man (a.k.a. homo economicus)
decides between action
such that his well-being is optimized.
Andrew Frank
7/18/2015
6
Multiple critiques:
-
-
Not just economic (monetary) optimizations,
but general well-being.
Bounded rationality: neither the information
nor the inference resources are available to
make the optimal decision
…
Andrew Frank
7/18/2015
7
Model of rational decision making is
(only) a model
Descriptive model: it is often used when we
rationalize our behavior after the fact.
We explain our actions in terms of optimizing our
utility.
Prescriptive model: for administrative decisions
the model is used to justify a decision and to
communicate the arguments to others.
Andrew Frank
7/18/2015
8
Core model of rational decision making
1. Produce all candidate actions
2. Exclude action by non-compensatory criteria
3. Evaluate utility of remaining candidate
actions using compensatory criteria and
weights.
4. Select best action (i.e. action with highest
utility).
Andrew Frank
7/18/2015
9
Actions change state of the world:
Andrew Frank
7/18/2015
10
Hotel for a weekend: candidates
Andrew Frank
7/18/2015
11
Andrew Frank
7/18/2015
12
Andrew Frank
7/18/2015
13
Andrew Frank
7/18/2015
14
Andrew Frank
7/18/2015
15
Andrew Frank
7/18/2015
16
Andrew Frank
7/18/2015
17
Andrew Frank
7/18/2015
18
My Criteria
Distance to beach
Classification of hotel
Restaurant
Garden
Trail access
Noise
Price
Andrew Frank
7/18/2015
19
Collection of data for these criteria
Andrew Frank
7/18/2015
20
Normalize data
Data is collected on different measurement
scales (cf. Steven’s paper in Science 1946).
Make it comparable by normalizing it, for
example on a scale 0..10 (or 0..1), but allow
positive and negative utility.
Andrew Frank
7/18/2015
21
Non-compensatory criteria
Non-compensatory criteria (a.k.a. K.O. criteria)
must be fulfilled for a candidate to make it
acceptable.
Andrew Frank
7/18/2015
22
Compensatory criteria
These criteria list the contribution of properties
of the candidate actions.
Weights indicate what the contribution to utility
per unit of the property is
Andrew Frank
7/18/2015
23
Unifying criteria
Andrew Frank
7/18/2015
24
Interaction with the spreadsheet:
The weights are not well determined – this is
one of the major critique of the method.
Too many non-compensatory criteria: no
elements left.
Reduce non-compensatory criteria.
Many similar solutions – reduce weight for the
common criterion.
Andrew Frank
7/18/2015
25
User interaction style
User interface must be “direct manipulation” –
not requiring a rational analysis,
but give a ‘feeling’ for connections between
criteria and optimal selection.
Andrew Frank
7/18/2015
26
User Interface Consideration
Shneiderman has pointed out that the only
interface style which works consistently are
interfaces based on direct manipulation.
They exploit human abilities which are not
based on verbal (rational) understanding,
but use the connection between actions and
reactions.
Direct manipulation:
The user has some controls and the result reacts
immediately to changes.
Andrew Frank
7/18/2015
27
Emotional aspects
Experience shows that users play with weights
till the solution feels right.
This means, that it is emotionally acceptable.
Modern neurophysiology has observed that
actual decision making in human brains is
not rational, but emotionally controlled.
Insert a property ‘likable’ and assess each
candidate. Then the weight given this
property indicates the emotional influence.
Andrew Frank
7/18/2015
28
What are the controls in the rational
decision model?
Non-compensatory criteria:
Threshold for fulfillment.
Compensatory criteria:
Weight
What data is considered – either a threshold or
a weight is set.
Andrew Frank
7/18/2015
29
A first sketch of an interface:
Very simple interface.
Interface is completely in the language of the
user.
Andrew Frank
7/18/2015
30
General user interface
because model is general
The rational decision model is general;
EVERY decision is modeled.
Users have to learn only one conceptual model,
not many different ones.
Andrew Frank
7/18/2015
31
Decision model links directly to user
task
Intermediate elements are excluded, which
simplifies the conceptualization (less is
better!)
Compare with Standard approach:
GIS produces map which is used as input to
the decision process.
Many details of map form must be fixed, which
are not relevant for the decision process.
User interface must have controls for these.
Andrew Frank
7/18/2015
32
Value of decision
In the model of rational decision making, the
value of data can be estimated:
The value of the data is the improvement of the
decision compared to no information.
For decision on actions where the action have a
cost, the difference between highest and
lowest cost can be used as an estimate for the
value of the decision.
Andrew Frank
7/18/2015
33
Value of data
Properties which have more weight contribute
more to the decision.
The value of the decision can be distributed
to the data according to the weights.
Andrew Frank
7/18/2015
34
Price of data
The value of the data is not the price at which it
can be sold:
Deduce cost of obtaining and using it
Price must be set for many users, value is specific
for a decision.
Opportunities for specialized user interfaces,
connections to data collections and thus
BUSINESS.
Andrew Frank
7/18/2015
35
Data Quality
Quality of the data is typically measured from
the perspective of the data producer.
Metadata standards codify this approach.
Observations indicate that users are not using
metadata. How should a user decide on the
usability of data from metadata?
Andrew Frank
7/18/2015
36
Data quality from a user perspective:
Data is good, if it leads to the best decision. It is
bad, if it makes me take the wrong decision.
Data quality is the risk of me making the wrong
decision.
Andrew Frank
7/18/2015
37
Can we translate a producers assessment of
data quality to the risk of the user making the
wrong decision?
Example: Precision
The producer of data states that the distance to
the beach is 100 m +- 50 (one standard
deviation, corresponds to 68% of all values
are between 50 and 150 m).
Andrew Frank
7/18/2015
38
Translation of completeness to risk
Incomplete data will make us miss the best
solution. The risk is comparable to the
amount of missing data.
Andrew Frank
7/18/2015
39
Example:
50% of data are missing (realistic in the
selection of hotels based on web browsing).
Reduce value of data by risk proportionally.
Andrew Frank
7/18/2015
40
Temporal currency
Temporal currency is a standard data quality
element.
Temporal currency is not ‘separable’ from other
criteria.
Andrew Frank
7/18/2015
41
Effects of temporal currency
Time passed since collection reduced
- Precision
- Completeness (omissions, commissions).
Andrew Frank
7/18/2015
42
Data does not change, but quality is
diluated with time:
Andrew Frank
7/18/2015
43
Estimate movement per period and
reduce precision proportionally:
Estimate appearance/disappearance of objects:
Reduce completeness proportionally.
Andrew Frank
7/18/2015
44
Decision model translates data quality to
risk
The decision model translates
1. data quality to risk and
2. risk to a reduction in the value of the data.
Andrew Frank
7/18/2015
45
Conclusion
The model or rational decision making gives a
single conceptual framework in which three
important practical problems of today's use
of Geographic Information can be discussed:
Andrew Frank
7/18/2015
46
User Interface
Decisions can be modeled as a selection of the action
which optimizes the utility, given some conditions.
The user must select: what are the elements which
influence the decision (selection of data layers,
themes..)
What are candidate actions.
What are the minimal requirement for a property
What are his preferences, translated to weights for each
property.
This is the same for many (all?) decision situations.
Andrew Frank
7/18/2015
47
Value of data
The value of the data is in the improvement of
the decision.
The contribution of each data element is
comparable to the weight of this property.
Andrew Frank
7/18/2015
48
Data quality from a user perspective
Better data reduces the risk of taking a wrong
decision.
Precision and completeness can be translated
directly to the risk of taking a wrong
decision and reduces the value of the data.
Temporal currency is first converted to reduced
precision and completeness (this should be
done by data provider)
Andrew Frank
7/18/2015
49
Closed loop semantics
My answer to the problem of semantics:
Link observation semantics in the database to
action semantics in the decision.
Andrew Frank
7/18/2015
50
My choice:
Andrew Frank
7/18/2015
51