Information systems ‘theory’ Peter Fox Xinformatics – ITEC, CSCI, ERTH 4400/6400 Week 3, February 4, 2014

Download Report

Transcript Information systems ‘theory’ Peter Fox Xinformatics – ITEC, CSCI, ERTH 4400/6400 Week 3, February 4, 2014

Information systems ‘theory’
Peter Fox
Xinformatics – ITEC, CSCI, ERTH 4400/6400
Week 3, February 4, 2014
1
Contents
• Review of last class
• Discussion of reading
• Information systems theory and principles
covering a range of traditional foundation
aspects
• Next class(es) and assignments
2
Reading
• http://en.wikipedia.org/wiki/Use_case
• http://alistair.cockburn.us/index.php/Use_cas
es,_ten_years_later
• Or questions about last week’s material?
3
Systems
• Regardless of the type of system, be it an irrigation
system, a communications relay system, an
information system, or whatever, all systems have
three basic properties:
– A system has a purpose - such as to distribute water to plant
life, bouncing a communications signal around the country to
consumers, or producing information for people to use in
conducting business.
– A system is a grouping of two or more components which
are held together through some common and cohesive
bond. The bond may be water as in the irrigation system, a
microwave signal as used in communications, or, as we will
see, data in an information system.
– A system operates routinely and, as such, it is predictable in 4
terms of how it works and what it will produce.
Thinking in systems
• Consists of primarily three things (Meadows)
– Elements
– Interconnections
– Function/ Purpose
• Three attributes of
– Resilience
– Self Organization
– Hierarchy
5
Twelve Leverage Points
• 12. Constants, parameters, numbers (such as subsidies, taxes,
standards)
• 11. The size of buffers and other stabilizing stocks, relative to their flows
• 10. Structure of material stocks and flows (such as transport network,
population age structures)
• 9. Length of delays, relative to the rate of system changes
• 8. Strength of negative feedback loops, relative to the effect they are
trying to correct against
• 7. Gain around driving positive feedback loops
• 6. Structure of information flow (who does and does not have access to
what kinds of information)
• 5. Rules of the system (such as incentives, punishment, constraints)
• 4. Power to add, change, evolve, or self-organize system structure
• 3. Goal of the system
• 2. Mindset or paradigm that the system — its goals, structure, rules,
delays, parameters — arises from
• 1. Power to transcend paradigms
6
First information system?
• The first on-line, real-time, interactive, data
base system was double-entry bookkeeping
which was developed by the merchants of
Venice in 1200 A.D. (Bryce’s Law). ***
• Truth, author’s opinion or urban legend?
7
Data-Information-Knowledge
Ecosystem
Producers
Consumers
Experience
Data
Creation
Gathering
Information
Presentation
Organization
Knowledge
Integration
Conversation
Context
8
Presentation
• See reading for this week
• Separation of content from presentation!!
• The theory here is more empirical or semiempirical
• Is developed based on a solid understanding
of minimizing information uncertainty
beginning with content, context and structural
considerations and, as we will see, adding
cognitive and social factors to reduce
uncertainty.
• Physiology for humans, color, …
9
Organization
• But also, organization of information
presentation, e.g. layout on a web page, in a
table, or figure, or report
• Also (again) content, context and structure
• Think about how you organize you
–
–
–
–
–
Class notes
Calendar and assignment schedule
Your social life (just kidding)
Assignments
Do, or do not, connect with others’ ways of organizing!
• A system??
– Elements, Interconnections, Function/ Purpose
10
All take a deep breath
11
THE PHYSICS OF INFORMATION
© 2005 EvREsearch LTD
EvREsearch©
Equations!
13
Information theory
• Entropy and randomness
– Critical for information system design
– More on why in a few slides
14
Huh? Entropy?
• No, you are not
in a physics
class
• Information is
always a
measure of the
decrease of
uncertainty at a
receiver.
15
Entropy and Rates
• R=H(x)-H(x|y) (Shannon; 1948)
• R=Rate of transmission measures the average
ambiguity of the received
signal
p(xi) is the probability mass function of outcome xi.
"The entropy rate of a data source means the
average number of bits per symbol needed to
encode it. Shannon's experiments with human
predictors show an information rate of between 0.6
and 1.3 bits per character, depending on the
experimental setup; the PPM compression
algorithm can achieve a compression ratio of 1.5
bits per character in English text.” (wikipedia,
16
Not a perfect story
• Many authors criticize the use of the
term entropy, and physics of information
• Information conservation, diffusion,
viscosity, advection, dissipation,
instability, steady state, conversion …
sort of all make some sense…
• But entropy arises in thermodynamics
not directly in other places
17
That’s not going to stop us!
• However the idea is very relevant to
– modeling (sometimes equations)
– design (variables)
– architecture (how they are put together)
– as well as how we “condition the system”
• We’ll revisit the components of information
soon but first let’s take some examples
18
For information systems?
•
•
•
•
31045?
(03) 1045
783-1045
+161397831045
• What helps reduce entropy / uncertainty?
• Notice: ‘signs’ as information representations
19
Information integrity
• We’ve seen that the information (content) of a
random variable is defined as the Sum of p x
log p, where p=probability. It represents the
uncertainty of the variable.
• In later classes we cover cognitive and social
factors in increasing the conditional entropy
and thus reducing the uncertainty and thus
increasing information content and value
• We will cover semiotics (signs) as a prelude
to visualization as a presentation mechanism
20
for information
Think of web pages
21
Not worst but poor
22
One more
23
Information gain/loss
• The mutual information of two variables
define how much information one variable
contains about the other.
• It is therefore defined as the decrease of the
uncertainty of one variable by knowing the
other.
• In probabilistic terms, the entropy decreases
by conditioning on the distribution.
• What does this mean for an information
system? E.g. a website or web service?
24
Information retrieval
• A vast field (metrics with theory behind them)
– Precision (very relevant to this class) - is the
fraction of the items retrieved that are relevant to
the user's information need.
– Recall - is the fraction of the items that are
relevant to the query that are successfully
retrieved.
– Fall-out – is the proportion of non-relevant
documents that are retrieved, out of all nonrelevant documents available, e.g. 0 fall-out if you
return 0 items!
– F-measure – is the weighted harmonic mean of 25
precision and recall
Models of IR
26
Machine Learning
• Daniel Wilkerson: “The precise study of how
to make decisions with only incomplete
information is deep”.
• Part of information systems design (and
architecture and implementation underneath)
is to ensure people make the best and most
robust decisions in the face of uncertainty.
27
Context
! " # $%&' ( ) *+, " *- . ( +" / +*. 0*' ( 0. &! # +' . ( *$1 $+" ! $*%$" *
*
!
!
5=>7:93276*8?8:>@A*4 9:B*23*<:969:7=C
, 9RB
– Curiosity
– User profiles
@*HI J K*7; ; 69>F*+# ! *L>M; 6792>F*92*
>=5>9D>F*; 67?Q<62>88*5328:=<5:A*:3*
– Analytics
B>9=*8:<F?*B9RB69RB:8*7*; 3:>2:976*
• External
532:>M:*3Q*<8>E*+B>*4 =9:>=8*=9RB:6?*
N*98*<8>F*Q3=*S>F<57:932A*8B3; ; 92RA*
3@@<2957:932A*; >=83276*92Q3=@7C
U*F>:>=@9272:8*3Q*<8>*@7?*9256<F>*
5:3=8E*+B>?*R7D>*V<>8:932279=>8*:3*
=*<8>*3Q*:B>*4 >NA*B34 >D>=*:B>?*Q796*
2:8*4 >=>*<892R*9:*Q3=E*' :*98*<269T>6?*
R*7*F988>=:7:932*4 966*B7D>*:B>*87@>*
7*8:<F>2:*4 B3*<8>8*9:*:3*D9>4 *; 3=C
; 3:B>898*:B>?*:>8:A*:B7:*:B>=>*98*7*
>:4 >>2*; >=5>9D>F*; 67?Q<62>88*72F*
95B*:B>?*Q92F*8<; ; 3=:A*326?*@7T>8*
*4 B7:*:B>*4 >N*98*N>92R*<8>F*Q3=E*
N7N6?*326?*>M98:8*Q3=*<8>8*4 B95B*
" M:>=276*=>78328*Q3=*<8>
• Internal - Human context, tacit knowledge
%:969:7=972
] <76
&>5=>7:93276
%8>6>88
– Domain context
– Skill/ education context
' 2:>=276*=>78328*Q3=*<8>
– Organizational
09R<=>*O^*- 32:>M:8*3Q*<8>*
– Procedural/ process *
+B98*6>7F8*<8*:3*D98<7698>*8?8:>@8*4 9:B92*7*:4 3*
– Unknown/ randomF9@>289
– what
is>*78*8B34
an example
3276*8576
2*92*09R<=>*OEof
*+B9this?
8*8576>*B78*
\ 34
, 9RB
28
N>>2*F9D9F>F*92:3*Q3<=*V<7F=72:8*67N>66>F*%8>6>88A*%:969:7=C
972A*] <76*72F*&>5=>7:93276E*' :*98*7=R<>F*:B7:*4 B95B>D>=*
Structure
• Is information stored or only presented?
• Structural representation of information
content can bias presentation, e.g.
– Modern image capture devices (digital camera)
often convert 2 byte integer to float, or 4 byte
integer, what are the implications
• Appropriate choice of information structure
can significantly decrease uncertainty, e.g.
returning land images in GeoTIFF, which can
encoding geographic location, instead of
PNG
29
Content
• Presentation
– We’ve covered a fair bit of this so far
– What other factors have you thought of?
• Translation
– Almost essential when transmitting between data
structures, e.g. serialization over network
protocols, sometime multiple levels; HTTP,
TCP/IP
• Encoding
– Lossless (Huffman, entropy based, 1952)
– Lossy (mpeg, jpeg)
30
Organizational control of
content
• Of encoding standards, e.g.
– Lempel-Ziv-Welch (lzw) was proprietary for many
years (until 2003) and was used in the GIF format
encoding, http://en.wikipedia.org/wiki/Lempel–
Ziv–Welch
– Moving Picture Experts Group (MPEG) and
ISO/IEC JTC1/SC29/WG11, mpeg.org
– Joint Photographic Experts Group (JPEG),
jpeg.org
31
Noise
• Most often refers to ‘data’ but does apply to
information
• Uncertainty, especially any that is introduced is a
source of noise, or more accurately – bias in the
use or interpretation of the information
• Noise/ bias is context and structure dependent
• Noise/ bias contamination is rampant in information
systems
• Quality control and verification is less developed for
information sources, e.g. ‘people do not report
problems’
32
Mode of noise introduction
From Shannon and Weaver (1949)
Msg?
Information
Source
Signal?
Web
Content,
Structure
Recvd?
Msg?
Web
browser?
Noise
source
HTML page,
user
33
Means of conduct
34
In Information Systems:
• An example of inductive research:
– Gather data
– Analyze and reanalyze the data
– Organize the data within broad topics
– Create categories within the topics
– Identify relationships among the categories
– Synthesize the patterns into conclusions
35
Must be inductive? (Haverty)
• It does not have an existing body of theory
which typically guides the work of a field
– Theory constrains acceptable solutions through
formal validation
– Without it, IAs – Information Architectures tend to
treat each problem as novel
• Also, it supports emergent phenomena
– The IA domain has a small set of initial
components and a relatively simple set of rules
– These lead to a large number of complex
patterns
36
Content, structure, navigation,
interaction
• in any given information system, there are
many interactions that can emerge when
people use it, influenced by the IA of the site
• IAs use combinations of these components to
define the framework that constrains user
interactions
– Problem: we don’t understand well how to study
and design for emerging user experiences
– We don’t know how each contributes to the user
experience
• This is why we need inductive analysis
37
Constructive induction (ci)
• IA as constructive induction
– This is a process for generating a design solution
using two intertwined searches
– First: identify the most adequate representational
framework for the problem
– Second: locate the best design solution within the
framework and translating it to the problem at
hand
• ci is useful when existing theory cannot
adequately explain the object of study
38
What are the steps for applying ci?
• Well, actually, the steps are exactly those for
a use case development, modeling, design
and implementation
• Thus the need for experience in preparing a
use case.
39
Interaction theory
• We can come to a system with an “information
task”
• Problem-solving: we go through a patterned
process and end with a relevance judgment
• We can also have chance encounters, encounters
with information, scanning activities
• These are less patterned but still end with some
type of judgment
• Then we browse, navigate, search, evaluate…
• Information interaction is the basis of the person’s
use experience
40
Deductive Information Systems
41
But wait!
• We develop and implement means (designs,
architectures, systems, etc.) that perpetuate
these two modes of investigation
• That’s a good thing? Right?
• Well, sometimes…
42
So what about abductive IS?
• This is another warm up for next class
• Abductive reasoning starts when an inquirer
considers of a set of seemingly unrelated facts,
armed with an intuition that they are somehow
connected.
• The term abduction is commonly presumed to mean
the same thing as hypothesis; however, an
abduction is actually the process of inference that
produces a hypothesis as its end result
43
Huh abduction?
Is a method of
logical inference
introduced by C. S.
Peirce** which
comes prior to
induction and
deduction for which
the colloquial name
is to have a "hunch”
Is abductive reasoning new?
• NO – but we’ve beaten it out of modern
information systems…..
• Why?
– Closed world approaches – huh?
– We’ve programmed “systems”
– Too much data/ information
– We lost sight of other options
45
Abductive Information System?
• What would this look like?
• If you consent that induction is fundamentally part of
how most (all) information system are developed, then
how would you allow for abduction before induction
may be possible?
46
Abductive Information System?
• Choices?
– More or less
• Presentation?
– How would that look
different?
• Design factors?
– TO invoke the human side
• Architecture factors?
– Hide what’s not needed,
but expose what is
• Cognitive factors?
47
Geographic Information Systems
• Why mention a specific IS?
• Geography!
–
–
–
–
Spatial
Provides context
Provides structure
Often predetermines content
form
• Wikipedia: “the term describes any information
system that integrates, stores, edits, analyzes,
shares, and displays geographic information”
• Discuss: a lesson for constraining uncertainty! 48
Questions on?
• About systems
• Information systems
• The elements of theory so far
– Entropy/ uncertainty
• Content, context, structure
• Presentation, organization, noise
• Induction, deduction, abduction
49
Reading for this week
• Is retrospective but … relates to a coming
assignment
–
–
–
–
–
–
Information entropy
Information Is Not Entropy, Information Is Not Uncertainty!
More on entropy
Context
Information retrieval
Abductive reasoning
50
What is next
• Week 4 – Foundations; semiotics, library,
cognitive and social science and class
exercise - information modeling
• Assignment 2
51