Information systems ‘theory’ Peter Fox Xinformatics – ITEC, CSCI, ERTH 4400/6400 Week 3, February 4, 2014
Download ReportTranscript Information systems ‘theory’ Peter Fox Xinformatics – ITEC, CSCI, ERTH 4400/6400 Week 3, February 4, 2014
Information systems ‘theory’ Peter Fox Xinformatics – ITEC, CSCI, ERTH 4400/6400 Week 3, February 4, 2014 1 Contents • Review of last class • Discussion of reading • Information systems theory and principles covering a range of traditional foundation aspects • Next class(es) and assignments 2 Reading • http://en.wikipedia.org/wiki/Use_case • http://alistair.cockburn.us/index.php/Use_cas es,_ten_years_later • Or questions about last week’s material? 3 Systems • Regardless of the type of system, be it an irrigation system, a communications relay system, an information system, or whatever, all systems have three basic properties: – A system has a purpose - such as to distribute water to plant life, bouncing a communications signal around the country to consumers, or producing information for people to use in conducting business. – A system is a grouping of two or more components which are held together through some common and cohesive bond. The bond may be water as in the irrigation system, a microwave signal as used in communications, or, as we will see, data in an information system. – A system operates routinely and, as such, it is predictable in 4 terms of how it works and what it will produce. Thinking in systems • Consists of primarily three things (Meadows) – Elements – Interconnections – Function/ Purpose • Three attributes of – Resilience – Self Organization – Hierarchy 5 Twelve Leverage Points • 12. Constants, parameters, numbers (such as subsidies, taxes, standards) • 11. The size of buffers and other stabilizing stocks, relative to their flows • 10. Structure of material stocks and flows (such as transport network, population age structures) • 9. Length of delays, relative to the rate of system changes • 8. Strength of negative feedback loops, relative to the effect they are trying to correct against • 7. Gain around driving positive feedback loops • 6. Structure of information flow (who does and does not have access to what kinds of information) • 5. Rules of the system (such as incentives, punishment, constraints) • 4. Power to add, change, evolve, or self-organize system structure • 3. Goal of the system • 2. Mindset or paradigm that the system — its goals, structure, rules, delays, parameters — arises from • 1. Power to transcend paradigms 6 First information system? • The first on-line, real-time, interactive, data base system was double-entry bookkeeping which was developed by the merchants of Venice in 1200 A.D. (Bryce’s Law). *** • Truth, author’s opinion or urban legend? 7 Data-Information-Knowledge Ecosystem Producers Consumers Experience Data Creation Gathering Information Presentation Organization Knowledge Integration Conversation Context 8 Presentation • See reading for this week • Separation of content from presentation!! • The theory here is more empirical or semiempirical • Is developed based on a solid understanding of minimizing information uncertainty beginning with content, context and structural considerations and, as we will see, adding cognitive and social factors to reduce uncertainty. • Physiology for humans, color, … 9 Organization • But also, organization of information presentation, e.g. layout on a web page, in a table, or figure, or report • Also (again) content, context and structure • Think about how you organize you – – – – – Class notes Calendar and assignment schedule Your social life (just kidding) Assignments Do, or do not, connect with others’ ways of organizing! • A system?? – Elements, Interconnections, Function/ Purpose 10 All take a deep breath 11 THE PHYSICS OF INFORMATION © 2005 EvREsearch LTD EvREsearch© Equations! 13 Information theory • Entropy and randomness – Critical for information system design – More on why in a few slides 14 Huh? Entropy? • No, you are not in a physics class • Information is always a measure of the decrease of uncertainty at a receiver. 15 Entropy and Rates • R=H(x)-H(x|y) (Shannon; 1948) • R=Rate of transmission measures the average ambiguity of the received signal p(xi) is the probability mass function of outcome xi. "The entropy rate of a data source means the average number of bits per symbol needed to encode it. Shannon's experiments with human predictors show an information rate of between 0.6 and 1.3 bits per character, depending on the experimental setup; the PPM compression algorithm can achieve a compression ratio of 1.5 bits per character in English text.” (wikipedia, 16 Not a perfect story • Many authors criticize the use of the term entropy, and physics of information • Information conservation, diffusion, viscosity, advection, dissipation, instability, steady state, conversion … sort of all make some sense… • But entropy arises in thermodynamics not directly in other places 17 That’s not going to stop us! • However the idea is very relevant to – modeling (sometimes equations) – design (variables) – architecture (how they are put together) – as well as how we “condition the system” • We’ll revisit the components of information soon but first let’s take some examples 18 For information systems? • • • • 31045? (03) 1045 783-1045 +161397831045 • What helps reduce entropy / uncertainty? • Notice: ‘signs’ as information representations 19 Information integrity • We’ve seen that the information (content) of a random variable is defined as the Sum of p x log p, where p=probability. It represents the uncertainty of the variable. • In later classes we cover cognitive and social factors in increasing the conditional entropy and thus reducing the uncertainty and thus increasing information content and value • We will cover semiotics (signs) as a prelude to visualization as a presentation mechanism 20 for information Think of web pages 21 Not worst but poor 22 One more 23 Information gain/loss • The mutual information of two variables define how much information one variable contains about the other. • It is therefore defined as the decrease of the uncertainty of one variable by knowing the other. • In probabilistic terms, the entropy decreases by conditioning on the distribution. • What does this mean for an information system? E.g. a website or web service? 24 Information retrieval • A vast field (metrics with theory behind them) – Precision (very relevant to this class) - is the fraction of the items retrieved that are relevant to the user's information need. – Recall - is the fraction of the items that are relevant to the query that are successfully retrieved. – Fall-out – is the proportion of non-relevant documents that are retrieved, out of all nonrelevant documents available, e.g. 0 fall-out if you return 0 items! – F-measure – is the weighted harmonic mean of 25 precision and recall Models of IR 26 Machine Learning • Daniel Wilkerson: “The precise study of how to make decisions with only incomplete information is deep”. • Part of information systems design (and architecture and implementation underneath) is to ensure people make the best and most robust decisions in the face of uncertainty. 27 Context ! " # $%&' ( ) *+, " *- . ( +" / +*. 0*' ( 0. &! # +' . ( *$1 $+" ! $*%$" * * ! ! 5=>7:93276*8?8:>@A*4 9:B*23*<:969:7=C , 9RB – Curiosity – User profiles @*HI J K*7; ; 69>F*+# ! *L>M; 6792>F*92* >=5>9D>F*; 67?Q<62>88*5328:=<5:A*:3* – Analytics B>9=*8:<F?*B9RB69RB:8*7*; 3:>2:976* • External 532:>M:*3Q*<8>E*+B>*4 =9:>=8*=9RB:6?* N*98*<8>F*Q3=*S>F<57:932A*8B3; ; 92RA* 3@@<2957:932A*; >=83276*92Q3=@7C U*F>:>=@9272:8*3Q*<8>*@7?*9256<F>* 5:3=8E*+B>?*R7D>*V<>8:932279=>8*:3* =*<8>*3Q*:B>*4 >NA*B34 >D>=*:B>?*Q796* 2:8*4 >=>*<892R*9:*Q3=E*' :*98*<269T>6?* R*7*F988>=:7:932*4 966*B7D>*:B>*87@>* 7*8:<F>2:*4 B3*<8>8*9:*:3*D9>4 *; 3=C ; 3:B>898*:B>?*:>8:A*:B7:*:B>=>*98*7* >:4 >>2*; >=5>9D>F*; 67?Q<62>88*72F* 95B*:B>?*Q92F*8<; ; 3=:A*326?*@7T>8* *4 B7:*:B>*4 >N*98*N>92R*<8>F*Q3=E* N7N6?*326?*>M98:8*Q3=*<8>8*4 B95B* " M:>=276*=>78328*Q3=*<8> • Internal - Human context, tacit knowledge %:969:7=972 ] <76 &>5=>7:93276 %8>6>88 – Domain context – Skill/ education context ' 2:>=276*=>78328*Q3=*<8> – Organizational 09R<=>*O^*- 32:>M:8*3Q*<8>* – Procedural/ process * +B98*6>7F8*<8*:3*D98<7698>*8?8:>@8*4 9:B92*7*:4 3* – Unknown/ randomF9@>289 – what is>*78*8B34 an example 3276*8576 2*92*09R<=>*OEof *+B9this? 8*8576>*B78* \ 34 , 9RB 28 N>>2*F9D9F>F*92:3*Q3<=*V<7F=72:8*67N>66>F*%8>6>88A*%:969:7=C 972A*] <76*72F*&>5=>7:93276E*' :*98*7=R<>F*:B7:*4 B95B>D>=* Structure • Is information stored or only presented? • Structural representation of information content can bias presentation, e.g. – Modern image capture devices (digital camera) often convert 2 byte integer to float, or 4 byte integer, what are the implications • Appropriate choice of information structure can significantly decrease uncertainty, e.g. returning land images in GeoTIFF, which can encoding geographic location, instead of PNG 29 Content • Presentation – We’ve covered a fair bit of this so far – What other factors have you thought of? • Translation – Almost essential when transmitting between data structures, e.g. serialization over network protocols, sometime multiple levels; HTTP, TCP/IP • Encoding – Lossless (Huffman, entropy based, 1952) – Lossy (mpeg, jpeg) 30 Organizational control of content • Of encoding standards, e.g. – Lempel-Ziv-Welch (lzw) was proprietary for many years (until 2003) and was used in the GIF format encoding, http://en.wikipedia.org/wiki/Lempel– Ziv–Welch – Moving Picture Experts Group (MPEG) and ISO/IEC JTC1/SC29/WG11, mpeg.org – Joint Photographic Experts Group (JPEG), jpeg.org 31 Noise • Most often refers to ‘data’ but does apply to information • Uncertainty, especially any that is introduced is a source of noise, or more accurately – bias in the use or interpretation of the information • Noise/ bias is context and structure dependent • Noise/ bias contamination is rampant in information systems • Quality control and verification is less developed for information sources, e.g. ‘people do not report problems’ 32 Mode of noise introduction From Shannon and Weaver (1949) Msg? Information Source Signal? Web Content, Structure Recvd? Msg? Web browser? Noise source HTML page, user 33 Means of conduct 34 In Information Systems: • An example of inductive research: – Gather data – Analyze and reanalyze the data – Organize the data within broad topics – Create categories within the topics – Identify relationships among the categories – Synthesize the patterns into conclusions 35 Must be inductive? (Haverty) • It does not have an existing body of theory which typically guides the work of a field – Theory constrains acceptable solutions through formal validation – Without it, IAs – Information Architectures tend to treat each problem as novel • Also, it supports emergent phenomena – The IA domain has a small set of initial components and a relatively simple set of rules – These lead to a large number of complex patterns 36 Content, structure, navigation, interaction • in any given information system, there are many interactions that can emerge when people use it, influenced by the IA of the site • IAs use combinations of these components to define the framework that constrains user interactions – Problem: we don’t understand well how to study and design for emerging user experiences – We don’t know how each contributes to the user experience • This is why we need inductive analysis 37 Constructive induction (ci) • IA as constructive induction – This is a process for generating a design solution using two intertwined searches – First: identify the most adequate representational framework for the problem – Second: locate the best design solution within the framework and translating it to the problem at hand • ci is useful when existing theory cannot adequately explain the object of study 38 What are the steps for applying ci? • Well, actually, the steps are exactly those for a use case development, modeling, design and implementation • Thus the need for experience in preparing a use case. 39 Interaction theory • We can come to a system with an “information task” • Problem-solving: we go through a patterned process and end with a relevance judgment • We can also have chance encounters, encounters with information, scanning activities • These are less patterned but still end with some type of judgment • Then we browse, navigate, search, evaluate… • Information interaction is the basis of the person’s use experience 40 Deductive Information Systems 41 But wait! • We develop and implement means (designs, architectures, systems, etc.) that perpetuate these two modes of investigation • That’s a good thing? Right? • Well, sometimes… 42 So what about abductive IS? • This is another warm up for next class • Abductive reasoning starts when an inquirer considers of a set of seemingly unrelated facts, armed with an intuition that they are somehow connected. • The term abduction is commonly presumed to mean the same thing as hypothesis; however, an abduction is actually the process of inference that produces a hypothesis as its end result 43 Huh abduction? Is a method of logical inference introduced by C. S. Peirce** which comes prior to induction and deduction for which the colloquial name is to have a "hunch” Is abductive reasoning new? • NO – but we’ve beaten it out of modern information systems….. • Why? – Closed world approaches – huh? – We’ve programmed “systems” – Too much data/ information – We lost sight of other options 45 Abductive Information System? • What would this look like? • If you consent that induction is fundamentally part of how most (all) information system are developed, then how would you allow for abduction before induction may be possible? 46 Abductive Information System? • Choices? – More or less • Presentation? – How would that look different? • Design factors? – TO invoke the human side • Architecture factors? – Hide what’s not needed, but expose what is • Cognitive factors? 47 Geographic Information Systems • Why mention a specific IS? • Geography! – – – – Spatial Provides context Provides structure Often predetermines content form • Wikipedia: “the term describes any information system that integrates, stores, edits, analyzes, shares, and displays geographic information” • Discuss: a lesson for constraining uncertainty! 48 Questions on? • About systems • Information systems • The elements of theory so far – Entropy/ uncertainty • Content, context, structure • Presentation, organization, noise • Induction, deduction, abduction 49 Reading for this week • Is retrospective but … relates to a coming assignment – – – – – – Information entropy Information Is Not Entropy, Information Is Not Uncertainty! More on entropy Context Information retrieval Abductive reasoning 50 What is next • Week 4 – Foundations; semiotics, library, cognitive and social science and class exercise - information modeling • Assignment 2 51