Transcript Slide 1

Visualization in Text
Analysis Problems
VAC Consortium Meeting
Stanford, May 24, 2006
Marti Hearst
School of Information, UC Berkeley
Outline
 Some Visualization Design Principles
 Illustrated with a new example
 Why Text is Tricky to Visualize
 How to do good visualization design with
text while meeting analysts needs?
 Focus on Flexibility with Reproducibility
 Examples from 4 different domains
What Makes for a Good Visualization?
 Visually illuminates important aspects of
the underlying data and domain.
 Supports the users’ tasks (better than
without the visualization).
 Adheres to good design principles.
Example from Software Engineering
Marat Boshernitsan, UC Berkeley PhD Dissertation 2006
 Problem: need to make complex changes
throughout code.
 Example: convert from one API to another.
A Typical Solution
 Either requires programmers to understand
and manipulate abstract syntax trees …
 Or requires learning another programming
language (or both)!
First Attempt
Second Attempt
A Better Solution
 Build on how programmers think about
programming.
 Operate on the textual representation of code.
Users Operate on Familiar Visual
Representation of Code
Context-and-Domain Sensitive
Visual Cues
Lessons from this Example
 User-centered Design
 This was the third attempt.
 First 2 attempts did not accurately reflect how
users think about the problem.
 Careful design of labels and interaction cues
 Very intelligent backend, but user-activated.
 Visually and interactively reflects how
programmers think about programming.
What Makes for a Good Visualization
for Analysts?
 Visually illuminates important aspects of
the underlying data and domain.
 Supports the users’ tasks (better than
without the visualization).
 Adheres to good design principles.
Goals vs. Tasks
 Analysts’ Goals:
 Understand current and past situations
 Predict and anticipate future situations
Observations by Pirolli & Card ’05:
 Different analysts starting with people,
organizations, tasks, and time:
 predict coup likelihood
 understand bio-warfare threats
 understand relations within cartel
Goals vs. Tasks
 Analysts’ tasks:
 Explore
 Extract
 Filter
 Link
 Arrange
 Compare
 Hypothesize
 (A combination of Foraging and Sensemaking)
 Should do the tasks only to support the goals.
Design Principles for Analysts
 Experienced analysts notice what is
missing or unexpected (Wright et al. ’06)
 Thus consistency and reproducibility are
important.
Design Principles for Analysts
 Analysts must guard against
confirmation bias. (Pirolli & Card ’05)
 Thus it is important for analysts to
 Be able to easily arrange and re-arrange,
 View information flexibly from many angles,
While at the same time retaining
consistency and reproducibility.
 However … it’s hard to do this with text.
Working with Text
Text is especially difficult to visualize
 Very high dimensionality
 Tens to hundreds of thousands of features
 Compositional
 Can be combined together in innumerable ways
 Abstract
 And so difficult to visualize
 Not pre-attentive
 Must foveate to read
 Subtle
 Small differences matter
 Unordered
Text Meaning is NOT pre-attentive
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
Why Text is Tough
 Abstract concepts are difficult to visualize
 Combinations of abstract concepts are
even more difficult to visualize
 time
 shades of meaning
 social and psychological concepts
 causal relationships
Why
Text
is Tough
Why
Text
is Tough
The dog..
Why
Text
is Tough
Why
Text
is Tough
The dog.
The dog cavorts.
The dog cavorted.
Why
Text
is Tough
Why
Text
is Tough
The man.
The man walks.
Why
Text
is Tough
Why
Text
is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
Why
Text
is Tough
Why
Text
is Tough
As the man walks the cavorting dog, thoughts
arrive unbidden of the previous spring, so unlike
this one, in which walking was marching and
dogs were baleful sentinels outside unjust halls.
How do we visualize this?
Why
Text
is
Tough
Why Text is Tough
 Language only hints at meaning
 Most meaning of text lies within our minds and
common understanding
 “How much is that doggy in the window?”
 how much: social system of barter and trade
(not the size of the dog)
 “doggy” implies childlike, plaintive, probably
cannot do the purchasing on their own
 “in the window” implies behind a store window,
not really inside a window, requires notion of
window shopping
Why
Text
is
Tough
Why Text is Tough
 General categories have no standard
ordering (nominal data)
 Categorization of documents by single
topics misses important distinctions
 Consider an article about
 NAFTA
 The effects of NAFTA on truck manufacture
 The effects of NAFTA on productivity of
truck manufacture in the neighboring cities
of El Paso and Juarez
Why Text is Tough
 Other issues about language
 Ambiguous (many different meanings for
the same words and phrases)
 Same meaning implied by different
combinations
 Different combinations imply different
meanings
Why Text is (Deceptively) Easy
 Text is easier when you have a lot of it
 Web search is now usually conjunction
 Text has a lot of redundancy
 A very simple algorithm can:
 Pull out “important” phrases
 Find “meaningfully” related words
 Create a “summary” from document
 Group “related” documents
Simple Text Analysis can Mislead
 Most frequent words
 Biases towards concepts with unique identifiers.
From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01
Major Trends vs. Minor Discoveries
 With text, it’s easy to extract and show the
largest, main trends
 But often we want the rare but unexpected
and important event:
 Russian oil company example
 Schwarzenegger and Enron
 Cigarettes and kids
 Person on the periphery who is working stealthily to
influence things
 This is really difficult to solve!
Design Principles for Analysts
 Experienced analysts notice what is missing or
unexpected.
 Analysts must guard against confirmation bias.
 Need to be able to easily arrange and re-arrange,
 View information flexibly from many angles,
 While at the same time retaining consistency and
reproducibility.
 Interfaces should reflect the domain and data.
 How to achieve this with text collections?
 Must transform text in understandable ways
 Must provide multiple, consistent views that
nevertheless allow for new discovery and insight
Why Emphasize Flexibility?
 Can’t view representations of all the
text content at once.
 Instead, needs ways to flexibly
navigate, group, organize, explore
 See important pieces over time.
The Importance of Flexibility
 Russell, Slaney, Qu, Houston ’05
 The ease of viewing and manipulation in the system
strongly influenced the kind of analysis operations done.
Examples of Flexibility on Text Data
 PaperLens (Conference proceedings)
 TAMKI (Customer service requests)
 Faceted Browsing (e-commerce)
 Flamenco
 Ebay Express
 FaThumb
 TRIST and Sandbox (Analysts)
Flexible views
 Infoviz 2004 contest
 Visualize 8 years of conference proceedings
 Tasks:
1.
2.
3.
4.
5.
Static Overview of 10 years of Infovis
Characterize the research areas and their evolution
The people in InfoVis
Which papers/authors are most often referenced?
How many papers conducted a user study?
 PaperLens integrated solution by Lee, Czerwinski,
Robertson, Bederson
 Uses graphical elements and brushing and linking
to flexibly elicudate a collection’s contents.

http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml
Flexibility in Foraging and Analysis
 TAKMI, by Nasukawa and Nagano, ‘01
 The system integrates:
 Analysis tasks (customer service help)
 Content analysis
 Information Visualization
Flexibility in Analysis
TAKMI, by Nasukawa and Nagano, 2001
 Documents containing “windows 98”
Flexibility in Analysis
TAKMI, by Nasukawa and Nagano, 2001
TAKMI, by Nasukawa and Nagano, 2001
 Patent documents containing “inkjet”,
organized by entity and year
Flexibility in Category Navigation
 Browsing Information Collections using
(Hierarchical) Faceted Metadata
What are facets?
 Sets of categories, each of which describe
a different aspect of the objects in the
collection.
 Each of these can be hierarchical.
 (Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
GeoRegion
+ Time/Date
+
Topic
Facet example: Recipes
Cooking
Method
Ingredient
Stir-fry
Chicken
Red Bell Pepper
Course
Main Course
Curry
Cuisine
Thai
Nobel Prize Winners Collection
New Site: eBay Express
Is This Visualization?
 Prior experience and other people’s
attempts seem to suggest that fewer
graphics and more text is better.
 Details of layout, font and color contrast,
label selection, and interaction make all
the difference.
Earlier Variation on the Idea
 Cat-a-Cone, 1997
Mobile Variation
 FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06
 Well-received, but visualization part not looked at.
Flexibility in SenseMaking
 DLITE by Cousins et al. ‘97
 Sandbox by Wright et al. ‘06
TRIST (The Rapid Information Scanning Tool) is the work space for
Information Retrieval and Information Triage.
Flexibility in Sensemaking
TRIST, Jonkers et al 05
User Defined and
Automatic Categorization
Launch
Queries
Comparative Analysis
of Answers and Content
Rapid Scanning
with Context
Entities
Query
History
Dimensions
Annotated
Document
Browser
Linked Multi-Dimensional
Views Speed Scanning
Flexibility for Sensemaking Support
Sandbox, Wright et al ‘06
Quick Emphasis of
Items of Importance.
Dynamic
Analytical Models.
Direct interaction
with Gestures
(no dialog, no controls).
Assertions with
Proving/Disproving
Gates.
Communication-Centric Text
 Email, conversations, blogs
 The first thought is usually nodes and
links
 Doesn’t have the desired flexibility
 Some alternatives:
 The Network
 Multivariate Networks
Re-envisioning Networks
 Viewing people’s shared workplaces,
hometowns, schools over time.
 www.theyrule.net:
Re-envisioning Networks
 First cut:
 Hastings, Snow, and King ’05
Reenvisioning
Networks
 Better version:
 Hastings, Snow, and King ’05
Re-envisioning Networks
 Wattenberg ’06
 OLAP on directed labeled graphs
Network Flexibility
Martin Wattenberg, “Visual Exploration of Multivariate Graphs”
M
Location A
Location B
Location C
Location D
Location E
F
Re-envisioning Networks
 Idea: vary these ideas to apply to
email and other communication text.
Summary:
Text Viz Design Guidelines
 An emphasis on flexible views on text data
 Emphasize brushing and linking using appropriate
visual cues.
 Interaction flow should guide the user but also be
flexible.
 Information structure should be consistent and
reproducible.
 Other guidelines:
 Make text visible.
 Visual components should reflect the data and tasks.
Thank you!
www.sims.berkeley.edu/~hearst