Transcript Slide 1
Visualization in Text
Analysis Problems
VAC Consortium Meeting
Stanford, May 24, 2006
Marti Hearst
School of Information, UC Berkeley
Outline
Some Visualization Design Principles
Illustrated with a new example
Why Text is Tricky to Visualize
How to do good visualization design with
text while meeting analysts needs?
Focus on Flexibility with Reproducibility
Examples from 4 different domains
What Makes for a Good Visualization?
Visually illuminates important aspects of
the underlying data and domain.
Supports the users’ tasks (better than
without the visualization).
Adheres to good design principles.
Example from Software Engineering
Marat Boshernitsan, UC Berkeley PhD Dissertation 2006
Problem: need to make complex changes
throughout code.
Example: convert from one API to another.
A Typical Solution
Either requires programmers to understand
and manipulate abstract syntax trees …
Or requires learning another programming
language (or both)!
First Attempt
Second Attempt
A Better Solution
Build on how programmers think about
programming.
Operate on the textual representation of code.
Users Operate on Familiar Visual
Representation of Code
Context-and-Domain Sensitive
Visual Cues
Lessons from this Example
User-centered Design
This was the third attempt.
First 2 attempts did not accurately reflect how
users think about the problem.
Careful design of labels and interaction cues
Very intelligent backend, but user-activated.
Visually and interactively reflects how
programmers think about programming.
What Makes for a Good Visualization
for Analysts?
Visually illuminates important aspects of
the underlying data and domain.
Supports the users’ tasks (better than
without the visualization).
Adheres to good design principles.
Goals vs. Tasks
Analysts’ Goals:
Understand current and past situations
Predict and anticipate future situations
Observations by Pirolli & Card ’05:
Different analysts starting with people,
organizations, tasks, and time:
predict coup likelihood
understand bio-warfare threats
understand relations within cartel
Goals vs. Tasks
Analysts’ tasks:
Explore
Extract
Filter
Link
Arrange
Compare
Hypothesize
(A combination of Foraging and Sensemaking)
Should do the tasks only to support the goals.
Design Principles for Analysts
Experienced analysts notice what is
missing or unexpected (Wright et al. ’06)
Thus consistency and reproducibility are
important.
Design Principles for Analysts
Analysts must guard against
confirmation bias. (Pirolli & Card ’05)
Thus it is important for analysts to
Be able to easily arrange and re-arrange,
View information flexibly from many angles,
While at the same time retaining
consistency and reproducibility.
However … it’s hard to do this with text.
Working with Text
Text is especially difficult to visualize
Very high dimensionality
Tens to hundreds of thousands of features
Compositional
Can be combined together in innumerable ways
Abstract
And so difficult to visualize
Not pre-attentive
Must foveate to read
Subtle
Small differences matter
Unordered
Text Meaning is NOT pre-attentive
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
GOVERNS PRECISE EXAMPLE MERCURY SNREVOG ESICERP ELPMAXE YRUCREM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
SUBJECT PUNCHED QUICKLY OXIDIZED TCEJBUS DEHCNUP YLKCIUQ DEZIDIXO
CERTAIN QUICKLY PUNCHED METHODS NIATREC YLKCIUQ DEHCNUP SDOHTEM
SCIENCE ENGLISH RECORDS COLUMNS ECNEICS HSILGNE SDROCER SNMULOC
Why Text is Tough
Abstract concepts are difficult to visualize
Combinations of abstract concepts are
even more difficult to visualize
time
shades of meaning
social and psychological concepts
causal relationships
Why
Text
is Tough
Why
Text
is Tough
The dog..
Why
Text
is Tough
Why
Text
is Tough
The dog.
The dog cavorts.
The dog cavorted.
Why
Text
is Tough
Why
Text
is Tough
The man.
The man walks.
Why
Text
is Tough
Why
Text
is Tough
The man walks the cavorting dog.
So far, we can sort of show this in pictures.
Why
Text
is Tough
Why
Text
is Tough
As the man walks the cavorting dog, thoughts
arrive unbidden of the previous spring, so unlike
this one, in which walking was marching and
dogs were baleful sentinels outside unjust halls.
How do we visualize this?
Why
Text
is
Tough
Why Text is Tough
Language only hints at meaning
Most meaning of text lies within our minds and
common understanding
“How much is that doggy in the window?”
how much: social system of barter and trade
(not the size of the dog)
“doggy” implies childlike, plaintive, probably
cannot do the purchasing on their own
“in the window” implies behind a store window,
not really inside a window, requires notion of
window shopping
Why
Text
is
Tough
Why Text is Tough
General categories have no standard
ordering (nominal data)
Categorization of documents by single
topics misses important distinctions
Consider an article about
NAFTA
The effects of NAFTA on truck manufacture
The effects of NAFTA on productivity of
truck manufacture in the neighboring cities
of El Paso and Juarez
Why Text is Tough
Other issues about language
Ambiguous (many different meanings for
the same words and phrases)
Same meaning implied by different
combinations
Different combinations imply different
meanings
Why Text is (Deceptively) Easy
Text is easier when you have a lot of it
Web search is now usually conjunction
Text has a lot of redundancy
A very simple algorithm can:
Pull out “important” phrases
Find “meaningfully” related words
Create a “summary” from document
Group “related” documents
Simple Text Analysis can Mislead
Most frequent words
Biases towards concepts with unique identifiers.
From Spink, Wolfram, Jansen, Saracevic, JASIS ‘01
Major Trends vs. Minor Discoveries
With text, it’s easy to extract and show the
largest, main trends
But often we want the rare but unexpected
and important event:
Russian oil company example
Schwarzenegger and Enron
Cigarettes and kids
Person on the periphery who is working stealthily to
influence things
This is really difficult to solve!
Design Principles for Analysts
Experienced analysts notice what is missing or
unexpected.
Analysts must guard against confirmation bias.
Need to be able to easily arrange and re-arrange,
View information flexibly from many angles,
While at the same time retaining consistency and
reproducibility.
Interfaces should reflect the domain and data.
How to achieve this with text collections?
Must transform text in understandable ways
Must provide multiple, consistent views that
nevertheless allow for new discovery and insight
Why Emphasize Flexibility?
Can’t view representations of all the
text content at once.
Instead, needs ways to flexibly
navigate, group, organize, explore
See important pieces over time.
The Importance of Flexibility
Russell, Slaney, Qu, Houston ’05
The ease of viewing and manipulation in the system
strongly influenced the kind of analysis operations done.
Examples of Flexibility on Text Data
PaperLens (Conference proceedings)
TAMKI (Customer service requests)
Faceted Browsing (e-commerce)
Flamenco
Ebay Express
FaThumb
TRIST and Sandbox (Analysts)
Flexible views
Infoviz 2004 contest
Visualize 8 years of conference proceedings
Tasks:
1.
2.
3.
4.
5.
Static Overview of 10 years of Infovis
Characterize the research areas and their evolution
The people in InfoVis
Which papers/authors are most often referenced?
How many papers conducted a user study?
PaperLens integrated solution by Lee, Czerwinski,
Robertson, Bederson
Uses graphical elements and brushing and linking
to flexibly elicudate a collection’s contents.
http://www.cs.umd.edu/hcil/InfovisRepository/contest-2004/index.shtml
Flexibility in Foraging and Analysis
TAKMI, by Nasukawa and Nagano, ‘01
The system integrates:
Analysis tasks (customer service help)
Content analysis
Information Visualization
Flexibility in Analysis
TAKMI, by Nasukawa and Nagano, 2001
Documents containing “windows 98”
Flexibility in Analysis
TAKMI, by Nasukawa and Nagano, 2001
TAKMI, by Nasukawa and Nagano, 2001
Patent documents containing “inkjet”,
organized by entity and year
Flexibility in Category Navigation
Browsing Information Collections using
(Hierarchical) Faceted Metadata
What are facets?
Sets of categories, each of which describe
a different aspect of the objects in the
collection.
Each of these can be hierarchical.
(Not necessarily mutually exclusive nor
exhaustive, but often that is a goal.)
GeoRegion
+ Time/Date
+
Topic
Facet example: Recipes
Cooking
Method
Ingredient
Stir-fry
Chicken
Red Bell Pepper
Course
Main Course
Curry
Cuisine
Thai
Nobel Prize Winners Collection
New Site: eBay Express
Is This Visualization?
Prior experience and other people’s
attempts seem to suggest that fewer
graphics and more text is better.
Details of layout, font and color contrast,
label selection, and interaction make all
the difference.
Earlier Variation on the Idea
Cat-a-Cone, 1997
Mobile Variation
FaThumb: Karlson, Robertson, Robbins, Czerwinski, Smith ’06
Well-received, but visualization part not looked at.
Flexibility in SenseMaking
DLITE by Cousins et al. ‘97
Sandbox by Wright et al. ‘06
TRIST (The Rapid Information Scanning Tool) is the work space for
Information Retrieval and Information Triage.
Flexibility in Sensemaking
TRIST, Jonkers et al 05
User Defined and
Automatic Categorization
Launch
Queries
Comparative Analysis
of Answers and Content
Rapid Scanning
with Context
Entities
Query
History
Dimensions
Annotated
Document
Browser
Linked Multi-Dimensional
Views Speed Scanning
Flexibility for Sensemaking Support
Sandbox, Wright et al ‘06
Quick Emphasis of
Items of Importance.
Dynamic
Analytical Models.
Direct interaction
with Gestures
(no dialog, no controls).
Assertions with
Proving/Disproving
Gates.
Communication-Centric Text
Email, conversations, blogs
The first thought is usually nodes and
links
Doesn’t have the desired flexibility
Some alternatives:
The Network
Multivariate Networks
Re-envisioning Networks
Viewing people’s shared workplaces,
hometowns, schools over time.
www.theyrule.net:
Re-envisioning Networks
First cut:
Hastings, Snow, and King ’05
Reenvisioning
Networks
Better version:
Hastings, Snow, and King ’05
Re-envisioning Networks
Wattenberg ’06
OLAP on directed labeled graphs
Network Flexibility
Martin Wattenberg, “Visual Exploration of Multivariate Graphs”
M
Location A
Location B
Location C
Location D
Location E
F
Re-envisioning Networks
Idea: vary these ideas to apply to
email and other communication text.
Summary:
Text Viz Design Guidelines
An emphasis on flexible views on text data
Emphasize brushing and linking using appropriate
visual cues.
Interaction flow should guide the user but also be
flexible.
Information structure should be consistent and
reproducible.
Other guidelines:
Make text visible.
Visual components should reflect the data and tasks.
Thank you!
www.sims.berkeley.edu/~hearst