Transcript Slide 1

Visual Thinking
&
Thinking about Visualization
William Ribarsky
Charlotte Visualization Center
SouthEast RVAC
Visual Thinking
Visual Reasoning
Foraging, Analysis, Reasoning, and DecisionMaking for Large Data and Complex Problems
• Objective – Develop capabilities
for collecting evidence from
large and multiple data sources,
with multiple analysis tools.
Build hypotheses and use to
steer data collection. Methods
must be automated but subject
to user control. Integrate all for
presentation or decision.
• DHS Mission Impact – New
means of support for intelligence
analysts, disaster prevention
planners, and emergency
responders.
Sensemaking Loop
STAB
RESIN
Foraging/Analysis
Loop
RESIN: Foraging, Analysis, and Reasoning
Problem – How to reason
STAB
towards a decision with
massive data, several visual
analytics tools, and limited
time and other resources
Solution –
1. Build an end-to-end process
for reasoning towards a
decision with limited time
and resources
2. Provide a mixed initiative
capability so that computer
and user can work together,
but always under user
control.
3. Provide a capability for
reasoning about complex
problems with several
aspects.
RESIN
A runtime view of RESIN’s
control panel while solving a
task with a tight deadline. The
image browser tool is chosen to
analyze the data. On the bottom
left is a hierarchical description
of the problem solving process
which also captures the realtime execution information of
various sub-tasks. On the top
right is a partial view of the
Markov Decision Process used
to compute the decision policy
for the task instance.
Multimedia: Automated Video Content Analysis
Multimedia: Automated Video Semantic Analysis
• News Interestingness Prediction
PS , G
j
News Story
Collection
wL G    PS j , Llog
j
Predictor
PS j , L
PS j , G 
Interestingness
User Preference
Usage History
Set of
news stories
PS , L
j
Multimedia: Video Semantic Analysis
• News Theme Network Visualization
Multimedia: News Broadcast Analysis
Problem – How should we
handle the stream of
thousands of stories and
themes from many
sources over time?
“Ultimately,
Solution –you gotta read (view)
the
Stasko and
1. stories”
Develop–John
LensRiver
EventRiver capabilities.
2. Develop highly interactive
ways to explore themes
and sub-themes, their
interlinkages, and stories
over time.
LensRiver hierarchical display
Multimedia: News Broadcast Analysis
Deep Exploration &
Reasoning Capabilities
• Hierarchical exploration
• Filter by theme (also,
broaden/narrow)
• Shoebox
• Search by Example
EventRiver
Multimedia: News Broadcast Analysis
Emerging events
Comparing themes and
sub-themes for
different channels
Karr
plot
Jonbenet
Ramsey
Emerging
Themes
CNN (above) and Fox top 30
themes from 8/1 to 8/24/2006
With These Tools, There Is Much That Can Be Done
(Some of Which Is Underway Already)
• All the news. High value content from official and “semiofficial” news sources at all levels.
• Identification and tracking of events and themes.
• High quality knowledge structures over time.
• Analysis of different viewpoints, different opinions based
on origin of story, what is being talked about, who is
talking, etc.
– Local vs. national
– Different broadcast styles (e.g., Fox vs. CNN vs. Al Jazeera)
• News at one level (e.g., local or for a foreign region) that
is not being reported nationally or in other regions.
Integrating Terrorism Data Analysis
and News Analysis
• News analysis is the foundation of systematic
databases such as the Global Terrorism Database
and the Minorities at Risk Database.
• News is a source for most investigative analyses
(e.g., fraud and money laundering analysis).
• Compiling the systematic databases is very labor
intensive requiring experienced (i.e., expensive)
investigators.
• Other investigations are also laborious
Integrating Terrorism Data Analysis
and News Analysis
• News stories are as much viewpoint and opinion as
news.
• Can thus get different angles from local, regional, or
different national news sources.
• Automated news analysis provides a complete record
of everything that’s going on over any period of time.
• News stories have strong relationships.
• News can follow the flow and change of a story over
time.
• News is immediate, but it is also rough and
incomplete
Integrating Terrorism Data Analysis
and News Analysis
Terrorism
Visual
Analysis
Terrorism
Databases
Terrorism
VA
STAB/
RESIN
Reasoning
Environment
Jigsaw
NVAC
Framing,
Broadcast
Affective Analysis
News
VA
Visual
Analysis
News Story
Databases
Next: full, Web-based multimedia content and
the Dark Web
Visual Analysis of Terrorism Data:
Supporting The Investigative Process
Where
When
Who
What
Example: selections on the GTD
spatio-temporal interface that support
investigative analysis. User would be
able to follow over time and space.
The user-driven investigation
addresses the issues of why.
Five Flexible
Entry
Components
Terrorism Data Analysis
• Combine continuous and
categorical data
• Curved ribbons for better
readability of the data
• Layering of ribbons
• First results:
- Number of terrorists
killed depends strongly
on type of entity
attacked:
- large number killed
when attacking
police/military
- few terrorists killed in
most other cases, like
businesses,
transportation, etc.
Terrorism Data Analysis
Number of female terrorists
depends on the region:
-Female terrorists in Latin
America and Europe
-hardly any female
terrorists in
Asia, Middle East, and
throughout Africa
Future plans for curved/forked
ribbons:
•Full interaction with these
ribbons: reordering,
highlighting
•Histograms on numerical
axes
•Filtering by categorical or
numerical axes (including
time)
Applying Visual Analytics to
Financial Transactions
Relevant Properties of Visual Analytics
•Positioned for exploration and discovery.
-Highly interactive, contextual views, “unstructured”
exploration
•Meant for large and/or complex data, with uncertainty,
with missing data (but we may not know where the holes
are), with data that are constructed to be purposely
misleading.
•Support of analytical reasoning, argument building,
evidence gathering and marshaling.
•Support of argument presentation and reporting (smart
reporting).
Application: Financial Fraud Analysis
All transaction activity
Identify
Interactive
Visualization
Google
Prioritize
Investigate
Report
WireVis:
Challenges to Financial Fraud Detection
• Bad guys are smart
– Automatic detection (black box) approach is reactive
to already known patterns
– Usually, bad guys are one step ahead
• Evaluation is difficult
– Difficult to obtain “Ground Truth”
– Financial Institutions do not perform law enforcement
• Suspicious reports are filed
• Turn around time on accuracy of reports could be long
• What is the percentage of fraudulent activities that are
actually found and reported?
WireVis:
Challenges with Wire Fraud Detection
• Size
– More than 200,000 transactions per day
• “No transaction by itself is suspicious”
– “It’s like searching for a needle in a stack of
needles” –Bill Fox
• Lack of International Wire Standard
– Loosely structured data with inherent ambiguity
London
Charlotte, NC
Singapore
Indonesia
WireVis:
Challenges with Wire Fraud Detection
London
Charlotte, NC
Singapore
Indonesia
• No Standard Form…
– When a wire leaves Bank of America in Charlotte…
– The recipient can appear as if receiving at London,
Indonesia or Singapore
• Vice versa, if receiving from Indonesia to Charlotte
– The sender can appear as if originating from London,
Singapore, or Indonesia
WireVis: Using Keywords
• Keywords…
– Words that are used to filter all transactions
• Only transactions containing keywords are flagged
– Highly secretive
– Typically include
•
•
•
•
Geographical information (country, city names)
Business types
Specific goods and services
Etc
– Updated based on intelligence reports
– Ranges from 200-350 words
– Could reduce the number of transactions by up to
90%
– Most importantly, give quantifiable meanings
(labels) to each transaction, and are repositories
of expert knowledge.
WireVis:
Current Practice at Bank of America
• Database Querying
– Experts filter the transactions by keywords, amounts,
date, etc.
– Results are displayed in a spreadsheet.
• Problems
– Cannot see more than a week or two of transactions
• Difficult to see temporal patterns
– It is difficult to be exploratory using a querying system
WireVis:
System Overview
Heatmap View
(Accounts to Keywords
Relationship)
Search by Example
(Find Similar
Accounts)
Keyword Network
(Keyword
Relationships)
Strings and Beads
(Relationships over Time)
WireVis:
Heatmap View
 List of Keywords
 Sorted by frequency from high to low (left
to right)
 Hierarchical
Clusters of
Accounts
 Sorted by
activities from big
companies to
individuals (top to
bottom)
 Fast “binning”
that takes O(3n)
 Number of occurrences of keywords
 Light color indicates few occurrences
WireVis:
Strings and Beads
 Each string corresponds to a cluster of accounts
in the Heatmap view
 Each bead represents a day
 Y-axis can
be amounts,
number of
transactions,
etc.
 Fixed or
logarithmic
scale
 Time
WireVis:
Keyword Network
• Each dot is a keyword
• Position of the keyword is
based on their relationships
– Keywords close to each
other appear together
more frequently
– Using a spring network,
keywords in the center are
the most frequently
occurring keyword
• Link between keywords
denote co-occurrence
WireVis:
Search by Example
 Target Account
 Histogram depicts
the occurrences
of keywords
 User interactive
selects features
within the
histogram used in
comparison
 Accounts that
are within the
similarity
threshold
appear ranked
(most similar on
top)
 Similarity threshold slider
WireVis:
Case Study
• Evaluation performed with James Price, lead analyst of
WireWatch of Bank of America
• Dataset has been sanitized and down sampled
• Video
• This system is generalizable to visual analysis of
transactional data
WireVis:
Integrated with Full Transaction Database
• Scalability
– We’re now connected to the database at Bank of America with
10-20 millions of records over the course of a rolling year (13
months)
– Connecting to a database makes interactive visualization tricky
• Unexpected Results (Access through the VA interface!)
– “go to where the data is” – operations relating to the data are
pushed onto the database (e.g, clustering).
Database
SQL
JDBC
Stored
Procedure
Temp Tables
WireVis Client
Raw Data
WireVis:
Integrated with Full Transaction Database
• Performance Measurements
– Data-driven operations such as re-clustering,
drilldown, transaction search by keywords require
worst case of 1-2 minutes.
– All other interactions remain real time
• No pre-computation / caching
• Single CPU desktop computer
• WireVis is in deployment on James Price’s
computer at WireWatch for testing and
evaluation
This is a general approach applicable to all types
of data.
WireVis:
Future Work
• Use text analysis (like IN-SPIRE) to automatically
identify keywords and associated important terms.
• Relationships between Accounts
– Seeing who send money to whom (over time) is
important
• Evaluation
– Working with analysts, try to understand how they use
the system and how to better their workflow
• Tracking and Reporting
– With tracking, we can make the analysis results
“repeatable”, “sharable”, and “accountable”
Financial Visual Analytics Workshop
•Met in Charlotte on December 3, 4 2007
•Participants from federal agencies (DHS, CIA, FinCEN,
Treasury, DEA), NVAC, Banks, National Insurance Crime
Bureau, and including several key university researchers.
•Report and recommendations coming out and to be
disseminated within the month.
Visual Reasoning (Knowledge Visualization)
+
Interaction Theory
Can we identify (conjecture) some (design) principles
even without a full theory?
Just thinking about visualization tasks in this way can
pay off.
Some Ideas That Could Lead to Principles
•“The interaction is the analysis.” --Remco Chang
•Keep interaction simple and direct.
•For more complex problems, have multiple views
(more pixels).
-Each one optimized for its purpose & integrated
with the others.
-Balanced interaction among views.
-There is a trade-off. How many views?
•Each interactive visualization should have the highest
value for that moment in the reasoning process.
Knowledge Visualization
More Ideas
•Determine the highest value (how?)
-Task-dependent
•But are there valuable visual artifacts that are
general, or that would be useful for a whole set of
tasks? Or are there general tasks?
-General Task: Exploration and Discovery
•Alternatively, are there ways to set up high value
visualizations where the artifacts that populate them
are task-dependent but the way to set them up are
general (e.g., spatio-temporal layouts)?
•Can we build models, even rather crude heuristic
models, with predictive capability?
Determining the Value: Knowledge Visualization
Data
Visualization
Information
Visualization
Knowledge
Visualization
Properties of Knowledge
•Knowledge is of higher value than information or
data.
•Knowledge begets knowledge.
•Knowledge is compact.
•Knowledge is connected (more connections, more
value).
•Labeling is important (also, captions, titles, text
annotations).
•Knowledge artifacts are the elements of reasoning.
•Knowledge can be made independent of user and
context (including domain).
What is Knowledge?
Knowledge is the “perception of agreement or disagreement of
two ideas.” -- John Locke (1689)
Ideas: The content of cognition; specific thoughts.
•To distinguish between ideas, one needs an
inferential framework.
•The basic element in such a framework are two
concepts (or ideas) and a connecting inference.
United States
Thus knowledge is built of ideas and their
inferential relations.
•In an ontology, the basic element is two
objects or concepts and their linking
(inferential) relation.
Montana
Washington
The Value of Visualization
Visualization Model
D
Im
D
K
V
P
dK/dt
K
D
D
dS/dt
S
data
D: data
S: specifications
visualization
E
user
V: visualization
P: knowledge process
Im: resultant image E: interactive exploration
-van Wijk, 2005
The Value of Visualization
Knowledge
Im(t )  V ( D, K
S ,,tS) , t )
Data
value
dK
 P ( Im , K )
dt
t
K (t )  K0   P( Im, K , t )dt
0
time
t

P is a functional and is a path integral!
Cost/Benefit Analysis
0
Return on Investment
G  nm W( K )
F  G  C  nm(W (K )  CS  kCe )  Ci  nCu
Profit
What is the Role of Interaction?
•The principal role of interaction in knowledge
visualization is to involve the user intimately in
exploration, discovery, and knowledge creation.
•The best interactive interface should have an air of
inevitability, successfully answering the question
“what next?”
t
K (t )  K0   P( Im, K , t )dt
0
Interaction selects the path that maximizes the above.
Knowledge Visualization: Bioinformatics
Knowledge Visualization: Bioinformatics
Knowledge Visualization: Bioinformatics
Knowledge Visualization: Bioinformatics
Questions?
www.srvac.uncc.edu
www.viscenter.uncc.edu