Thinking Interactively with Visualizations

Download Report

Transcript Thinking Interactively with Visualizations

1/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Visual Analytics Research at Tufts
Remco Chang
Assistant Professor
Tufts University
2/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Problem Statement
• The growth of data is
exceeding our ability to
analyze them.
• The amount of digital
information generated in
the years 2002, 2006,
2010:
– 2002: 22 EB (exabytes, 1018)
– 2006: 161 EB
– 2010: 988 EB (almost 1 ZB)
1: Data courtesy of Dr. Joseph Kielman, DHS
2: Image courtesy of Dr. Maria Zemankova, NSF
3/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Problem Statement
• The data is often complex,
ambiguous, noisy.
Analysis of which requires
human understanding.
– About 2 GB of digital
information is being
produced per person per
year
– 95% of the Digital
Universe’s information is
unstructured
1: Data courtesy of Dr. Joseph Kielman, DHS
2: Image courtesy of Dr. Maria Zemankova, NSF
Intro
4/25
VA
Apps
Dist Func
ATG
Wrap-up
Example: What Does Fraud Look Like?
• Financial Institutions like Bank of America have legal responsibilities to
report all suspicious activities
• Data size: approximately 200,000 transactions per day (73 million
transactions per year)
• Problems:
–
–
–
–
–
Automated approach can only detect known patterns
Bad guys are smart: patterns are constantly changing
No single transaction appears fraudulent
Few experts: fraud detection is considered an “art”
Data is messy: lack of international standards resulting in ambiguous data
• Current methods:
– 10 analysts monitoring and analyzing all transactions
– Using SQL queries and spreadsheet-like interfaces
– Limited to the time scale (2 weeks)
5/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
WireVis: Financial Fraud Analysis
• In collaboration with Bank of America
– Looks for suspicious wire transactions
– Currently beta-deployed at WireWatch
– Visualizes 7 million transactions over 1 year
• Uses interaction to coordinate four perspectives:
–
–
–
–
Keywords to Accounts
Keywords to Keywords
Keywords/Accounts over Time
Account similarities (search by example)
6/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
WireVis: Financial Fraud Analysis
Heatmap View
(Accounts to Keywords
Relationship)
Search by Example
(Find Similar
Accounts)
Keyword Network
(Keyword
Relationships)
Strings and Beads
(Relationships over Time)
R. Chang et al., Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information Visualization,2008.
R. Chang et al., Wirevis: Visualization of categorical, time-varying data from financial transactions. IEEE VAST, 2007.
7/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
What is Visual Analytics?
• Visual analytics is the science of analytical reasoning
facilitated by interactive visual interfaces [Thomas &
Cook 2005]
• Since 2004, the field has grown
significantly. Aside from tens to
hundreds of domestic and
international partners, it now
has a IEEE conference (IEEE
VAST), an NSF program
(FODAVA), and a forthcoming
IEEE Transactions journal.
8/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Individually Not Unique
• Data Mining
• Machine
Learning
• Databases
• Information
Retrieval
• etc
Analytical
Reasoning
and
Interaction
Data
Representation
Transformation
Production,
Presentation
Dissemination
• Tech Transfer
• Report Generation
• etc
•
•
•
•
Interaction Design
Cognitive Psychology
Intelligence Analysis
etc.
Visual
Representation
•
•
•
•
InfoVis
SciVis
Graphics
etc
Validation
and
Evaluation
• Quality Assurance
• User studies (HCI)
• etc
9/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
In Combinations of 2 or 3…
• Data Mining
• Machine
Learning
• Databases
• Information
Retrieval
• etc
Analytical
Reasoning
and
Interaction
Data
Representation
Transformation
Production,
Presentation
Dissemination
Visual
Representation
Validation
and
Evaluation
•
•
•
•
InfoVis
SciVis
Graphics
etc
10/25
Intro
VA
Apps
Dist Func
ATG
In Combinations of 2 or 3…
Analytical
Reasoning
and
Interaction
Data
Representation
Transformation
Production,
Presentation
Dissemination
• Tech Transfer
• Report Generation
• etc
•
•
•
•
Interaction Design
Cognitive Psychology
Intelligence Analysis
etc.
Visual
Representation
Validation
and
Evaluation
Wrap-up
11/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Extending Visual Analytics Principles
Who
• Global Terrorism
Database
– Application of the
investigative 5 W’s
• Bridge Maintenance
Where
What
Evidence
Box
Original
Data
– Exploring subjective
inspection reports
• Biomechanical
Motion
– Interactive motion
comparison
methods
R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum, 2008.
When
12/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Extending Visual Analytics Principles
• Global Terrorism
Database
– Application of the
investigative 5 W’s
• Bridge Maintenance
– Exploring subjective
inspection reports
• Biomechanical
Motion
– Interactive motion
comparison
methods
R. Chang et al., An Interactive Visual Analytics System for Bridge Management, Journal of Computer Graphics Forum, 2010. To Appear.
13/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Extending Visual Analytics Principles
• Global Terrorism
Database
– Application of the
investigative 5 W’s
• Bridge Maintenance
– Exploring subjective
inspection reports
• Biomechanical
Motion
– Interactive motion
comparison
methods
R. Chang et al., Interactive Coordinated Multiple-View Visualization of Biomechanical Motion Data, IEEE Vis (TVCG) 2009.
14/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Human + Computer
A Mixed-Initiative Perspective
• So far, our approach is mostly user-driven
• Human vs. Artificial Intelligence
Garry Kasparov vs. Deep Blue (1997)
– Computer takes a “brute force” approach without analysis
– “As for how many moves ahead a grandmaster sees,” Kasparov concludes: “Just
one, the best one”
• Artificial Intelligence vs. Augmented Intelligence
Hydra vs. Cyborgs (1998)
– Grandmaster + 1 computer > Hydra (equiv. of Deep Blue)
– Amateur + 3 computers > Grandmaster + 1 computer1
• How to systematically repeat the success?
– Unsupervised machine learning + User
– User’s interactions with the computer
1. http://www.collisiondetection.net/mt/archives/2010/02/why_cyborgs_are.php
Computer
Translation
Human
15/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Examples of Human + Computer Computing
• CAPCHA
– RE-CAPCHA
– General Crowd-Sourcing
• Adaptive / Intelligent User
Interfaces (IUI)
• User assisted clustering /
searching
16/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Simple Example
• Distance Function
Achange  xi , x j  | x i  Y1 , x j  Y2 or xi  Y2 , x j  Y1
Aother  xi , x j  | xi , x j  Achange 
arg min


D x , x


i
x i , x j Achange
D x , x


x i , x j Aother
i
j
j

|    I  D xi , x j |  t 1

|    D xi , x j |  t 1


17/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Application 1: Find Important Features
• Data set: X, 178x13
• 3 classes
• add 10 random number columns
as extra features
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Intro
18/25
VA
Apps
Dist Func
ATG
Wrap-up
1st Step: Success
Trying to separate circled green dots from all blue dots
0.2
0.25
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.2
-0.2
-0.15
-0.1
0
0.1
0.2
0.3
-0.2
-0.3
-0.2
-0.1
0
0.1
0.2
Intro
19/25
VA
Apps
Dist Func
ATG
Wrap-up
Result
• Recall the structure of data set
10 Randomly
generated feature
values for every
instance
Original Wine Dataset,
each instance has 13
feature values
• Weight vector:
– Randomly generated features gets low weights
0.096
0.150
0.062
0
0.018
0.011
0.025
0.039
0.037
0.047
0.038
0.011
0
0.017
0
0.046
0
0
0
0
0.091
0.186
0.127
20/25
Intro
VA
Apps
Dist Func
ATG
Visual Analytics for Political Science
Wrap-up
21/25
Intro
VA
Apps
Dist Func
Aggregate Temporal Graph
1000 simulations
60 time steps in each
simulation
(time step == a node)
(edge == transition)
Merged time steps if
two states are the
same
ATG
Wrap-up
22/25
Intro
VA
Apps
Dist Func
Aggregate Temporal Graph
ATG
Wrap-up
23/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Gateways and Terminals
Each of the yellow vertices is a Gateway
to the vertex set of {A}. That is, every
maximal path leaving a yellow vertex
eventually passes through A.
Vertex G is a Gateway to each of the
yellow vertices, or Terminals. That is,
every maximal path leaving G passes
eventually through each of the yellow
vertices.
24/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Applications of Aggregate Temporal Graphs
• A generalizable representation of problems involving
parameter spaces that are too large to explore as a
whole, but which are composed of related individual
parts can be examined independently
• Collaborative Analysis
– Each analyst’s trail is a simulation
– Each configuration state is a node
• Web Analytics
– Each visit is a simulation
– Each configuration of a page is a node
25/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Conclusion
• Visual Analytics is a growing new
area that is looking to address
some pressing needs
Analytical
Reasoning
and
Interaction
Data
Representat
ion
Transformat
ion
Production,
Presentatio
n
Disseminati
on
Visual
Represent
ation
Validation
and
Evaluation
– Too much (messy) data, too little
time
• By combining strengths and
findings in existing disciplines, we
have demonstrated that
– There are some great benefits
– But there are also some difficult
challenges
26/25
Intro
VA
Apps
Dist Func
Questions?
Thank you!
ATG
Wrap-up
27/25
Intro
VA
Backup Slides
Apps
Dist Func
ATG
Wrap-up
Intro
28/25
VA
Apps
Dist Func
ATG
Wrap-up
(2) Investigative GTD
Who
Where
What
Evidence
Box
Original
Data
R. Chang et al., Investigative Visual Analysis of Global Terrorism, Journal of Computer Graphics Forum (Eurovis), 2008.
When
29/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
(2) Investigative GTD:
Revealing Global Strategy
This group’s attacks
are not bounded by
geo-locations but
instead, religious
beliefs.
Its attack patterns
changed with its
developments.
30/25
Intro
VA
Apps
Dist Func
ATG
(2) Investigative GTD:
Discovering Unexpected Temporal Pattern
A geographicallybounded entity in the
Philippines.
The ThemeRiver shows
its rise and fall as an
entity and its modus
operandi.
Domestic Group
Wrap-up
31/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
What is in a User’s Interactions?
Keyboard, Mouse, etc
Input
Visualization
Human
Output
Images (monitor)
• Types of Human-Visualization Interactions
– Word editing (input heavy, little output)
– Browsing, watching a movie (output heavy, little input)
– Visual Analysis (closer to 50-50)
32/25
Intro
VA
Apps
Dist Func
ATG
Wrap-up
Discussion
• What interactivity is not good for:
– Presentation
– YMMV = “your mileage may vary”
• Reproducibility: Users behave differently each time.
• Evaluation is difficult due to opportunistic discoveries..
– Often sacrifices accuracy
• iPCA – SVD takes time on large datasets, use iterative
approximation algorithms such as onlineSVD.
• WireVis – Clustering of large datasets is slow. Either
pre-compute or use more trivial “binning” methods.
33/25
Intro
VA
Apps
Dist Func
Discussion
• Interestingly,
– It doesn’t save you time…
– And it doesn’t make a user more
accurate in performing a task.
• However, there are empirical
evidence that using interactivity:
– Users are more engaged (don’t
give up)
– Users prefer these systems over
static (query-based) systems
– Users have a faster learning curve
• We need better measurements
to determine the “benefits of
interactivity”
ATG
Wrap-up