Information Visualization for Knowledge Discovery: Big Insights from Big Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for.

Download Report

Transcript Information Visualization for Knowledge Discovery: Big Insights from Big Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for.

Information Visualization for
Knowledge Discovery:
Big Insights from Big Data
Ben Shneiderman
[email protected]
Founding Director (1983-2000), Human-Computer Interaction Lab
Professor, Department of Computer Science
Member, Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
Real Users, Real Problems, Real Data
Ben Shneiderman
[email protected]
Founding Director (1983-2000), Human-Computer Interaction Lab
Professor, Department of Computer Science
Member, Institute for Advanced Computer Studies
University of Maryland
College Park, MD 20742
Interdisciplinary research community
- Computer Science & Info Studies
- Psych, Socio, Educ, Jour & MITH
www.cs.umd.edu/hcil
vimeo.com/72440805
Design Issues
•
•
•
•
•
Input devices & strategies
• Keyboards, pointing devices, voice
• Direct manipulation
• Menus, forms, commands
Output devices & formats
• Screens, windows, color, sound
• Text, tables, graphics
• Instructions, messages, help
Collaboration & Social Media
Help, tutorials, training
• Visualization
Search
www.awl.com/DTUI
Fifth Edition: 2010
HCI Pride: Serving 5B Users
Mobile, desktop, web, cloud
 Diverse users: novice/expert, young/old, literate/illiterate,
abled/disabled, cultural, ethnic & linguistic diversity, gender,
personality, skills, motivation, ...
 Diverse applications: E-commerce, law, health/wellness,
education, creative arts, community relationships, politics,
IT4ID, policy negotiation, mediation, peace studies, ...
 Diverse interfaces: Ubiquitous, pervasive, embedded, tangible,
invisible, multimodal, immersive/augmented/virtual, ambient,
social, affective, empathic, persuasive, ...
Obama Unveils “Big Data” Initiative (3/2012)
Big Data challenges:
• Developing scalable algorithms
for processing imperfect data
in distributed data stores
•
Creating effective humancomputer interaction tools
for facilitating rapidly
customizable visual reasoning
for diverse missions.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf `
Integrating Statistics & Visualization
“…exaggerated reports appear of the secrets that can be
uncovered by setting learning algorithms loose on
oceans of data”
-- Witten & Frank (2000, 2005, 2011)
“…attempting to bring statistical principles to bear on
massive data…may yield results that are not useful,
at best, or harmful at worst.”
--Frontiers in Massive Data Analysis (2013)
Integrating Statistics & Visualization
“As yet I know of no person or group that is taking nearly
adequate, advantage of the graphical potentialities of
the computer... In exploration they are going to be the
data analyst's greatest single resource.”
-- John Tukey (1965)
“visualization and (interactive) exploration of complex
and vast data constitute a crucial component of an
analytics infrastructure.”
--Frontiers in Massive Data Analysis (2013)
Information Visualization & Visual Analytics
•
Visual bands
• Human percle
• Trend, clus..
• Color, size,..
•
Three challe
• Meaningful vi
• Interaction: w
• Process mo
1999
Information Visualization & Visual Analytics
•
Visual bandwidth is enormous
• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
•
Three challenges
• Meaningful visual displays of massive da
• Interaction: widgets & window coordinati
• Process models for discovery
1999
2004
Information Visualization & Visual Analytics
•
Visual bandwidth is enormous
• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
•
Three challenges
• Meaningful visual displays of massive data
• Interaction: widgets & window coordination
• Process models for discovery
1999
2004
2010
Spotfire: Retinol’s role in embryos & vision
Spotfire: DC natality data
http://registration.spotfire.com/eval/default_edu.asp
10M - 100M pixels: Large displays
16M pixels: My New Workstation
100M-pixels & more
1M-pixels & less
Small mobile devices
Information Visualization: Mantra
•
•
•
•
•
•
•
•
•
•
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
Overview, zoom & filter, details-on-demand
SciViz .
•
•
•
1-D Linear
2-D Map
3-D World
Document Lens, SeeSoft, Info Mural
•
•
•
•
•
Multi-Var
Temporal
Tree
Network
Text
Spotfire, Tableau, Qliktech, Visual Insight
InfoViz
Information Visualization: Data Types
flowingdata.com
visual.ly
infosthetics.com
GIS, ArcView, PageMaker, Medical imagery
CAD, Medical, Molecules, Architecture
EventFlow, TimeSearcher, Palantir, DataMontage
Cone/Cam/Hyperbolic, SpaceTree, Treemap
Pajek, UCINet, NodeXL, Gephi, Tom Sawyer
TagClouds, Wordle, ManyEyes, Ngram Viewer
visualcomplexity.com
perceptualedge.com
visualizing.org
eagereyes.org
datakind.org
infovis.org
Temporal Data: TimeSearcher 1.3
•
•
•
Time series
• Stocks
• Weather
• Genes
User-specified
patterns
Rapid search
Temporal Data: TimeSearcher 2.0
•
•
•
Long Time series (>10,000 time points)
Multiple variables
Controlled precision in match
(Linear, offset, noise, amplitude)
LifeLines: Patient Histories
www.cs.umd.edu/hcil/lifelines
LifeLines2: Align-Rank-Filter & Summarize
www.cs.umd.edu/hcil/lifelines2
EventFlow: Temporal Events
Airway
Breathing
Circulation
Disability
http://youtu.be/dz86_nSXt-M
Secondary Survey
Children’s Hospital Trauma Care: EventFlow
Children’s Hospital Trauma Care: EventFlow
Children’s Hospital Trauma Care: EventFlow
Treemap: Gene Ontology
+ Space filling
+ Space limited
+ Color coding
+ Size coding
- Requires learning
(Shneiderman, ACM Trans. on Graphics, 1992 & 2003)
www.cs.umd.edu/hcil/treemap/
Treemap: Smartmoney MarketMap
www.smartmoney.com/marketmap
Market falls steeply Feb 27, 2007, with one exception
Market mixed, February 8, 2008
Energy & Technology up, Financial & Health Care down
Market rises, September 1, 2010, Gold contrarians
Treemap: Newsmap (Marcos Weskamp)
newsmap.jp
Treemap: Nutritional Analysis
www.hivegroup.com
Treemap: Spotfire Bond Portfolio Analysis
www.spotfire.com
Treemap: NY Times – Car&Truck Sales
www.cs.umd.edu/hcil/treemap/
Treemap (Voronoi): NY Times - Inflation
www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html
VisualComplexity.com : Manuel Lima
SocialAction
•
•
•
Integrates statistics
& visualization
4 case studies, 4-8 weeks
(journalist, bibliometrician, terrorist analyst,
organizational analyst)
Identified desired features, gave strong positive
feedback about benefits of integration
www.cs.umd.edu/hcil/socialaction
Perer & Shneiderman, CHI2008, IEEE CG&A 2009
NodeXL:
Network Overview for Discovery & Exploration in Excel
www.codeplex.com/nodexl
NodeXL:
Network Overview for Discovery & Exploration in Excel
www.codeplex.com/nodexl
NodeXL: Import Dialogs
www.codeplex.com/nodexl
Tweets at #WIN09 Conference: 2 groups
Twitter discussion of #GOP
Red: Republicans, anti-Obama,
mention Fox
Blue: Democrats, pro-Obama,
mention CNN
Green: non-affiliated
Node size is number of followers
Politico is major bridging group
Twitter networks: #SOTU
Group-In-A-Box: Twitter Network for #CI2012
Twitter Network for “TTW”
Pennsylvania Innovation Network
No Location
Philadelphia
Patent
Tech
Navy
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical
Pittsburgh Metro
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro
13-15: Semi-rural/rural
17: Foreign countries
Westinghouse Electric
19: Other states
Innovation Patterns: 11,000 vertices, 26,000 edges
No Location
Philadelphia
Innovation Clusters: People, Locations, Companies
Patent
Tech
Navy
SBIR (federal)
PA DCED (state)
Related patent
2: Federal agency
Pharmaceutical/Medical
Pittsburgh Metro
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
Westinghouse Electric
19: Other states
6 Twitter Network Structures
[Divided]
Polarized Crowds
Tight Crowd
[Unified]
[Fragmented]
Brand Clusters
Community Clusters
[Clustered]
[In-Hub & Spoke]
Broadcast Network
Support Network
[Out-Hub & Spoke]
www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters
Analyzing Social Media Networks with NodeXL
I. Getting Started with Analyzing Social Media Networks
1. Introduction to Social Media and Social Networks
2. Social media: New Technologies of Collaboration
3. Social Network Analysis
II. NodeXL Tutorial: Learning by Doing
4. Layout, Visual Design & Labeling
5. Calculating & Visualizing Network Metrics
6. Preparing Data & Filtering
7. Clustering &Grouping
III Social Media Network Analysis Case Studies
8. Email
9. Threaded Networks
10. Twitter
11. Facebook
12. WWW
13. Flickr
14. YouTube
15. Wiki Networks
www.elsevier.com/wps/find/bookdescription.cws_home/723354/description
Social Media Research Foundation
Researchers who want to
- create open tools
- generate & host open data
- support open scholarship
Map, measure & understand
social media
Support tool projects to
collection, analyze & visualize
social media data.
smrfoundation.org
Preparation
8 Golden Rules of Data Science
•
•
Choose actionable problems & compelling theories
Open your mind: domain experts & statisticians
Exploration Preparation
8 Golden Rules of Data Science
•
•
•
•
•
Choose actionable problems & compelling theories
Open your mind: domain experts & statisticians
If you don’t have questions, you’re not ready.
Clean, clean, clean,…your data (gently on the screen)
Know thy data: ranges, patterns, clusters,
gaps, outliers, missing values, uncertainty
Decision
Exploration Preparation
8 Golden Rules of Data Science
•
•
•
•
•
•
•
•
Choose actionable problems & compelling theories
Open your mind: domain experts & statisticians
If you don’t have questions, you’re not ready.
Clean, clean, clean,…your data (gently on the screen)
Know thy data: ranges, patterns, clusters,
gaps, outliers, missing values, uncertainty
Evaluate your efficacy, refine your theory
Take responsibility, own your failures
World is complex, proceed with humility
Campus Visualization Partnership
Goal: Make UMd a national role model
for use of visualization
across the curriculum
for research & teaching
•
•
•
•
Lecture Series (Livestreamed & Archived)
• Spring: Wattenberg, Ericson, Nowell, etc.
• Fall: Nick & Niklas, Downs, Gaither, Schwabish, etc.
Equipment Grants & Software Licenses
Hire & Win Over
Courses
viz.umd.edu
UN Millennium Development Goals
To be achieved by 2015
• Eradicate extreme poverty and hunger
• Achieve universal primary education
• Promote gender equality and empower women
• Reduce child mortality
• Improve maternal health
• Combat HIV/AIDS, malaria and other diseases
• Ensure environmental sustainability
• Develop a global partnership for development
www.cs.umd.edu/hcil
@benbendc
For More Information
•
Visit the HCIL website for 700+ papers & info on videos
www.cs.umd.edu/hcil
•
•
See Chapter 14 on Info Visualization
Shneiderman, B. and Plaisant, C., Designing the User Interface:
Strategies for Effective Human-Computer Interaction:
Fifth Edition (2010) www.awl.com/DTUI
Edited Collections:
Card, S., Mackinlay, J., and Shneiderman, B. (1999)
Readings in Information Visualization: Using Vision to Think
Bederson, B. and Shneiderman, B. (2003)
The Craft of Information Visualization: Readings and Reflections
For More Information
•
•
•
•
•
Treemaps
• Hive Group:
• Marketwatch:
• HCIL Treemap 4.0:
www.hivegroup.com
www.marketwatch.com/marketmap
www.cs.umd.edu/hcil/treemap
Spotfire:
www.spotfire.com
TimeSearcher: www.cs.umd.edu/hcil/timesearcher
NodeXL:
nodexl.codeplex.com
Hierarchical Clustering Explorer:
www.cs.umd.edu/hcil/hce
•
•
LifeLines2:
EventFlow:
www.cs.umd.edu/hcil/lifelines2
www.cs.umd.edu/hcil/eventflow
Black Swan, Nassim Taleb (2007)
•
•
A highly improbable event:
• Unpredictable
• Massive impact
• Retrospective narrative fabrication
The world is far, far more complicated than we think,
which is not a problem,
except when most of us don’t know it.
The Signal & The Noise: Nate Silver (2012)
•
•
New ideas are sometimes found in the most granular
details of a problem where few others bother to look.
Technology is a labor-saving device, but we should not
expect machines to do our thinking for us.
•
•
•
•
Financial, sports, politics
Weather, earthquakes, economics, infectious diseases
Chess, poker
Global warming, terrorism, bubbles in financial markets
Choose Actionable Problems
•
•
•
Problems that matter (personal vs global)
Adequate time for Intervention (short vs. long)
Reliable predictions (lo vs. hi)
Choose Appropriate Theories
Theories
• Descriptive
• Explanatory
• Prescriptive
• Predictive
• Generative
Sources
• Social
• Political
• Economic
• Chemical
• Physical
• Biological
• Cognitive
• Perceptual