Information Visualization for Knowledge Discovery: Big Insights from Big Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for.
Download ReportTranscript Information Visualization for Knowledge Discovery: Big Insights from Big Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for.
Information Visualization for Knowledge Discovery: Big Insights from Big Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for Advanced Computer Studies University of Maryland College Park, MD 20742 Real Users, Real Problems, Real Data Ben Shneiderman [email protected] Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for Advanced Computer Studies University of Maryland College Park, MD 20742 Interdisciplinary research community - Computer Science & Info Studies - Psych, Socio, Educ, Jour & MITH www.cs.umd.edu/hcil vimeo.com/72440805 Design Issues • • • • • Input devices & strategies • Keyboards, pointing devices, voice • Direct manipulation • Menus, forms, commands Output devices & formats • Screens, windows, color, sound • Text, tables, graphics • Instructions, messages, help Collaboration & Social Media Help, tutorials, training • Visualization Search www.awl.com/DTUI Fifth Edition: 2010 HCI Pride: Serving 5B Users Mobile, desktop, web, cloud Diverse users: novice/expert, young/old, literate/illiterate, abled/disabled, cultural, ethnic & linguistic diversity, gender, personality, skills, motivation, ... Diverse applications: E-commerce, law, health/wellness, education, creative arts, community relationships, politics, IT4ID, policy negotiation, mediation, peace studies, ... Diverse interfaces: Ubiquitous, pervasive, embedded, tangible, invisible, multimodal, immersive/augmented/virtual, ambient, social, affective, empathic, persuasive, ... Obama Unveils “Big Data” Initiative (3/2012) Big Data challenges: • Developing scalable algorithms for processing imperfect data in distributed data stores • Creating effective humancomputer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions. http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf ` Integrating Statistics & Visualization “…exaggerated reports appear of the secrets that can be uncovered by setting learning algorithms loose on oceans of data” -- Witten & Frank (2000, 2005, 2011) “…attempting to bring statistical principles to bear on massive data…may yield results that are not useful, at best, or harmful at worst.” --Frontiers in Massive Data Analysis (2013) Integrating Statistics & Visualization “As yet I know of no person or group that is taking nearly adequate, advantage of the graphical potentialities of the computer... In exploration they are going to be the data analyst's greatest single resource.” -- John Tukey (1965) “visualization and (interactive) exploration of complex and vast data constitute a crucial component of an analytics infrastructure.” --Frontiers in Massive Data Analysis (2013) Information Visualization & Visual Analytics • Visual bands • Human percle • Trend, clus.. • Color, size,.. • Three challe • Meaningful vi • Interaction: w • Process mo 1999 Information Visualization & Visual Analytics • Visual bandwidth is enormous • Human perceptual skills are remarkable • Trend, cluster, gap, outlier... • Color, size, shape, proximity... • Three challenges • Meaningful visual displays of massive da • Interaction: widgets & window coordinati • Process models for discovery 1999 2004 Information Visualization & Visual Analytics • Visual bandwidth is enormous • Human perceptual skills are remarkable • Trend, cluster, gap, outlier... • Color, size, shape, proximity... • Three challenges • Meaningful visual displays of massive data • Interaction: widgets & window coordination • Process models for discovery 1999 2004 2010 Spotfire: Retinol’s role in embryos & vision Spotfire: DC natality data http://registration.spotfire.com/eval/default_edu.asp 10M - 100M pixels: Large displays 16M pixels: My New Workstation 100M-pixels & more 1M-pixels & less Small mobile devices Information Visualization: Mantra • • • • • • • • • • Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand Overview, zoom & filter, details-on-demand SciViz . • • • 1-D Linear 2-D Map 3-D World Document Lens, SeeSoft, Info Mural • • • • • Multi-Var Temporal Tree Network Text Spotfire, Tableau, Qliktech, Visual Insight InfoViz Information Visualization: Data Types flowingdata.com visual.ly infosthetics.com GIS, ArcView, PageMaker, Medical imagery CAD, Medical, Molecules, Architecture EventFlow, TimeSearcher, Palantir, DataMontage Cone/Cam/Hyperbolic, SpaceTree, Treemap Pajek, UCINet, NodeXL, Gephi, Tom Sawyer TagClouds, Wordle, ManyEyes, Ngram Viewer visualcomplexity.com perceptualedge.com visualizing.org eagereyes.org datakind.org infovis.org Temporal Data: TimeSearcher 1.3 • • • Time series • Stocks • Weather • Genes User-specified patterns Rapid search Temporal Data: TimeSearcher 2.0 • • • Long Time series (>10,000 time points) Multiple variables Controlled precision in match (Linear, offset, noise, amplitude) LifeLines: Patient Histories www.cs.umd.edu/hcil/lifelines LifeLines2: Align-Rank-Filter & Summarize www.cs.umd.edu/hcil/lifelines2 EventFlow: Temporal Events Airway Breathing Circulation Disability http://youtu.be/dz86_nSXt-M Secondary Survey Children’s Hospital Trauma Care: EventFlow Children’s Hospital Trauma Care: EventFlow Children’s Hospital Trauma Care: EventFlow Treemap: Gene Ontology + Space filling + Space limited + Color coding + Size coding - Requires learning (Shneiderman, ACM Trans. on Graphics, 1992 & 2003) www.cs.umd.edu/hcil/treemap/ Treemap: Smartmoney MarketMap www.smartmoney.com/marketmap Market falls steeply Feb 27, 2007, with one exception Market mixed, February 8, 2008 Energy & Technology up, Financial & Health Care down Market rises, September 1, 2010, Gold contrarians Treemap: Newsmap (Marcos Weskamp) newsmap.jp Treemap: Nutritional Analysis www.hivegroup.com Treemap: Spotfire Bond Portfolio Analysis www.spotfire.com Treemap: NY Times – Car&Truck Sales www.cs.umd.edu/hcil/treemap/ Treemap (Voronoi): NY Times - Inflation www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html VisualComplexity.com : Manuel Lima SocialAction • • • Integrates statistics & visualization 4 case studies, 4-8 weeks (journalist, bibliometrician, terrorist analyst, organizational analyst) Identified desired features, gave strong positive feedback about benefits of integration www.cs.umd.edu/hcil/socialaction Perer & Shneiderman, CHI2008, IEEE CG&A 2009 NodeXL: Network Overview for Discovery & Exploration in Excel www.codeplex.com/nodexl NodeXL: Network Overview for Discovery & Exploration in Excel www.codeplex.com/nodexl NodeXL: Import Dialogs www.codeplex.com/nodexl Tweets at #WIN09 Conference: 2 groups Twitter discussion of #GOP Red: Republicans, anti-Obama, mention Fox Blue: Democrats, pro-Obama, mention CNN Green: non-affiliated Node size is number of followers Politico is major bridging group Twitter networks: #SOTU Group-In-A-Box: Twitter Network for #CI2012 Twitter Network for “TTW” Pennsylvania Innovation Network No Location Philadelphia Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical Pittsburgh Metro 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro 13-15: Semi-rural/rural 17: Foreign countries Westinghouse Electric 19: Other states Innovation Patterns: 11,000 vertices, 26,000 edges No Location Philadelphia Innovation Clusters: People, Locations, Companies Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agency Pharmaceutical/Medical Pittsburgh Metro 3: Enterprise 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries Westinghouse Electric 19: Other states 6 Twitter Network Structures [Divided] Polarized Crowds Tight Crowd [Unified] [Fragmented] Brand Clusters Community Clusters [Clustered] [In-Hub & Spoke] Broadcast Network Support Network [Out-Hub & Spoke] www.pewinternet.org/2014/02/20/mapping-twitter-topic-networks-from-polarized-crowds-to-community-clusters Analyzing Social Media Networks with NodeXL I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network Analysis II. NodeXL Tutorial: Learning by Doing 4. Layout, Visual Design & Labeling 5. Calculating & Visualizing Network Metrics 6. Preparing Data & Filtering 7. Clustering &Grouping III Social Media Network Analysis Case Studies 8. Email 9. Threaded Networks 10. Twitter 11. Facebook 12. WWW 13. Flickr 14. YouTube 15. Wiki Networks www.elsevier.com/wps/find/bookdescription.cws_home/723354/description Social Media Research Foundation Researchers who want to - create open tools - generate & host open data - support open scholarship Map, measure & understand social media Support tool projects to collection, analyze & visualize social media data. smrfoundation.org Preparation 8 Golden Rules of Data Science • • Choose actionable problems & compelling theories Open your mind: domain experts & statisticians Exploration Preparation 8 Golden Rules of Data Science • • • • • Choose actionable problems & compelling theories Open your mind: domain experts & statisticians If you don’t have questions, you’re not ready. Clean, clean, clean,…your data (gently on the screen) Know thy data: ranges, patterns, clusters, gaps, outliers, missing values, uncertainty Decision Exploration Preparation 8 Golden Rules of Data Science • • • • • • • • Choose actionable problems & compelling theories Open your mind: domain experts & statisticians If you don’t have questions, you’re not ready. Clean, clean, clean,…your data (gently on the screen) Know thy data: ranges, patterns, clusters, gaps, outliers, missing values, uncertainty Evaluate your efficacy, refine your theory Take responsibility, own your failures World is complex, proceed with humility Campus Visualization Partnership Goal: Make UMd a national role model for use of visualization across the curriculum for research & teaching • • • • Lecture Series (Livestreamed & Archived) • Spring: Wattenberg, Ericson, Nowell, etc. • Fall: Nick & Niklas, Downs, Gaither, Schwabish, etc. Equipment Grants & Software Licenses Hire & Win Over Courses viz.umd.edu UN Millennium Development Goals To be achieved by 2015 • Eradicate extreme poverty and hunger • Achieve universal primary education • Promote gender equality and empower women • Reduce child mortality • Improve maternal health • Combat HIV/AIDS, malaria and other diseases • Ensure environmental sustainability • Develop a global partnership for development www.cs.umd.edu/hcil @benbendc For More Information • Visit the HCIL website for 700+ papers & info on videos www.cs.umd.edu/hcil • • See Chapter 14 on Info Visualization Shneiderman, B. and Plaisant, C., Designing the User Interface: Strategies for Effective Human-Computer Interaction: Fifth Edition (2010) www.awl.com/DTUI Edited Collections: Card, S., Mackinlay, J., and Shneiderman, B. (1999) Readings in Information Visualization: Using Vision to Think Bederson, B. and Shneiderman, B. (2003) The Craft of Information Visualization: Readings and Reflections For More Information • • • • • Treemaps • Hive Group: • Marketwatch: • HCIL Treemap 4.0: www.hivegroup.com www.marketwatch.com/marketmap www.cs.umd.edu/hcil/treemap Spotfire: www.spotfire.com TimeSearcher: www.cs.umd.edu/hcil/timesearcher NodeXL: nodexl.codeplex.com Hierarchical Clustering Explorer: www.cs.umd.edu/hcil/hce • • LifeLines2: EventFlow: www.cs.umd.edu/hcil/lifelines2 www.cs.umd.edu/hcil/eventflow Black Swan, Nassim Taleb (2007) • • A highly improbable event: • Unpredictable • Massive impact • Retrospective narrative fabrication The world is far, far more complicated than we think, which is not a problem, except when most of us don’t know it. The Signal & The Noise: Nate Silver (2012) • • New ideas are sometimes found in the most granular details of a problem where few others bother to look. Technology is a labor-saving device, but we should not expect machines to do our thinking for us. • • • • Financial, sports, politics Weather, earthquakes, economics, infectious diseases Chess, poker Global warming, terrorism, bubbles in financial markets Choose Actionable Problems • • • Problems that matter (personal vs global) Adequate time for Intervention (short vs. long) Reliable predictions (lo vs. hi) Choose Appropriate Theories Theories • Descriptive • Explanatory • Prescriptive • Predictive • Generative Sources • Social • Political • Economic • Chemical • Physical • Biological • Cognitive • Perceptual