Transcript Slide 1
Visual Thinking & Thinking about Visualization William Ribarsky Charlotte Visualization Center SouthEast RVAC Visual Thinking Visual Reasoning Foraging, Analysis, Reasoning, and DecisionMaking for Large Data and Complex Problems • Objective – Develop capabilities for collecting evidence from large and multiple data sources, with multiple analysis tools. Build hypotheses and use to steer data collection. Methods must be automated but subject to user control. Integrate all for presentation or decision. • DHS Mission Impact – New means of support for intelligence analysts, disaster prevention planners, and emergency responders. Sensemaking Loop STAB RESIN Foraging/Analysis Loop RESIN: Foraging, Analysis, and Reasoning Problem – How to reason STAB towards a decision with massive data, several visual analytics tools, and limited time and other resources Solution – 1. Build an end-to-end process for reasoning towards a decision with limited time and resources 2. Provide a mixed initiative capability so that computer and user can work together, but always under user control. 3. Provide a capability for reasoning about complex problems with several aspects. RESIN A runtime view of RESIN’s control panel while solving a task with a tight deadline. The image browser tool is chosen to analyze the data. On the bottom left is a hierarchical description of the problem solving process which also captures the realtime execution information of various sub-tasks. On the top right is a partial view of the Markov Decision Process used to compute the decision policy for the task instance. Multimedia: Automated Video Content Analysis Multimedia: Automated Video Semantic Analysis • News Interestingness Prediction PS , G j News Story Collection wL G PS j , Llog j Predictor PS j , L PS j , G Interestingness User Preference Usage History Set of news stories PS , L j Multimedia: Video Semantic Analysis • News Theme Network Visualization Multimedia: News Broadcast Analysis Problem – How should we handle the stream of thousands of stories and themes from many sources over time? “Ultimately, Solution –you gotta read (view) the Stasko and 1. stories” Develop–John LensRiver EventRiver capabilities. 2. Develop highly interactive ways to explore themes and sub-themes, their interlinkages, and stories over time. LensRiver hierarchical display Multimedia: News Broadcast Analysis Deep Exploration & Reasoning Capabilities • Hierarchical exploration • Filter by theme (also, broaden/narrow) • Shoebox • Search by Example EventRiver Multimedia: News Broadcast Analysis Emerging events Comparing themes and sub-themes for different channels Karr plot Jonbenet Ramsey Emerging Themes CNN (above) and Fox top 30 themes from 8/1 to 8/24/2006 With These Tools, There Is Much That Can Be Done (Some of Which Is Underway Already) • All the news. High value content from official and “semiofficial” news sources at all levels. • Identification and tracking of events and themes. • High quality knowledge structures over time. • Analysis of different viewpoints, different opinions based on origin of story, what is being talked about, who is talking, etc. – Local vs. national – Different broadcast styles (e.g., Fox vs. CNN vs. Al Jazeera) • News at one level (e.g., local or for a foreign region) that is not being reported nationally or in other regions. Integrating Terrorism Data Analysis and News Analysis • News analysis is the foundation of systematic databases such as the Global Terrorism Database and the Minorities at Risk Database. • News is a source for most investigative analyses (e.g., fraud and money laundering analysis). • Compiling the systematic databases is very labor intensive requiring experienced (i.e., expensive) investigators. • Other investigations are also laborious Integrating Terrorism Data Analysis and News Analysis • News stories are as much viewpoint and opinion as news. • Can thus get different angles from local, regional, or different national news sources. • Automated news analysis provides a complete record of everything that’s going on over any period of time. • News stories have strong relationships. • News can follow the flow and change of a story over time. • News is immediate, but it is also rough and incomplete Integrating Terrorism Data Analysis and News Analysis Terrorism Visual Analysis Terrorism Databases Terrorism VA STAB/ RESIN Reasoning Environment Jigsaw NVAC Framing, Broadcast Affective Analysis News VA Visual Analysis News Story Databases Next: full, Web-based multimedia content and the Dark Web Visual Analysis of Terrorism Data: Supporting The Investigative Process Where When Who What Example: selections on the GTD spatio-temporal interface that support investigative analysis. User would be able to follow over time and space. The user-driven investigation addresses the issues of why. Five Flexible Entry Components Terrorism Data Analysis • Combine continuous and categorical data • Curved ribbons for better readability of the data • Layering of ribbons • First results: - Number of terrorists killed depends strongly on type of entity attacked: - large number killed when attacking police/military - few terrorists killed in most other cases, like businesses, transportation, etc. Terrorism Data Analysis Number of female terrorists depends on the region: -Female terrorists in Latin America and Europe -hardly any female terrorists in Asia, Middle East, and throughout Africa Future plans for curved/forked ribbons: •Full interaction with these ribbons: reordering, highlighting •Histograms on numerical axes •Filtering by categorical or numerical axes (including time) Applying Visual Analytics to Financial Transactions Relevant Properties of Visual Analytics •Positioned for exploration and discovery. -Highly interactive, contextual views, “unstructured” exploration •Meant for large and/or complex data, with uncertainty, with missing data (but we may not know where the holes are), with data that are constructed to be purposely misleading. •Support of analytical reasoning, argument building, evidence gathering and marshaling. •Support of argument presentation and reporting (smart reporting). Application: Financial Fraud Analysis All transaction activity Identify Interactive Visualization Google Prioritize Investigate Report WireVis: Challenges to Financial Fraud Detection • Bad guys are smart – Automatic detection (black box) approach is reactive to already known patterns – Usually, bad guys are one step ahead • Evaluation is difficult – Difficult to obtain “Ground Truth” – Financial Institutions do not perform law enforcement • Suspicious reports are filed • Turn around time on accuracy of reports could be long • What is the percentage of fraudulent activities that are actually found and reported? WireVis: Challenges with Wire Fraud Detection • Size – More than 200,000 transactions per day • “No transaction by itself is suspicious” – “It’s like searching for a needle in a stack of needles” –Bill Fox • Lack of International Wire Standard – Loosely structured data with inherent ambiguity London Charlotte, NC Singapore Indonesia WireVis: Challenges with Wire Fraud Detection London Charlotte, NC Singapore Indonesia • No Standard Form… – When a wire leaves Bank of America in Charlotte… – The recipient can appear as if receiving at London, Indonesia or Singapore • Vice versa, if receiving from Indonesia to Charlotte – The sender can appear as if originating from London, Singapore, or Indonesia WireVis: Using Keywords • Keywords… – Words that are used to filter all transactions • Only transactions containing keywords are flagged – Highly secretive – Typically include • • • • Geographical information (country, city names) Business types Specific goods and services Etc – Updated based on intelligence reports – Ranges from 200-350 words – Could reduce the number of transactions by up to 90% – Most importantly, give quantifiable meanings (labels) to each transaction, and are repositories of expert knowledge. WireVis: Current Practice at Bank of America • Database Querying – Experts filter the transactions by keywords, amounts, date, etc. – Results are displayed in a spreadsheet. • Problems – Cannot see more than a week or two of transactions • Difficult to see temporal patterns – It is difficult to be exploratory using a querying system WireVis: System Overview Heatmap View (Accounts to Keywords Relationship) Search by Example (Find Similar Accounts) Keyword Network (Keyword Relationships) Strings and Beads (Relationships over Time) WireVis: Heatmap View List of Keywords Sorted by frequency from high to low (left to right) Hierarchical Clusters of Accounts Sorted by activities from big companies to individuals (top to bottom) Fast “binning” that takes O(3n) Number of occurrences of keywords Light color indicates few occurrences WireVis: Strings and Beads Each string corresponds to a cluster of accounts in the Heatmap view Each bead represents a day Y-axis can be amounts, number of transactions, etc. Fixed or logarithmic scale Time WireVis: Keyword Network • Each dot is a keyword • Position of the keyword is based on their relationships – Keywords close to each other appear together more frequently – Using a spring network, keywords in the center are the most frequently occurring keyword • Link between keywords denote co-occurrence WireVis: Search by Example Target Account Histogram depicts the occurrences of keywords User interactive selects features within the histogram used in comparison Accounts that are within the similarity threshold appear ranked (most similar on top) Similarity threshold slider WireVis: Case Study • Evaluation performed with James Price, lead analyst of WireWatch of Bank of America • Dataset has been sanitized and down sampled • Video • This system is generalizable to visual analysis of transactional data WireVis: Integrated with Full Transaction Database • Scalability – We’re now connected to the database at Bank of America with 10-20 millions of records over the course of a rolling year (13 months) – Connecting to a database makes interactive visualization tricky • Unexpected Results (Access through the VA interface!) – “go to where the data is” – operations relating to the data are pushed onto the database (e.g, clustering). Database SQL JDBC Stored Procedure Temp Tables WireVis Client Raw Data WireVis: Integrated with Full Transaction Database • Performance Measurements – Data-driven operations such as re-clustering, drilldown, transaction search by keywords require worst case of 1-2 minutes. – All other interactions remain real time • No pre-computation / caching • Single CPU desktop computer • WireVis is in deployment on James Price’s computer at WireWatch for testing and evaluation This is a general approach applicable to all types of data. WireVis: Future Work • Use text analysis (like IN-SPIRE) to automatically identify keywords and associated important terms. • Relationships between Accounts – Seeing who send money to whom (over time) is important • Evaluation – Working with analysts, try to understand how they use the system and how to better their workflow • Tracking and Reporting – With tracking, we can make the analysis results “repeatable”, “sharable”, and “accountable” Financial Visual Analytics Workshop •Met in Charlotte on December 3, 4 2007 •Participants from federal agencies (DHS, CIA, FinCEN, Treasury, DEA), NVAC, Banks, National Insurance Crime Bureau, and including several key university researchers. •Report and recommendations coming out and to be disseminated within the month. Visual Reasoning (Knowledge Visualization) + Interaction Theory Can we identify (conjecture) some (design) principles even without a full theory? Just thinking about visualization tasks in this way can pay off. Some Ideas That Could Lead to Principles •“The interaction is the analysis.” --Remco Chang •Keep interaction simple and direct. •For more complex problems, have multiple views (more pixels). -Each one optimized for its purpose & integrated with the others. -Balanced interaction among views. -There is a trade-off. How many views? •Each interactive visualization should have the highest value for that moment in the reasoning process. Knowledge Visualization More Ideas •Determine the highest value (how?) -Task-dependent •But are there valuable visual artifacts that are general, or that would be useful for a whole set of tasks? Or are there general tasks? -General Task: Exploration and Discovery •Alternatively, are there ways to set up high value visualizations where the artifacts that populate them are task-dependent but the way to set them up are general (e.g., spatio-temporal layouts)? •Can we build models, even rather crude heuristic models, with predictive capability? Determining the Value: Knowledge Visualization Data Visualization Information Visualization Knowledge Visualization Properties of Knowledge •Knowledge is of higher value than information or data. •Knowledge begets knowledge. •Knowledge is compact. •Knowledge is connected (more connections, more value). •Labeling is important (also, captions, titles, text annotations). •Knowledge artifacts are the elements of reasoning. •Knowledge can be made independent of user and context (including domain). What is Knowledge? Knowledge is the “perception of agreement or disagreement of two ideas.” -- John Locke (1689) Ideas: The content of cognition; specific thoughts. •To distinguish between ideas, one needs an inferential framework. •The basic element in such a framework are two concepts (or ideas) and a connecting inference. United States Thus knowledge is built of ideas and their inferential relations. •In an ontology, the basic element is two objects or concepts and their linking (inferential) relation. Montana Washington The Value of Visualization Visualization Model D Im D K V P dK/dt K D D dS/dt S data D: data S: specifications visualization E user V: visualization P: knowledge process Im: resultant image E: interactive exploration -van Wijk, 2005 The Value of Visualization Knowledge Im(t ) V ( D, K S ,,tS) , t ) Data value dK P ( Im , K ) dt t K (t ) K0 P( Im, K , t )dt 0 time t P is a functional and is a path integral! Cost/Benefit Analysis 0 Return on Investment G nm W( K ) F G C nm(W (K ) CS kCe ) Ci nCu Profit What is the Role of Interaction? •The principal role of interaction in knowledge visualization is to involve the user intimately in exploration, discovery, and knowledge creation. •The best interactive interface should have an air of inevitability, successfully answering the question “what next?” t K (t ) K0 P( Im, K , t )dt 0 Interaction selects the path that maximizes the above. Knowledge Visualization: Bioinformatics Knowledge Visualization: Bioinformatics Knowledge Visualization: Bioinformatics Knowledge Visualization: Bioinformatics Questions? www.srvac.uncc.edu www.viscenter.uncc.edu