No Slide Title

Download Report

Transcript No Slide Title

MURI — Info. Management Group

Group Co-Leaders:

 Jiawei Han (UIUC)   Chris Clifton (Purdue)  Hillol Kargupta (UMBC)

Core Contributors:

 Latifur Khan (UT Dallas)  Chengxiang Zhai (UIUC)  

Collaborators:

 Murat Kantarcioglu (UT-Dallas)  Shouhuai Xu (UT- San Antonio)  Ninghui Li (Purdue)

Liasons:

 Ravi Sandhu (UT- San Antonio)  Anupam Joshi (UMBC) April 26, 2020 1

Core Contributors & Current Ph.D. Students

Jiawei Han (UIUC)

 Lu An Tang  Zhijun Yin 

Chengxiang Zhai (UIUC)

 Yuanhua Lv  Hyun Duk Kim 

Hillol Kargupta (UMBC)

 Kamalika Das 

Latifur Khan (UTD)

 Mehedy Masud 

Chris Clifton (Purdue)

 Mummoorthy Murugesan April 26, 2020 2

General Project Goals

Provide information management and analysis support for the project

Major research themes

 Knowledge Discovery  Data integration and fusion  Measuring and maintaining information quality  Provenance tracking  Confidentiality in Information Management and April 26, 2020 Analysis 3

Posters Reported in the Kick-Off Meeting

   

Plausibly Deniable Search

 Mummoorthy Murugesan and Chris Clifton

Conforming to Truth with Multiple Conflicting Information Providers on the Web

 Jiawei Han, Xiaoxin Yin, and Philip S. Yu

Privacy-preserving Data Mining within Anonymous Credential Systems

 Shouhuai Xu

User-Centered Adaptive Information Retrieval

 Xuehua Shen, Bin Tan, and ChengXiang Zhai  

Privacy Preserving Distributed Data Mining: A Game-Theoretic Approach

 Kamalika Das and Hillol Kargupta

Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment.

 April 26, 2020 Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham 4

On-Going Research Projects

     

Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment

 Mohammad M. Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani Thuraisingham (UTD/UIUC)

Confidentiality Preserving Data Cubes

 Jiawei Han, Lu An Tang and Bolin Ding (UIUC)

Scalable Distributed Privacy-Preserving Local Algorithms for Large Peer-to-Peer Data Mining: A Game Theoretic Approach

 Hillol Kargupta and Kamalika Das (UMBC)

Confidential peer to peer extension to personalized search

 Chengxiang Zhai, Chris Clifton, and Mummoorthy Murugesan (UIUC/Purdue)

Information quality: Understanding and identifying provenance

 ChengXiang Zhai and Jiawei Han (UIUC)

SPDU: A Secure Provenance Management Framework

 Shouhuai Xu and Ravi Sandhu (UTSA) April 26, 2020 5

Discovery in Data Streams for Security Protection

       Novel Class Detection in Concept-Drifting Data Streams in a Shared Environment Novelty/anomaly detection: A major issue in many applications, especially in a streaming environment Goal: Detect new classes in data streams Approach: Efficiently handle the novel class detection task in the presence of concept-drift and multiple classes The approach is non-parametric —not assume any underlying distributions of data Comparison with the state-of-the-art stream classification techniques prove the superiority of our approach The technique can be extended to a distributed environment with multiple sources April 26, 2020 6

Confidentiality-Preserving Data Cubes

  Confidentiality-/privacy-/sensitivity-preserving data cubes Researchers have been studying confidentiality preserving database systems (for query processing) and confidentiality-preserving data mining systems   We propose to investigate confidentiality-preserving data cubes for multidimensional analysis of data warehouses Goal: Work out mechanisms to ensure one can access maximal information in data cubes for information understanding but lose minimal privacy information, even with different combinations of OLAP queries  Extensions: How knowledge discovery will help confidentiality preserving April 26, 2020 7

Data and Information Integration for Security Protection

  Data fusion: Merge/integrate the same objects with different names or identities Data distinction: Distinguish different objects with identical names    Information integration by information network analysis Veracity analysis to conform truth with conflicting information provided by multiple website or other information providers Correlation analysis to reduce redundancy and control information disclosure  E.g. medical records, patients, medical treatments April 26, 2020 8

Data and Information Access and Management for Security Protection

 Data separation vs. data integration and their role in sensitive information disclosure and correlation discovery  Privacy-aware indexing to support fast/efficient data accessing  Sensitivity-aware query processing and data publishing  Any other data/information management and analysis issues needed from other groups in the project April 26, 2020 9

Scalable Distributed Local Algorithms for Peer-to-Peer Knowledge Discovery from Sensitive Data Hillol Kargupta University of Maryland, Baltimore County www.cs.umbc.edu/~hillol www.agnik.com

Acknowledgement: Chengxiang Zhai, Kamalika Das, Kanishka Bhaduri, Kun Liu

April 26, 2020 10

Scalable Privacy-Preserving Information Assurance

Challenges in Scalable Knowledge Discovery

   Scaling in large asynchronous distributed environments Confidentiality/Privacy Preserving Data Analysis Heterogeneous Policies and Strategies 

Applications

  Distributed collaboration Distributed search and information retrieval

Motivation: Secure Multi-Party Sum Computation

z 1 =(R+v 1 ) mod N v 1 z 3 =(z 2 +v 3 ) mod N • Each party has a number • Compute the sum without divulging the numbers • Consider a sequence of secure sum operations.

v 2 z 2 =(z 1 +v 2 ) mod N R is uniformly distributed in [0, N-1] v 3

Locality Sensitive Distributed Algorithms

 Global algorithms: Communicate with the entire network  Every node needs to maintain information about the entire network  Maintaining this information is resource intensive for large networks  Local algorithms: Communicate only with the local neighborhood.

 Bounded communication local algorithms

Distributed Sum Computation: A Local Approach

  Each node has a number

x i

[ 0 ] Compute the sum 

x i

[

t x i

[

t

] 

x i

[

t

 1 ]   

j

 

i

(

x j

[

t

 1 ] 

x i

[

t

 1 ])  Asymptotically converges to the global sum

Optimization, Games, and Privacy Preserving Knowledge Discovery

 Multi-Party Privacy Preservation as an optimization problem  Multi-party, multi-objective optimization  Blending game theory and mechanism design  Asynchronous algorithms for achieving equilibrium states

Privacy/Confidentiality Preservation: An Optimization Perspective

 Multi-objective Optimization Perspective  Policies  Strategies  Performance  Distributed games for optimizing utility functions

Summary of the Approach

Local Asynchronous Distributed Knowledge Discovery Algorithms that preserve Privacy/Confidentiality

Distributed Search and Information Retrieval Algorithms

Multi-party Optimization Perspective of Privacy/Confidentiality Preservation and Design of Distributed Game Theoretic Mechanisms

April 26, 2020 17

Example: Cross-Domain Network Threat Detection

Correlating threats from different network domains

Copyright, Agnik

Motivation : P2P Search Engine

What is the most visited news-page in network today?

Has anybody found a cheap store to buy a digital camera?

What is the best search-key to search for “Child Care”?

Useful Browser Data

      Web-browser history Browser cache Click-stream data stored at browser (browsing pattern) Search queries typed in the search engine User profile Bookmarks  Challenges    Indexing, clustering, data analysis in a decentralized asynchronous manner Scalability Privacy

User-Centered Adaptive Information Retrieval

WEB

Search Engine Search Engine Viewed Web pages

...

Query History Personalized search agent “java”

Desktop Files Email

Search Engine Personalized search agent “java”

User-Centered Adaptive IR

A novel retrieval strategy emphasizing

– – –

user modeling (“user-centered”) search context modeling (“adaptive”) interactive retrieval

Implemented as a personalized search agent that

– –

sits on the client-side (owned by the user) integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users)

– –

collaborates with each other goes beyond search toward task support

Reranking of Search Results with UCAIR Toolbar

April 26, 2020 23

Research Agenda

Develop a scalable methodology for Knowledge Discovery from Multi-Party Data

Design local asynchronous algorithms with bounded communication

Multi-objective Distributed Optimization, Mechanism Design, and Local Algorithms

Designing the Next Generation of Privacy Preserving Distributed Knowledge Discovery Algorithms

Research Agenda

Privacy-preserving user modeling:

How can we model a user’s information need yet preserving privacy?

How can we aggregate user models and information needs to control privacy?

P2P information recommendation

  

P2P architecture: flexible information sharing What’s the right protocol for information recommendation? How to extend collaborative filtering algorithms to protect user privacy?

Collaborative Search

How can we match information needs with information content at different levels of representation?

From Collaborative Query/Filtering to Information Push

   

Chengxiang Zhai and Chris Clifton (UIUC/Purdue) Personalized search

profile of information needs

 Profile based on prior search, without requiring explicit definition of profile  Assist information sources in identifying need to share

Challenge: profile / search may be sensitive

 May not be able to reveal to information source (unless they have needed information?)

Research thrusts:

  Turning personalized search into profiles Matching information to profiles without disclosing either April 26, 2020 27

SPDU: A Secure Provenance Management Framework

   

Shouhuai Xu and Ravi Sandhu (UTSA) Security of provenance management is critical to many applications including assured information sharing The state-of-the-art is that we know little about the security aspect of provenance management. We propose investigating a comprehensive framework for secure provenance management as well as supporting architectures and mechanisms for realizing the framework

April 26, 2020 28

SPDU Shouhuai Xu and Ravi Sandhu • A comprehensive framework for securing provenance and the corresponding information – We cannot talk about provenance without touching what the provenance is for (i.e., both data and their provenance are the goals for protection) • Supporting architectures and mechanisms for realizing the framework

SPDU framework

• The above challenges call for a novel framework for secure provenance management. • We propose a SPDU framework for this purpose.

– S stands for Source trustworthiness management – P stands for Processing trustworthiness management – D stands for Dissemination management Information trustworthiness management – U stands for Usage management • SPDU is application-neutral: allowing plug-and-play application specific modules (e.g., semantic similarity between two documents) • SPDU covers the whole lifecycle of information sharing Processing (recursive) Source Dissemination Usage

Eight facets of SPDU

Usage accountability Dissemination accountability Source privacy Processing accountability Secure provenance management Processing privacy Source accountability Usage privacy Dissemination privacy

Information Quality: Understanding and Identifying Provenance

    

ChengXiang Zhai and Jiawei Han (UIUC) Credibility of information, particularly information presumed to be from multiple sources, is a challenging issue Are multiple reports independent confirmation of the same event? Based on a common report? Reports of different events? Propose to use data mining techniques to identify similarities/differences in information that is apparently from different sources to estimate the likelihood that data is from a single or independent sources, and about the same or multiple events Propose to develop novel text mining algorithms to analyze "information genealogy" in large amounts of text data from multiple sources and summarize contradictory opinions on a topic

33 April 26, 2020

Summarizing Contradictory Information

    

Given a set of text articles from different sources with contradictory information, how can we help analysts to digest the information? Problem 1: Semantic integration of information from multiple sources Problem 2: Detection of contradictory information Problem 3: Summarization of contradictory information Techniques to explore:

  text mining with probabilistic models information extraction (e.g., entity/relation extraction)

Questions for YOU!

  

Other data analysis / global statistical model needs?

  Data quality? Lifecycle?

What sort of global statistical models would be of interest to Intelligence Analysts?

Models that transcend data silos

Scenarios for testing

 Sample/surrogate data to support scenarios April 26, 2020 35

April 26, 2020

Thanks and Questions

36