Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept

Download Report

Transcript Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept

Massive Data Analysis Lab

(MassDAL) S. Muthukrishnan CS Dept

MassDAL

• Agenda: Gather, manage and process massive data logs --- Web, IP/wireless traffic data, location trajectories of objects, sensor readings of physical world.

• Key Challenges: – Scale: Beyond the traditional “human” scale. Eg., IP data at a single router interface for an hour exceeds total yearly worldwide credit card transactions!

– Data Collection: probes/sensors with associated data quality and communication problems.

• Need breakthroughs in Mathematics, Algorithms, Systems and Engineering, to meet these challenges. • Potential: Major impact in Homeland Security, Telecom, Transportation and Society-at-large.

State of MassDAL

• Mathematics and Computer Science .

– Algorithmic tools for embedding vectors, strings, trees and other objects for “compact” representation. – Algorithmic tools for analyzing data summaries for heavy hitters, deviants, clustering, decision trees, etc. – Invited talks at ACM, SIAM, European conferences in Algorithms, Databases, Statistics, and Data Mining on novel models and algorithms. – Over dozen research papers in last 2 years on experience with massive data analysis.

– Supported by NSF grants. Partner: MIT, DIMACS.

State of MassDAL

• Science

– Developing wearable sensors for tracking location of objects as well as “interactions” between objects. Measuring behavioral data.

– Current partner: Telcordia. Their initial investment: $300k/3 months (est). Potential parter in works: Los Alamos National Lab.

– Potential: Analysis of social networks for Epidemiology and Homeland Security, and health industry.

State of MassDAL

• Engineering.

– Consulting in analysis of wireless network logs. AT&T Wireless, 3 rd largest in US, 20 Million customers. Terabytes/month. Fully operational, telco grade! – Incorporated novel algorithms in

operational

IP network data analysis tools. Partner: Gigascope.

– Developed principled approach to data cleaning and data quality monitoring for operational IP network. Partner: PACMAN. – Developed new burst-detection algorithms for text streams. Partner: DIMACS, Monitoring message streams.

Future

• See http://cs.rutgers.edu/~muthu/massdal.html

• Research:

Future of MassDAL

Need breakthrough research in mathematics, systems, databases, algorithms, sensor networking. • Expand data domains.

– Potential partners : Google, NJ auto insurance fraud data, USPTO patent data, AWS location trajectories, etc.

• Build state-of-art facility at Rutgers.

– Secure, 24X7, data hosting and analysis infrastructure capable of gathering and processing petabytes of data/month across domains, data sources, etc. Unique in the world!

• Potential.

– Every wireless, telecom, internet service provider is looking to farm out this crucial piece of their operations. Estimated market for these services: 100’s of millions in US $ per year. Crucial for NJ State. Interest from multiple VCs now.