faloutsos-sdm07-pane..

Download Report

Transcript faloutsos-sdm07-pane..

CMU SCS
SDM’07 Panel
Data Mining Research: Current
Status and Future Opportunities
Christos Faloutsos
CMU
CMU SCS
Questions
Q1: What are future challenges and
opportunities for data mining that are not
presently receiving as much attention as
they deserve?
Q2: Are there things we are doing now that
we should be rethinking in considering
future challenges and opportunities for
data mining?
SDM'07
C. Faloutsos, CMU
2
CMU SCS
Past + current successes
• cross-disciplinarity: DM = Stat, ML, DB
• fascinating apps:
–
–
–
–
–
–
SDM'07
Bio-informatics
privacy
security
streams
social network mining
...
C. Faloutsos, CMU
3
CMU SCS
Machine Learning to support Systems Biology:
Subcellular Location - Bob Murphy
Cell Images of
many proteins
Feature Extraction,
Graphical Models,
Clustering of
proteins by pattern
Generative
Models for
each SDM'07
pattern
C. Faloutsos, CMU
Combine to enable
accurate
simulation of cell
4
behavior
CMU SCS
Q1: Challenges to focus on
• Scalability – mining Tera and Peta bytes
– stream mining (anomaly, intrusion detection,
sensors)
– graph mining (text/web mining, marketing, ...)
– autonomic systems
– search engines
– national security
– ...
SDM'07
C. Faloutsos, CMU
5
CMU SCS
Scalability
• Google: > 450,000 processors in clusters of
~2000 processors each
Barroso, Dean, Hölzle, “Web Search for a
Planet: The Google Cluster Architecture”
IEEE Micro 2003
• target: hundreds of Tb, to several Peta-bytes
SDM'07
C. Faloutsos, CMU
6
CMU SCS
E.g.: self-* system @ CMU
• >200 nodes
• 40 racks of computing
equipment
• 774kw of power.
• target: 1 PetaByte
• goal: self-correcting, selfsecuring, self-monitoring, self...
SDM'07
C. Faloutsos, CMU
7
CMU SCS
SDM'07
C. Faloutsos, CMU
8
CMU SCS
DM for Tera- and Peta-bytes
Two-way street:
<- DM can use such infrastructures to find
patterns
-> DM can help such infrastructures become
self-healing, self-adjusting, ‘self-*’
SDM'07
C. Faloutsos, CMU
9
CMU SCS
Q2: What to do differently
• emphasis on Systems – DM collaboration
SDM'07
C. Faloutsos, CMU
10