Machine Learning for Big Data, Methods, Trends and

Download Report

Transcript Machine Learning for Big Data, Methods, Trends and

Büyük Veri Madenciliği veYapay Öğrenme
A. Taylan Cemgil
24.12.2012, ITO Istanbul
http://www.cmpe.boun.edu.tr/pilab



Machine Learning
Use Cases
Supervised Learning
 Classification

Unsupervised Learning
 Clustering
 Dimensionality Reduction

Probabilistic Approach to Machine Learning






Probability Theory
Graphical Models, Probabilistic Expert Systems
Time Series
Matrix and Tensor Factorization
Sensor Fusion
Scaling up Machine Learning
 Architectures

References
ML for Big Data, Cemgil, 24.12.2012
2

Collection of computational methods to …
 Detect hidden patterns in data
 Create useful predictions about unseen data
 Decision making under uncertainty
 Transform raw data into useful knowledge
ML for Big Data, Cemgil, 24.12.2012
3
Mathematics and
Statistics
• Optimization
• Numerical Linear
Algebra
• Probability Theory
Computer Science
• Databases
• Parallel Processing
• Artificial Intelligence
• Information Retrieval
• Graphics/Visualization
Electrical
Engineering
• Pattern Recognition
• Signal processing
• Detection/Estimation
• Information Theory
• Data Compression
ML for Big Data, Cemgil, 24.12.2012
4



Facets of the same problem
Differences in emphasis/terminology
Historical Evolution of the fields
 Data Mining: Database systems, Data Structures
 Statistics: Probability Theory, Mathematics
 Machine Learning: Artificial Intelligence, Pattern
Recognition
ML for Big Data, Cemgil, 24.12.2012
5




Thinking about old methods with a new mind set
… and invent new ones
Curse/Blessing of Dimensionality
Infrastructure is cheaper
 Cloud Computing
 Sensor Networks (“new kind of data”)
 Speed (“real time”)
ML for Big Data, Cemgil, 24.12.2012
6


Emphasis on System Integration
Reached Critical Mass/Mature technology
ML for Big Data, Cemgil, 24.12.2012
7


“data explosion is bigger than Moore's law”
Computers get faster and cheaper every year but
the amount of data that needs to be processed
grows even faster.
DATA
CPU
ML for Big Data, Cemgil, 24.12.2012
8
AMERICAN/TURKISH (SHORT)
EUROPEAN (LONG)
103 Thousand
(106 ) Million
(109 ) Billion
(1012 ) Trillion
 (1015 ) Quadrillion
 (1018 ) Quintillion
 …









1000 × 1000𝑛
103 Thousand
(106 ) Million
(109 ) Milliard
(1012 ) Billion
 (1015 ) Billiard
 (1018 ) Trillion
 …

1000000𝑛
ML for Big Data, Cemgil, 24.12.2012
9
103
210
megabyte (MB) 106
220
gigabyte (GB)
109
230
terabyte (TB)
1012
240
petabyte (PB)
1015
250
exabyte (EB)
1018
260
zettabyte (ZB)
1021
270
yottabyte (YB)
1024
280
kilobyte (kB)
ML for Big Data, Cemgil, 24.12.2012
10
= 1TB = 1 000 000 000 000 Bytes
=1 Trillion Bytes
= 1PB
= 1 000 000 000 000 000B
=1 Quadrillion Bytes
ML for Big Data, Cemgil, 24.12.2012
11

CERN: Large Hadron Collider produces about 15
petabytes of data per year
× 15 000

Google processes about 24 petabytes of data per
day.
× 24 000
ML for Big Data, Cemgil, 24.12.2012
12

Facebook’s Hadoop Distributed File System (HDFS)
is reported to be about 100 PB
× 100 000

Global Internet Traffic per month in 2011 is
estimated to be about 27500 PB (Source:Cisco)
× 27 500 000
ML for Big Data, Cemgil, 24.12.2012
13
We are drowning in data and starving for knowledge
– J. Naisbitt
(from Machine Learning, a probabilistic perspective, KP Murphy)
ML for Big Data, Cemgil, 24.12.2012
14






Product Recommendation
Market Basket Analysis
Event/Activity/Behavior Analysis
Campaign management and optimization
Supply-chain management and analytics
Market and consumer segmentations
ML for Big Data, Cemgil, 24.12.2012
15

Netflix: 18K movies × 500K users %99 sparse
ML for Big Data, Cemgil, 24.12.2012
16







Network Monitoring and Performance
Optimization
Pricing Optimization
Customer Churn Management
Call Detail Record (CDR) Analysis
(Mobile) User Behavior Analysis
Cybersecurity, Detection and Prevention of DDOS
Attacks
Infrastructure Planning
ML for Big Data, Cemgil, 24.12.2012
17
ML for Big Data, Cemgil, 24.12.2012
18



Fraud Detection/Risk Estimation
High Speed Trading
Anomality/Changepoint Detection
ML for Big Data, Cemgil, 24.12.2012
19






Clickstream Segmentation and Analysis
Ad Targeting/Selection, Forecasting and
Optimization
Click Fraud Detection/Prevention
Social Graph Analysis
Customer Segmentation
Newsgroup/Blog/Social Media opinion tracking
ML for Big Data, Cemgil, 24.12.2012
20

Community Detection (source: matlab exchange)
ML for Big Data, Cemgil, 24.12.2012
21

Ad Personalization: Match ads with users
 Key income generator for Google, Yahoo
ML for Big Data, Cemgil, 24.12.2012
22




Urban Traffic Management
Energy Grid Management/Optimization,
Power Generation Management
Environment Monitoring
ML for Big Data, Cemgil, 24.12.2012
23





Diagnosis and Medical Expert systems
Health Insurance fraud detection
Patient care quality and program analysis
Drug discovery
Remote Monitoring
ML for Big Data, Cemgil, 24.12.2012
24

𝑋(𝑔𝑒𝑛𝑒, 𝑠𝑎𝑚𝑝𝑙𝑒, 𝑡𝑖𝑚𝑒)
ML for Big Data, Cemgil, 24.12.2012
25

Pragmatic view
 Small Data: Naïve algorithms are feasible
 Medium Data: Feasibly processed on one machine
 Big Data: Does not fit on one machine

Complex relational data
 Analysis of pairwise/higher order interactions between
entities
ML for Big Data, Cemgil, 24.12.2012
26

Classification
ML for Big Data, Cemgil, 24.12.2012
27
Feature 1
Feature 2
Feature 3
Feature 4
Class
5.1
4.3
2.1
0.3
0
5.7
3.5
3.2
0.8
0
3.4
5.2
0.4
0.6
1
X1
X2
X3
X4
c
𝑐 ≈ 𝑓(𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑁 𝑥𝑁 )
ML for Big Data, Cemgil, 24.12.2012
28

Ad Prediction on a Cluster of 1000 Machines

what is the probability that a given ad will be clicked given some context?
A Reliable Effective Terascale Linear Learning System, Agarwal et.al. 2012
Features = 16 M
Number of Examples
17 Billion

3TB Entries
1000 Machines
ML for Big Data, Cemgil, 24.12.2012
29
1.
2.
3.
4.
5.
On each node use online learning independently
to find a parameter vector.
Use AllReduce to average the weights.
On each node, compute the sum of the gradient
for each example.
AllReduce to add the gradients at each node.
Use L-BFGS to update the weight vector, goto 3
ML for Big Data, Cemgil, 24.12.2012
30



Clustering
Dimensionality Reduction
Visualization
ML for Big Data, Cemgil, 24.12.2012
31
ML for Big Data, Cemgil, 24.12.2012
32

Terms-Documents
ML for Big Data, Cemgil, 24.12.2012
33
ML for Big Data, Cemgil, 24.12.2012
34
ML for Big Data, Cemgil, 24.12.2012
35

Probability Theory
 Probability theory is nothing but common sense
reduced to calculation – P. Laplace


Graphical Models, Probabilistic Expert Systems
Time Series
 Example: Network flow classification
ML for Big Data, Cemgil, 24.12.2012
36
ML for Big Data, Cemgil, 24.12.2012
37
ML for Big Data, Cemgil, 24.12.2012
38
ML for Big Data, Cemgil, 24.12.2012
39
ML for Big Data, Cemgil, 24.12.2012
40
ML for Big Data, Cemgil, 24.12.2012
41
ML for Big Data, Cemgil, 24.12.2012
42
ML for Big Data, Cemgil, 24.12.2012
43
ML for Big Data, Cemgil, 24.12.2012
44
ML for Big Data, Cemgil, 24.12.2012
45
ML for Big Data, Cemgil, 24.12.2012
46
ML for Big Data, Cemgil, 24.12.2012
47
ML for Big Data, Cemgil, 24.12.2012
48
ML for Big Data, Cemgil, 24.12.2012
49
Graphical Model Through Time
ML for Big Data, Cemgil, 24.12.2012
50
Mobile 3G Usage patterns, Monitor Applications
without Deep Packet Inspection (DPI)
8 Hrs Capture, Anonymised, without Payload 1TB
Joint work Kurt, Mungan, Saygun with Ericsson/Avae FP7 Mevico
ML for Big Data, Cemgil, 24.12.2012
51
VIDEO
VIDEO2
ML for Big Data, Cemgil, 24.12.2012
52
ML for Big Data, Cemgil, 24.12.2012
53

Tracking
ML for Big Data, Cemgil, 24.12.2012
54
ML for Big Data, Cemgil, 24.12.2012
55
1
2
1.5
?
4
3
3
6
?
ML for Big Data, Cemgil, 24.12.2012
4
8
6.1
56
1
2
1.5
1
1
2
1.5
2
?
4
3
3
3
6
?
ML for Big Data, Cemgil, 24.12.2012
4
4
8
6.1
57
1
2
1.5
1
1
2
1.5
2
2
4
3
3
3
6
4.5
ML for Big Data, Cemgil, 24.12.2012
4
4
8
6.1
58
ML for Big Data, Cemgil, 24.12.2012
59
ML for Big Data, Cemgil, 24.12.2012
60
ML for Big Data, Cemgil, 24.12.2012
61
ML for Big Data, Cemgil, 24.12.2012
62
Slide from ICML 2011 tutorial Langford et. al.
ML for Big Data, Cemgil, 24.12.2012
63



A. Gray, Analyzing Massive Datasets, Skytree, ML
Company
Data Scientist: The Sexiest Job of the 21st Century
(HBR)
Agarwal et. al. A Reliable Effective Terascale
Linear Learning System
ML for Big Data, Cemgil, 24.12.2012
64
ML for Big Data, Cemgil, 24.12.2012
65
ML for Big Data, Cemgil, 24.12.2012
66
ML for Big Data, Cemgil, 24.12.2012
67

Data is not Knowledge
 More Data is not more Knowledge





ML for Big Data Requires a new mindset for
algorithm design
Big Data is not only about entities but also about
their relations and interactions
Many applications, ML provides viable solutions
New CS Education, need more Maths, Physics and
Social Science Majors
Big Data = Big Potential
ML for Big Data, Cemgil, 24.12.2012
68
ML for Big Data, Cemgil, 24.12.2012
69





Ground Truth Labelling
Difficult but a must
Cheaters abound
Validation of labellers + qualification test
Amazon Mechanical Turk
ML for Big Data, Cemgil, 24.12.2012
70