Tutorial - Center for Large-scale Data Systems Research

Download Report

Transcript Tutorial - Center for Large-scale Data Systems Research

Benchmarking Datacenter and Big
Data Systems
Wanling Gao, Zhen Jia, Lei Wang, Yuqing Zhu, Chunjie Luo,
Yingjie Shi, Yongqiang He, Shiming Gong, Xiaona Li, Shujie
Zhang, Bizhu Qiu, Lixin Zhang,
Jianfeng Zhan
INSTITUTE OF COMPUTING TECHNOLOGY
http://prof.ict.ac.cn/jfzhan
1
Acknowledgements

2/
This work is supported by the Chinese 973
project (Grant No.2011CB302502), the HiTech Research and Development (863)
Program of China (Grant No.2011AA01A203,
No.2013AA01A213), the NSFC project (Grant
No.60933003, No.61202075) , the
BNSFproject (Grant No.4133081), and Huawei
funding.
Big Data Benchmarking Workshop
Executive summary

An open-source project on datacenter and big
data benchmarking ICTBench

http://prof.ict.ac.cn/ICTBench
 Several case studies using ICTBench
3/
Big Data Benchmarking Workshop
Question One

Gap between Industry and Academia

Longer and longer distance
• Code
• Data sets
4/
Big Data Benchmarking Workshop
Question Two

Different benchmark requirements

Architecture communities
• Simulation is very slow
• Small data and code sets

System communities
• Large-scale deployment is valuable.

Users need real-world applications
• There are three kinds of lies: lies, damn lies, and
benchmarks
5/
Big Data Benchmarking Workshop
State-of-Practice Benchmark Suites
SPEC CPU
TPCC
6/
SPEC Web
HPCC
Gridmix
Big Data Benchmarking Workshop
PARSEC
YCSB
Why a New Benchmark Suite for
Datacenter Computing

No benchmark suite covers diversity of data
center workloads

State-of-art: CloudSuite

7/
Only includes six applications according to their
popularity
Big Data Benchmarking Workshop
Why a New Benchmark Suite (Cont’)

Memory Level Parallelism(MLP):

Simultaneously outstanding cache misses
CloudSuite
our benchmark suite
DCBench
MLP
8/
Big Data Benchmarking Workshop
Why a New Benchmark Suite (Cont’)

Scale-out performance
DCBench
Cloudsuite Data analysis
benchmark 6
Speed up
5
sort
grep
wordcount
4
svm
kmeans
fkmeans
3
all-pairs
Bayes
HMM
2
1
1
9/
4
Working nodes
Big Data Benchmarking Workshop
8
Outline
10/

Background and Motivation

Our ICTBench

Case studies
Big Data Benchmarking Workshop
ICTBench Project

ICTBench: three benchmark suites




Project homepage


11/
DCBench: architecture (application, OS, and VM
execution)
BigDataBench: system (large-scale big data applications)
CloudRank: Cloud benchmarks (distributed
managements) not covered in this talk
http://prof.ict.ac.cn/ICTBench
The source code is available
Big Data Benchmarking Workshop
DCBench
12/

DCBench: typical data center workloads
 Different from scientific computing: FLOPS
 Cover applications in important domains
• Search engine, electronic commence etc.
 Each benchmark = a single application

Purposes
 Architecture
 system (small-to-medium) researches
Big Data Benchmarking Workshop
BigDataBench
13/

Characterizing big data applications
 Not including data-intensive super computing
 Synthetic data sets varying from 10G~ PB
 Each benchmark = a single big application.

Purposes
 large-scale system and architecture researches
Big Data Benchmarking Workshop
CloudRank



14/
Cloud computing
 Elastic resource management
 Consolidating different workloads
Cloud benchmarks
 Each benchmark = a group of consolidated data
center workloads.
 services/ data processing/ desktop
Purposes
 Capacity planning, system evaluation and researches
 User can customize their benchmarks.
Big Data Benchmarking Workshop
Benchmarking Methodology

To decide and rank main application domains
according to a publicly available metric


15/
e.g. page view and daily visitors
To single out the main applications from main
applications domains
Big Data Benchmarking Workshop
Top Sites on the Web
Search Engine
Social Network
Electronic Commerce
Media Streaming
Others
15%
5%
40%
15%
25%
Top Sites on the Web
More details in http://www.alexa.com/topsites/global;0
16/
Big Data Benchmarking Workshop
Benchmarking Methodology

To decide and rank main application domains
according to a publicly available metric


17/
e.g. page view and daily visitors
To single out the main applications from main
applications domains
Big Data Benchmarking Workshop
Main algorithms in Search Engine
Search Engine
Social Network
Electronic Commerce
Media Streaming
Others
15%
5%
40%
15%
25%
Top Sites on The Web
18/
Big Data Benchmarking Workshop
Algorithms used in Search:
Pagerank
Graph mining
Segmentation
Feature Reduction
Grep
Statistical counting
Vector calculation
sort
Recommendation
……
Main Algorithms in Search Engines
(Nutch)
Sort
Classification
DecisionTree
19/
BFS
Word Grep
Word Count
Segmentation
Big Data Benchmarking Workshop
Merge Sort
Vector calculate
PageRank
Segmentation
Scoring & Sort
Main Algorithms in Social Networks
Search Engine
Social Network
Electronic Commerce
Media Streaming
Others
15%
5%
40%
15%
25%
Top Sites on The Web
20/
Big Data Benchmarking Workshop
Algorithms used in Social Network:
Recommendation
Clustering
Classification
Graph mining
Grep
Feature Reduction
Statistical counting
Vector calculation
Sort
……
Main Algorithms in Electronic Commerce
Search Engine
Social Network
Electronic Commerce
Media Streaming
Others
15%
5%
40%
15%
25%
Top Sites on The Web
21/
Big Data Benchmarking Workshop
Algorithms used in electronic
commerce:
Recommendation
Associate rule mining
Warehouse operation
Clustering
Classification
Statistical counting
Vector calculation
……
Overview of DCBench
Programmin
g model
MapReduce
MapReduce
MapReduce
MapReduce
Vector MapReduce
Category
Workloads
Basic operation
Cluster
Sort
Wordcount
Grep
Naïve Bayes
Support
Machine
K-means
MapReduce
MPI
Fuzzy k-means
MapReduce
MPI
Recommendatio Item
based MapReduce
n
Collaborative Filtering
Association rule
mining
Segmentation
Frequent
pattern MapReduce
growth
Hidden Markov model MapReduce
Classification
22/
Big Data Benchmarking Workshop
language
source
Java
Java
Java
Java
Java
Java
C++
Java
C++
Java
Hadoop
Hadoop
Hadoop
Mahout
Implemented
by ourself
Mahout
IBM PML
Mahout
IBM PML
Mahout
Java
Mahout
Java
Implemented
by ourself
Overview of DCBench (Cont’)
Category
Workloads
Programming
model
language source
Warehouse
operation
Feature
reduction
Database operations
MapReduce
Java
Hive-bench
Principal Component
Analysis
MPI
C++
IBM PML
Kernel Principal
Component Analysis
MPI
C++
IBM PML
Vector calculate
Paper similarity
analysis
All-Pairs
C&C++
Implemented by ourself
Graph mining
Breadth-first search
MPI
C++
Graph500
Service
Pagerank
Search engine
MapReduce
C/S
Java
Java
Mahout
nutch
Auction
C/S
Java
Rubis
Media streaming
C/S
Java
Cloudsuite
Service
23/
Big Data Benchmarking Workshop
Methodology of Generating Big Data

To preserve the characteristics of real-world
Characteristic
data
Expand
Analysis
Small-scale
Data
Semantic
Big Data
Locality
Temporally
e.g. word
frequency
24/
Spatially
Big Data Benchmarking Workshop
Word reuse
distance
Word distribution
in documents
Workloads in BigDataBench 1.0 Beta

Analysis Workloads



Search Engine Service Workloads

25/
Simple but representative operations
• Sort, Grep, Wordcount
Highly recognized algorithms
• Naïve Bayes, SVM
Widely deployed services
• Nutch Server
Big Data Benchmarking Workshop
Variety of Workloads are Included
Workloads
Off-line
On-line
Base
Operations
26/
Machine
Learning
I/O bound
CPU bound
Hybrid
Sort
Wordcount
Grep
Big Data Benchmarking Workshop
Naïve
Bayes
Nutch
Server
SVM
Features of Workloads
Workloads
Sort
Wordcount
Grep
Resource
Characteristic
Computing Complexity
Instructions
I/O bound
O(n*lgn)
Integer comparison
domination
CPU bound
/
O(m*n)
[m: the length of
dictionary]
Floating-point
computation domination
/
O(M*n)
[M: the number of support
vectors * dimension]
Floating-point
computation domination
SVM
27/
O(n)
Integer comparison
domination
Hybrid
Naïve Bayes
Nutch Server
O(n)
Integer comparison and
calculation domination
I/O & CPU bound
Big Data Benchmarking Workshop
Integer comparison
domination
Content
28/

Background and Motivation

Our ICTBench

Case studies
Big Data Benchmarking Workshop
Use Case 1: Microarchitecture
Characterization
Using DCBench
 Five nodes cluster



29/
one mater and four slaves(working nodes)
Each node:
Big Data Benchmarking Workshop
Instructions Execution level
kernel
application

DCBench:


30/
Software Testing
Media Streaming
Data Serving
Web Search
Web Serving
SPECFP
SPECINT
SPECWeb
HPCC-COMM
HPCC-DGEMM
HPCC-FFT
HPCC-HPL
HPCC-PTRANS
HPCC-RandomAccess
HPCC-STREAM
Data
analysis
Naive Bayes
SVM
Grep
WordCount
K-means
Fuzzy K-means
PageRank
Sort
Hive-bench
IBCF
HMM
avg
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
service
Data analysis workloads have more app-level instructions
Service workloads have higher percentages of kernel-level
instructions
Big Data Benchmarking Workshop
Pipeline Stall

DC workloads have severe front end stall (i.e. instruction
fetch stall)
Services: more RAT(Register Allocation Table) stall
Data analysis: more RS(Reservation Station) and ROB(ReOrder Buffer) full
stall


Instruction fetch_stall
Rat_stall
load_stall
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
31/
Big Data Benchmarking Workshop
RS_full stall
store_stall
ROB_full stall
Architecture Block Diagram
32/
Big Data Benchmarking Workshop
Front End Stall Reasons
For DC, High Instruction cache miss and Instruction TLB
miss make the front end inefficiency
100
80
60
40
20
0
ITLB Page Walks per K-instruction
L1 I Cache Miss per K-Instruction

33/
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
Big Data Benchmarking Workshop
MLC Behaviors

DC workloads have more MLC misses than HPC
L2 Cache misses per k-Instruction

34/
Data analysis workloads own better locality (less L2
Service
cache misses)
100
80
60
40
Data analysis
20
0
Big Data Benchmarking Workshop
HPCC

35/
Software Testing
Media Streaming
Data Serving
Web Search
Web Serving
SPECFP
SPECINT
SPECWeb
HPCC-COMM
HPCC-DGEMM
HPCC-FFT
HPCC-HPL
HPCC-PTRANS
HPCC-RandomAccess
HPCC-STREAM

Naive Bayes
SVM
Grep
WordCount
K-means
Fuzzy K-means
PageRank
Sort
Hive-bench
IBCF
HMM
avg
The ratio of L3 Cache satisfed L2 Cache Miss
LLC Behaviors
LLC is good enough for DC workloads
Most L2 cache misses can be satisfied by LLC
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Big Data Benchmarking Workshop

36/
2,5
Software Testing
Media Streaming
Data Serving
Web Search
Web Serving
SPECFP
SPECINT
SPECWeb
HPCC-COMM
HPCC-DGEMM
HPCC-FFT
HPCC-HPL
HPCC-PTRANS
HPCC-RandomAccess
HPCC-STREAM

Naive Bayes
SVM
Grep
WordCount
K-means
Fuzzy K-means
PageRank
Sort
Hive-bench
IBCF
HMM
avg
Page Walks per K-Instruction
DTLB Behaviors
DC workloads own more DTLB miss than HPC
Most data analysis workloads have less DTLB miss
Data analysis
Service
Big Data Benchmarking Workshop
HPCC
2
1,5
1
0,5
0


8,00%
37/
6,00%
5,00%
3,00%
2,00%
Big Data Benchmarking Workshop
Software Testing
Media Streaming
Data Serving
Web Search
Web Serving
SPECFP
SPECINT
SPECWeb
HPCC-COMM
HPCC-DGEMM
HPCC-FFT
HPCC-HPL
HPCC-PTRANS
HPCC-RandomAccess
HPCC-STREAM

Naive Bayes
SVM
Grep
WordCount
K-means
Fuzzy K-means
PageRank
Sort
Hive-bench
IBCF
HMM
avg
Branch misprediction ratio
Branch Prediction
DC:
Data analysis workloads have pretty good branch
behaviors
Service’s branch is hard to predict
Service
7,00%
Data analysis
4,00%
HPCC
1,00%
0,00%
DC Workloads Characteristics

Data analysis applications share many inherent
characteristics, which place them in a different class
from desktop, HPC, traditional server and scale-out
service workloads.
More details can be found at our IISWC 2013 paper.
 Characterizing Data Analysis Workloads in Data
Centers. Zhen Jia, et al. 2013 IEEE International
Symposium on Workload Characterization (IISWC2013)

38/
Big Data Benchmarking Workshop
Use Case 2: Architecture Research
Using BigDataBench 1.0 Beta
 Data Scale



Hadoop Configuration

39/
10 GB – 2 TB
1 master 14 slave node
Big Data Benchmarking Workshop
Use Case 2: Architecture Research


Some micro-architectural events are tending towards stability
when the data volume increases to a certain extent
Cache and TLB behaviors have different trends with increasing
data volumes for different workloads

40/
L1I_miss/1000ins: increase for Sort, decrease for Grep
Big Data Benchmarking Workshop
Search Engine Service Experiments

Same phenomena is
observed


Index size:2GB ~ 8GB
Segment size:4.4GB ~ 17.6GB
41/
Big Data Benchmarking Workshop
Micro-architectural events
are tending towards stability
when the index size
increases to a certain extent
Big data impose challenges
to architecture researches
since large-scale simulation
is time-consuming
Use Case 3: System Evaluation
Using BigDataBench 1.0 Beta
 Data Scale



Hadoop Configuration

42/
10 GB – 2 TB
1 master 14 slave node
Big Data Benchmarking Workshop
System Evaluation

a threshold for each workload



100MB ~ 1TB
System is fully loaded when the data
volume exceeds the threshold
Sort is an exception



An inflexion point(10GB ~ 1TB)
Data processing rate decreases after
this point
Global data access requirements
• I/O and network bottleneck

43/
System performance is dependent
on applications and data volumes.
Big Data Benchmarking Workshop
Conclusion

ICTBench




An open-source project on datacenter and big
data benchmarking

44/
DCBench
BigDataBench
CloudRank
http://prof.ict.ac.cn/ICTBench
Big Data Benchmarking Workshop
Publications



Characterizing OS behavior of Scale-out Data Center Workloads. Chen Zheng et al.
Seventh Annual Workshop on the Interaction amongst Virtualization, Operating
Systems and Computer Architecture (WIVOSCA 2013). In Conjunction with ISCA
2013.[

Characterization of Real Workloads of Web Search Engines. Huafeng Xi et al. 2011 IEEE
International Symposium on Workload Characterization (IISWC-2011).
The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big
Data Systems. Zhen Jia et al. Second workshop of big data benchmarking (WBDB 2012
India) & Lecture Note in Computer Science (LNCS)
CloudRank-D: Benchmarking and Ranking Cloud Computing Systems for Data
Processing Applications. Chunjie Luo et al. Front. Comput. Sci. (FCS) 2012, 6(4): 347–362


45/
BigDataBench: a Big Data Benchmark Suite from Web Search Engines. Wanling Gao, et
al. The Third Workshop on Architectures and Systems for Big Data (ASBD 2013) in
conjunction with ISCA 2013.
Characterizing Data Analysis Workloads in Data Centers. Zhen Jia, et al. 2013 IEEE
International Symposium on Workload Characterization (IISWC-2013)
Big Data Benchmarking Workshop
Thank you!
Any questions?
46/
Big Data Benchmarking Workshop