On Grid Performance Evaluation using Synthetic Workloads

Download Report

Transcript On Grid Performance Evaluation using Synthetic Workloads

Resource and Test Management in Grids
Dick Epema, Catalin Dumitrescu, Hashim Mohamed,
Alexandru Iosup, Ozan Sonmez
Parallel and Distributed Systems Group
Delft University of Technology
July 7, 2015
Rapid Prototyping in e-Science VL-e Workshop, Amsterdam, NL
1
A Brief Introduction to Grid Computing
• Typical grid environment
e.g., the DAS
• Applications [!]
• Grids vs. (traditional)
parallel production environments
• Resources
• Compute (Clusters)
• Storage
• (Dedicated) Network
• Virtual Organizations,
Projects (e.g., VL-e),
Groups, Users
•
•
•
•
Dynamic
Heterogeneous
Very large-scale (world)
No central administration
→ Most problems are NP-hard,
need experimental validation
July 7, 2015
3
Outline
• A Brief Introduction to Grid Computing
• Koala: Processor and Data Co-Allocation in Grids




The Co-Allocation Problem in Grids
The Koala Design
Koala and the DAS Community
The Future of Koala
• GrenchMark: Analyzing, Testing, and Comparing Grids
 Grid Performance Evaluation Issues
 The GrenchMark Architecture
 Experience with GrenchMark
• Take home message
July 7, 2015
4
The Co-allocation Problem in Grids (1)
Motivation
• Co-allocation = the simultaneous allocation of resources
in multiple clusters to single applications
which consist of multiple components
• Reasons
• Use more resources than available at single cluster at given time
• Create a specific virtual environment (e.g., visualization cluster ,
geographically spread data)
• Achieve reliability through replication on multiple clusters
• Avoid resource contention on the same site (e.g., batches)
July 7, 2015
5
The Co-allocation Problem in Grids (2)
Overall Example
global queue
KOALA
local queues
with local
schedulers
LS
LS
LS
load sharing
co-allocation
clusters
global job
local jobs
Source: Dick Epema
July 7, 2015
6
The Co-allocation Problem in Grids (3)
Details: Processors and Data Co-Alloc.
• Jobs have access to processors and data from many sites
• Files stored at different file sites, replicas may exist
• Scheduler decides on job component placement at execution sites
• Jobs can be of high or low priority
Source: Hashim Mohamed
July 7, 2015
7
The Co-allocation Problem in Grids (4)
Details: Co-Allocated Job Types
fixed jobs
non-fixed jobs
Job component size and placement
fixed by user
Job component size fixed by user,
placement by scheduler decision
semi-fixed jobs
flexible jobs
July 7, 2015
Job component size and placement by
scheduler decision / fixed by user
Job component size and placement 8
by scheduler decision
The Koala Design
Source: Hashim Mohamed
Selection Control Instantiation
Placing job
components
Transfer
Claiming
executable
resources
7, 2015
and input Julyselected
for each
files
job component
Run
Submit, then
monitor job
execution
(fault-tolerance)
9
The Koala Selection Step
Many Placement Policies
• Originally supported co-allocation policies:
• Worst-Fit: balance job components across sites
• Close-to-Files: take into account the locations of input files
to minimize transfer times
• (Flexible) Cluster Minimization: mitigate inter-cluster
communication; can also split the job automatically
• But, different application types require
different ways of component placement
• So:
• Modular structure with pluggable policies
• Take into account internal structure of applications
July 7, 2015
10
The Koala Selection Step
HOCs: Exploiting Application Structure
• Higher-Order Components:
• Pre-packaged software components with generic
patterns of parallel behavior
• Patterns: master-worker, pipelines, wavefront
• Benefits:
• Facilitates parallel programming in grids
• Enables user-transparent scheduling in grids
• Most important additional middleware:
• Translation layer that builds a performance model
from the HOC patterns and the user-supplied
application parameters
• Supported by KOALA (with Univ. of Münster)
July 7, 2015
• Initial results: up to 50%
reduction in runtimes
11
The Koala Instantiation Step
The Runners
• Problem:
How to support many application
types, each with specific
(and difficult) requirements?
• Solution:
runners (=interface modules)
• Currently supported:
•
•
•
•
Any type of single-component job
MPI/DUROC jobs
Ibis jobs
HOC applications
runner
• API for extensions: write your own!
July 7, 2015
12
Koala and the DAS Community
• Extensive experience gathered while assessing various
co-allocation policies: over 25,000 completed jobs!
• Koala has been released on the DAS in Sep 2005
[ www.st.ewi.tudelft.nl/koala/ ]
• Hands-on Tutorials (last in Spring 2006)
• Documentation (web-site)
• Papers
• IEEE Cluster’04, Dagstuhl FGG’04, EGC’05,
IEEE CCGrid’05, IEEE Cluster’06, etc.
• Koala helps you get results:
• IEEE CCGrid’06, others submitted
July 7, 2015
13
The Future of Koala
• Support for more applications types, e.g.,
• Workflows, Parameter sweep applications
• Scheduling your application?
• Communication-aware and
application-aware scheduling policies:
• Take into account the communication pattern of applications
when co-allocating
• Also schedule bandwidth (in DAS3)
vrije Universiteit
• Support heterogeneity
• DAS3
• DAS2 + DAS3
• DAS3 + Grid’5000 + RoGRID
• Peer-to-peer structure instead of hierarchical grid scheduler
July 7, 2015
14
Outline
• A Brief Introduction to Grid Computing
• Koala: Processor and Data Co-Allocation in Grids




The Co-Allocation Problem in Grids
The Koala Design
Koala and the DAS Community
The Future of Koala
• GrenchMark: Analyzing, Testing, and Comparing Grids
 Grid Performance Evaluation Issues
 The GrenchMark Architecture
 GrenchMark and the DAS Community
• Take home message
July 7, 2015
15
Grid Performance Evaluation
Current Practice
• Performance Indicators
• Define my own metrics, or use U and AWT/ART, or both
• Workload Structure
• Run my own workload; Mostly all users are created equal
assumption (unrealistic)
• Do not make comparisons (incompatible workloads)
• No repeatability of results (e.g., background load)
Need a common
performance evaluation
framework for Grid:
GrenchMark
July 7, 2015
16
GrenchMark: a Framework for
Analyzing, Testing, and Comparing grids
• What’s in a name?
grid benchmark → working towards a generic tool for the whole
community: help standardizing the testing procedures,
but benchmarks are too early; we use
synthetic grid workloads instead
• What’s it about?
A systematic approach to analyzing, testing, and comparing grid settings,
based on synthetic workloads
• A set of metrics and workload units for analyzing grid settings [JSSPP’06]
• A set of representative grid applications
• Both real and synthetic
• Easy-to-use tools to create synthetic grid workloads
• Flexible, extensible framework
July 7, 2015
17
GrenchMark Overview: Easy to
Generate and Run Synthetic Workloads
July 7, 2015
18
… but More Complicated Than You Think
• Workload structure
• User-defined and
statistical models
• Dynamic jobs arrival
• Burstiness and self-similarity
• Feedback, background load
• Machine usage assumptions
• Users, VOs
• Measurement methods
• Long workloads
• Saturated / non-saturated system
• Start-up, production, and
cool-down scenarios
• Scaling workload to system
• Applications
• Synthetic
• Real
• Metrics
• A(W) Run/Wait/Resp. Time
• Efficiency, MakeSpan
• Failure rate [!]
• (Grid) notions
• Workload definition language
• Base language layer
• Extended language layer
• Co-allocation, interactive jobs,
malleable, moldable, …
• Other
• Can use the same workload for
both simulations and real
environments
July 7, 2015
19
GrenchMark and the DAS community
• Generic Performance Evaluation [IEEE CCGrid’06]
• Grid System Analysis
• Performance testing, What-if analysis
• Functionality Testing in Grid Environments
• System functionality testing, Periodic testing
• Comparing Grid Settings
• Single site vs. co-allocated jobs
• Releasing the Koala Grid Scheduler on the DAS
• 5,000+ jobs successfully run (in all workloads);
• Functionality tests for 3 different job submission modules
• GrenchMark has been released in Nov 2005
[ grenchmark.st.ewi.tudelft.nl ]
July 7, 2015
20
Take home message
• PDS Group/TU Delft
- resource and test management in Grid systems
• Koala: Processor and Data Co-Allocation in Grids
[ www.st.ewi.tudelft.nl/koala/ ]
- Grid scheduling with co-allocation and fault-tolerance
- many placement policies available
- extensible runners system
- easy-to-use, flexible
- tutorials, on-line documentation, papers
• GrenchMark: Analyzing, Testing, and Comparing Grids
[ grenchmark.st.ewi.tudelft.nl ]
- generic tool for the whole community
- generates diverse grid workloads
- easy-to-use, flexible, portable, extensible, …
July 7, 2015
22
Thank you!
Questions?
Remarks? Observations?
All welcome!
www.st.ewi.tudelft.nl/koala
grenchmark.st.ewi.tudelft.nl/
July 7, 2015
23