Super Scaling PROOF to very large clusters
Download
Report
Transcript Super Scaling PROOF to very large clusters
Super Scaling PROOF to very
large clusters
Maarten Ballintijn, Kris Gulbrandsen,
Gunther Roland / MIT
Rene Brun, Fons Rademakers / CERN
Philippe Canal / FNAL
CHEP 2004
Outline
PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
2
Outline
PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
3
PROOF – Parallel ROOT Facility
Interactive analysis of very large sets of ROOT
data files on a cluster of computers
Employ inherent parallelism in event data
The main design goals are:
Transparency, scalability, adaptability
On the GRID, extended from local cluster to
wide area virtual cluster or cluster of clusters
Collaboration between ROOT group at CERN
and MIT Heavy Ion Group
September, 2004
Super Scaling PROOF to Very Large Clusters
4
PROOF, continued
Slave
Master
Slave
Slave
Slave
Internet
Multi Tier architecture
Optimize for Data Locality
WAN Ready and GRID compatible
User
September, 2004
Super Scaling PROOF to Very Large Clusters
5
PROOF - Architecture
Data Access Strategies
Local data first, also rootd, rfio, SAN/NAS
Transparency
Input objects copied from client
Output objects merged, returned to client
Scalability and Adaptability
Vary packet size (specific workload, slave
performance, dynamic load)
Heterogeneous Servers
Migrate to multi site configurations
September, 2004
Super Scaling PROOF to Very Large Clusters
6
Outline
PROOF Overview
Benchmark Package
Dataset generation
Benchmark TSelector
Statistics and Event Trace
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
7
Dataset generation
Use the ROOT “Event” example class
Script for creating PAR file is provided
Generate data on all nodes with slaves
Slaves generate data files in parallel
Specify location, size and number of files
% make_event_par.sh
% root
root[0] gROOT->Proof()
root[1] .X make_event_trees.C(“/tmp/data”,100000,4)
root[2] .L make_tdset.C
root[2] TDSet *d = make_tdset.C()
September, 2004
Super Scaling PROOF to Very Large Clusters
8
Benchmark TSelector
Three selectors are used
EventTree_NoProc.C – Empty Process()
function, reads no data
EventTree_Proc.C – Reads all data and fills
histogram (actually only 35% read in this test)
EventTree_ProcOpt.C – Reads a fraction of the
data (20%) and fills histogram
September, 2004
Super Scaling PROOF to Very Large Clusters
9
Statistics and Event Trace
Global Histograms to monitor master
Number of packets, number of events, processing
time, get packet latency; per slave
Can be viewed using standard feedback
Trace Tree, detailed log of events during query
Master only or Master and Slave
Detailed List of recorded events follows
Implemented using standard ROOT classes and
PROOF facilities
September, 2004
Super Scaling PROOF to Very Large Clusters
10
Events recorded in Trace
Each event contains a timestamp and the
recording slave or master
Begin and End of Query
Begin and End of File
Packet details and processing time
File Open statistics (slaves)
File Read statistics (slaves)
Easy to add new events
September, 2004
Super Scaling PROOF to Very Large Clusters
11
Outline
PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
12
Benchmark Results
CDF cluster at Fermilab
160 nodes, initial tests
Pharm, Phobos private cluster, 24 nodes
6, 730 MHz P3 dual
6, 930 MHz P3 dual
12, 1.8 GHz P4 dual
Dataset:
1 files per slave, 60000 events, 100 Mb
September, 2004
Super Scaling PROOF to Very Large Clusters
13
Results on Pharm
September, 2004
Super Scaling PROOF to Very Large Clusters
14
Results on Pharm, continued
September, 2004
Super Scaling PROOF to Very Large Clusters
15
Local and remote File open
Local
local
remote
September, 2004
Super Scaling PROOF to Very Large Clusters
16
Slave I/O Performance
September, 2004
Super Scaling PROOF to Very Large Clusters
17
Benchmark Results
Phobos-RCF, central facility at BNL, 370
nodes total
75, 3.05 Ghz P4 dual, IDE
99, 2.4 Ghz P4 dual, IDE
18, 1.4 Ghz P3 dual, IDE
Dataset:
1 files per slave, 60000 events, 100 Mb
September, 2004
Super Scaling PROOF to Very Large Clusters
18
PHOBOS RCF LAN Layout
September, 2004
Super Scaling PROOF to Very Large Clusters
19
Results on Phobos-RCF
September, 2004
Super Scaling PROOF to Very Large Clusters
20
Looking at the problem
September, 2004
Super Scaling PROOF to Very Large Clusters
21
Processing time distributions
September, 2004
Super Scaling PROOF to Very Large Clusters
22
Processing time, detailed
September, 2004
Super Scaling PROOF to Very Large Clusters
23
Request packet from Master
September, 2004
Super Scaling PROOF to Very Large Clusters
24
Benchmark Conclusions
The benchmark and measurement facility
has proven to be a very useful tool
Don’t use NFS based home directories
LAN topology is important
LAN speed is important
More testing is required to pinpoint
sporadic long latency
September, 2004
Super Scaling PROOF to Very Large Clusters
25
Outline
PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
26
Other developments
Packetizer fixes and new dev version
PROOF Parallel startup
TDrawFeedback
TParameter utility class
TCondor improvements
Authentication improvements
Long64_t introduction
September, 2004
Super Scaling PROOF to Very Large Clusters
27
Outline
PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
28
Future plans
Understand and Solve LAN latency problem
In prototype stage
TProof::Draw()
Multi level master configuration
Documentation
HowTo
Benchmarking
PEAC PROOF Grid scheduler
September, 2004
Super Scaling PROOF to Very Large Clusters
29
The End
Questions?
September, 2004
Super Scaling PROOF to Very Large Clusters
30
#proof.conf
slave node1
slave node2
slave node3
slave node4
Remote
PROOF
Parallel Script Execution
Local PC
root
stdout/obj
ana.C
proof
proof
node1
Cluster
TFile
*.root
ana.C
proof
node2
$ root
root [0] tree->Process(“ana.C”)
.x ana.C
root [1] gROOT->Proof(“remote”)
root [2] dset->Process(“ana.C”)
proof = master server
proof = slave server
proof
*.root
TNetFile
TFile
*.root
TFile
*.root
node3
proof
node4
September, 2004
Super Scaling PROOF to Very Large Clusters
31
Simplified message flow
Master
Client
Slave(s)
SendFile
SendFile
Process(dset,sel,inp,num,first)
GetEntries
Process(dset,sel,inp,num,first)
GetPacket
ReturnResults(out,log)
ReturnResults(out,log)
September, 2004
Super Scaling PROOF to Very Large Clusters
32
TSelector control flow
TProof
TSelector
Slave(s)
TSelector
Begin()
Send Input Objects
SlaveBegin()
Process()
...
Process()
Return Output Objects
SlaveTerminate()
Terminate()
September, 2004
Super Scaling PROOF to Very Large Clusters
33
PEAC System Overview
September, 2004
Super Scaling PROOF to Very Large Clusters
34
Active Files during Query
September, 2004
Super Scaling PROOF to Very Large Clusters
35
Pharm Slave I/O
September, 2004
Super Scaling PROOF to Very Large Clusters
36
September, 2004
Super Scaling PROOF to Very Large Clusters
37
Active Files during Query
September, 2004
Super Scaling PROOF to Very Large Clusters
38