Super Scaling PROOF to very large clusters

Download Report

Transcript Super Scaling PROOF to very large clusters

Super Scaling PROOF to very
large clusters
Maarten Ballintijn, Kris Gulbrandsen,
Gunther Roland / MIT
Rene Brun, Fons Rademakers / CERN
Philippe Canal / FNAL
CHEP 2004
Outline





PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
2
Outline





PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
3
PROOF – Parallel ROOT Facility
 Interactive analysis of very large sets of ROOT
data files on a cluster of computers
 Employ inherent parallelism in event data
 The main design goals are:
 Transparency, scalability, adaptability
 On the GRID, extended from local cluster to
wide area virtual cluster or cluster of clusters
 Collaboration between ROOT group at CERN
and MIT Heavy Ion Group
September, 2004
Super Scaling PROOF to Very Large Clusters
4
PROOF, continued
Slave
Master
Slave
Slave
Slave
Internet
 Multi Tier architecture
 Optimize for Data Locality
 WAN Ready and GRID compatible
User
September, 2004
Super Scaling PROOF to Very Large Clusters
5
PROOF - Architecture
 Data Access Strategies
 Local data first, also rootd, rfio, SAN/NAS
 Transparency
 Input objects copied from client
 Output objects merged, returned to client
 Scalability and Adaptability
 Vary packet size (specific workload, slave
performance, dynamic load)
 Heterogeneous Servers
 Migrate to multi site configurations
September, 2004
Super Scaling PROOF to Very Large Clusters
6
Outline
 PROOF Overview
 Benchmark Package
 Dataset generation
 Benchmark TSelector
 Statistics and Event Trace
 Benchmark results
 Other developments
 Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
7
Dataset generation
 Use the ROOT “Event” example class
 Script for creating PAR file is provided
 Generate data on all nodes with slaves
 Slaves generate data files in parallel
 Specify location, size and number of files
% make_event_par.sh
% root
root[0] gROOT->Proof()
root[1] .X make_event_trees.C(“/tmp/data”,100000,4)
root[2] .L make_tdset.C
root[2] TDSet *d = make_tdset.C()
September, 2004
Super Scaling PROOF to Very Large Clusters
8
Benchmark TSelector
 Three selectors are used
 EventTree_NoProc.C – Empty Process()
function, reads no data
 EventTree_Proc.C – Reads all data and fills
histogram (actually only 35% read in this test)
 EventTree_ProcOpt.C – Reads a fraction of the
data (20%) and fills histogram
September, 2004
Super Scaling PROOF to Very Large Clusters
9
Statistics and Event Trace
 Global Histograms to monitor master
 Number of packets, number of events, processing
time, get packet latency; per slave
 Can be viewed using standard feedback
 Trace Tree, detailed log of events during query
 Master only or Master and Slave
 Detailed List of recorded events follows
 Implemented using standard ROOT classes and
PROOF facilities
September, 2004
Super Scaling PROOF to Very Large Clusters
10
Events recorded in Trace
 Each event contains a timestamp and the
recording slave or master
 Begin and End of Query
 Begin and End of File
 Packet details and processing time
 File Open statistics (slaves)
 File Read statistics (slaves)
 Easy to add new events
September, 2004
Super Scaling PROOF to Very Large Clusters
11
Outline





PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
12
Benchmark Results
 CDF cluster at Fermilab
 160 nodes, initial tests
 Pharm, Phobos private cluster, 24 nodes
 6, 730 MHz P3 dual
 6, 930 MHz P3 dual
 12, 1.8 GHz P4 dual
 Dataset:
 1 files per slave, 60000 events, 100 Mb
September, 2004
Super Scaling PROOF to Very Large Clusters
13
Results on Pharm
September, 2004
Super Scaling PROOF to Very Large Clusters
14
Results on Pharm, continued
September, 2004
Super Scaling PROOF to Very Large Clusters
15
Local and remote File open
Local
local
remote
September, 2004
Super Scaling PROOF to Very Large Clusters
16
Slave I/O Performance
September, 2004
Super Scaling PROOF to Very Large Clusters
17
Benchmark Results
 Phobos-RCF, central facility at BNL, 370
nodes total
 75, 3.05 Ghz P4 dual, IDE
 99, 2.4 Ghz P4 dual, IDE
 18, 1.4 Ghz P3 dual, IDE
 Dataset:
 1 files per slave, 60000 events, 100 Mb
September, 2004
Super Scaling PROOF to Very Large Clusters
18
PHOBOS RCF LAN Layout
September, 2004
Super Scaling PROOF to Very Large Clusters
19
Results on Phobos-RCF
September, 2004
Super Scaling PROOF to Very Large Clusters
20
Looking at the problem
September, 2004
Super Scaling PROOF to Very Large Clusters
21
Processing time distributions
September, 2004
Super Scaling PROOF to Very Large Clusters
22
Processing time, detailed
September, 2004
Super Scaling PROOF to Very Large Clusters
23
Request packet from Master
September, 2004
Super Scaling PROOF to Very Large Clusters
24
Benchmark Conclusions
 The benchmark and measurement facility
has proven to be a very useful tool




Don’t use NFS based home directories
LAN topology is important
LAN speed is important
More testing is required to pinpoint
sporadic long latency
September, 2004
Super Scaling PROOF to Very Large Clusters
25
Outline





PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
26
Other developments







Packetizer fixes and new dev version
PROOF Parallel startup
TDrawFeedback
TParameter utility class
TCondor improvements
Authentication improvements
Long64_t introduction
September, 2004
Super Scaling PROOF to Very Large Clusters
27
Outline





PROOF Overview
Benchmark Package
Benchmark results
Other developments
Future plans
September, 2004
Super Scaling PROOF to Very Large Clusters
28
Future plans
 Understand and Solve LAN latency problem
 In prototype stage
 TProof::Draw()
 Multi level master configuration
 Documentation
 HowTo
 Benchmarking
 PEAC PROOF Grid scheduler
September, 2004
Super Scaling PROOF to Very Large Clusters
29
The End
 Questions?
September, 2004
Super Scaling PROOF to Very Large Clusters
30
#proof.conf
slave node1
slave node2
slave node3
slave node4
Remote
PROOF
Parallel Script Execution
Local PC
root
stdout/obj
ana.C
proof
proof
node1
Cluster
TFile
*.root
ana.C
proof
node2
$ root
root [0] tree->Process(“ana.C”)
.x ana.C
root [1] gROOT->Proof(“remote”)
root [2] dset->Process(“ana.C”)
proof = master server
proof = slave server
proof
*.root
TNetFile
TFile
*.root
TFile
*.root
node3
proof
node4
September, 2004
Super Scaling PROOF to Very Large Clusters
31
Simplified message flow
Master
Client
Slave(s)
SendFile
SendFile
Process(dset,sel,inp,num,first)
GetEntries
Process(dset,sel,inp,num,first)
GetPacket
ReturnResults(out,log)
ReturnResults(out,log)
September, 2004
Super Scaling PROOF to Very Large Clusters
32
TSelector control flow
TProof
TSelector
Slave(s)
TSelector
Begin()
Send Input Objects
SlaveBegin()
Process()
...
Process()
Return Output Objects
SlaveTerminate()
Terminate()
September, 2004
Super Scaling PROOF to Very Large Clusters
33
PEAC System Overview
September, 2004
Super Scaling PROOF to Very Large Clusters
34
Active Files during Query
September, 2004
Super Scaling PROOF to Very Large Clusters
35
Pharm Slave I/O
September, 2004
Super Scaling PROOF to Very Large Clusters
36
September, 2004
Super Scaling PROOF to Very Large Clusters
37
Active Files during Query
September, 2004
Super Scaling PROOF to Very Large Clusters
38