Computer Model Working Group 2 New Analysis Model

Transcript Computer Model Working Group 2 New Analysis Model

BABAR: il nuovo modello di calcolo
1.
Computer Model Working Group
2.
Analysis Model
3.
Event Store Technologies
4.
Computing/Analysis Sites
5.
Implementation/Migration
J. Walsh
INFN-Pisa
Note: The new Babar Computing Model is still under
development. Everything I present here today is still under
discussion within Babar.
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
1
Need for new computing model
•
Babar Computing Model established end of 2000
•
Babar Computing Review: April, 2002
–
changes in computing environment  require review
–
large luminosity of PEP-II puts big burden on Babar computing
resources
Analysis Groups produce huge ntuples that essentially
duplicate micro-dst information  not scalable with luminosity
two event store technologies, Root/IO (Kanga/Root) and
Objectivity  burden on Computing Group
several other issues raised: role of remote sites, etc.
–
–
•
•
–
Computer Model Working Group 2 - update Babar Computing
Model. So far work concentrated in two main areas
1. New proposal for analysis model
2. Event store technology - Root vs. Objectivity debate
Plus, will comment on Remote sites – Tier A, Tier C, etc.
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
2
Luminosity Projection
design luminosity
(3x1033) exceeded
in 2001
already expect 3 times
current data
sample in 2005
where we are
now
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
3
Computing Model Working Group 2: Members
•
Members:
David Brown/LBL
Claudio Campagnari
Andreas Hoecker
Hassan Jawahery (co-chair)
Yury Kolomensky (co-chair)
David Lange
Mike Roney
Aaron Roodman
Anders Ryd
Bernhard Spaan
John Walsh
Fergus Wilson
CSN1, Nov. 2002
•
Technical Experts:
•
Ex-Officio:
J. Walsh,
INFN-Pisa
Jacek Becla
Fabrizio Bianchi
Nicole Chevalier
Nicolo De Groot
Gregory Dubois-Felsmann
Peter Elmer
Steffen Luitz
Mauro Morandin
Stephen Gowdy CC
Rainer Bartoldus DepCC
Richard Mount SCS
Dominique Boutigny Tier-A Rep.
Livio Lanceri DepPAC
4
Current Computing Model - Jargon
• Analysis:
–
–
–
–
performed exclusively on micro format
Beta/Framework is analysis program, C++ based
Central Skims produce subsets of data pertinent to various analyses
Analysis Groups (AWG’s) often produce ntuples (or rootuples) that are
large and redundant (contain same info as micro)
– Fortran (or root scripts) analyze ntuples (rootuples)
• Event Store:
– Originally only in Objectivity
– Kanga/Root developed for micro format only, to enhance analysis
performance, data distribution
– micro currently maintained in Objy and Kanga/Root
• Computing/Analysis Sites:
– Full copy of micro data available at SLAC, IN2P3, RAL
– Production reprocessing in place at INFN-Padova
– Numerous institutes produce MC at remote sites
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
Tier A
Sites
5
New Analysis Model – Key Points
• Data formats:
– Mini: new format with detailed detector information available.
– New Micro: upgraded form of current micro, allow faster
access, customizable content
– Tag (nano-dst): no major changes, general cleanup
– Note: large RECO and RAW objects no longer written to event
store
• Skims:
– New Micro will make centralized skims more useful and
efficient
• Large ntuple production:
– Hopefully, rendered obsolete, freeing up resources (CPU, disk
space and manpower)
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
6
Mini Format
• New format introduced over the last year:
– contains essentially full detector information
• track hits, calorimeter crystals, DRC hits, etc.
– efficiently packed to optimize space
• about 8 kBytes per event (compare to micro: 2
kBytes/event)
– increased analysis capability w.r.t. micro, e.g.:
• track extrapolation through detector material
• follow changing conditions (e.g. SVT alignment)
• event display
• etc.
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
7
Mini Format - II
• Additional characteristics:
– customization
– larger and slower than micro-dst:
• develop coherent staging system to retrieve events
from tape. Target access times:
– < 1 hr for small (< 100 events) samples
– < 1 day for medium (< 1 k events) samples
– < 2-4 weeks for large samples
– exact use pattern of mini not really predictable
• need to remain flexible on implementation
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
8
New-Micro Format
•
Radical improvement w.r.t. current micro-dst
1. Dual usage:
1. regular framework/Beta job (current Babar norm)
2. interactive use with Root
2. Customizable content
– option to store detector info or not
– additional user info can be added
– composite candidates
– different mass hypo track fits, etc.
3. High speed: the aim is to reach 1 kHz with framework/Beta 
will require Beta development
– current rate: few tens of Hz
– higher read rates envisioned with Root access (at cost of
reduced functionality)
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
9
New Micro Format – II
• Impact on users:
– much analysis in Babar done at ntuple level
– ntuple analysis code will have to be adapted/converted to
new-micro (use of paw/Fortran discouraged) 
potentially disruptive
• Comment on Objectivity:
– since interactive Root access is a basic feature of the
new-micro  Objectivity event store is not an option
– new-micro will be in Root/IO format
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
10
Event Store Debate
• Current system: hybrid with Objectivity at SLAC and IN2P3
and Kanga/Root at RAL, INFN-Padova
• Problems with Objectivity:
– lock collisions
– Prompt Reconstruction and Simulation Production performance
issues
– poor record of scaling with luminosity: every jump in data
sample has been accompanied by Objy problems
– distribution difficulties  getting data samples to Tier C sites
– other HEP experiments have dropped Objy as an option
– concerns about viability of Objectivity Company
• we don’t have source code
• how much expertise will be around in 2007?
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
11
Event Store Debate - II
• Kanga/Root
– much easier maintenance
– easier to export
– smaller event size (although Objy event size is
decreasing with deployment of compression and redesign
of navigational info)
– more efficient CPU usage
– becoming HEP standard – easy to attract manpower to
support Kanga/Root
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
12
Event Store Debate - III
• So, why not drop Objy and go with Kanga/Root system?
– Cost of migration: effort and disruption: estimates ranged from
1 to 2 years to achieve migration  most in Babar agree a
switch that takes more than 2 years is probably not worth
doing.
– Kanga/Root has some technical issues that need to be
addressed:
•
•
•
•
•
file server to handle huge number of files
lack of transactions
lack of cross-file associations (e.g. mini-to-micro navigation)
bookkeeping
staging system
– Political/human issues.
Work to address
these issues is
ongoing (not just in
Babar context) 
probably no showstoppers, but it is
work.
• Note: conditions database implemented in Objectivity
–
–
too costly (estimate 2-3 years) to convert to Root-based DB
Babar relationship with Objy will continue in any case
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
13
Event Store Debate - IV
• Alternative to Kanga/Root-only system: a hybrid
system where:
– new-micro in Kanga/Root format only
– everything else (event reconstruction, simulation
production, mini format) in Objectivity
• Hybrid system has advantage of easier, lessdisruptive migration, but we still need to support 2
event store technologies
• Final decision/recommendation on event store
coming soon
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
14
Computing/Analysis Sites
• The Working Group is just starting on this subject  just
present the issues
• Role of Tier A sites - large site that reduces significantly
computing burden at SLAC
– Primarily analysis: IN2P3, RAL
– Production: INFN-Padova
– Issues:
•
•
•
•
data replication at Tier A’s
data partitioning at Tier A’s (micro, mini, beam data, MC)
transparent access to data across Tier A’s (BabarGrid)
specialization of Tier A’s: skimming, (re-)processing, etc.
• Role of Tier C sites – smaller sites at remote institutes
– main contribution so far in MC production (majority of MC
events produced away from SLAC)
– analysis at Tier C’s has been difficult due to problems with data
distribution  need to resolve with new Computing Model
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
15
Implementation
•
Mini
–
–
•
Already implemented in Objectivity (minor fixes,
improvements ongoing)
Feasibility of Root implementation has been studied 
could be ready by early 2003
New -micro
–
Dual usage (Beta and Root) prototype implementation
has been achieved. Additional development needed:
1. customization
2. persistent composites
3. Beta/Framework optimization
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
16
Migration
•
•
Essential requirement: minimal disturbance to Babar capability to
produce physics results
Mini
– currently doing reprocessing in Padova of all data, producing mini
format output
– the mini is “new” feature, so disruption of ongoing analyses is minimal
•
New-micro
– exploit Babar’s data replication to ease migration
• maintain old-micro at SLAC and IN2P3 sites
• introduce new-micro at RAL site
• users have choice of format during transition period
•
Dependence on other parts of Computing Model
– use of Tier A sites
– choice of event store technology, etc.
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
17
Summary
•
•
Babar is currently updating its computing model, to be able to deal
with large increase in data set in the coming years
A new analysis model, based on the new-micro and mini data
formats has been proposed and largely agreed to.
– the mini will permit more in-depth analysis
– the new-micro will eliminate largely wasteful/redundant ntuples
•
•
The working group is also considering the future of event store
technologies employed in Babar.
–
–
Should Objectivity event store be dropped in favor of Root-based technology?
Is Kanga/Root ready to be used as a full-scale event store?
–
Does a hybrid system do enough to alleviate the problems of Objy?
An important part of the new model will be how best to use remote
computing/analysis sites: Tier A and Tier C
– Work starting on this subject within the Working Group
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
18
the following are backup slides
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
19
Skims with New-Micro
• The customization features of the new-micro make it an
attractive tool to use with Centralized Skims
• The idea is that each Analysis Working Group (AWG) will
provide the appropriate event selection and content
customization to the Central Skim group
• Small skims will be encouraged: deep-copies, which provide
fast access, will be possible for small skims (< few %
selection rate)
• In addition to skims, a generic new-micro containing all
physics events will be available  important for new analyses
• Aiming for increased frequency for Central Skims – every 3
months
– feasibility being evaluated
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
20
Tag (nano-dst) Format
• The Tag format will continue to be maintained
• Optimization/cleanup to remove unused or
redundant information – should get a factor of 2
size reduction
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
21
Deep Copy vs. Pointer Skims
•
Deep copy
– copy full event to
new location
– faster read rate
– more disk space
•
Pointer (shallow)
copy
deep copy
Ev 1
Ev 2
Ev 2
Ev 4
Ev 3
Ev 4
– write pointer only
– slower read rate
– less disk space
CSN1, Nov. 2002
shallow copy
Ptr 2
Ptr 4
J. Walsh,
INFN-Pisa
22
Use Cases
1.
2.
3.
4.
5.
Mature analysis (like sin2b) could create a Mini skim of a
relatively small number of events and work from that
An analysis with loose skim cuts (2-body charmless) would
customize a new micro skim, dropping unneeded info and
saving B candidates. Mini could be used near end of analysis
when number of events is sufficiently reduced.
A new analysis would use allEvents generic new micro to
explore concept and define cuts. Final analysis could require
a customized new micro skim or a mini skim (if event sample
is small enough).
An AWG could produce a skim that serves many analyses.
Specific analyses could make pointer skims of the skim, or
deep copy skims if small enough.
etc.
CSN1, Nov. 2002
J. Walsh,
INFN-Pisa
23

Computer Model Working Group 2 New Analysis Model

Transcript Computer Model Working Group 2 New Analysis Model

Directory