Computer Model Working Group 2 New Analysis Model

Download Report

Transcript Computer Model Working Group 2 New Analysis Model

BABAR: il nuovo modello di calcolo
Computer Model Working Group
Analysis Model
Event Store Technologies
Computing/Analysis Sites
J. Walsh
Note: The new Babar Computing Model is still under
development. Everything I present here today is still under
discussion within Babar.
CSN1, Nov. 2002
J. Walsh,
Need for new computing model
Babar Computing Model established end of 2000
Babar Computing Review: April, 2002
changes in computing environment  require review
large luminosity of PEP-II puts big burden on Babar computing
Analysis Groups produce huge ntuples that essentially
duplicate micro-dst information  not scalable with luminosity
two event store technologies, Root/IO (Kanga/Root) and
Objectivity  burden on Computing Group
several other issues raised: role of remote sites, etc.
Computer Model Working Group 2 - update Babar Computing
Model. So far work concentrated in two main areas
1. New proposal for analysis model
2. Event store technology - Root vs. Objectivity debate
Plus, will comment on Remote sites – Tier A, Tier C, etc.
CSN1, Nov. 2002
J. Walsh,
Luminosity Projection
design luminosity
(3x1033) exceeded
in 2001
already expect 3 times
current data
sample in 2005
where we are
CSN1, Nov. 2002
J. Walsh,
Computing Model Working Group 2: Members
David Brown/LBL
Claudio Campagnari
Andreas Hoecker
Hassan Jawahery (co-chair)
Yury Kolomensky (co-chair)
David Lange
Mike Roney
Aaron Roodman
Anders Ryd
Bernhard Spaan
John Walsh
Fergus Wilson
CSN1, Nov. 2002
Technical Experts:
J. Walsh,
Jacek Becla
Fabrizio Bianchi
Nicole Chevalier
Nicolo De Groot
Gregory Dubois-Felsmann
Peter Elmer
Steffen Luitz
Mauro Morandin
Stephen Gowdy CC
Rainer Bartoldus DepCC
Richard Mount SCS
Dominique Boutigny Tier-A Rep.
Livio Lanceri DepPAC
Current Computing Model - Jargon
• Analysis:
performed exclusively on micro format
Beta/Framework is analysis program, C++ based
Central Skims produce subsets of data pertinent to various analyses
Analysis Groups (AWG’s) often produce ntuples (or rootuples) that are
large and redundant (contain same info as micro)
– Fortran (or root scripts) analyze ntuples (rootuples)
• Event Store:
– Originally only in Objectivity
– Kanga/Root developed for micro format only, to enhance analysis
performance, data distribution
– micro currently maintained in Objy and Kanga/Root
• Computing/Analysis Sites:
– Full copy of micro data available at SLAC, IN2P3, RAL
– Production reprocessing in place at INFN-Padova
– Numerous institutes produce MC at remote sites
CSN1, Nov. 2002
J. Walsh,
Tier A
New Analysis Model – Key Points
• Data formats:
– Mini: new format with detailed detector information available.
– New Micro: upgraded form of current micro, allow faster
access, customizable content
– Tag (nano-dst): no major changes, general cleanup
– Note: large RECO and RAW objects no longer written to event
• Skims:
– New Micro will make centralized skims more useful and
• Large ntuple production:
– Hopefully, rendered obsolete, freeing up resources (CPU, disk
space and manpower)
CSN1, Nov. 2002
J. Walsh,
Mini Format
• New format introduced over the last year:
– contains essentially full detector information
• track hits, calorimeter crystals, DRC hits, etc.
– efficiently packed to optimize space
• about 8 kBytes per event (compare to micro: 2
– increased analysis capability w.r.t. micro, e.g.:
• track extrapolation through detector material
• follow changing conditions (e.g. SVT alignment)
• event display
• etc.
CSN1, Nov. 2002
J. Walsh,
Mini Format - II
• Additional characteristics:
– customization
– larger and slower than micro-dst:
• develop coherent staging system to retrieve events
from tape. Target access times:
– < 1 hr for small (< 100 events) samples
– < 1 day for medium (< 1 k events) samples
– < 2-4 weeks for large samples
– exact use pattern of mini not really predictable
• need to remain flexible on implementation
CSN1, Nov. 2002
J. Walsh,
New-Micro Format
Radical improvement w.r.t. current micro-dst
1. Dual usage:
1. regular framework/Beta job (current Babar norm)
2. interactive use with Root
2. Customizable content
– option to store detector info or not
– additional user info can be added
– composite candidates
– different mass hypo track fits, etc.
3. High speed: the aim is to reach 1 kHz with framework/Beta 
will require Beta development
– current rate: few tens of Hz
– higher read rates envisioned with Root access (at cost of
reduced functionality)
CSN1, Nov. 2002
J. Walsh,
New Micro Format – II
• Impact on users:
– much analysis in Babar done at ntuple level
– ntuple analysis code will have to be adapted/converted to
new-micro (use of paw/Fortran discouraged) 
potentially disruptive
• Comment on Objectivity:
– since interactive Root access is a basic feature of the
new-micro  Objectivity event store is not an option
– new-micro will be in Root/IO format
CSN1, Nov. 2002
J. Walsh,
Event Store Debate
• Current system: hybrid with Objectivity at SLAC and IN2P3
and Kanga/Root at RAL, INFN-Padova
• Problems with Objectivity:
– lock collisions
– Prompt Reconstruction and Simulation Production performance
– poor record of scaling with luminosity: every jump in data
sample has been accompanied by Objy problems
– distribution difficulties  getting data samples to Tier C sites
– other HEP experiments have dropped Objy as an option
– concerns about viability of Objectivity Company
• we don’t have source code
• how much expertise will be around in 2007?
CSN1, Nov. 2002
J. Walsh,
Event Store Debate - II
• Kanga/Root
– much easier maintenance
– easier to export
– smaller event size (although Objy event size is
decreasing with deployment of compression and redesign
of navigational info)
– more efficient CPU usage
– becoming HEP standard – easy to attract manpower to
support Kanga/Root
CSN1, Nov. 2002
J. Walsh,
Event Store Debate - III
• So, why not drop Objy and go with Kanga/Root system?
– Cost of migration: effort and disruption: estimates ranged from
1 to 2 years to achieve migration  most in Babar agree a
switch that takes more than 2 years is probably not worth
– Kanga/Root has some technical issues that need to be
file server to handle huge number of files
lack of transactions
lack of cross-file associations (e.g. mini-to-micro navigation)
staging system
– Political/human issues.
Work to address
these issues is
ongoing (not just in
Babar context) 
probably no showstoppers, but it is
• Note: conditions database implemented in Objectivity
too costly (estimate 2-3 years) to convert to Root-based DB
Babar relationship with Objy will continue in any case
CSN1, Nov. 2002
J. Walsh,
Event Store Debate - IV
• Alternative to Kanga/Root-only system: a hybrid
system where:
– new-micro in Kanga/Root format only
– everything else (event reconstruction, simulation
production, mini format) in Objectivity
• Hybrid system has advantage of easier, lessdisruptive migration, but we still need to support 2
event store technologies
• Final decision/recommendation on event store
coming soon
CSN1, Nov. 2002
J. Walsh,
Computing/Analysis Sites
• The Working Group is just starting on this subject  just
present the issues
• Role of Tier A sites - large site that reduces significantly
computing burden at SLAC
– Primarily analysis: IN2P3, RAL
– Production: INFN-Padova
– Issues:
data replication at Tier A’s
data partitioning at Tier A’s (micro, mini, beam data, MC)
transparent access to data across Tier A’s (BabarGrid)
specialization of Tier A’s: skimming, (re-)processing, etc.
• Role of Tier C sites – smaller sites at remote institutes
– main contribution so far in MC production (majority of MC
events produced away from SLAC)
– analysis at Tier C’s has been difficult due to problems with data
distribution  need to resolve with new Computing Model
CSN1, Nov. 2002
J. Walsh,
Already implemented in Objectivity (minor fixes,
improvements ongoing)
Feasibility of Root implementation has been studied 
could be ready by early 2003
New -micro
Dual usage (Beta and Root) prototype implementation
has been achieved. Additional development needed:
1. customization
2. persistent composites
3. Beta/Framework optimization
CSN1, Nov. 2002
J. Walsh,
Essential requirement: minimal disturbance to Babar capability to
produce physics results
– currently doing reprocessing in Padova of all data, producing mini
format output
– the mini is “new” feature, so disruption of ongoing analyses is minimal
– exploit Babar’s data replication to ease migration
• maintain old-micro at SLAC and IN2P3 sites
• introduce new-micro at RAL site
• users have choice of format during transition period
Dependence on other parts of Computing Model
– use of Tier A sites
– choice of event store technology, etc.
CSN1, Nov. 2002
J. Walsh,
Babar is currently updating its computing model, to be able to deal
with large increase in data set in the coming years
A new analysis model, based on the new-micro and mini data
formats has been proposed and largely agreed to.
– the mini will permit more in-depth analysis
– the new-micro will eliminate largely wasteful/redundant ntuples
The working group is also considering the future of event store
technologies employed in Babar.
Should Objectivity event store be dropped in favor of Root-based technology?
Is Kanga/Root ready to be used as a full-scale event store?
Does a hybrid system do enough to alleviate the problems of Objy?
An important part of the new model will be how best to use remote
computing/analysis sites: Tier A and Tier C
– Work starting on this subject within the Working Group
CSN1, Nov. 2002
J. Walsh,
the following are backup slides
CSN1, Nov. 2002
J. Walsh,
Skims with New-Micro
• The customization features of the new-micro make it an
attractive tool to use with Centralized Skims
• The idea is that each Analysis Working Group (AWG) will
provide the appropriate event selection and content
customization to the Central Skim group
• Small skims will be encouraged: deep-copies, which provide
fast access, will be possible for small skims (< few %
selection rate)
• In addition to skims, a generic new-micro containing all
physics events will be available  important for new analyses
• Aiming for increased frequency for Central Skims – every 3
– feasibility being evaluated
CSN1, Nov. 2002
J. Walsh,
Tag (nano-dst) Format
• The Tag format will continue to be maintained
• Optimization/cleanup to remove unused or
redundant information – should get a factor of 2
size reduction
CSN1, Nov. 2002
J. Walsh,
Deep Copy vs. Pointer Skims
Deep copy
– copy full event to
new location
– faster read rate
– more disk space
Pointer (shallow)
deep copy
Ev 1
Ev 2
Ev 2
Ev 4
Ev 3
Ev 4
– write pointer only
– slower read rate
– less disk space
CSN1, Nov. 2002
shallow copy
Ptr 2
Ptr 4
J. Walsh,
Use Cases
Mature analysis (like sin2b) could create a Mini skim of a
relatively small number of events and work from that
An analysis with loose skim cuts (2-body charmless) would
customize a new micro skim, dropping unneeded info and
saving B candidates. Mini could be used near end of analysis
when number of events is sufficiently reduced.
A new analysis would use allEvents generic new micro to
explore concept and define cuts. Final analysis could require
a customized new micro skim or a mini skim (if event sample
is small enough).
An AWG could produce a skim that serves many analyses.
Specific analyses could make pointer skims of the skim, or
deep copy skims if small enough.
CSN1, Nov. 2002
J. Walsh,