Anaphe - OO Libraries for Data Analysis using C++ and Python

Download Report

Transcript Anaphe - OO Libraries for Data Analysis using C++ and Python

Anaphe - OO Libraries for Data
Analysis using C++ and Python
Andreas Pfeiffer, CERN/EP
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
1
Anaphe: what it is
Analysis for physics experiments
Modular replacement of CERNLIB functionality
for use in HEP experiments
CERNLIB functionality :
memory management
I/O (“persistency”)
foundation classes
histogramming
minimizing/fitting
visualization
interactive data analysis
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
2
Anaphe Architecture
Component based
Functionality defines content of a component
Components interact only through interfaces
Trying to use standards wherever possible
AIDA for interfaces to analysis components
Basic functionalities (histograms, fitting, etc.)
are available as individual C++ class libraries.
Allows interchange between different
implementations
Eases evolution as components are decoupled
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
3
AIDA
Abstract Interfaces
for Data Analysis
 See separate talk in
Tuesday’s cat-9
parallel session
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
4
Anaphe Architecture
AIDA interfaces
IHistogram
(user level)
developer level
interfaces
IDevHistogram
wrapper layer
(optional)
Basic components Histo library
CHEP, 27-Mar-2003
IPlotter
IFitter
IDevPlotter
IDevFitter
AIDA Plotter
AIDA Fitter
Grace Plotter
FML
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
5
Anaphe Components
Analyzer
Lizard
Interactive Commands
Histograms
NTuples
Fitting
Plotting
Functions
DataPointSet
Histogram Library
Ntuples Library
Fitting and Minim.
Plotter
Store libraries
+ other utility libs
Python / SWIG
XML parser
Qt
Grace
NAG-C
Objectivity
User’s C++
code
optional
Abstract types
Anaphe Implementations
AIDA
(Abstract
Interfaces for
Data Analysis)
CHEP, 27-Mar-2003
CLHEP
CERNLIB
HepODBMS
HEP implementations
non-HEP components
commercial components
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
6
Interactive Data Analysis Tool
Aim: “OO replacement for PAW” (at least)
analysis of “ntuple-like data” (“Tags”, “Ntuples”, …)
visualisation of data (Histograms, scatter-plot, “Vectors”)
fitting of histograms (and other data)
access to experiment specific data/code
Component based approach
Maximize flexibility and re-use
Foresee customization/integration with experiment’s s/w
Plan for extensions
Ensure maintainability
Choose Python as interactive OO language
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
7
Interactivity: Lizard
Lizard provides Python environment for interactive
analysis
Unified user interface at top level
AIDA types and methods mapped into Python commands
User modules can be plugged in as required
Analyzer module provides on-the-fly compilation, loading and
running of user code
Python as scripting language:
Easy to use
Object Oriented language
Maps well to C++ and Java
Huge user base with lots of free
software (networking, GUI, OS,
scientific etc )
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
C1
C2
Lizard
C3
C4
C5
C++ component libraries
8
Tutorials and Examples
available
(http://cern.ch/anaphe)
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
9
DIstributed ANalysis Environment
Easy to use
Hide complex details of underlying technology
Using master-worker model
most of typical jobs: ntuple analysis, event level distributed reconstruction
and simulation
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
10
DIstributed ANalysis Environment
Parallel cluster processing
make fine tuning and customization easy
transparently using GRID technology
application independent
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
11
Using the prototype with Geant-4
Successful runs of prototype with various Geant-4
simulations (ESA missions, Underground expts., Medical)
Analysis implemented with AIDA/Anaphe
Increased performance
Separate talk in
Monday’s
cat-1 session !
generate more events – debug simulation faster
shift from batch to semi-interactive simulation
user can study the results of the simulation faster and more often
Correctness and ease of use
preserve reproducibility of the results
parallel run should look as local run to users
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
12
Anaphe History
LHC++ project started in 1997
HEP foundation libraries developed 1997-2000
Anaphe started in 2000 with first version of
Lizard (interactive python component)
Prototype for evaluation
Production version Summer 2001
Fully based on components with abstract interfaces
Major re-design in 2002 to integrate with AIDA
AIDA 2.2 compliant version Summer 2002
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
13
Anaphe History (II)
New version in October 2002 implementing
AIDA 3.0
Improved Histograms and Ntuples libraries
Native implementations of the interfaces
New Plotter library based on Grace
Powerful interactive graphics
Introduction of XML store
Store all AIDA objects in XML format in file(s)
Histograms, Clouds, DataPointSets, Ntuples, Fits, …
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
14
Anaphe Status
Further development of Anaphe has been
suspended by SC2
Maintenance and support continues
Anaphe architecture is fully compatible with LCG
Blueprint architecture:
Interface based component architecture
Adoption of AIDA
Use of Python
Activities are now going to be aligned with LCG
Application Area projects (SEAL, PI, …)
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
15
Anaphe Users
AIDA spoken here!
LHC Computing Grid project (LCG) has adopted AIDA
LHC experiment environments
IGUANA (CMS visualization)
GAUDI (LHCb/HARP) framework
ATHENA (Atlas) framework
Geant 4 has adopted AIDA as a tool-independent analysis
standard
In the advanced examples (ATLAS and CMS calorimeter test beam
simulations) and in various analysis of underground, astroparticle
experiments and medical applications (radiotherapy)
Adopted for GEANT4 test and validation process
Presently more than 100 real downloads of latest version (not
counting direct uses through AFS)
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
16
Summary
The architecture of Anaphe shows some important items
for flexible and modular data analysis:
Weak coupling between components through use of Abstract
Interface
Basic functionality is covered by individual C++ class libraries
Emphasis on usability and maintainability
Major criteria are flexibility, extensibility and
interoperability
Recent example: GEANT-4 examples (based on AIDA)
Python as interactive OO language for flexibility
Fully compatible with LCG Architectural Blueprint
Components, AIDA, Python
Activities will be aligned with LCG Application Area projects
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
17
More information
On Anaphe:
cern.ch/Anaphe
On AIDA:
aida.freehep.org/
For questions send mail to:
[email protected]
CHEP, 27-Mar-2003
Andreas Pfeiffer, CERN/EP-SFT,
[email protected]
18