RooUnfold: unfolding framework and algorithms

Download Report

Transcript RooUnfold: unfolding framework and algorithms

RooUnfold
unfolding framework
and algorithms
Tim Adye
Rutherford Appleton Laboratory
BaBar Statistics Working Group
BaBar Collaboration Meeting
13th December 2005
Outline
• What is Unfolding?
• and why might you want to do it?
• Overview of a few techniques
• Regularised unfolding
• Iterative method
• RooUnfold package
• Currently implements three methods with a common
interface
• Status and Plans
• References
13th December 2005
Tim Adye
2
Unfolding
•
In other fields known as “deconvolution”, “unsmearing”
•
Given a “true” PDF in μ, that is corrupted by detector
effects, described by a response function, R, we measure a
distribution in ν. In terms of histograms
M
 i   Rij  j
•
•
This may involve
i  1..N
j 1
1. inefficiencies: lost events
2. bias and smearing: events moving between bins
(off-diagonal Rij)
With infinite statistics, it would be possible to recover the
original PDF by inverting the response matrix
1
μR ν
13th December 2005
Tim Adye
3
Not so simple…
•
Unfortunately, if there are statistical fluctuations
between bins this information is destroyed
• Since R washes out statistical fluctuations, R-1 cannot
distinguish between wildly fluctuating and smooth PDFs
•
•
•
Obtain large negative correlations between adjacent bins
Large fluctuations in reconstructed bin contents
Need some procedure to remove wildly fluctuating
solutions
1. Give added weight to “smoother” solutions
2. Solve for µ iteratively, starting with a reasonable guess and
truncate iteration before it gets out of hand
3. Ignore bin-to-bin fluctuations altogether
13th December 2005
Tim Adye
4
What happens if
you don’t smooth
13th December 2005
Tim Adye
5
True Gaussian, with Gaussian smearing, systematic translation, and
variable inefficiency – trained using a different Gaussian
13th December 2005
Tim Adye
6
Double Breit-Wigner, with Gaussian smearing, systematic translation,
and variable inefficiency – trained using a single Gaussian
13th December 2005
Tim Adye
7
So why don’t we always do this?
• If the true PDF and resolution function can be
parameterised, then a Maximum Likelihood fit is
usually more convenient
• Directly returns parameters of interest
• Does not require binning
• If the response function doesn’t include smearing
(ie. it’s diagonal), then apply bin-by-bin efficiency
correction directly
• If result is just needed for comparison (eg. with MC),
could apply response function to MC
• simpler than un-applying response to data
13th December 2005
Tim Adye
8
When to use unfolding
• Use unfolding to recover theoretical distribution where
• there is no a-priori parameterisation
• this is needed for the result and not just comparison with MC
• there is significant bin-to-bin migration of events
13th December 2005
Tim Adye
9
Where could we use unfolding?
• Traditionally used to extract structure functions
• Widely used outside PP for image reconstruction
• Dalitz plots
• Cross-feed between bins due to misreconstruction
• “True” decay momentum distributions
• Theory at parton level, we measure hadrons
• Correct for hadronisation as well as detector effects
13th December 2005
Tim Adye
10
1. Regularised Unfolding
• Use Maximum Likelihood to fit smeared bin contents to measured
data, but include regularisation function
ln L(μ)  ln L(μ)   S (μ)
where the regularisation parameter, α, controls the degree of
smoothness (select α to, eg., minimise mean squared error)
• Various choices of regularisation function, S, are used
• Tikhonov regularisation: minimise curvatureM 1
• for some definition of curvature, eg. S (μ)    [( i 1  i )  ( i  i 1 )]2
i 2
• RooUnfHistoSvd by Kerstin Tackmann and Heiko Lacker
• based on GURU by Andreas Höcker and Vakhtang Kartvelishvili
• uses Singular Value Decomposition
• RUN by Volker Blobel
M
• Maximum entropy: S (μ)   ( i / tot ) ln( i / tot )
i
13th December 2005
Tim Adye
11
2. Iterative method
• Uses Bayes’ theorem to invert
Rij  P(observed in bin i | true value in bin j )
and using an initial set of probabilities, pi (eg. flat) obtain an
improved estimate
ˆi 
1
i
N

j 1
Rij pi
R
k
jk
pk
nj
• Repeating with new pi from these new bin contents converges
quite rapidly
• Truncating the iteration prevents us seeing the bad effects of
statistical fluctuations
• Fergus Wilson and I have implemented this method in
ROOT/C++
• Supports 1D, 2D, and 3D cases
13th December 2005
Tim Adye
12
2D Unfolding
Example
2D Smearing, bias,
variable efficiency, and
variable rotation
13th December 2005
Tim Adye
13
RooUnfold Package
• Make these different methods available as ROOT/C++
classes with a common interface to specify
• unfolding method and parameters
• response matrix
• pass directly or fill from MC sample
• measured histogram
• return reconstructed truth histogram and errors
• full covariance matrix
• Easy to do with multiple dimensions (when supported)
• This should make it easy to try and compare different
methods in your analysis
• Could also be useful outside BaBar!
13th December 2005
Tim Adye
14
RooUnfold Classes
• RooUnfoldResponse
• response matrix with various filling and access methods
• create from MC, use on data (can be stored in a file)
• RooUnfold – unfolding algorithm base class
• RooUnfoldBayes – Iterative method
• RooUnfoldSvd – Inteface to RooUnfHistoSvd package
• RooUnfoldBinByBin – Simple bin-by-bin method
• Trivial implementation, but useful to compare with full
unfolding
• RooUnfoldExample – Simple 1D example
• RooUnfoldTest and RooUnfoldTest2D
• Test with different training and unfolding distributions
13th December 2005
Tim Adye
15
RooUnfold Status
• Available in CVS
• Announced in Statistics HN
• See README file for details of building and running
• Interface can still be adjusted based on comments
• I already have an idea for simplifying use in multidimensional case
13th December 2005
Tim Adye
16
Plans and possible improvements
• So far this is mostly a programming exercise
• Would be interesting to compare the different methods
for some real analysis distributions
• But YMMV
• Add common tools, useful for all algorithms
• Inputs and results in different formats
• already supports histograms and ROOT vectors/matrices
• Automatic calculation of figures of merit (eg. Â2)
• can also use standard ROOT functions on histograms
• Simplify selection of regularisation parameter
• More algorithms?
• Maximum entropy regularisation
• Simple matrix inversion without regularisation
• perhaps useful with large statistics
13th December 2005
Tim Adye
17
References - Overview
• G. Cowan, A Survey of Unfolding Methods for Particle Physics, Proc.
Advanced Statistical Techniques in Particle Physics, Durham (2002)
http://www.ippp.dur.ac.uk/Workshops/02/statistics/
• G. Cowan, Statistical Data Analysis, Oxford University Press (1998),
Chapter 11: Unfolding
• R. Barlow, SLUO Lectures on Numerical Methods in HEP (2000),
Lecture 9: Unfolding
www-group.slac.stanford.edu/sluo/Lectures/Stat_Lectures.html
13th December 2005
Tim Adye
18
References - Techniques
• V. Blobel, Unfolding Methods in High Energy Physics,
DESY 84-118 (1984); also CERN 85-02
• A. Höcker and V. Kartvelishvili, SVD Approach to Data Unfolding,
NIM A 372 (1996) 469
www.lancs.ac.uk/depts/physics/staff/kartvelishvili.html
• K. Tackmann, H. Lacker, Unfolding the Hadronic Mass Spectrum
in B->Xu lν Decays, BAD 894.
• G. D’Agostini, A multidimensional unfolding method based on
Bayes’ theorem, NIM A 362 (1995) 487
13th December 2005
Tim Adye
19