Exploring Artificial Neural Networks to discover the Higgs

Download Report

Transcript Exploring Artificial Neural Networks to discover the Higgs

Exploring Artificial Neural
Networks to discover the Higgs
boson at the LHC
Overview
• Introduction
– The Standard Model and the mass problem
– Higgs search at the LHC (and ANNs)
• ttH, H to bb and other channels
–
–
–
–
Production process
Decay
Experimental signatures
Background processes
• ANNs a possible solution (theory)
• ANN development issues (on simple 2-D classification
problem + results)
• ANNs applied to Higgs data (results)
• Summary
Introduction
• Origin of mass is last big unanswered question in the SM.
• The Standard Model
–
–
–
To make physical equations in SM gauge invariant we require new terms, these correspond
directly to gauge bosons. (eg photon)
Massless particles would preserve SM’s gauge symmetry (easiest, but not the case)
Higgs mechanism allows generation of mass in the SM (by breaking
gauge invariance of vacuum) (spontaneous symmetry breaking)
–
Needs further particle: HIGGS BOSON!!
• So search for Higgs is important to our understanding of particle
interactions.
• It may be that nature has chosen another mass-generating mechanism,
but whatever this mechanism is, it should show itself at the LHC.
Search at the Large Hadron
Collider (LHC)
(Higgs discovery one of its main aims)
• H mass not predicted by the SM but production and decay rates can be
predicted as function of mH.
– From LEP:
• 114.4GeV < mH (SM) .
– The LHC, with its detectors ATLAS and CMS, (due to go online in 2007)
will collide p-p at
• 14TeV
• Higgs mass reach to about 1TeV.
– High Luminosity;
• But Higgs production v. rare! (1016 proton-proton interactions will occur
per year, but less than 100,000 Higgs bosons will be produced )
– As well as Higgs, LHC hopes to find evidence for new physics
• Supersymmetry (SUSY) (modifies the SM to include a whole new series of
particles, supersymmetric partners of all the particles so far known. Has
many desirable features, mending some shortcomings of the SM.
• If SUSY is the theory, we do not know how many Higgs bosons we would
see (minimum 5)
ttH, H to bb Channel
• H production processes:
0.901
bb
0.801
WW
0.701
0.601
Branching Ratio
– Gluon fusion is
dominant Higgs
production process, gg
to H (but difficult to
separate signal from
large QCD
background)
– Associated Production!
ttH, Lower cross
section but has leptonic
final states
Branching ratios for the SM Higgs Boson
0.501
0.401
0.301
ZZ
0.201
• Dominant decay mode
at mH<130GeV is H to
bb
0.101
0.001
100
110
120
130
140
150
HIggs Mass (GeV)
160
170
180
190
200
•
ttH, H to bb could account for half the
Higgs discovery potential at ATLAS
(Cammin)
e-/W-
•
Background;
– ttjj (most important, 94% after TDR
analysis)
•
Full reconstruction of final state is
necessary to minimize combinatorial
background and to discriminate signal
from large bg.

bbar
b
jets
bbar
b
jet
W+
TDR analysis has 3 steps:
•Preselection
-1 isolated lepton,
-At least 6 jets,
-Exactly 4 tagged as b-jets.
•Reconstruction
-reconstruction of 2 top quarks, minimise:
Δ2 = (mlvb – mt)2 + (mjjb – mt)2
•Cuts on the reconstructed t and H masses (where ANNs come in)
-Reconstructed top masses must be within 20GeV of mt.
-mbb = mH 30GeV
jet
• After this TDR analysis, significance S/√B = 1.94 (for 120GeV
Higgs)
• Could increase significance by:
– Better jet pairing
– Improving ‘final selection’ (after t reconstruction, apply to events in mbb =
mH 30GeV)
Applying ANNs promising as makes use of event topology, not just
mass cuts! Ie minimising (Δ2 eqn) does not take into account
additional info such as spatial differences between jets!
• I looked at final selection!
• Used 10 variables generated by Pythia. (which gave separation in
signal and background distributions)
• Fed variables into a neural network. (to classify event as signal or
background)
ANNs
• Artificial Neural Networks (ANNs) are
computational modelling tools
• Inspired by biological nervous system
• Good at:
x1
– generalization,
– non-linear,
– learn by example.
• Want to train network with examples to
recognise right data( classification task)
and reject rest
• (ANNs perform better than cut based in
theory because can separate classes in
feature space non linearly)
• (but training is difficult, optimisation
harder than for cut based methods)
x2
x1
x2
How do ANNs Work?
A neural node
• Response function:
oi=g(∑iwijg(∑kwjkxk))
y
Which is non-linear so network able to
perform non-linear mappings
g( )
•Architecture and weight settings are what
change classification!
•We want network to output 1 for signal
and 0 for all background
∑w.x
oi
w1
wij
w2
x3
x1
x2
hj
wjk
xk
A neural network
w3
•
Weights are changed in proportion to the
difference (error) between target output and
actual network output for each example.
• Minimize summed square error function:
E = 1/2 ∑p∑i(oi(p) - ti(p))2
with respect to the weights.
•
Error is function of all the weights and forms an
irregular multidimensional complex hyperplane
with many peaks, saddle points and minima.
•
Error minimized by finding set of weights that
correspond to global minimum. (ie get close to
1 for signal and close to 0 for background)
•
Done with gradient descent method – (weights
incrementally updated in proportion to δE/δwij)
Error Surface
Summary of learning algorithm
1. Initialize wij and wjk with random values.
2. Pick pattern p from training set.
•
•
Present input and calculate the output from:
oi=g(∑iwijg(∑kwjkxk))
Update weights according to:
wij(t + 1) = wij(t) – Δwij
wjk(t + 1) = wjk(t) – Δwjk
(…etc…for extra hidden layers).
•
where Δw = -η δE/δw.
When no change (within some accuracy) occurs, the
weights are frozen and network is ready to use on data
it has never seen.
2-D problem
•
Initially looked at simple ANN classification
problem;
–
Separate out a single point in a 2-D plane of
randomly generated numbers.
•
•
•
•
•
•
Generated 2 sets of random numbers
Fed network (using SNNS)(2 input 1 output)
(show diag!!) examples of signal and
background data. (desired output 1 and 0
respectively)
Used 300 patterns in both tr. And val sets.
Background to signal ratio was 3 to 1.
Looked at various net architectures.
Results:
–
–
–
Learning shown by error curves
Projections show hyperplanes
3 hidden nodes solve classification task fully!
(effectively 1 hidden node is equiv. of 1 linear
hyperplane)
– Got spiking behaviour of some error curves.
• Showed inconsistent learning (updating of weights)
• Was solved by adjusting some network params (made learning more stable!!!)
– Learning parameter, η.
– dmax.
– Shuffle option.
• To get a deeper understanding of learning, also looked at weight and bias variables.
Using ANNs for Higgs search
•
•
Worked with data after reconstruction of top quarks.
Variables used;
– mbb: the invariant mass of the two b-jets assigned to the Higgs boson,
– Δη(tnear, bb): the difference in pseudo rapidity between the bb-system and the
reconstructed top quark nearest ΔR.
– cosb,b*: the cosine of the decay angle of the two b-jets from the Higgs boson in
the rest frame of the bb-system,
– Δη(b,b): the difference in pseudo rapidity between the two b-jets from the Higgs
boson,
– mbb(1): the combination with the smallest invariant mass mbb out of the six
combintations which are possible when selecting two b-jets out of four b-jets,
– mbb(2): the combination with the second smallest invariant mass mbb out of the six
combinations which are possible when selecting two b-jets out of four b-jets,
– t1-t2: the difference in phi between the reconstructed top quarks,
– pTt1+pTt2: the sum of the transverse momenta of the reconstructed top quarks.
Signal is RED
• (only ttjj background
used)
• Rescaled data to [0,1]
• Separated data into tr
and val sets
• Used 1:1 for signal to
background.
• Looked at various archs
(1 and 2 hidden layers)
• Weak generalization:
• Output for best architecture (6-20-20-1) gave:
Signal is RED
Summary
• Optimisation difficulties and solutions have
been identified in net development
• Some classification produced for Higgs
data
• More work on arch could be needed (more
data, lack of generalization)
• s/√B as fn. of cut on output.