Neural nets for pedestrians - University of Wisconsin

Download Report

Transcript Neural nets for pedestrians - University of Wisconsin

Neural nets for pedestrians
Lee Pondrom
University of Wisconsin
Credits
• The NN program used is JETNET written
by the Lund group.
• A root C++ interface for CDF users was
written by Catalin Ciobanu, now at UIUC.
• Catalin maintains an introduction to the
use of Root_Jetnet on fcdflnx* computers
• This introduction includes MC top and
background files used in this presentation.
More credits
• I also learned from Rick Field:
• R.D. Field et al, PRD 53, 2296 (1996).
• And from Matt Herndon.
So what is a neural net?
• I don’t really know, but it is a collection of
computer codes which form a nonlinear
information processing system.
• The configuration we will consider is about
the simplest form which contains all of the
essential ingredients.
• JetNet allows lots of flexibility which we
won’t use.
Two inputs and two hidden layers
• Input x1
H1
a1
Out
• Input x2
H2
a2
Each output ranges from 0 to 1
Inputs normalized |x|<1
Network ‘trained’ on signal and background
b
The formulas
• Output function f(z)=(1+tanh(z))/2
• Or f(z)=1/(1+exp(2*z)) as -<z<, 1>f>0
• The idea of the training is to determine
free parameters called weights and
thresholds so that f is near one for signal,
and near zero for background
More formulas
• For this problem, called 2-2-1, there are
nine free parameters: four weights and two
thresholds for the hidden layers, and two
weights and one threshold for the output.
• The general formula for the number of
parameters is: Ntot=NinN1+2N1+1, N1
being the number of hidden layers.
Still more formulas
• aj = f(i (w1)ij xi + (T1)j)
• b = f(i (w2)i ai + T2)
• Minimization process uses the network
error function
• 2net=(1/Nsig)(b(n)-1)2+(1/Nbkg)(b(n)-0)2
Example top quark production
• Use monte carlo to train the network to
distinguish between ttbar ->W+jets and
QCD background of W+jets.
• ttbar is Pythia and W+jets background is
vecbos
• Nine possible kinematic distributions
NNInputs 1
NNInputs 2
NNInputs 3
Train on all nine variables first
• This determines 100 weights and
thresholds, so is not easy to understand.
• The 2 minimization moves in w-T space in
the direction -2
Learning performance and error
• The next two slides plot performance and
error vs epoch. An epoch is a training
cycle.
• Performance is a measure of the efficiency
for separating the two samples. The
training and testing samples are
statistically independent.
• The error is related to the 2net defined
earlier.
Learning performance
Learning error
Neural net output for 9 inputs
• Signal red
Choose only two variables
• ET1 and HT
• Now there are 6 weights and 3 thresholds
2-2-1 out
2-2-1 Neural net scatter plots for 100 top and 100 bkg mc events
Conclusions
• The 2-2-1 neural net is reasonable.
• The 9-9-1 neural net is not easy to
comprehend, but the performance is not
too different from the 2-2-1.
• NNoutput should be treated just like any
other histogram. Its power is an ability to
consider many variables at once.
• A liklihood function does similar things.