Transcript pps - Desy
ACAT05 May 22 - 27, 2005 DESY, Zeuthen, Germany Search for the Higgs boson at LHC by using Genetic Algorithms Mostafa MJAHED Ecole Royale de l’Air, Mathematics and Systems Dept. Marrakech, Morocco Search for the Higgs boson at LHC by using Genetic Algorithms Introduction Genetic Algorithms Search for the Higgs boson at LHC by using Genetic Algorithms Optimization of discriminant functions Optimization of Neural weights Hyperplans search Hypersurfaces search Conclusion M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 2 Introduction Introduction Higgs production at LHC Several mechanisms contribute to the production of SM Higgs boson in proton collisions The dominant mechanism is the gluon fusion process, gg H Introduction Decay Modes decay into quarks: H bb and H cc leptonic decay: H + gluonic decay: Hgg decay into virtual W boson pair: H W +W - Introduction Main discovery modes MH < 2MZ: • • • • H H H H bb+X W+W- ll ZZ 4l HW+W-ll (1) •The decay channel chosen: H W+W- e+-, + e- , e+e-, +-. •Signature: • Two charged oppositely leptons with large transverse momentum P • Two energetic jets in the forward detectors. • Large missing transverse momentum P’ T •Main background: T • tt production: with tt WbWb l j l j • QCD W W +jets production • Electroweak WWjj + M. Mjahed - ACAT 05, DESY, Zeuthen, 26/05/2005 7 HW+W-ll (2) Main Variables • , : the pseudo-rapidity and the azimuthal angle differences ll ll between the two leptons • , : the pseudo-rapidity and the azimuthal angle differences jj jj between the two jets •M , M : the invariant mass of the two leptons and jets, •M (n,m = 1,2,3) some rapidity weighted transverse momentum ll jj nm Mnm n m . p i iT i event n, m = 1, 2,3, … i rapidity of the leptons or jets, piT their transverse momentums. M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 8 Genetic Algorithms Pattern Recognition Measurement Feature Extraction/ Feature Selection Feature Classification Decision M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 10 Pattern Recognition Methods Pattern Recognition Methods Statistical Methods Others Methods PCA Discriminant Analysis Decision Trees Clustering Wavelets Connectionist Methods Genetic Algorithms Fuzzy Logic Neural Networks Genetic Algorithms • Based on Darwin’s theory of ”survival of the fittest”: Living organisms reproduce, individuals evolve/ mutate, individuals survive or die based on fitness •The input is an initial set of possible solutions • The output of a genetic algorithm is the set of ”fittest solutions” that will survive in a particular environment •The process • Produce the next generation (by a cross-over function) • Evolve solutions (by a mutation function) • Discard weak solutions (based on a fitness function) M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 12 Genetic Algorithms •Preparation: Define an encoding to represent solutions (i. e., use a character sequence to represent a classe) Create possible initial solutions (and encode them as strings) Perform the 3 genetic functions: Crossover, Mutate, Fitness Test • Why Genetic Algorithms (GAs) ? • Many real life problems cannot be solved in polynomial amount of time using deterministic algorithm • Sometimes near optimal solutions that can be generated quickly are more desirable than optimal solutions which require huge amount of time • Problems can be modeled as an optimization one M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 13 Genetic Functions • Crossover 1 0 0 1 1 0 1 0 0 1 1 1 1 1 0 1 • Mutation 1 0 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 •Roulette Wheel Selection 3 2 Spin 4 1 2 1 2 4 1 GA Process Initialize Population Evaluate Fitness Terminate? Yes Output solution No Perform selection, crossover and mutation Evaluate Fitness M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 15 GAs for Pattern Classification Optimization of discriminant functions Optimization of Neural weights Hyperplans search Hypersurfaces search Efficiency and Purity of Classification • Validation Test Events C1: N1 C2: N2 Total Classification C1 C2 N11 N12 N21 N22 M1 M2 Efficiency for Ci classification Purity for Ci events Ei Nii Ni Pi Nii Mi Misclassification rate for Ci or Error N ij i Ni Search for the Higgs boson at LHC by using Genetic Algorithms Search for the Higgs boson at LHC by using Genetic Algorithms Signal Background pp HX W+W-X l+ l- X pp ttX WbWbX l j l jX pp qqX W+W- X Events generated by the LUND MC PYTHIA 6.1 at s = 14 TeV MH = 115 - 150 GeV/c2 10000 Higgs events and 10000 Background events are used Research of discriminating variables M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 18 Variables ll , ll: the pseudo-rapidity and jj , jj: the pseudo-rapidity and the azimuthal angle differences between the two leptons the azimuthal angle differences between the two jets Mll : the invariant mass of the two Mjj : the invariant mass of the two leptons jets Rapidity-impulsion weighted Moments Mnm : (n=1, …,6) i rapidity: Mnm i n . piT m iJet 1 2 i .Log( Ei pi // ) Ei pi // ll, ll, jj, jj, Mll , Mjj, M11, M21, M31, M41 Optimization of Discriminant Functions (1) Discriminant Analysis C F(x ) = (gsignal - gback )T V-1 x = i xi The most separating discriminant Function FHiggs / Back CHiggs between the classes CHiggs and CBack is : CBack FHiggs / Back = - 0.02+0.12ll +0.4jj +0.35 Mll +0.61 Mjj + 0.74M11 +1.04M21 The classification of a test event x0 is then obtained according to the condition: if FHiggs / Back (xo) 0 then xo CHiggs else xo CBack Classification of test events Test events Classification Efficiency Purity CHiggs 0.601 0.606 CBack 0.610 0.604 Optimization of Discriminant Functions (2) • GA Parameters • Discriminant Function -0.02 0.12 0i 1i • Fitness • Number of generations • Selection, Crossover, Mutation 0.4 0.35 0.61 0.74 1.01 2i 3i 6i I Classification Eff. Purity CHiggs 0.601 0.606 Matlab GA Toolbox M. Mjahed 5i Fitness Fn: Misclassification rate Test Events • GA Code 4i 0.395 CBack 0.610 0.604 ACAT 05, DESY, Zeuthen, 26/05/2005 Eff 0.6055 0.3945 21 Optimization of Discriminant Functions (3) Chromosome 1 ……………… Chromosome N Generation of N solutions 01 11 21 31 41 51 61 I 0N 1N 2N 3N 4N 5N 6N N Fitness Fn: Coefficients of F Initialize Population Evaluate Fitness • Genetic Process Terminate? Yes No Perform selection, crossover and mutation Evaluate Fitness Output solution Optimization of Discriminant Functions (4) • Optimization Results Number of generations =10000 CPU Time: 120 s • Optimal Disc. Fn -0.02 0.12 0.4 Test Events 0.35 0.61 0.74 1.01 Classification Eff. Purity CHiggs 0.652 0.649 CBack 0.648 0.650 M. Mjahed 0.35 ACAT 05, DESY, Zeuthen, 26/05/2005 Eff 0.65 0.35 23 Optimization of Neural Weights (1) ll ll jj jj Mll M0 M11 M21 M31 M41 Neural Analysis NN Architecture: (10, 10, 10, 1) o1 wih12oh2i i h1 h2i f ( w hji1h2h1 j i ) j h1i f ( w xjih1 x j i ) h2 j O1 Classification of test events if O1(x) 0.5 then x CHiggs else x CBack Test Evts CHiggs CBack Classification Eff. Pur. Eff 0.654 0.663 0.669 0.659 0.661 0.338 Optimization of Neural Weights (2) • GA Parameters •Connection Weights + Thresholds Wijxh1, ih1 Wijh1h2, ih2 100 +10 Wijh2o I 10 I 100 + 10 • Total number of parameters to be optimized : 230 • Fitness: Misclassification rate, • Optimization Results Number of generations =1000 CPU Time: 6 mn Test Evts CHiggs CBack Classification Eff. Pur. Eff 0.691 0.696 0.699 0.693 0.695 0.305 Hyperplan search (1) H(x) = 0 + i xi = 0 , i=1, 10 Hyperplan Hj 0j 1j 2j 3j 4j 5j 6j 7j 8j 9j 10j • 11 parameters to optimize Classification rule if H(x) 0 then x CHiggs else x CBack • Genetic Process • Generation of N hyperplans Hj, j=1: N=20 • Number of generations =10000 • CPU Time: 4 mn Initialize Population Evaluate Fitness Terminate? Yes No Perform selection, crossover and mutation Evaluate Fitness Output solution Hyperplan search (2) • Hyperplan search Results Classification of test events Test Evts Classification Eff. Pur. CHiggs 0.661 0.655 CBack 0.651 0.657 Eff 0.656 0.344 Same results than Discriminant functions optimization Test Events Classification Eff. Pur. CHiggs 0.652 0.649 CBack 0.648 0.650 Eff 0.65 0.35 Hypersurface search (1) 10 S( x) 0 ( i xi i xi 2 i xi 3 ) i 1 Hypersurface j I j i=0:10 i j i=1:10 i j i=1:10 j • 31 parameters to optimize Classification rule if S(x) 0 then x CHiggs else x CBack • Genetic Process • Generation of N hyperplans Sj, j=1: N=20 • Number of generations =10000 • CPU Time: 6 mn Initialize Population Evaluate Fitness Terminate? Yes No Perform selection, crossover and mutation Evaluate Fitness Output solution Hypersurface search (2) • Hypersurface search Results Classification of test events Test Evts Classification Eff. Pur. CHiggs 0.689 0.696 CBack 0.693 0.693 Eff 0.691 0.309 Same results as NN weights optimization Test Evts Classification Eff. Pur. CHiggs 0.691 0.696 CBack 0.699 0.693 M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 Eff 0.695 0.305 29 Conclusion Methods Importance of Pattern Recognition Methods C CBack CHiggs The improvement of an any identification is subjected to the multiplication of multidimensional effect offered by PR methods and the discriminating power of the proposed variables. Genetic Algorithms Method allows to minimize the classification error and to improve efficiencies and purities of classifications. The performances are in average 3 to 5 % higher than those obtained with the other methods. Discriminant Functions Optimization : comparative to hyperplan search approach Neural Weights Optimization : comparative to hypersurface search approach M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 30 Conclusion (continued) Variables Characterisation of Higgs Boson events: Other variables should be examined Physics Processes Other processes should be considered Detector effects should be added to the simulated events M. Mjahed ACAT 05, DESY, Zeuthen, 26/05/2005 31