Transcript ppt

Classification PSF Analysis
• A New Analysis Tool: Insightful Miner
• Classification Trees
• From Cuts Classification Trees:
Recasting of the GLAST PSF Analysis
• Energy Dependencies
• Present status of GLAST PSFs
Bill Atwood, Nov. 2002
1
GLAST
A Data Mining Tool
An Miner
Analysis
Program!
Bill Atwood, Nov. 2002
2
GLAST
Miner Details
What is a Data Miner?
A Traditional “CUT”
o A graphical user programming environment
o An ensemble of Data Manipulation Tools
o A Set of Data Modelling Tools
INPUT
OUTPUT
A Properties Browser
to set parameters
o A “widget” scripting language
o An interface to data bases
Why use a Data Miner?
o Fast and Easy prototyping of Analysis
o Encourages “exploration”
o Allows a more “Global” View of Analysis
Bill Atwood, Nov. 2002
3
GLAST
Classification Trees
Branch 1
Root
Given a “catagorical varible” split the data into two pieces
using “best” independent continuous varible
Example: VTX.Type =
Use “Entropy” to deside which
Independent varible to use:

k
if “vertex” direction is best
2
if “best-track” direction is best
Continue process – treating each branch as
a new “root.” Terminate according to statistics
in last node and/or change in Entropy
Branch 2
Entropy =
1
Example: Classification Tree from Miner
pik log( pik )
Where k is over catagories
and i is the ith Node
(There are other criteria)
Bill Atwood, Nov. 2002
4
GLAST
Classification Trees
Why use Classification Trees?
1. Simplicity of method – recursive application of a
decision making rule
2. Easily captures non-linear behavior in predictors as well
As interactions amoung them
3. Not limited to just 2 catagories
There are numerous text on this subject……
In the following analysis Classification Trees will be used to:
Bill Atwood, Nov. 2002

Separate out the good “vertex” events

Predict how “good” and event really is
5
GLAST
GLAST PSF Analysis
This portion of the code
 Reads in the data
 Culls out bad data
 Adds new columns for analysis
 Makes Global Cuts
 Splits the data into 2 pieces
 Thin Radiators
 Thick Radiators
( ACD.DOCA > 350 &
Energy > .5*MC.Energy)
(TKR.1.z0 > 250)
Bill Atwood, Nov. 2002
6
GLAST
The VTX Classification Tree
Relative amounts
of Catagories
Relative amount
of Data
Bill Atwood, Nov. 2002
7
GLAST
CPA: To Vertex or not to Vertex?
Probability is not continuous –
its essentially binned by the finite
number of leaves (ending nodes)
There is a “gap” at .5 - Use that to
determine which solution to use
Bill Atwood, Nov. 2002
8
GLAST
Do the Vertex Split!
Predictor created by Classification Tree
Use 2-Track Solution
From “Thin”
Split
Use 1-Track Solution
Rename probability column
The data are now divided into 2 subsets according to the
Probability that the 2-Track (“vertex”) solution is best.
No data have been eliminated – Failed Vertexed solutions
Are tried again as 1-Track events
Bill Atwood, Nov. 2002
9
GLAST
Bin the PSF
Continuous Variable
Catagroical Variable
Target Class: Class #1 – MS PSF Limited Bin
Bill Atwood, Nov. 2002
10
GLAST
2 Track Classification Tree
Bill Atwood, Nov. 2002
11
GLAST
1 Track Classification Tree
Bill Atwood, Nov. 2002
12
GLAST
Combining
Bill Atwood, Nov. 2002
13
Results
GLAST
Example PSF’s At FoM Max
100 MeV
PSF-68 =2.7o
95/68 = 2.65
1000 MeV: PSF-68 = .35o
95/68 = 2.3
10000 MeV :
PSF-68 = .1o
95/68 = 2.9
Bill Atwood, Nov. 2002
14
GLAST
Before and After Trees
Using Classification Trees
PSF: 2.1o
95%/68% :2.34
Aeff: 1387 cm2
Bill Atwood, Nov. 2002
15
GLAST
Before and After Trees
Using Classification
Trees
Best results obtained
using the “cuts” to
achieve a good PSF
3
.
5
5
.
4
0
0
5
2
0
.
4
PSF: 2.1o
3
.
0
Aeff
2
.
5
0
0
0
2
5
.
3
95%/68% :2.34
0
0
5
1
Aef
0
.
3
PSF68
2
.
0
Aeff: 1387 cm2
2
.
0
0
0
0
1
PSF95/68
PSF68
2
.
5
95/68 Ratio
5
.
2
1
.
5
0
0
5
0
.
2
1
.
5
1
.
0
0
.
0
0
.
1
0
.
2
0
.
3
0
.
4
0
.
5
0
5
.
1
0
.
0
0
.
6
1
.
0
2
.
0
4
.
0
3
.
0
e
l
g
n
A
.
x
t
V
5
.
0
6
.
0
V
T
X
A
n
g
l
e
Bill Atwood, Nov. 2002
16
GLAST