Transcript Document

Data Mining Tools
For ZLE
Copying and Use Restrictions:
Material under this presentation is the Intellectual Property of HP
Corporation and Genus Software. Any use of the this material, in part
or whole, except in context of Genus Data Mining Integrator and Data
Mart Builder, without written permission from HP and Genus is
prohibited.
© 2002
page 1
agenda
agenda
• data mining in ZLE solutions
• ZLE data mining toolkit
• toolkit demonstration
© 2002
page 2
Meta Group
• process of identifying and/or
extracting previously
unknown, non-trivial,
unanticipated, important
information from large sets
of data
title text
Gartner Group
• process of discovering
meaningful new correlations,
patterns and trends by sifting
through large amounts of
data stored in repositories,
using pattern recognition
technologies, statistical and
mathematical techniques
© 2002
page 3
• role
– determine most
effective responses to
business events
• ZLE facilitates mining
by providing
title text
– a rich, integrated, current
data source
– an integrated operational
environment into which
models can be deployed
• data mining helps to
realize the full business
value of a ZLE system
© 2002
page 4
ZLE data mining process
• understand the opportunity
– identify and define business opportunity
identify and define
business opportunity
• prepare data
–
–
–
–
profile and understand data
derive attributes
transform data
create case set
typically about
75% of process
profile data
derive attributes
transform data
• build models
– train models
– assess model performance
• use models
– deploy model
– monitor model performance
create case set
train models
assess
performance
deploy model
monitor model
performance
© 2002
page 5
agenda
agenda
• data mining in ZLE solutions
• ZLE data mining toolkit
• toolkit demonstration
© 2002
page 6
the ZLE data mining toolkit
• goal:
– provide tools that facilitate ZLE data mining
– reduce process cycle times dramatically
• three tools being developed by Genus Software:
– data preparation
– data transfer
– model deployment
• partners: Genus, MicroStrategy, SAS
• product names:
– Genus Mining Integrator for NonStop SQL (all three tools)
– Genus Mart Builder for NonStop SQL (first two tools only)
© 2002
page 7
ZLE data mining analytical cycle
Data Preparation
(profiling/transforming data)
Real-Time Scoring
Data Transfer
Interaction Manager
(using the
Recommender)
(fast parallel streams)
Modeling
(SAS Enterprise Miner)
Rules
Engine
Scoring
Engine
Agg.
Engine
Mining Mart
Data Store
(NonStop SQL)
Model Deployment
(Tru64/Windows)
(written to DB tables)
part of Genus toolkit
available from SAS
part of ZDK 3
© 2002
page 8
agenda
agenda
• data mining in ZLE solutions
• ZLE data mining toolkit
• toolkit demonstration
© 2002
page 9
toolkit demonstration
• credit card fraud detection example
• opportunity: use ZLE data store data to predict,
in real-time, which credit card purchases are likely
to be fraudulent
• use tools to:
–
–
–
–
build a case set table with one row describing each purchase
transfer table to SAS server for modeling
deploy predictive model to ZLE data store
execute model in real-time to make fraud predictions
• steps described, including many tool screen shots
© 2002
page 10
toolkit data
preparation
solution
• based on the MicroStrategy
(MSI) Business Intelligence
toolset, leverages GUI,
logical data model support,
SQL generation, etc.
• uses NonStop SQL/MX
DBMS, leverages sampling,
TRANSPOSE, statistical
functions, …
• custom tool developed by
Genus using MSI SDK for
NonStop SQL operations
and functionality not
supported by MSI tools
© 2002
page 11
two main ZLE data preparation tasks
1. profile tables
– column names and types
– partitioning information, attributes, key structure, …
– column values
2. transform source tables
–
–
–
–
–
© 2002
derive new attributes
aggregate to appropriate level
clean data
pivot
combine to form case set
page 12
the MicroStrategy desktop
© 2002
page 13
MSI profile report: fraud vs. billing state
© 2002
page 14
NonStop SQL/MX sampling
• source table sampling
– insert into CustSamp
select * from Cust
sample random 1 percent clusters of 10 blocks
union
select * from Cust
where CardNo in (select CardNo from FrdFlg)
• enables interactive and exploratory data prep
• cleanly integrated into SQL
• performed efficiently in DP2
• easily accessible through Genus tool
© 2002
page 15
creating a materialized sample table using
the Genus Data Mart Builder
© 2002
page 16
identifying source and sample method
© 2002
page 17
specifying materialized sample table
© 2002
page 18
transforming source data
Purchase
PurchDt
102302 11:02:44
102302 11:02:44
102302 11:02:45
102302 11:02:45
…
102402 11:01:01
102402 11:02:59
102402 11:02:21
102402 11:03:58
…
102502 12:01:34
102502 12:01:49
102502 12:03:45
102502 12:03:58
…
Account
Amt Store
$4.50 423
$88.38 221
$121.33 221
$19.99 73
Acct
Size
8849940044 249
8376636636 337
8376636636 893
3866493657 102
$43.84
$77.01
$11.63
$144.00
743
23
189
270
8376636636
5378366284
8376636636
3866493657
219
430
501
194
12
6
14
2
44
90
23
5
0
0
0
1
$289.08
$71.99
$38.23
$58.84
45
301
219
17
6474538469
3866493657
5382638977
3866493657
579
220
331
430
5
13
1
8
75
34
91
18
0
1
0
0
Billions of
Purchase
s
© 2002
Store
Millions of
Accounts
Age
4
9
1
19
Purchase
History
CS CR CrLim
33 1 1000
88 0 4600
76 0 1700
43 1 1700
S1
0
1
0
0
A1 P3 S3
0
0 0
$54 1 1
0
0 0
0
0 0
Item
Summary Fraud
Ten
8
46
15
15
P1
0
1
0
0
4600
1000
2000
1500
89
1
20
12
1
1
2
1
1 $121
1 $54
2 $79
1 $20
1
1
2
3
1 $121 $8 $19
1 $54 $15 $22
2 $79
$1
$3
1 $60 $11 $42
1
1
0
1
0
1
0
1
0
0
0
1
0
0
0
0
3000
3300
2900
1800
30
28
29
16
0
2
0
3
0
1
0
2
0
4
0
5
0
1
0
2
0
0
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
$54
0
$55
A3 Min Max Elec Vid Jewl Frd?
0
$1
$3
0
0
0
0
$54 $9 $17
1
1
0
1
0 $19 $42
0
0
1
0
0
$4
$9
0
1
0
0
0 $19 $98
$59 $7 $22
0
$4
$9
$58
$6 $14
Aggregate
and Pivot
page 19
result: a case set for modeling
Hundreds of Attributes
PurchDt
102302 11:02:44
102302 11:02:44
102302 11:02:45
102302 11:02:45
…
102402 11:01:01
102402 11:02:59
102402 11:02:21
102402 11:03:58
…
102502 12:01:34
102502 12:01:49
102502 12:03:45
102502 12:03:58
…
Amt Store
$4.50 423
$88.38 221
$121.33 221
$19.99 73
Acct
Size
8849940044 249
8376636636 337
8376636636 893
3866493657 102
$43.84
$77.01
$11.63
$144.00
743
23
189
270
4674847467
5378366284
8376636636
3866493657
219
430
501
194
12
6
14
2
44
90
23
5
0
0
0
1
$289.08
$71.99
$38.23
$58.84
45
301
219
17
6474538469
3866493657
5382638977
3866493657
579
220
331
430
5
13
1
8
75
34
91
18
0
1
0
0
One Row Per
Purchase
© 2002
Age
4
9
1
19
CS CR CrLim
33 1 1000
88 0 4600
76 0 1700
43 1 1700
Ten
8
46
15
15
P1
0
1
0
0
S1
0
1
0
0
A1 P3 S3
0
0 0
$54 1 1
0
0 0
0
0 0
4600
1000
2000
1500
89
1
20
12
1
1
2
1
1 $121
1 $54
2 $79
1 $20
1
1
2
3
1 $121 $8 $19
1 $54 $15 $22
2 $79
$1
$3
1 $60 $11 $42
1
1
0
1
0
1
0
1
0
0
0
1
1
1
0
1
3000
3300
2900
1800
30
28
29
16
0
2
0
3
0
1
0
2
0
4
0
5
0
1
0
2
0
0
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
$54
0
$55
A3 Min Max Elec Vid Jewl Frd?
0
$1
$3
0
0
0
0
$54 $9 $17
1
1
0
1
0 $19 $42
0
0
1
0
0
$4
$9
0
1
0
0
0 $19 $98
$59 $7 $22
0
$4
$9
$58
$6 $14
Mix of Fraud
and No-Fraud
Purchases
page 20
MSI Datamart report summarizing items
© 2002
page 21
data transfer tool
• task: transfer case set from data store to mining mart
– design
HTTP
Web browser
client
HTML
Web App.
Web server
JDBC
coordinator
NonStop
SQL/MX
coordinator
transfer
transfer
transfer
transfer
Data Store
© 2002
receive
receive
receive
receive
Mining Mart
SAS import
SAS import
SAS import
SAS import
ASCII files
SAS data set
page 22
data transfer specification screen
© 2002
page 23
transfer monitoring
© 2002
page 24
modeling in SAS enterprise miner
© 2002
page 25
model export
score converter node
generates Java model
code
body copy
reporter node exports
code and HTML report to
project directory
© 2002
page 26
model deployment tool
• task
– copy model information to a ZLE Data Store
– design
HTTP
Web browser
client
HTML
JDBC
access
© 2002
Web App.
Web Server
File/registry
access
NonStop
SQL/MX
SAS Open
Metadata
server
Data Store
File/SAS server
SAS
Enterpris
e
Miner
Mining Mart
Model
export/registration
page 27
starting the model deployment tool
© 2002
page 28
connecting to a Data Store
© 2002
page 29
a list of models in the Data Store
© 2002
page 30
viewing a deployed model
© 2002
page 31
selecting a SAS report directory
© 2002
page 32
viewing available reports
© 2002
page 33
viewing an Enterprise Miner report
© 2002
page 34
deploying a model
© 2002
page 35
deployment confirmation
© 2002
page 36
Interaction Manager
real-time scoring using the Recommender
© 2002
Offers /
Advice
Rules Engine
Business
Rules
Model Scores
Scoring Engine
Deployed
Models
Model Aggregates
Customer
Data
Aggregation Engine
Aggregate
Definitions
page 37
how to get the data mining tools
•Product Names
– Genus Mining Integrator for NonStop SQL
(Data Preparation, Data Transfer, and Model Deployment tools)
– Genus Mart Builder for NonStop SQL (first two tools only)
• Can be ordered through HP, support provided by Genus
• Availability: calendar Q4 2002
• For more information, contact
– [email protected] (Product Manager)
– [email protected] (Program Manager)
– [email protected] (Development)
© 2002
page 38