Data mining : A Bird's Eye View

Download Report

Transcript Data mining : A Bird's Eye View

KISS-SIGDB Tutorial 1998
Data Mining Concepts and Research Trends
1998. 5. 21.
Do-Heon LEE
Database Laboratory
Dept. of Computer Science
Chonnam National University
Table of Contents
 Do Heon LEE
•
•
•
•
•
•
•
•
Definition and Motivation of Data Mining
Classification of Data Mining Techniques
Mining Association Rules
Attribute Dependencies
Database Summarization
Data Mining Projects
– DBMiner/GeoMiner/WebMiner
– MineSet
Data Mining and Data Warehousing
References
Definition of Data Mining
 Do Heon LEE
Data mining is the
nontrivial extraction of
implicit,
previously unknown, and
potentially useful information
from large volume of
actual data .
: beyond databases and catalogs
: exclude well-known knowledge
: application-dependent usefulness
: performance perspective
: missing, erroneous data
Some counter examples
– The 3th attribute of table ‘EMP’ is ‘SALARY’.
• Explicit information in the DB catalog
– Most of college students have been graduated from high schools.
• Well-known information, common sense
Motivation of Data Mining Research
 Do Heon LEE
Growing reliance
on database systems
Fast advance of database
system technology
Increasing volume of
data stored in databases
Mining databases for useful knowledge
that can be exploited in decision making
Database =
operational data
collection +
useful resource
reflecting
domain
characteristics
Comparison with Machine Learning
 Do Heon LEE
Data Mining
Machine Learning
Dynamic data
Static data
Errorneous data
Error-free data
Uncertain data
Exact data
Missing data
No missing data
Coexistence of irrelvant data
Only relevant data
Immense size
Moderate size
Structured data
Flat collection of data
• Data mining is an actual application of machine learning methodologies.
Classification of Data Mining Techniques
 Do Heon LEE
• On knowledge types to be discovered
• Characterization : generalized description of data characteristics
•
•
•
•
•
Classfication : description of discriminating characteristics
Clustering : grouping data having common properties
Association : co-occurence relationships among multiple events
Trend analysis : characterize evolution trend of temporal data
Pattern analysis : find specified patterns in large DB’s
Types of mining targets are continuously evolved
according to emerged application demands. ( cf. SQL evolution )
• On database types to be mined
• relational, transactional, object-oriented, temporal, multi-media etc ..
• On techniques adopted
• statistics, symbolic learning, neural networks, visualization etc..
Association Rules : Definition and Applications
 Do Heon LEE


QUEST project at IBM Almaden Research Center
Association rules ( among items )
-
-
Given a collection of transactions each of which is { item-1, ..., item-n },
an association rule has a form of
{ item-11, item-12, ... , item-1m} --> { item-21, item-22, ... , item-2k }
antecedent items
consequence items
The existence of an item(or items) implies the existence of other item(s) in
the same transaction.
In a POS(Point-Of-Sales) data set,
10/15/13:01 { coke, bread, hamburger }
10/15/14:21 { coke, hamburger , juice}
10/15/14:25 { milk, sandwich, juice }
10/15/15:13 { sandwich, milk, juice, bread }
10/15/16:31 { hamburger, juice, coke}
association rules
{ hamburger } --> {coke}
{sandwich, juice} --> {milk}
decision making
for shelf layout design,
direct mailing, etc ...
.....


Customer usage patterns in public communication services
Fault co-occurence analysis in complex systems
Association Rules : Usefulness Measures
 Do Heon LEE

Two measures for identifying useful association rules
- support : statistical significance - the fraction of transactions containing all items
- confidence : rule strength - the fraction of transactions containing consequence items to
transactions containing antecedent items
{ coke, bread, hamburger }
{ coke, hamburger , juice}
{ milk, sandwich, juice }
{ sandwich, milk, juice, bread }
{ hamburger, juice, coke}
{ coke, bread, hamburger }
{ coke, hamburger , juice}
{ hamburger, juice }
{ milk, hamburger, sweater }
{ coke, milk, juice }
hamburger
coke
both
o
o
x
x
o
o
o
o
o
x
o
o
x
x
o
o
o
x
x
o
o
o
x
x
o
o
o
x
x
x
7
6
5
For an assoication rule
{coke} --> { hamburger },
support : 5 out of 10 = 50 %
confidence : 5 out of 6 = 83 %
Association Rules : Mining Procedures
 Do Heon LEE
{ coke, bread, hamburger }
{ coke, hamburger , juice}
{ milk, sandwich, juice }
{ sandwich, milk, juice, bread }
{ hamburger, juice, coke}
{ coke, bread, hamburger }
{ coke, hamburger , juice}
{ hamburger, juice }
{ milk, hamburger, sweater }
{ coke, milk, juice }
{ coke, juice }
{ coke, sweater}
The first phase
: finding frequent item-sets ( high support )
: the threshold value for support is given as 40 %
{coke} : 8
{bread} : 3
{hamburger} : 7
{juice} : 8
{milk} : 4
{sandwich} : 2
{sweater} : 2
{coke, hamburger} : 5
{coke, juice } : 5
{hamburger, juice} : 4
{coke, hamburger, juice} : 2
The second phase
• Blind search : 2N candidates
• AIS : basic algorithm
• SETM : sort-merge algorithm
• Apriori : tree-structured candidate sets
• AprioriTid : temprary table generation
• Partition : partitioned mining
• DHP : hash-based algorithm
: finding strong associations (high confidence)
: the threshold value for confidence is given as 70%
{coke} --> {hamburger} : 5 out of 8 = 62.5 %
{hamburger} --> {coke} : 5 out of 7 = 71 %
{coke} --> {juice}
: 5 out of 8 = 62.5 %
{juice} --> {coke}
: 5 out of 8 = 62.5 %
Sequential Patterns
 Do Heon LEE
CID
Time
Items
1
1
95/06/25
95/06/30
30
90
2
2
2
95/06/10
95/06/15
95/06/20
10,20
30
40,60,70
3
95/06/25
30,50,70
4
4
4
95/06/25
95/06/30
95/07/25
30
40,70
90
5
95/06/12
90
CID Sequence
1
2
3
4
5
<(30) (90)>
<(10,20) (30) (40,60,70)>
<(30,50,70)>
<(30) (40,70) (90)>
<(90)>
Maximal sequential patterns with support > 25%
<(30) (90)>
<(30) (40,70)>
Telecommunication Network Diagnosis
 Do Heon LEE
node-A
node-B
* time = 30 min
(C, 123 )
( F, 678 )
(E, 256 )
node-C
node-F
node-D
node-E
node-I
node-G
node-H
“Co-occurence of 123 alarm
in C and 256 alarm in E
implies 678 alarm in F in
30 minintes.”
Attribute Dependencies
 Do Heon LEE
•
Given attributes A1, A2, ..., Am
– f(A1, A2, ..., Am, a set of constants) ==>
g(A1, A2, ... Am, a set of constants)
where f and g are arbitrary (boolean) functions.
e.g. (A1 = c1 and A2 = c2) then (A3 = c3 and A4 = c4)
– Intractable problems because the number of possible functions and constants
are potentially infinite.
– Thus, several constraints are given to make them tractable in actual domains.
e.g. LHS is a conjuction of simple predicates and RHS is an assertion of
classification --> Classification problem
Classification
 Do Heon LEE
•
Symbolic classification rules(e.g. decision trees)
– The most well-studied area among inductive learning problems.
A1
A2
C
a
a
b
b
d
e
f
g
1
2
3
3
A1
a
A2
d
1
•
b
e
2
3
Neural network approach
– Weight values in edges --> symbolic description of classification rules
– Still far from a practical solution <-- too costly learning time
; Suitable for single-learning/multiple-runs problems
Bottom-Up Summarization
 Do Heon LEE

DBLEARN project at J.Han's Lab., Simon Fraser Univ., Canada
Name
Major Birth_Place GPA vote
Lee
music
Kwangju
Kim physics Sunchon
Yoon
math
Mokpo
Park painting
Yeosu
Choi computing Taegu
Hong statistics Suwon
3.4
3.9
3.7
3.4
3.8
3.2
Major
1
1
1
1
1
1
art
science
science
art
science
science
Birth_Place
GPA
vote
Chunnam
good
Chunnam execellent
Chunnam execellent
Chunnam
good
Kyungbuk execellent
Kyonggi
good
attribute-oriented substitution
1
1
1
1
1
1
Major
Birth_Place
GPA
art
Chunnam
good
science Chunnam execellent
science Kyungbuk execellent
science Kyonggi
good
vote
2
2
1
1
merging redundant records
Domain Knowledge
Major
art
..
.
music
painting physics
Birth_Place
science
math
Korea
...
Chunnam
Kyungbuk
computing
GPA
Foreign
...
[4.0-3.5]
...
...
Kwangju
Sunchon
execellent
...
good
(3.5,3.0]
bad
(3.0,0.0]
Top-Down Summarization
 Do Heon LEE

CLEVER system at DB Lab. KAIST
Table to be summarized
PROGRAM
USER
vi
emacs
word
gcc
tetris
John
Tom
Lee
Park
Yang
: user's selection
< engineering, w >
0.833
editor
< w, marketer >
0.411
< w, programmer >
0.589
USR_01
w
engineering
< w, developer >
0.800
< engineering, developer >
0.700
Fuzzy set hierarchies
PROG_01
tSD = 0.4
< w, w >
1.000
< editor, developer >
0.489
w
game
developer
...
programmer
< engineering, programmer >
0.522
marketer
...
< editor, programmer >
0.456
Data Mining Projects
 Do Heon LEE






QUEST : IBM Almeden Research Center
- a common set of operations in a unified framework
- classfication, association etc..
KDW(Knowledge Discovery Workbench) : GTE Laboratory Inc.
- focus on architectural issues of data mining system
- clustering, classification, summarization, deviation detection etc
IMACS(Intelligent Market Analysis and Classification System) : AT&T
Bell Lab
- focus on human interaction on data mining
- data archaeology
CoverStory : Information Resources Incorporated
- summarization on supermarket scanner data
DBMiner/GeoMiner/WebMiner : Simon Fraser Univ.
MineSet : Silicon Graphics Inc.
DBMiner
 Do Heon LEE
•
•
•
DBMiner Research Group in Simon Fraser Univ., Canada
DMQL : a SQL-like Data Mining Query Language
Data structures : Generalized relations, multi-dimensional data cube
Graphical User Interface
SQL Server
Discovery Modules
DB
Data
Concept Hierarchy
DBMiner(cont’d)
 Do Heon LEE
•
Functions
– Characterizer : the general characteristics of a set of user-specified data
• attribute-oriented induction
• eg. Cold(x) => headache(x) and cough(x)
• eg. Fever(x) => headache(x) and low-leucocyte-count(x)
– Discriminator : features that distinguish the target class from constrasting classes
• eg. Low-leucocyte-count(x) => Fever(x)
– Classifier : generalization-based decision tree induction
– Association rule finder : multi-level association rules
– Meta-rule guided miner : confine the search to specific forms of rules
• eg. Meta-rule : major(s : student, x) and p(s, y) => GPA(s, z)
– Predictor : predict the possible values for missing data, after factor analysis
• eg. An employee’s potential salary can be predicted based on the salary distribution of
similar employees in the company
– Data evolution evaluator
• eg. Growth patterns of certain stocks
– Deviation evaluator
• eg. A set of stocks whose growth patterns deviate from the major trend.
GeoMiner/WebMiner
 Do Heon LEE
•
GeoMiner with GMQL(Geo-Mining Query Language)
– An extension of DBMiner for spatial data mining
– Modules
• Geo-characterizer
– eg. Given spatial hierarchies of Western Canada, discover general
weather patterns according to region partitions
• Geo-comparator(= discriminator)
– eg. The differences in weather patterns between British Columbia and
Alberta
• Geo-associator
•
WebMiner with WebQL
– It finds resources in the internet related to a specific topic
– eg. What is the most popular document about data mining in terms of number of
accesses
•
cf. WEB traversal pattern discovery(by Chen, Park and Yu, 1996)
– eg. If a user visits h1 => h2 => h5 then he/she is apt to visit h8 => h11
MineSet
 Do Heon LEE
•
•
•
•
•
•
•
Developed by Silicon Graphics Inc.
Combine intelligent data mining algorithms and multidimensional data
visualization techniques
Association rule generator/rule visualizer
Classification tools
– MLC++ based classification modules
– Decision tree inducer
– Option tree inducer
– Evidence classifier inducer
– Decision table inducer
– Tree/evidence visualizers
Map visualizer : spatial data analysis
Clustering module
Regressin tree inducer : predict unknown values
Rule Visualizer of MineSet
 Do Heon LEE
Cited from the Silicon Graphics Inc. Home Pages
Decision Tree Visualizer of MineSet
 Do Heon LEE
Cited from the Silicon Graphics Inc. Home Pages
Map Visualizer of MineSet
 Do Heon LEE
Cited from the Silicon Graphics Inc. Home Pages
Two Perspectives on Data Mining
 Do Heon LEE
•
•
•
AI practitioner’s perspective
– Extensions of machine learning technology
– Focus on sophisticated measures and theories rather than efficiency
improvement
DB practitioner’s perspective
– Application of machine learning paradigms to massive and actual data
management problems
A suggestion as a DB practitioner
– First step : Blindly search possible knowledge ==> “ Data Mining”
• There is no guru who could guide the search directions.
• No available heuristics : Rather ignore heuristics for unknown patterns.
– Second step : Validate the discovered rough knowledge in detail
Data Mining and Data Warehousing
 Do Heon LEE
Process-oriented
Metadata
Data Mining
Relational DB-1
Subject-oriented
Relational DB-2
Data mart-1
Object-oriented DB-1
Object-oriented DB-2
Legacy DB-1
File system-1
Operational Data
Data
warehouse
builder/
manager
Data mart-2
Data
warehouse
Data mart-3
Data mart-4
Data mart-5
Data for Decision Support
Research Issues
 Do Heon LEE
•
•
•
•
•
•
•
Looking for useful mining targets
– Associations, characteristic rules, classification, clustering
– Functional dependency, regression trees
– Similar sequential patterns/time series
Variations of association rules
– Alternatives for simple support and confidence measures
– Generalized/multilevel association rules
Performance enhancement for association rule discovery
System implementation issues
– Identify core functions(eg. A tightly-coupled architecture[MEO98], MLC++)
– Elicit common DBMS requirements for various data mining tasks
– Integration with relational databases and/or multi-dimensional databases
Data/knowledge visualization
Extended query language or extened CLI : eg. DMQL
And so on ...
References
 Do Heon LEE
[Data Mining General]
•
•
•
•
[FRW91] W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge Discovery in Databases : An
Overview”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. J. Frawley Ed., AAAI Press,
1991, pp. 1-27
[AGR93a] R. Agrawal, T. Imielinski and A. Swami, “Database Mining : A Performance Perspective”, IEEE
Trans. on Knowledge and Data Enginieering, Vol. 5, No. 6, 1993, pp. 914-925
[MAT93] C. J. Matheus, P. Chan and G. Piatetsky-Shapiro, “Systems for Knowledge Discovery in Databases”,
IEEE TKDE, Vol. 5, No. 6, 1993, pp. 903-913
[HOL94a] M Holsheimer and A. Siebes, “Data Mining : The Search for Knowledge in Databases”, Report CSR9406, ISSN 0169-118X, CWI(Centrum voor Wiskunde en Informatica), The Netherland, 1994
[Association Rules]
•
•
•
[AGR93b] R. Agrawal, T. Imielinski and A. Swami, “Mining Associations between Sets of Items in Massive
Databases”, Proc. ACM SIGMOD, Washington D.C., May 1993
[AGR94] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases”,
Proc. VLDB, Santiago, Sep. 1994, pp. 487-499
[KLE94] M. Klemettien, H. Mannila, P. Ronakainen, H. Toivonen and A. Verkamo, “Finding Interesting Rules
from Large Sets of Discovered Association Rules”, Proc. CIKM, Gaithersburg, Nov. 1994, pp. 401-407
References(Cont’d)
 Do Heon LEE
•
•
•
•
•
•
•
•
•
[HOT95] M. Houtsma and A. Swami, “Set-Oriented Mining for Association Rules in Relational Databases”,
Proc. ICDE, Taipei, Mar. 1995, pp. 25-33
[SAV95] A. Savasere, E. Omiecinski, S. Navathe, “An Efficient Algorithm for Mining Association Rules in
Large Databases”, Proc. VLDB, Zurich, Sep. 1995, pp. 432-444
[SRI95] R. Srikant and R. Agrawal, “Mining Generalized Association Rules”, Proc. VLDB, Zurich, Sep. 1995,
pp. 407-419
[HAN95] J. Han and Y. Fu, “Discovery of Multiple-level Association Rules from Large Databases”, Proc.
VLDB, Zurich, Sep. 1995, pp. 420-431
[PAR95a] J. -S. Park and Y. Fu, “An Efficient Hash Based Algorithm for Mining Association Rules”, Proc.
SIGMOD, 1995, pp. 175-186
[PAR95b] J. -S. Park, M. -S. Chen and P. S. Yu, “Efficient Parallel Data Mining for Association Rules”, Proc.
CIKM, 1995
[SRI96] R. Srikant and R. Agrawal, “Minining Quantitative Association Rules in Large Relational Tables”,
Proc. SIGMOD, Quebec, Jun. 1996, pp. 1-12
[FUK96] T. Fukuda, Y. Morimoto, S. Morishita and T.Tokuyama, “Data Mining Using Two-Dimensional
Optimized Association Rules : Scheme, Algorithms, and Visualization”, Proc. SIGMOD, Quebec, Jun. 1996,
pp. 13-23
[CHE96] D. Cheung, J. Han, V. Ng and C.Wong, “Maintenance of Discovered Association Rules in Large
Databases : An Incremental Updating Technique”, Proc. ICDE, New Orleans, Feb. 1996, pp. 106-114
References(Cont’d)
 Do Heon LEE
•
•
•
•
•
•
•
•
[BRI97a] S. Brin, R. Motwami, J. Ullman and S. Tsur, “Dynamic Itemset Counting and Implication Rules for
Market Basket Data”, Proc. SIGMOD, 1997, pp. 255-264
[BRI97b] S. Brin, R. Motwami and C. Silverstein, “Beyond Market Baskets : Generalizing Association Rules
to Correlations”, Proc. SIGMOD, 1997, pp. 265-276
[HAN97] E. H. Han, G. Karypis and V. Kumar, “Scalable Parallel Data Mining for Association Rules”, Proc.
SIGMOD, 1997, pp. 277-288
[AGG98] C. C. Aggarwal and P. S. Yu, “Online Generation of Association Rules”, Proc. Int’l Conf. on Data
Engineering, 1998, pp. 402-411
[OZD98] B. Özden, S. Ramaswamy and A. Silberschatz, “Cyclic Association Rules”, Proc. Int’l Conf. on
Data Engineering, 1998, pp. 412-423
[LIN98] J. -L. Lin and M. H. Dunham, “Mining Association Rules : Anti-Skew Algorithms”, Proc. Int’l Conf.
on Data Engineering, 1998, pp. 486-493
[SAV98] A. Savasere, E. Omiecinski ans S. Navathe, “Mining for Strong Negative Associations in a Large
Database of Customer Transactions”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 494-502
[RAS98] R. Rastogi and K. Shim, “Mining Optimized Association Rules with Categorical and Numeric
Attributes”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 503-513
References(Cont’d)
 Do Heon LEE
[Characterization]
•
•
•
•
•
•
[HAN91] Y. Cai, N. Cercone and J. Han, “Attribute-Oriented Induction in Relational Databases”, Knowledge
Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 213-228
[HAN92a] J. Han, Y. Cai and N. Cercone, “Knowledge Discovery in Databases : An Attribute-Oriented
Approach”, Proc. VLDB, 1992, pp. 547-559
[HAN92b] J. Han, Y. Cai, N. Cercone and Y. Huang, “DBLEARN : A Knowledge Discovery System for Large
Databases”, Proc. CIKM, 1992, pp. 473-481
[HAN93] J. Han, Y. Cai and N. Cercone, “Data-Driven Discovery of Quantitative Rules in Relational
Databases”, IEEE TKDE, Vol. 5, No. 1, Feb. 1993, pp. 29-40
[LEE94] D.-H. Lee and M. H. Kim, “Discovering Database Summaries through Refinements of Fuzzy
Hypotheses”, Proc. ICDE, Houston, Feb. 1994, pp. 223-230
[LEE97] D.H. Lee and M.H. Kim, "Database Summarization Using Fuzzy ISA Hierarchies", IEEE
Transactions on Systems, Man and Cybernetics, Vol.27, No.4, August 1997, pp. 671-680
References(Cont’d)
 Do Heon LEE
[Sequential Patterns]
•
•
•
•
•
•
•
•
•
•
[ARG93c] R. Agrawal, C. Faloutsos and A. Swami, “Efficient Similarity Search in Sequence Databases”, Proc.
the 4th Int’l Conf. on Foundations of Data Organization and Algorithms, Chicago, Oct 1993
[FAL94] C. Faloutsos, M. Ranganathan and Y. Manolopoulos, “Fast Subsequence Matching in Time-Series
Databases”, Proc. SIGMOD, Minneapolis, May. 1994, pp. 419-429
[AGR95a] R. Agrawal and R. Srikant, “Mining Sequential Patterns”, Proc. ICDE, Taipei, Mar. 1995, pp. 3-14
[AGR95b] R. Agrawal, K.Lin, H. Sawhney and K. Shim, “Fast Similarity Search in the Presense of Noise,
Scaling, and Translation in Time-Series Databases”, Proc. VLDB, Zurich, Sep. 1995, pp. 490-501
[AGR95c] R. Agrawal, G. Psaila, E. Wimmers and M. Zait, “Querying Shapes of Histories”, Proc. VLDB,
Zurich, Sep. 1995, pp. 502-514
[HAT96] K. Hatonen, M. Klemettinen, H. Mannila, P. Ronkainen and H. Toivonen, “Knowledge Discovery
from Telecommunication Network Alarm Databases”, Proc. ICDE, New Orleans, Feb. 1996, pp. 115-123
[SHA96] H. Shatkay and S.Zdonik, “Approximate Queries and Representations for Large Data Sequences”,
Proc. ICDE, New Orleans, Feb. 1996, pp. 536-545
[LI96] C. Li, P. Yu and V. Castelli, “HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases
of Long Sequences”, Proc. ICDE, New Orleans, Feb. 1996, pp. 546-555
[CHE96] M. -S. Chen, J. S. Park and P. S. Yu, “Data Mining for Path Traversal Patterns in a Web
Environment”, Proc. ICDCS, 1997, pp. 385-392
[SHA97] J. Shafer and R. Agrawal, “Parallel Algorithms for High-Dimensional Proximity Joins”, Proc. VLDB,
1997, pp. 176-185
References(Cont’d)
 Do Heon LEE
[Classification/Clustering]
•
•
•
•
•
•
•
[QUI89] J. Quinlan and R. Rivest, “Inferring Decision Trees Using the Minimum Description Length
Principle”, Information and Computation, Vol. 80, 1989, pp. 227-248
[YAS91] R. Yasdi, “Learning Classification Rules from Database in the Context of Knowledge Acquisition
and Representation”, IEEE TKDE, Vol. 3, No. 3, Sep. 1991, pp. 293-306
[CHA91] K. Chan and A. Wong, “A Statistical Technique for Extracting Classificatory Knowledge from
Databases”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991,
pp. 107-123
[UTH91] R. Uthursamy, U. Fayyad and S. Spangler, “Learning Useful Rules from Inconclusive Data”,
Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 141157
[ZIA91] W. Ziarko, “The Discovery, Analysis and Representation of Data Dependencies in Databases”,
Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 195209
[PIA91] G. Piatetsky-Shapiro, “Discovery, Analysis and Presentation of Strong Rules”, Knowledge Discovery
in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 229-248
[MAN91] M. Manago and Y. Kodratoff, “Induction of Decision Trees from Complex Structured Data”,
Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991, pp. 289306
References(Cont’d)
 Do Heon LEE
•
•
•
•
•
•
•
[SMY92] P. Smyth and R. Goodman, “An Information Theoretic Approach to Rule Induction from Databases”,
IEEE TKDE, Vol. 4, No. 4, Aug. 1992, pp. 301-316
[WAN92] L. Wang and J. Mendel, “Generating Fuzzy Rules by Learning from Examples”, IEEE TSMC, Vol.
22, No. 6, Nov. 1992, pp. 1414-1427
[AGR92] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer and A. Swami, “An Interval Classifier for Database
Mining Applications”, Proc. VLDB, Vancouver, Aug. 1992, pp.207-216
[LU95] H. Lu, R. Setiono and H. Liu, “NeuroRule : A Connectionist Approach to Data Mining”, Proc. VLDB,
Zurich, Sep. 1995, 478-489
[HON91] J. Hong and C. Mao, “Incremental Discovery of Rules and Structure by Hierarchical and Parallel
Clustering”, Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley Ed., AAAI Press, 1991,
pp. 177-194
[NG94] R. Ng and J. Han, “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. VLDB,
1994, pp. 144-155
[XU98] X. Xu, M. Ester, H. -P. Kriegel and J. Sander, “A Distribution-Based Clustering Algorithm for Mining
in Large Spatial Databases”, Proc. Int’l Conf. on Data Engineering, 1998, pp. 324-333
References(Cont’d)
 Do Heon LEE
[System Implementations]
•
•
•
•
•
•
•
[SEL96] P.Selfridge, D.Srivastava and L. Wilson, “IDEA : Interactive Data Exploration and Analysis”, Proc.
SIGMOD, Quebec, Jun. 1996, pp. 24-34
[MEO98] R. Meo, G. Psalia and S. Ceri, “A Tightly-Coupled Architecture for Data Mining”, Proc. Int’l Conf.
on Data Engineering, 1998, pp. 316-323
[HAN96] J. Han et. al., “DBMiner : A System for Mining Knowledge in Large Relational Databases”, Proc.
KDD, 1996
[HAN97] J. Han et. al., “GEOMiner : A System Prototype for Spatial Data Mining”, Proc. SIGMOD, 1997
[HAN98] “WebMiner : A Resource and Knowledge Discovery System for the Internet”,
http://db.cs.sfu.ca/WebMiner/
[KOH96] R. Kohavi et. al., “Data Mining Using MCL++ : A Machine Learning Library in C++”, Proc. Tools
with AI, 1996, pp. 234-245
[HAL98] C. Hall ed., “MineSet 2.0 for Data Mining and Multidimensional Data Analysis”,
http://www.cgi.com/Products/software/MineSet/DMStrategies/index.html