Discussion (KDD 2 of 3): Feature Selection for KDD Lecture 7

Download Report

Transcript Discussion (KDD 2 of 3): Feature Selection for KDD Lecture 7

Lecture 7
Discussion (KDD 2 of 3):
Feature Selection for KDD
Tuesday, December 7, 1999
William H. Hsu
Department of Computing and Information Sciences, KSU
http://www.cis.ksu.edu/~bhsu
Readings:
Paper #2: Liu and Motoda, Chapter 3
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Lecture Outline
•
Readings: Liu and Motoda
– Feature Selection for Knowledge Discovery and Data Mining
– Chapter 3: “Feature selection aspects”
•
What is Feature Selection?
•
Generation Scheme
– How do we generate subsets?
– Forward, backward, bidirectional, random, opportunistic
•
Evaluation Measure
– How do we tell how good a candidate subset is?
– Accuracy, consistency, scores (information gain, cross entropy, variance, etc.)
•
Search Strategy
– How do we systematically search for a good subset?
– Blind (uninformed) search
– Heuristic (informed) search
•
Next Class: Presentation
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
What is Feature Selection?
•
Problem: Choosing Inputs x for Supervised Learning
•
Applications
– Concept learning for monitoring
– Extraction of temporal features
– Sensor and data fusion
•
Solutions
– Decomposition of spatiotemporal data
• Attribute-driven problem redefinition
• Constructive induction
– Model selection
•
Constructive
Induction
(x, y)
Approach
– Hierarchy of temporal submodels
– Probabilistic subnetworks
Attribute
Feature
Construction
Partitioning
(xx’
1’,…, xn’)
Cluster
Definition
• ANNs
• Bayesian networks
– Quantitative (metric-based) model selection
CIS 830: Advanced Topics in Artificial Intelligence
y’)(xn’, yn’))
((x1’, y1(x’,
’), …,
Kansas State University
Department of Computing and Information Sciences
Lecture Outline
•
Readings: Liu and Motoda
– Feature Selection for Knowledge Discovery and Data Mining
– Chapter 3: “Feature selection aspects”
•
What is Feature Selection?
•
Generation Scheme
– How do we generate subsets?
– Forward, backward, bidirectional, random, opportunistic
•
Evaluation Measure
– How do we tell how good a candidate subset is?
– Accuracy, consistency, scores (information gain, cross entropy, variance, etc.)
•
Search Strategy
– How do we systematically search for a good subset?
– Blind (uninformed) search
– Heuristic (informed) search
•
Next Class: Presentation
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Issues:
Generation Scheme and Evaluation Measure
•
Generation Scheme
– Directed subset construction
• Forward – start with Ø and grow until U(S) is “high enough”
• Backward – start with S and shrink while U(S) is still “high enough”
• Bidirectional – “meet in the middle” (S, F boundaries)
– Random – iterative improvement (cf. simulated annealing) using F
– Opportunistic – prior knowledge guides generation (compare: heuristic search)
•
Evaluation Measure
– r (MAXR) =  (xi, y) = Cov (xi, y) / sqrt (Var (xi) * Var (y))
– Accuracy
– Consistency
– Classical scores
• Information gain
• Cross entropy
• Variance
• Many others (Gini coefficient, dependence)
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Search
Subset Inclusion State Space
0,0,0,0
Poset Relation: Set Inclusion
A  B = “B is a subset of A”
1,0,0,0
1,1,0,0
0,1,0,0
0,0,1,0
0,0,0,1
1,0,1,0
0,1,1,0
1,0,0,1
0,1,0,1
1,1,1,0
1,1,0,1
1,0,1,1
0,1,1,1
1,1,1,1
{1,2}
“Up” operator: DELETE
“Down” operator: ADD
0,0,1,1
{}
{1}
{2}
{3}
{4}
{1}{3}
{2,3}
{1,4}
{2,4}
{1,2,3}
{1,2,4}
{1,3,4}
{2,3,4}
{3,4}
{1,2,3,4}
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Feature Selection and Construction as
Unsupervised Learning
•
Unsupervised Learning in Support of Supervised Learning
– Given: D  labeled vectors (x, y)
Constructive
Induction
– Return: D’  new training examples (x’, y’)
(x, y)
– Constructive induction: transformation step in KDD
Feature (Attribute)
Construction and
Partitioning
• Feature “construction”: generic term
• Cluster definition
•
x’ / (x1’, …, xp’)
Feature Construction: Front End
– Synthesizing new attributes
Cluster
Definition
• Logical: x1   x2, arithmetic: x1 + x5 / x2
• Other synthetic attributes: f(x1, x2, …, xn), etc.
– Dimensionality-reducing projection, feature extraction
(x’, y’) or ((x1’, y1’), …, (xp’, yp’))
– Subset selection: finding relevant attributes for a given target y
– Partitioning: finding relevant attributes for given targets y1, y2, …, yp
•
Cluster Definition: Back End
– Form, segment, and label clusters to get intermediate targets y’
– Change of representation: find good (x’, y’) for learning target y
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Wrappers for Performance Enhancement
Decision Support System
Single-Task
Model Selection
Task-Specific
Model Selection
Supervised
Unsupervised
•
– “Outer loops” for improving inducers
– Use inducer performance to optimize
•
Definition of New
Learning Problem(s)
Relevant Inputs
(Multiple Objectives)
Reduction of
Inputs
Subdivision of
Inputs
Decomposition
Methods
•
Heterogeneous Data
(Multiple Sources)
Applications of Wrappers
– Combining knowledge sources
• Committee machines (static):
bagging, stacking, boosting
• Other sensor and data fusion
– Tuning hyperparameters
• Number of ANN hidden units
• GA control parameters
• Priors in Bayesian learning
– Constructive induction
• Attribute (feature) subset selection
• Feature construction
Supervised
Unsupervised
Relevant Inputs
(Single Objective)
Wrappers
Implementing Wrappers
– Search [Kohavi, 1995]
– Genetic algorithm
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Supervised Learning Framework
Attribute Selection
and Partitioning

x

Multiattribute
Data Set
Subproblem
Definition
'
x1
'
y1
Partition
Evaluator
?
Learning
Method
?
?
'
xn
'
yn
Metric-Based
Model Selection
Learning
Architecture
?
( Architecture,
Method )
Subproblem
Overall
Prediction
Data
Fusion
Learning Specification
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Case Study:
Automobile Insurance Risk Analysis
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Terminology
•
Supervised Learning
– Inducer
• Supervised inductive learning framework (L, H)
• L – learning algorithm, H – hypothesis space (language)
– Relevance determination – finding inputs that are important to performance
element (e.g., regression or classification)
•
Feature Selection
– Related terms: feature, attribute, variable
– Definition: problem of determining for given inducer which
– Related problems: feature extraction, construction (synthesis), partitioning
•
Methods for Feature Selection
– Feature ranking
– Subset selection: minimum subset (Min-Set)
– Set generation (regression): sequential forward (forward selection), sequential
backward (backward elimination), bidirectional, random
– Search strategies: uninformed, informed
– Filters vs. wrappers
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Summary Points
•
Feature Selection and Knowledge Discovery in Databases (KDD)
– Virtuous cycle of data mining: iterative refinement
– Feedback from supervised learning
•
Role of Feature Selection in Data Mining
– Relevance determination
– Methodologies
• Filters vs. wrappers
• Generation scheme, evaluation measure, search strategy
•
Resources Online
– MLC++
• FSS wrapper
• Many inducers, including ID3, OC1
• http://www.sgi.com/Technology/mlc
– Jenesis
• Part of NCSA D2K: http://lorax.ncsa.uiuc.edu
• KSU KDD Group: http://www.kddresearch.org/Info
– C4.5 / C5.0: http://www.rulequest.com
CIS 830: Advanced Topics in Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences