Extending Propositional Satisfiability to Determine Minimal Fuzzy

Download Report

Transcript Extending Propositional Satisfiability to Determine Minimal Fuzzy

Extending Propositional Satisfiability to
Determine Minimal Fuzzy-Rough
Reducts
Richard Jensen
Aberystwyth University, UK
Andrew Tuson
City University, UK
Qiang Shen
Aberystwyth University, UK
Richard Jensen, Andrew Tuson and Qiang Shen
Outline
• The importance of feature selection
• Rough set theory
• Fuzzy-rough feature selection (FRFS)
• FRFS-SAT
• Experimentation
• Conclusion
Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection
• Why dimensionality reduction/feature selection?
High dimensional
data
Dimensionality
Reduction
Intractable
Low dimensional
data
Processing System
• Growth of information - need to manage this effectively
• Curse of dimensionality - a problem for machine learning
Richard Jensen, Andrew Tuson and Qiang Shen
Rough set theory
Upper
Approximation
Set A
Lower
Approximation
Equivalence
class Rx
Rx is the set of all points that are indiscernible
with point x in terms of feature subset B
Richard Jensen, Andrew Tuson and Qiang Shen
Discernibility approach
• Decision-relative discernibility matrix
• Compare objects
• Examine attribute values
• For attributes that differ:
cij  {a  C | a( xi )  a( x j ), d ( xi )  d ( x j )}
• If decision values differ, include attributes in matrix
• Else leave slot blank
• Construct discernibility function:
fC(a1,...,am)  {(cij ) :1  j  i | U |, cij  }
Richard Jensen, Andrew Tuson and Qiang Shen
Example
• Remove duplicates
fC(a,b,c,d) =
{a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀
{b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀
{a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d}
• Remove supersets
fC(a,b,c,d) = {b ⋁ c} ⋀ {d}
Richard Jensen, Andrew Tuson and Qiang Shen
Finding reducts
• Usually too expensive to search exhaustively for
reducts with minimal cardinality
• Reducts found through:
• Converting from CNF to DNF (expensive)
• Hill-climbing search using clauses (non-optimal)
• Other search methods - GAs etc (non-optimal)
• RSAR-SAT
• Solve directly in SAT formulation.
• DPLL approach is both fast and ensures optimal reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Fuzzy discernibility matrices
• Extension of crisp approach
• Previously, attributes had {0,1} membership to clauses
• Now have membership in [0,1]
• Allows real-coded data as well as nominal.
• Fuzzy DMs can be used to find fuzzy-rough
reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Formulation
• Fuzzy satisfiability
• In crisp SAT, a clause is fully satisfied if at least
one variable in the clause has been set to true
• For the fuzzy case, clauses may be satisfied to a
certain degree depending on which variables
have been assigned the value true
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: setup
• 9 benchmark datasets
• Features – 10 to 39
• Objects – 120 to 690
• Methods used:
• FRFS-SAT
• Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and
fuzzy discernibility.
• Evolutionary algorithms: genetic algorithms (GA) and particle swarm
optimization (PSO) using fuzzy dependency
• 10x10-fold cross validation
• FS performed on the training folds, test folds reduced using
discovered reducts
Richard Jensen, Andrew Tuson and Qiang Shen
Experimentation: results
Richard Jensen, Andrew Tuson and Qiang Shen
Conclusion
• Extended propositional satisfiability to enable
search for fuzzy-rough reducts
• New framework for fuzzy satisfiability
• New DPLL algorithm
• Fuzzy clause simplification
• Future work:
•
•
•
•
Non-chronological backtracking
Better heuristics
Unsupervised FS
Other extensions in propositional satisfiability
Richard Jensen, Andrew Tuson and Qiang Shen
• WEKA implementations of all fuzzy-rough
feature selectors and classifiers can be
downloaded from:
Richard Jensen, Andrew Tuson and Qiang Shen
Feature selection
• Feature selection (FS) is a DR technique that
preserves data semantics (meaning of data)
Generation
Evaluation
Stopping
Criterion
Validation
• Subset generation: forwards, backwards, random…
• Evaluation function: determines ‘goodness’ of subsets
• Stopping criterion: decide when to stop subset search
Richard Jensen, Andrew Tuson and Qiang Shen
Algorithm
Richard Jensen, Andrew Tuson and Qiang Shen
Example
Richard Jensen, Andrew Tuson and Qiang Shen