Lecture 45 Course Review and Future Research Directions Friday, May 5, 2000 William H.

Download Report

Transcript Lecture 45 Course Review and Future Research Directions Friday, May 5, 2000 William H.

Lecture 45

Course Review and Future Research Directions

Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU

http://www.cis.ksu.edu/~bhsu Readings: Chapters 1-10, 13, Mitchell Chapters 14-21, Russell and Norvig

CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • • •

Main Themes Artificial Intelligence and KDD

Analytical Learning: Combining Symbolic and Numerical AI

– –

Inductive learning Role of knowledge and deduction in integrated inductive and analytical learning Artificial Neural Networks (ANNs) for KDD

– –

Common neural representations: current limitations Incorporating knowledge into ANN learning Uncertain Reasoning in Decision Support

– –

Probabilistic knowledge representation Bayesian knowledge and data engineering (KDE): elicitation, causality Data mining: KDD applications

– –

Role of causality and explanations in KDD Framework for data mining: wrappers for performance enhancement Genetic Algorithms (GAs) for KDD

– –

Evolutionary algorithms (GAs, GP) as optimization wrappers Introduction to classifier systems CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • • •

Class 0: A Brief Overview of Machine Learning

Overview: Topics, Applications, Motivation Learning = Improving with Experience at Some Task

– – –

Improve over task T, with respect to performance measure P, based on experience E.

Brief Tour of Machine Learning

– – –

A case study A taxonomy of learning Intelligent systems engineering: specification of learning problems Issues in Machine Learning

– –

Design choices The performance element: intelligent systems Some Applications of Learning

– –

Database mining, reasoning (inference/decision support), acting Industrial usage of intelligent systems CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 1: Integrating Analytical and Inductive Learning

Learning Specification (Inductive, Analytical)

Instances X, target function (concept) c: X

H, hypothesis space H

Training examples D: positive, negative examples of target function c

Analytical learning: also given domain theory T for explaining examples Domain Theories

– –

Expressed in formal language: propositional logic, predicate logic Set of assertions (e.g., well-formed formulae) for reasoning about domain

• •

Expresses constraints over relations (predicates) within model Example: Ancestor (x, y)

Parent (x, z)

Ancestor (z, y).

Determine

Hypothesis h

H such that h(x) = c(x) for all x

D

Such h are consistent with training data and domain theory T Integration Approaches

– – –

Explanation (proof and derivation)-based learning: EBL Pseudo-experience: incorporating knowledge of environment, actuators Top-down decomposition: programmatic (procedural) knowledge, advice CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 2-3: Explanation-Based Neural Networks

Paper

– – – –

Topic: Explanation-Based and Inductive Learning in ANNs Title: Integrating Inductive Neural Network Learning and EBL Authors: Thrun and Mitchell Presenter: William Hsu Key Strengths

Idea: (state, action)-to-state mappings as steps in generalizable proof (explanation) for observed episode

Generalizable approach (significant for RL, other learning-to-predict inducers) Key Weaknesses

– –

Other numerical learning models (HMMs, DBNs) may be more suited to EBG Tradeoff: domain theory of EBNN lacks semantic clarity of symbolic EBL Future Research Issues

– – –

How to get the best of both worlds (clear DT, ability to generate explanations)?

Applications: to explanation in commercial, military, legal decision support See work by: Thrun, Mitchell, Shavlik, Towell, Pearl, Heckerman CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 4-5: Phantom Induction

Paper

– – – –

Topic: Distal Supervised Learning and Phantom Induction Title: Iterated Phantom Induction: a Little Knowledge Can Go a Long Way Authors: Brodie and Dejong Presenter: Steve Gustafson Key Strengths

– –

Idea: apply knowledge to generate (pseudo-experiential) training data Speedup – learning curve significantly shortened with respect to RL by application of “small amount” of knowledge Key Weaknesses

– –

Haven’t yet seen how to produce plausible, comprehensible explanations How much knowledge is “a small amount”? (How to measure?) Future Research Issues

– – –

Control, planning domains similar (but not identical) to robot games Applications: adaptive (e.g., ANN, BBN, MDP, GA) agent control, planning See work by: Brodie, Dejong, Rumelhart, McClelland, Sutton, Barto CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 6-7: Top-Down Hybrid Learning

Paper

– – – –

Topic: Learning with Prior Knowledge Title: A Divide-and-Conquer Approach to Learning from Prior Knowledge Authors: Chown and Dietterich Presenter: Aiming Wu Key Strengths

– – –

Idea: apply programmatic (procedural) knowledge to select training data Uses simulation to boost inductive learning performance (cf. model checking) Divide-and-conquer approach (multiple experts) Key Weaknesses

– –

Doesn’t illustrate form, structure of programmatic knowledge clearly Doesn’t systematize and formalize model checking / simulation approach Future Research Issues

– – –

Model checking and simulation-driven hybrid learning Applications: “consensus under uncertainty”, simulation-based optimization See work by: Dietterich, Frawley, Mitchell, Darwiche, Pearl CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 8-9: Learning Using Prior Knowledge

• • • •

Paper

– –

Topic: Refinement of Approximate Domain-Theoretic Knowledge Title: Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks

– –

Authors: Towell, Shavlik, and Noordewier Presenter: Li-Jun Wang Key Strengths

– – –

Idea: build relational explanations; compile into ANN representation Applies structural, functional, constraint-based knowledge Uses ANN to further refine domain theory Key Weaknesses

– –

Can’t get refined domain theory back!

Explanations also no longer clear after “compilation” (transformation) process Future Research Issues

– – –

How to retain semantic clarity of explanations, DT, knowledge representation Applications: intelligent filters (e.g., fraud detection), decision support See work by: Shavlik, Towell, Maclin, Sun, Schwalb, Heckerman CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 10: Introduction to Artificial Neural Networks

Architectures

– – –

Nonlinear transfer functions Multi-layer networks of nonlinear units (sigmoid, hyperbolic tangent) Hidden layer representations Backpropagation of Error

The backpropagation algorithm

• •

Relation to error gradient function for nonlinear units Derivation of training rule for feedfoward multi-layer networks

Training issues: local optima, overfitting References: Chapter 4, Mitchell; Chapter 4, Bishop; Rumelhart et al Research Issues: How to…

– – – –

Learn from observation, rewards and penalties, and advice Distribute rewards and penalties through learning model, over time Generate pseudo-experiential training instances in pattern recognition Partition learning problems on the fly, via (mixture) parameter estimation CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 11-12: Reinforcement Learning and Advice

Paper

– – – –

Topic: Knowledge and Reinforcement Learning in Intelligent Agents Title: Incorporating Advice into Agents that Learn from Reinforcements Authors: Maclin and Shavlik Presenter: Kiranmai Nandivada Key Strengths

– – –

Idea: compile advice into ANN representation for RL Advice expressed in terms of constraint-based knowledge Like KBANN, achieves knowledge refinement through ANN training Key Weaknesses

– –

Like KBANN, lose semantic clarity of advice, policy, explanations How to evaluate “refinement” effectively? Quantitatively? Logically?

Future Research Issues

– – –

How to retain semantic clarity of explanations, DT, knowledge representation Applications: intelligent agents, web mining (spiders, search engines), games See work by: Shavlik, Maclin, Stone, Veloso, Sun, Sutton, Pearl, Kuipers CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 13-14: Reinforcement Learning Over Time

• • • •

Paper

– – – –

Topic: Temporal-Difference Reinforcement Learning Title: TD Models: Modeling the World at a Mixture of Time Scales Author: Sutton Presenter: Vrushali Koranne Key Strengths

Idea: combine state-action evaluation function (Q) estimates over multiple time steps of lookahead

– –

Effective temporal credit assignment (TCA) Biologically plausible (simulates TCA aspects of dopaminergic system) Key Weaknesses

– –

TCA methodology is effective but semantically hard to comprehend Slow convergence: can knowledge help? How will we judge? Future Research Issues

– – –

How to retain clarity, improve convergence speed, of multi-time RL models Applications: control systems, robotics, game playing See work by: Sutton, Barto, Mitchell, Kaelbling, Smyth, Shafer, Goldberg CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 15-16: Generative Neural Models

• • • •

Paper

– – – –

Topic: Pattern Recognition using Unsupervised ANNs Title: The Wake-Sleep Algorithm for Unsupervised Neural Networks Authors: Hinton, Dayan, Frey, and Neal Presenter: Prasanna Jayaraman Key Strengths

Idea: use 2 phase algorithm to generate training instances (“dream” stage) and maximize conditional probability of data given model (“wake” stage)

– –

Compare: expectation-maximization (EM) algorithm Good for image recognition Key Weaknesses

– –

Not all data admits this approach (small samples, ill-defined features) Not immediately clear how to use for problem-solving performance elements Future Research Issues

– – –

Studying information theoretic properties of Helmholtz machine Applications: image/speech/signal recognition, document categorization See work by: Hinton, Dayan, Frey, Neal, Kirkpatrick, Hajek, Gharahmani CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 17-18: Modularity in Neural Systems

• • • •

Paper

– – – –

Topic: Combining Models using Modular ANNs Title: Modular and Hierarchical Learning Systems Authors: Jordan and Jacobs Presenter: Afrand Agah Key Strengths

– –

Idea: use interleaved EM update steps to update expert, gating components Effect: forces specialization among ANN components (GLIMs); boosts performance of single experts; very fast convergence in some cases

Explores modularity in neural systems (artificial and biological) Key Weaknesses

– –

Often cannot achieve higher accuracy than ML, MAP, Bayes optimal estimation Doesn’t provide experts that specialize in spatial, temporal pattern recognition Future Research Issues

– – –

Constructing, selecting mixtures of other ANN components (not just GLIMs) Applications: pattern recognition, time series prediction See work by: Jordan, Jacobs, Nowlan, Hinton, Barto, Jaakola, Hsu CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 19: Introduction to Probabilistic Reasoning

Architectures

Bayesian (Belief) Networks

• •

Tree structured, polytrees General

– –

Decision networks Temporal variants (beyond scope of this course) Parameter Estimation

– –

Maximum likelihood (MLE), maximum a posteriori (MAP) Bayes optimal classification, Bayesian learning References: Chapter 6, Mitchell; Chapters 14-15, 19, Russell and Norvig Research Issues: How to…

– – – –

Learn from observation, rewards and penalties, and advice Distribute rewards and penalties through learning model, over time Generate pseudo-experiential training instances in pattern recognition Partition learning problems on the fly, via (mixture) parameter estimation CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 20-21: Approaches to Uncertain Reasoning

Paper

– – – –

Topic: The Case for Probability Title: In Defense of Probability Author: Cheeseman Presenter: Pallavi Paranjape Key Strengths

– – –

Idea: probability is mathematically sound way to represent uncertainty Views of probability considered: objectivist, frequentist, logicist, subjectivist Argument made for meta-subjectivist belief measure concept of probability Key Weaknesses

– –

Highly dogmatic view without concrete justification for all assertions Does not quantitatively, empirically compare Bayesian, non-Bayesian methods Future Research Issues

– – –

Integrating symbolic and numerical (statistical) models of uncertainty Applications: uncertain reasoning, pattern recognition, learning See work by: Cheeseman, Cox, Good, Pearl, Zadeh, Dempster, Shafer CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 22-23: Learning Bayesian Network Structure

Paper

– – – –

Topic: Learning Bayesian Networks from Data Title: Learning Bayesian Network Structure from Massive Datasets Authors: Friedman, Pe'er, Nachman Presenter: Jincheng Gao Key Strengths

Idea: can use graph constraints, scoring functions to select candidate parents in constructing directed graph model of probability (BBN)

Tabu search, greedy score-based methods (K2), etc. also considered Key Weaknesses

Optimal Bayesian network structure learning still intractable for conventional (single-instruction sequential) architectures

More empirical comparison among alternative methods warranted Future Research Issues

– – –

Scaling up to massive real-world data sets (e.g., medical, agricultural, DSS) Applications: diagnosis, troubleshooting, user modeling, intelligent HCI See work by: Friedman, Goldszmidt, Heckerman, Cooper, Beinlich, Koller CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 24-25: Bayesian Networks for User Modeling

• • • •

Paper

– –

Topic: Decision Support Systems and Bayesian User Modeling Title: The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users

– –

Authors: Horvitz, Breese, Heckerman, Hovel, Rommelse Presenter: Yuhui (Cathy) Liu Key Strengths

– –

Idea: BBN model is developed from user logs, used to infer mode of usage Can infer goals, skill level of user Key Weaknesses

– –

Need high accuracy in inferring goals to deliver meaningful content May be better to use next-generation search engine (more interactivity, less passive monitoring) Future Research Issues

– – –

Designing better interactive user modeling Applications: clickstream monitoring, e-commerce, web search, help See work by: Horvitz, Breese, Heckerman, Lee, Huang CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Classes 26-27: Causal Reasoning

• • • •

Paper

– – – –

Topic: KDD and Causal Reasoning Title: Symbolic Causal Networks for Reasoning about Actions and Plans Authors: Darwiche and Pearl Presenter: Yue Jiao Key Strengths

– –

Idea: use BBN to represent symbolic constraint knowledge Can use to generate mechanistic explanations

• •

Model actions Model sequences of actions (plans) Key Weaknesses

– –

Integrative methods (numerical, symbolic BBNs) still need exploration Unclear how to incorporate methods for learning to plan Future Research Issues

– – –

Reasoning about systems Applications: uncertain reasoning, pattern recognition, learning See work by: Horvitz, Breese, Heckerman, Lee, Huang CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 28-29: Knowledge Discovery from Scientific Data

Paper

– – – –

Topic: KDD for Scientific Data Analysis Title: KDD for Science Data Analysis: Issues and Examples Authors: Fayyad, Haussler, and Stolorz Presenter: Arulkumar Elumalai Key Strengths

– –

Idea: investigate how and whether KDD techniques (OLAP, learning) scale up to huge data sets Answer: “it depends” – on computational complexity, many other factors Key Weaknesses

Haven’t developed clear theory yet of how to assess “how much data is really needed”

No technical treatment or characterization of data cleaning Future Research Issues

– – –

Data cleaning (aka data cleansing), pre- and post-processing (OLAP) Applications: intelligent databases, visualization, high-performance CSE See work by: Fayyad, Smyth, Uthurusamy, Haussler, Foster CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 30-31: Relevance Determination

Paper

– – – –

Topic: Relevance Determination in KDD Title: Irrelevant Features and the Subset Selection Problem Authors: John, Kohavi, and Pfleger Presenter: DingBing Yang Key Strengths

Idea: cast problem of choosing relevant attributes problem specification) as search (given “top-level” learning

Effective state space search (A/A*-based) approach demonstrated Key Weaknesses

– –

May not have good enough heuristics!

Can either develop them (via information theory) or use MCMC methods Future Research Issues

– – –

Selecting relevant data channels from continuous sources (e.g., sensors) Applications: bioinformatics (genomics, proteomics, etc.), prognostics See work by: Kohavi, John, Rendell, Donoho, Hsu, Provost CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 32-33: Learning for Text Document Categorization

Paper

– – – –

Topic: Text Documents and Information Retrieval (IR) Title: Hierarchically Classifying Documents using Very Few Words Authors: Koller and Sahami Presenter: Yan Song Key Strengths

Idea: use rank frequency scoring methods to find “keywords that make a difference”

Break into meaningful hierarchy Key Weaknesses

– –

Sometimes need to derive semantically meaningful cluster labels How to integrate this method with dynamic cluster segmentation, labeling?

Future Research Issues

– –

Bayesian architectures using “non-Bayesian” learning algorithms (e.g., GAs) Applications: digital libraries (hierarchical, distributed dynamic indexing), intelligent search engines, intelligent displays (and help indices)

See work by: Koller, Sahami, Roth, Charniak, Brill, Yarowsky CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Classes 34-35: Web Mining

Paper

– – – –

Topic: KDD and The Web Title: Learning to Extract Symbolic Knowledge from the World Wide Web Authors: Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, and Slattery Presenter: Ping Zou Key Strengths

– –

Idea : build probabilistic model of web documents using “keywords that matter” Use probabilistic model to represent knowledge for indexing into web database Key Weaknesses

– –

How to account for concept drift?

How to explain and express constraints (e.g., “proper nouns that are person names don’t matter”)? Not considered here… Future Research Issues

– – –

Using natural language processing (NLP), image / audio / signal processing Applications: searchable hypermedia, digital libraries, spiders, other agents See work by: McCallum, Mitchell, Roth, Sahami, Pratt, Lee CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 36: Introduction to Evolutionary Computation

Architectures

– –

Genetic algorithms (GAs), genetic programming (GP), genetic wrappers Simple vs. parameterless GAs Issues

Loss of diversity

• •

Consequence: collapse of Pareto front Solutions: niching (sharing, preselection, crowding)

– –

Parameterless GAs Other issues (not covered): genetic drift, population sizing, etc.

References: Chapter 9, Mitchell; Chapters 1-6, Goldberg; Chapter 1-5, Koza Research Issues: How to…

– – – –

Design GAs based on credit assignment system (in performance element) Build hybrid analytical / inductive learning GP systems Use GAs to perform relevance determination in KDD Control diversity in GA solutions for hard optimization problems CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 37-38: Genetic Algorithms and Classifier Systems

Paper

– – – –

Topic: Classifier Systems and Inductive Learning Title: Generalization in the XCS Classifier System Author: Wilson Presenter: Elizabeth Loza-Garay Key Strengths

– – –

Idea: incorporate performance element (classifier system) into GA design Solid theoretical foundation: advanced building block (aka schema) theory Can use to engineer more efficient GA model, tune parameters Key Weaknesses

– –

Need to progress from toy problems (e.g., MUX learning) to real-world ones Need to investigate scaling up of GA principles (e.g., building block mixing) Future Research Issues

– – –

Building block scalability in classifier systems Applications: reinforcement learning, mobile robotics, other animats, a-life See work by: Wilson, Goldberg, Holland, Booker CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• • • •

Class 39-40: Knowledge-Based Genetic Programming

Paper

– –

Topic: Genetic Programming and Multistrategy Learning Title: Genetic Programming and Deductive-Inductive Learning: A Multistrategy Approach

– –

Authors: Aler, Borrajo, and Isasi Presenter: Yuhong Cheng Key Strengths

Idea: use knowledge-based system to calibrate starting state of MCMC optimization system (here, GP)

Can incorporate knowledge (as in CIS830 Part 1 of 5) Key Weaknesses

– –

Generalizability of HAMLET population seeding method not well established “General-purpose” problem solving systems can become Rube Goldberg-ian Future Research Issues

– – –

Using multistrategy GP systems to provide knowledge-based decision support Applications: logistics (military, industrial, commercial), other problem solving See work by: Aler, Borrajo, Isasi, Carbonell, Minton, Koza, Veloso CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Class 41-42: Genetic Wrappers for Inductive Learning

• • • •

Paper

– –

Topic: Genetic Wrappers for KDD Performance Enhancement Title: Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm

– –

Authors: Raymer, Punch, Goodman, Sanschagrin, Kuhn Presenter: Karthik K. Krishnakumar Key Strengths

– – –

Idea: use GA to empirically (statistically) validate inducer Can use to select, synthesize attributes (aka features) Can also use to tune other GA parameters (hence “wrapper”) Key Weaknesses

– –

Systematic experimental studies of genetic wrappers have not yet been done Wrappers don’t yet take performance element into explicit account Future Research Issues

– – –

Improving supervised learning inducers (e.g., in MLC++) Applications: better combiners; feature subset selection, construction See work by: Raymer, Punch, Cherkauer, Shavlik, Freitas, Hsu, Cantu-Paz CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

Class 43-44: Genetic Algorithms for Optimization

• • • •

Paper

– –

Topic: Genetic Optimization and Decision Support Title: A Niched Pareto Optimal Genetic Algorithm for Multiobjective Optimization

– –

Authors: Horn, Nafpliotis, and Goldberg Presenter: Li Lian Key Strengths

– –

Idea: control representation of neighborhoods Pareto optimal front by niching Gives abstract and concrete case studies of niching (sharing) effects Key Weaknesses

– –

Need systematic exploration, characterization of “sweet spot” Shows static comparisons, not small-multiple visualizations that led to them Future Research Issues

– –

Biologically (ecologically) plausible models Applications: engineering (ag / bio, civil, computational, environmental, industrial, mechanical, nuclear) optimization; computational life sciences

See work by: Goldberg, Horn, Schwefel, Punch, Minsker, Kargupta CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences

• •

Class 45: Meta-Summary

Data Mining / KDD Problems

Business decision support

Classification

Recommender systems

Control and policy optimization Data Mining / KDD Solutions: Machine Learning, Inference Techniques

Models

Version space, decision tree, perceptron, winnow

• • •

ANN, BBN, SOM Q functions GA/GP building blocks (schemata), GP building blocks

Algorithms

• • •

Candidate elimination, ID3 , delta rule, MLE, Simple (Naïve) Bayes K2, EM, backprop, SOM convergence, LVQ, ADP, simulated annealing Q-learning, TD(

)

Simple GA, GP CIS 830: Advanced Topics in Artificial Intelligence Kansas State University Department of Computing and Information Sciences