Computational Discovery of Communicable Knowledge

Download Report

Transcript Computational Discovery of Communicable Knowledge

Computational Discovery of
Explanatory Process Models
Pat Langley
School of Computing and Informatics
Arizona State University
Tempe, Arizona
http://cll.stanford.edu/~langley
[email protected]
Thanks to N. Asgharbeygi, K. Arrigo, D. Billman, S. Borrett, W. Bridewell, S. Dzeroski,
O. Shiran, and L. Todorovski for their contributions to this research, which is funded by
a grant from the National Science Foundation.
The Challenge of Systems Science
Disciplines like Earth science and computational biology differ
from traditional fields in that they:
 focus on synthesis rather than analysis in their operation;
 develop system-level models with many variables / relations;
 rely on computational methods to aid in their construction.
However, the key challenge involves search through the model
space, not running rapid simulations or handling large data sets.
Example: Explain Data from the Ross Sea
A Model of the Ross Sea Ecosystem
d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto
d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo
d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus
d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus
Differential equation models of this sort are regularly used to
explain observations and predict future behavior.
The Task of Model Construction
Environmental scientists are confronted with a challenging task:
 Given: A set of variables of interest to the scientist;
 Given: Observations of how these variables change over time;
 Find: A model that explains these variations in plausible terms
and that generalizes well to future observations.
Automating such model construction is a natural task for artificial
intelligence and machine learning.
We can develop algorithms that search the space of differential
equation models, but this space is huge, so we need constraints.
Another Account of the Ross Sea Ecosystem
As phytoplankton uptakes nitrogen, its
concentration increases and nitrogen
decreases. This continues until the
nitrogen supply is exhausted, which
leads to a phytoplankton die off. This
produces detritus, which gradually
remineralizes to replenish the nitrogen.
Zooplankton grazes on phytoplankton,
which slows the latter’s increase and
also produces detritus.
d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto
d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo
d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus
d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus
Processes in the Ross Sea Ecosystem
Knowledge about candidate processes requires that some terms
occur either together or not at all.
d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto
d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo
d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus
d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus
Here we highlight the terms related to phytoplantkon loss, which
decreases phyto concentration and increases detritus.
Processes in the Ross Sea Ecosystem
Here we highlight terms related to zooplankton grazing, which
decreases phyto but increases zoo and detritus.
d[phyto,t,1] =  0.307  phyto  0.495  zoo + 0.411  phyto
d[zoo,t,1] =  0.251  zoo + 0.615  0.495  zoo
d[detritus,t,1] = 0.307  phyto + 0.251  zoo + 0.385  0.495  zoo  0.005  detritus
d[nitro,t,1] =  0.098  0.411  phyto + 0.005  detritus
We can use knowledge about processes to reorganize models and
constrain search through the model space.
A Process Model for the Ross Sea
model Ross_Sea_Ecosystem
variables: phyto, zoo, nitro, detritus
observables: phyto, nitro
process phyto_loss
equations: d[phyto,t,1] =  0.307  phyto
d[detritus,t,1] = 0.307  phyto
process zoo_loss
equations: d[zoo,t,1] =  0.251  zoo
d[detritus,t,1] = 0.251  zoo
process zoo_phyto_grazing
equations: d[zoo,t,1] = 0.615  0.495  zoo
d[detritus,t,1] = 0.385  0.495  zoo
d[phyto,t,1] =  0.495  zoo
process nitro_uptake
equations: d[phyto,t,1] = 0.411  phyto
d[nitro,t,1] =  0.098  0.411  phyto
process nitro_remineralization;
equations: d[nitro,t,1] = 0.005  detritus
d[detritus,t,1 ] =  0.005  detritus
This model is equivalent to a
standard differential equation
model, but it makes explicit
assumptions about which
processes are involved.
For completeness, we must
also make assumptions about
how to combine influences
from multiple processes.
The Task of Inductive Process Modeling
We can use these ideas to reformulate the modeling problem:
 Given: A set of variables of interest to the scientist;
 Given: Observations of how these variables change over time;
 Given: Background knowledge about plausible processes;
 Find: A process model that explains these variations and that
generalizes well to future observations.
We can use background knowledge about candidate processes to
make search much more tractable.
Moreover, the resulting model will be consistent with this domain
knowledge, making it more comprehensible.
Generic Processes as Background Knowledge
We cast background knowledge as generic processes that specify:
 the variables involved in a process and their types;
 the parameters appearing in a process and their ranges;
 the forms of conditions on the process; and
 the forms of associated equations and their parameters.
Generic processes are building blocks from which one can compose
a specific process model.
Generic Processes for Aquatic Ecosystems
generic process exponential_loss
variables: S{species}, D{detritus}
parameters:  [0, 1]
equations: d[S,t,1] = 1    S
d[D,t,1] =   S
generic process remineralization
variables: N{nutrient}, D{detritus}
parameters:  [0, 1]
equations: d[N, t,1] =   D
d[D, t,1] = 1    D
generic process grazing
variables: S1{species}, S2{species}, D{detritus}
parameters:  [0, 1],  [0, 1]
equations: d[S1,t,1] =     S1
d[D,t,1] = (1  )    S1
d[S2,t,1] = 1    S1
generic process constant_inflow
variables: N{nutrient}
parameters:  [0, 1]
equations: d[N,t,1] = 
generic process nutrient_uptake
variables: S{species}, N{nutrient}
parameters:  [0, ],  [0, 1],  [0, 1]
conditions: N > 
equations: d[S,t,1] =   S
d[N,t,1] = 1      S
Our current library contains
about 20 generic processes,
including ones with alternative
functional forms for loss and
grazing processes.
Constructing Process Models
observations
process model
model AquaticEcosystem
variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto
observables: nitro, phyto, zoo
process phyto_exponential_growth
equations: d[phyto,t] = 0.1  phyto
process zoo_logistic_growth
equations: d[zoo,t] = 0.1  zoo / (1  zoo / 1.5)
process exponential_growth
variables: P {population}
equations: d[P,t] = [0, 1,]  P
Heuristic
Search
process phyto_nitro_consumption
equations: d[nitro,t] = 1  phyto  nutrient_nitro,
d[phyto,t] = 1  phyto  nutrient_nitro
process phyto_nitro_no_saturation
equations: nutrient_nitro = nitro
process logistic_growth
variables: P {population}
equations: d[P,t] = [0, 1, ]  P  (1  P / [0, 1, ])
process zoo_phyto_consumption
equations: d[phyto,t] = 1  zoo  nutrient_phyto,
d[zoo,t] = 1  zoo  nutrient_phyto
process constant_inflow
variables: I {inorganic_nutrient}
equations: d[I,t] = [0, 1, ]
process zoo_phyto_saturation
equations: nutrient_phyto = phyto / (phyto + 0.5)
process consumption
variables: P1 {population}, P2 {population}, nutrient_P2
equations: d[P1,t] = [0, 1, ]  P1  nutrient_P2,
d[P2,t] =  [0, 1, ]  P1  nutrient_P2
process no_saturation
variables: P {number}, nutrient_P {number}
equations: nutrient_P = P
process saturation
variables: P {number}, nutrient_P {number}
equations: nutrient_P = P / (P + [0, 1, ])
phyto, nitro, zoo,
nutrient_nitro, nutrient_phyto
generic processes
variables
A Method for Process Model Construction
Our initial system, IPM, constructs process models from generic
components in four stages:
1. Find all ways to instantiate known generic processes with
specific variables, subject to type constraints;
2. Combine instantiated processes into candidate generic models
subject to additional constraints (e.g., number of processes);
3. For each generic model, carry out search through parameter
space to find good coefficients;
4. Return the parameterized model with the best overall score.
Our typical evaluation metric is squared error, but we have also
explored other measures of explanatory adequacy.
Results on Observations from Ross Sea
We provided IPM with 188
samples of phytoplnkton,
nitrate, and ice measures
taken from the Ross Sea.
From 2035 distinct model
structures, it found accurate
models that limited phyto
growth by the nitrate and
the light available.
Some high-ranking models
incorporated zooplankton,
whereas others did not.
Results with Inductive Process Modeling
population dynamics
battery behavior
hydrology
biochemical kinetics
Extensions to Inductive Process Modeling
In recent work, we have extended our system to incorporate:
 heuristic beam search through the space of process models;
 hierarchical generic processes that further constrain search;
 an ensemble-like method that mitigates overfitting effects;
 an EM-like method that deals with missing observations.
This approach has great potential to speed the construction of
scientifc models – provided that domain users adopt it.
Interfacing with Scientists
Because few scientists want to be replaced, we are developing an
interactive environment, PROMETHEUS, that lets users:
 specify a quantitative process model of the target system;
 display and edit the model’s structure and details graphically;
 simulate the model’s behavior over time and situations;
 compare the model’s predicted behavior to observations;
 invoke a revision module in response to detected anomalies.
The environment offers computational assistance in forming and
evaluating models but lets the user retain control.
Viewing a Process Model Graphically
Viewing a Process Model as Equations
Adding a Process Manually
Requesting Automatic Model Revision
Results of Automatic Model Revision
Directions for Future Research
Despite our progress to date, we need further work in order to:
 provide better ways to visualize models, data, and their relation
 offer users more natural ways to define the space of models
 specifying constraints on relations among entities and processes
 characterizing subsystems that decompose complex models
 incorporate intuitive metrics like match to trajectory shape
 more generally improve the usability of PROMETHEUS
Taken together, these will make inductive process modeling a
more robust approach to scientific model construction.
Intellectual Influences
Our approach to aiding scientific model construction incorporates
ideas from many traditions:
 computational scientific discovery (e.g., Langley et al., 1983);
 theory revision in machine learning (e.g., Towell, 1991);
 qualitative physics and simulation (e.g., Forbus, 1984);
 languages for scientific simulation (e.g., STELLA, MATLAB);
 interactive tools for data analysis (e.g., Schneiderman, 2001).
Our work combines, in novel ways, insights from machine learning,
AI, programming languages, and human-computer interaction.
Contributions of the Research
In summary, our work on computational model construction has
produced an approach that:
 incorporates a formalism that is familiar to many scientists;
 takes into account background knowledge about the domain;
 produces meaningful results from small amounts of data;
 generates models that explain rather than describe observations;
 provides an interactive environment for model construction.
We need much more research in computational systems science
that addresses these challenges.
End of Presentation