Sketching for Knowledge Capture - ACT-R

Transcript Sketching for Knowledge Capture - ACT-R

Symbolic Supercomputer
for
Artificial Intelligence
and
Cognitive Science
Research
Kenneth D. Forbus
Dedre Gentner
Northwestern University
Overview
• Why symbolic supercomputing?
• Off-line experiments
– Work in progress: Large-scale corpus analysis
– Distributed experiments harness
• Interactive Cognitive Architecture experiments
– Companion Cognitive Systems (DARPA)
– Explanation Agent
Off-line experiments
• Sensitivity Analysis
– Every cognitive simulation has parameters
• Analyzing how performance depends on parameters important
for understanding models
– Sensitivity analyses can be expensive
• 1994 MAC/FAC simulations took weeks of CPU time
• 2000: 4.8 million SME runs in SEQL sensitivity analyses took
23 days (400 mhz PII), should be 4 days today.
• Corpora Analyses
– Text
– Sketches
– Problems
Larger-scale simulations
• Goal: Increased use of
automatically generated
inputs
– Reduce tailorability
– Increase # of stimuli
generated and used.
• Processes
– Analogical Encoding
– Conceptual problem
solving
Symbolic models and parallelism
• Our approach is based on Gentner’s (1983)
structure-mapping theory
– Assumes parallel processing both within modules and
between modules
– Currently emulate on serial processors
• Coarse-grained parallelism could provide
important benefits
– Continue to simulate within-module parallelism on
single CPUs
– Exploit parallel processing between modules
• Incrementally update retrievals during reasoning
• Incrementally construct generalizations during reasoning
• Reason about domain, interactions, and self in parallel
Traditional Supercomputers ineffective for
symbolic processing
• Optimized for
– Floating-point processing
– Pipelined, with vector or grid model
–  okay CPUs, low RAM, fast floating point
• Symbolic processing
– Involves many pointer operations
– Some floating-point, but over irregular structures
(graphs, sparse-vectors)
–  fast CPUs, high RAM, okay floating point
Optimizing a cluster for symbolic processing
1. Use the fastest CPU available.
2. Distribute the processing in large,
functionally-organized units.
– Avoid communication overhead
– Data-parallel programming style poor fit for
clusters
– Replicate knowledge base as needed
3. Organize memory to be as fast as possible.
– Maximize RAM, cache
– Avoid virtual memory
Why large memories are crucial
• If a program is going to know a lot, it has to put it
somewhere
• Example: Subset of Cyc KB contents we use
– 35,070 concepts, 8939 relations, and 3,917 functions
– 1,283,835 axioms, divided into 3,537 microtheories
– Added knowledge (DARPA HPKB, CPOF, RKF)
• Military tasks, units, equipment
• Countries, international relationships, terrorist incidents
• Qualitative models, terrain, trafficability, visual representation
conventions, developed by our group
– Takes roughly 495 MB of storage, due to indexing
overhead
• May double in size as we learn by accumulating experiences
Mk2
• Hardware: Linux
Networx
– 5 year maintenance
contract
• 67 nodes
– Dual 3.2Ghz Xeon CPUs
– 3GB RAM/node
– 80GB disk/node
• Allegro Common Lisp for
Linux
– Provides flexible
development environment
Mk2 Cluster Network
Master Host
Node 1
Node 2
…
Node 3
Backplane Subnet
Cisco PIX
Firewall/Router
Frontplane Subnet
Public
Internet
Packet
filtering,
trusted
whitelist of
hosts
Gigabit
switched
Ethernet
Node 67
One-command
provisioning, P2P data
distribution system
Qualitative reasoning for
intelligent agents
(ONR AI Program)
Situation
updates
Queries
Estimates,
warnings
Model of ongoing
situation/system
Explanation
Agent
Objective
Create science base for intelligent
software agents that can
• Reason about the physical
phenomena and systems in a
human-like way
• Extend their knowledge
incrementally, by communicating
with human collaborators in natural
language.
Technical Approach
•
New examples
Knowledge Base
(general knowledge
+ libraries of cases)
•
•
Develop qualitative reasoning techniques for
solving problems under time pressure with
partial, incomplete knowledge (“back of the
envelope” reasoning)
Explore the use of qualitative
representations as part of the semantics for a
natural language system.
Develop techniques to assimilate controlledlanguage reports to extend an agent’s
models of the physical world.
QP Theory in Natural Language Semantics
•
Idea: Qualitative Process theory can be
used as a framework for understanding
NL descriptions of physical phenomena.
– Right level of abstraction
– Consistent with human mental models
– Support for compositionality
•
Approach
– Identify syntactic patterns corresponding
to QP theory concepts via corpus
analysis
– Recast QP theory in terms of frames
– Use controlled subset of English to
simplify parsing, focus on semantics
•
(1) A pipe connects cylinder c1 to
cylinder c2.
(2) Cylinder C1 contains 5 liters of
water.
(3) Cylinder C2 contains 2 liters of
water.
(4) Water flows from cylinder C1 to
cylinder C2, because the pressure in
cylinder C1 is greater than the
pressure
in cylinder C2.
(5) The higher the pressure in cylinder
C1,
the higher the flowrate of the water.
(6) When the pressure in cylinder C2
increases, the flowrate of the water
decreases.
Current status
Type:
(isa flow3606 Translation-Flow)
Participants:
(isa c1 Container) [QuantityFrame
q3609]
(isa c2 Container) [QuantityFrame
q3603]
Conditions:
(> (pressure c1) (pressure c2))
Quantities:
[QuantityFrames q3608 and q3605]
Consequences:
(qprop (flowrate flow3606)
(pressure c1))
(qprop- (flowrate flow3606)
(pressure c2))
(I- (water c1) (flowrate
flow3606))
(I+ (water c2) (flowrate
flow3606))
– NL system translates paragraph sized
texts about physical processes into
formal representations
– Tested on a dozen examples
•
Next steps
– Expand range of texts handled
– Develop knowledge assimilation
techniques to construct knowledge bases
by reading multiple texts
C1
C2
The EA natural language system
Parser
Input
text
QRG-CE
grammar
Retrieval of
semantic
information
Word-Sense
Disambiguation
Lexicon
WSD
Data
Frame
Construction
Frame
Rules
Merge
Rules
Process Frame
Construction
Process
Rules
1.2 million
fact subset
of Cyc
KB
Only 15 out
of ~100
grammar
rules are
QP-specific
Patterns for QPspecific
constituents
QP Frames
QP Theory
constraints
Facts
Sven
Kuehne’s
Ph.D
thesis
Corpus Analysis (in progress)
•
Kuehne and Forbus (2002) used by-hand corpus
analysis to identify syntactic patterns
– Four chapters of an introductory science book, 216
sentences total
– 43% of the material in physical explanatory text could
be captured via QP theory.
•
Do the syntactic patterns that we found for
explanatory physical texts apply to everyday
texts?
– If they do, what is their coverage?
– How many more patterns are there?
Looking for quantities
• 1999 volume of the New York Times, consisting of 6.4 million sentences
• First stage used 30 word list for filtering (7.5 hours)
– ~172,000 sentences output
• Second stage used regular expressions (12 hours)
– Derived from vocabulary and syntactic patterns from previous corpus analysis.
– Result: ~19,000 sentences worth examining more closely
• Third stage uses modified version of our Explanation Agent NLU system
(less than 2 days, 17 hours, on 3 nodes)
– Previously, used Quantity and PhysicalQuantity
– Generalized to the Cyc concept ScalarInterval,
• Subsumes temperament, monetary values, feeling attributes, formality/politeness of
speech, plus others.
• 14,000+ quantities found.
– 0.2% of the sentences mention a recognizable quantity
– Lexicon limitations may have a strong effect here
• Expanding it via hand-labor (Cycorp) plus co-training is probably necessary
• e.g., “intensification of the war effort”
Qualitative changes in the New York Times
• Starting point: Corpus of 6.4 million sentences
• Filter using word list of 89 synonyms for
increases, 66 for decreases (~10 hours each)
–
–
–
–
62,117 candidate sentences mentioning decreases
195,452 candidate sentences mentioning increases
Around 4% of corpus
Contrast: 43% of the material in physical explanatory
text could be captured via QP theory.
• Larger analysis only concerns qualitative proportionalities
• Qualitative representations may play a smaller role in
understanding political texts versus physical texts.
• Genre differences: newspapers versus explanatory material
– E.g., “(X i.e., Y)” common on web, not in newspapers
Dexp: Distributed Experiment Tool
• Provides support for running distributed
experiments
– Written in Common Lisp
– Uses sandbox to avoid configuration issues
• Experimenter divides computation into work units
– Example: For N queries, find all of the solutions to
them
– Provides list of work units to dexp as a file, along with
a startup file and code tree to use
– Gets back a set of files containing the results.
dexp Architecture
• Experiment Coordinator
– Manage distribution and
execution of work units
– Collect results
Coordinator
Load
balancer (*)
• Experiment pool nodes
– Executes a work unit, returns
results.
– Execution uses sandbox for
configuration control
distributed
experiment pool
• Load Balancer
– Dynamically allocate nodes
for work units
– Will balances demands from
multiple simultaneous
experiments
n66
n15
n65
n34
n31
n33
How dexp simplifies experiments: Example
• A experiment analyzing semantic translations in
ResearchCyc KB consisted of ~1200 work units
– Each consisted of a query to see how many examples in
the KB satisfied the semantic patterns given for verbs
• With 24 nodes, most of the experiment was
completed in 34 minutes
– Estimate: 11 hours on a single CPU, if no failures
• Five work units churned for 12 hours, failed to
finish due to heap blow-out
– Most of the results were available quickly
– Much easier to diagnose what was going wrong, instead
of waiting for hours to hit a failure.
Companion Cognitive Systems
A new cognitive systems
architecture
• Robust reasoning and learning
– Companions will learn about
their domains, their users, and
themselves.
• Longevity
– Companions will operate
continuously over weeks and
months at a time.
• Interactivity
– Companions will be capable of
high bandwidth interaction with
their human partners. This
includes taking advice.
– Sketching is a major
interaction modality
Mk2
(ONR, 67 nodes)
Central hypotheses
• Analogical processing will
enable us to create systems
with human-like learning and
reasoning abilities
– Able to handle relational
information
– Able to incrementally adapt
and extend their knowledge
– Able to apply what they learn
in one domain to other
domains
• Using a cluster can make an
analogical processing
architecture fast enough to be
used in interactive systems
– Changes the kinds of
experiments that become
feasible as well.
Colossus
(DARPA, 5
nodes)
Companions as Structure-Mapping Architecture
Psychological Bets
• Ubiquitous use of
structure-mapping for
reasoning and learning
– SME for matching
– MAC/FAC for similaritybased retrieval
– SEQL for generalization
• Qualitative representations
play central role
– Part of visual structure in
spatial reasoning
– Representation of causal
knowledge and arguments
Engineering Choices
• Distributed agent
architecture using KQML
• Logic-based TMS for
working memory
• No hardwired workingmemory capacity limits
Companion Architecture Year One
Cluster
MAC/FAC
Domain
Tickler
Facilitator
Session
Reasoner
Node
Master node
User’s Windows box
Session
Manager
nuSketch System
(sKEA or nuSketch
Battlespace)
Relational
Concept
Map
Node
w/Thomas
Hinrichs, Jeff
Usher, Matt
Klenk, Greg
Dunham,
Emmett Tomai,
Tom Ouyang,
Hyeonkyeong
Kim, and Brian
Kyckelhahn
Bennett Mechanical Comprehension Test
• Widely used standardized exam
for technicians
• Used in cognitive psychology
as indicator of spatial ability
• Difficulty lies in breadth of
situations, not narrow technical
knowledge
• Best score to date: 10 correct
out of a subset of 13 BMCT
problems (77%). [P < 0.001]
Q: Which crane is more stable?
Example describes
how physical
principles apply to a
real-world situation
Analogies
with example
provides
causal
models
needed for
solution
Suggesting visual/conceptual relations by analogy
109
candidates
184
candidates
Analogical
inferences
are
surmises,
not
certainties
189
candidates
109
candidates
MAC/FAC
Knowledge Base
(including case libraries of examples)
Suggestions
Filtering
Candidate
Inference
Extraction
Visual/Conceptual Relations: Experimental Results
• Ex1: Focused Tasking
– 54 sketches (18 situations
drawn by three KEs) as case
library for BMCT
experiment
• Round Robin method: For
each sketch, remove from
library, remove its VCR
answers, generate
suggestions via analogy
– Yielded “exam” of 181
VCR questions
– Score = 74.25 (P << 10-5)
– Coverage = 54%
– Accuracy = 87%
• Ex2: Open tasking
– 10 situations selected from
BMCT problems, covering
larger range of phenomena
(e.g., “a boat moving in
water”, “a bicycle”)
– Each situation sketched by
two graduate students, told
to illustrate the principle(s)
you think are important.
• Round Robin method
– Yielded “exam” of 138
questions
– Score = 21.75 (P < 10-7)
– Coverage = 46%
– Accuracy = 57%
MAC/FAC
Domain Model
Tickler
Session
Reasoner
SEQL
Domain
Generalizer
MAC/FAC
Self Model
Tickler
nuSketch GUI
Dialogue
Manager
Executive
Headless
nuSketch
MAC/FAC
User Model
Tickler
SEQL
Self Model
Generalizer
Facilitator
Relational
Concept
Map
Interactive
Explanation
Interface
SEQL
User Model
Generalizer
Offline
Learning
Offline
Learning
Offline
Learning
Cluster
Session
Manager
User’s
Windows box
Companions
Architecture
as of 9/05
Explanation Agent Prototype
• Use Companions Architecture as infrastructure
• Incorporate other ONR advances
– EA NLU system (Sven Kuehne)
– Back of the envelope reasoning (Praveen Paritosh)
– Spatial prepositions model to link language and
sketches (Kate Lockwood)
– Analogical Problem Solver (Tom Ouyang)
• Use for cognitive simulations
– Natural language, sketching for stimulus input
Back of the Envelope Reasoning (Paritosh)
How much
oxygen is
left?
Is anyone
still alive in
there?
How long
to repair it?
Goal: Develop theories that enable software to reason quantitatively
in real-world situations
• Qualitative
Estimate parameter directly
representations
Use known value
essential for framing
if available
the problems,
Feel for numbers
supporting comparisons
Estimate based
• Analogical reasoning
on
similar
situation
Create estimation model
used to find similar
situations for estimation
Problem solving
models, construct
Find modeling strategy
qualitative
Find values for
representations via
parameters
generalization over
in model
experience
Back of the Envelope Reasoning Progress
• Implemented BoTE-Solver
– Solves 13 problems to date
• Examples
• How many K-8 school
teachers are in the USA?
• How much money is spent on
newspapers in USA per year?
• What is the total annual
gasoline consumption by cars
in US?
• What is the annual cost of
healthcare in USA?
• How much power can an adult
human generate?
• Claim: There is a core
collection of strategic
knowledge, specifically, seven
strategies that capture most of
back of the envelope
reasoning.
• Source:
– Strategies in Bote-Solver
– Analysis of all problems (n=44)
from Force and Pressure,
Rotation and Mechanics, Heat
and Astronomy from Clifford
Swartz’s Back-of-the-Envelope
Physics.
CARVE: Using analogy to generate qualitative
representations
Dimensional
partitioning
for each
quantity
(k-means
clustering)
C1
Input cases
(isa Algeria
(HighValueContextualizedFn
Area AfricanCountries)
.
.
Add these facts
to original cases
Quantity 1
Cj
Cases + structural
limit points and
distributional
partitions
S1
L1
S2
S3
L2
Structural
clustering
using SEQL
C1
Analogical Estimation
• Analogical estimator: makes guesses for a numeric
parameter based on analogy.
(GrossDomesticProduct Brazil ?x)
– The value is known.
– Find an analogous case for which value is known.
– Find anything in the KB which might be a basis for an
estimate.
• Hypothesis: Representations augmented with
symbolic representation will lead to more accurate
estimates.
Basketball Stats Domain
• Quantities (e.g., points per game, rebounds per
game, assists per game, etc.)
• Causal relationships
– Being taller helps being able to rebound and block
– Power forwards are taller and are expected to shoot,
rebound and block
– Being good at getting 3 point field goals means one is a
good shooter, so their free throw success rates will be
higher.
• Case library
– 15 players from different positions on field
– 11 facts per player
(seasonThreePointsPercent JasonKidd 0.404)
(qprop seasonThreePointPercent seasonFreeThrowPercent
BasketballPlayers)
Results: Errors
80
70
60
50
Enriched mean % error
40
Raw mean % error
30
20
10
Al
l
eb
ou
nd
Th
s
re
e
po
in
t%
R
Po
in
ts
s
Fr
e
e
th
r
ow
sis
ts
As
H
ei
gh
t
0
SpaceCase: Motivation
• Multimodal interfaces
potentially useful for
military needs
– Language plus diagrams,
other spatial displays
• Software’s notion of
similarity needs to be like
their human partners
– Including visual properties
– Including retrieval, for
shared history
– Including shared language
• Recent research points to
role of non-geometric
properties in spatial
preposition use
– Coventry 1994; Coventry &
Prat-Sala, 1999; Herskovitz,
1986; Feist & Gentner,
2003; Garrod et al., 1999;
Coventry & Garrod, 2004;
Carlson & van der Zee,
2005
• Spatial language can affect
retrieval of pictures
– Feist and Gentner, 2001
Lockwood, K., Forbus, K., and Usher, J.
SpaceCase: a Model of Spatial Preposition Use
Proceedings of CogSci-05, to appear
sKEA Sketching Interface
sKEA Sketching Interface
firefly -> insect ->
animate
functions as weak
container
firefly
dish
ground_supports_
figure
medium_
curvature
Sketch corpus crucial for model development
• Building a corpus of sketches
– Gathering library of examples from literature
– Use sKEA to capture them in machine-understandable
form
– Estimate: ~ 200 sketches will be needed to cover the set
of prepositions and phenomena to be modeled
• Cluster will be used for
– Regression testing
– Sensitivity analyses: How does performance depend on
parameter values?
Problem-solving experiments
• Starting point: Pisan’s (1998) Thermodynamics
Problem Solver
– Solved 80% of the problems typically found in first four chapters
in engineering thermodynamics textbooks
– Used graphs and property tables
– Produced human-like solutions
• Generalize: Analogical Problem Solver
– Focus on conceptual comprehension questions
– Declarative strategies now include analogical processing
• when/what to retrieve, what candidate inferences to use, level of
effort in testing
– Experiment in progress: Can strategy variations explain
novice/expert differences?
• Pilot results promising, should have full data by end of summer.
Questions?
Technology Transfer
The Whodunit Problem
• Goal: Generate plausible
hypotheses about who
performed an event.
• Formal version: Given
some event E whose
perpetrator is unknown,
construct a small set of
hypotheses {Hp} about
the identity of the
perpetrator of E.
– Include explanations as to
why these are the likely
ones
– Able to explain on demand
why others are less likely.
Assumptions & Limitations
• Formal inputs. Structured
descriptions, including
relational information,
expressed in CycL.
• Accurate inputs.
• One-shot operation. No
incremental updates.
• Passive operation.
Doesn’t generate
differential diagnosis
information
Method 1: Closest Exemplar
Memory pool
Probe
Output =
memory item
+ SME results
CVmatch
CVmatch
SME
SME
CVmatch
1.
2.
SME
CVmatch
Cheap, fast, non-structural
MAC/FAC models similarity based
retrieval
•
•
Scales to large memories
Accounts for psychological phenomena
•
Memory pool = All cases concerning
the 98 perpetrators, minus the test set.
3.
4.
Use MAC/FAC to
retrieve events similar to
E.
For each similar event,
remove it if it doesn't
include a candidate
inference about the
perpetrator.
Iterate until enough
hypotheses are generated.
(Optional) Generate
explanations and
expectations by
analyzing the similarities
and differences between
each Hp and E.
Method 2: Closest Generalization
New
Example
SEQL
SME
Generalizations
Exemplars
…
SEQL models generalization
•
•
•
Assimilate new exemplars into a
generalization when close enough.
Models psychological data, used to
made successful predictions of
human behavior.
Recent extension: use probability to
improve noise immunity
•
Preprocessing:
1.
Partition case library
according to perpetrator.
Use SEQL to construct
generalizations for each
perpetrator.
2.
•
Generating
hypotheses:
1.
Given an incident E, pick
the n closest
generalizations, as
determined by SME's
structural evaluation score.
Whodunit Experiment
• Used 3,379 terrorist
incidents from
Cycorp’s Terrorist
knowledge base
– Between 6 and 158
propositions per case,
20 on average
• 98 perpetrators
involved in at least 3
incidents in the TKB
– Pick one incident at
random for test set,
remove perpetrator
• Elaborate via
inference
– Add attributes (e.g.,
(CityInCountryFn
Italy)) using genls
hierarchy
• Three performance
levels:
– Best bet
– Top 3: Best plus
plausible alternatives
– Top Ten list: Foci for
additional collection,
analysis
Whodunit Example
Whodunit Results
60%
50%
Pure
retrieval
surprisingly
good
Correctness
Adding
probability
yielded 5%
improvement
40%
Top-10
Top-3
Correct
30%
20%
10%
0%
MAC/FAC
SEQL
Symbolic
generalization
adds valve for
weaker criteria
SEQL+P
Background Material
Basketball Stats Estimation by Analogy
Given: An estimation problem
(seasonThreePointsPercent JasonKidd ?x) and
a case library
Find the most similar player to JasonKidd in the
case library for whom we know the value for
seasonThreePointsPercent.
Use that as an estimate for the given problem.
Compare accuracy over the initial case library, and
the case library enriched with representations from
CARVE.
SpaceCase
sKEA
input stimulus
KB
Evidence
Rules
ink
processing
routines
Bayesian
updating
algorithm
Spatial Preposition Label
Performance
• Labeling task (Feist &
Gentner, 2003)
– <figure> is in/on the
<ground>
• 36 total stimuli
– {firefly, coin}
– {bowl, dish, plate, slab,
rock, hand}
– {low, medium, high}
• Consistent on all 36
trials for values of
parameters given
Modeling a spatial language/memory interaction
• Feist and Gentner (2001)
• Use spatial preposition when
showing someone a situation
“On”
• Use SpaceCase to confirm
unsuitability of original stimuli
for ON
initial sketch
0.363
plus variant
0.859
minus variant
0.2428
• Retrieval via MAC/FAC
• Given novel stimulus, they are
more likely to claim they have
seen it before
– Initial sketch plus variants
stored as memory
– Initial as probe retrieves
itself
– Initial plus relation for
spatial preposition retrieves
plus variant
SpaceCase next steps
• Expand model
– more prepositions
– more complex input
• Cross-linguistic modeling
auf
an

Sketching for Knowledge Capture - ACT-R

Transcript Sketching for Knowledge Capture - ACT-R

Directory