Status of the Component Library

Download Report

Transcript Status of the Component Library

Progress on Building the
Component Library
Bruce Porter, Peter Clark
Ken Barker, Art Souther, John Thompson
James Fan, Dan Tecuci, Peter Yeh
Charles Benton, Marwan Elrakabawy,
Cheyenne Kohnlein
November 1, 2000
The Purpose of the
Component Library
• To represent the set of common actions, states,
objects, and properties so that SME’s can build
KB’s by simply instantiating and assembling
them.
• Representing actions has been our primary focus
for four months.
• Most team members have used a few prototype
components to build relatively simple scenarios.
Now we’re trying to properly build a more
comprehensive set of components.
Refresher…
Slides from kickoff meeting in
New Orleans
Representation of Bioremediation
Soil
Biotechnologist
environment
agent
remediator
agent
Microbes
patient
Get
then
Rate
contains
rate
Bioremediation
script
then
I-
Amount
pollutant
se Script se
se se
Apply
Q+
patient
agent
Break
Down
I-
Amount
amount
product
Oil
absorbed
then
Q-
Absorb
amount
Fertilizer
product
An underlying abstraction...
Soil
Biotechnologist
environment
agent
remediator
agent
Microbes
patient
Get
then
Rate
contains
rate
Bioremediation
script
then
I-
Q-
Amount
Amount
Oil
patient
agent
Break
Down
I-
amount
product
pollutant
se Script se
se se
Apply
Q+
absorbed
then
amount
Fertilizer
product
Absorb
Rate
rate
Q+
I-
Amount
Conversion
rawmaterials
Q-
I-
Amount
amount
product
Substance
amount
Substance
Another abstraction...
Soil
Biotechnologist
environment
agent
script
Microbes
Get
then
Apply
Q-
amount
product
patient
agent
absorbed
then
product
Absorb
Agent
food
script
agent
Script
agent
se
se
Break
Down
patient
Substance
absorbed
then
Absorb
amount
Fertilizer
Digest
eater
I-
Amount
Oil
Break
Down
then
I-
Amount
pollutant
se Script se
se se
patient
Q+
rate
Bioremediation
remediator
agent
Rate
contains
Another abstraction...
Soil
Biotechnologist
environment
agent
remediator
agent
Microbes
patient
Get
script
se Script se patient
se se
agent
Apply
then
Break
Down
script
patient
Get
se
Apply
then
Script
substance
patient
I-
Amount
pollutant
Treatment
substance
Q+
rate
Bioremediation
then
Agent
Rate
contains
then
Q-
I-
Amount
amount
product
Oil
absorbed
Absorb
amount
Fertilizer
product
The Space of Actions
• Based on various linguistic resources and an
analysis of 2 texts by Alberts, we’re working
toward this set of about 190 action components.
• We’ve built components for about half of them, as
shown here.
• Our coding rate has increased significantly, and
we’re now able to productively add more
personnel.
Schedule
• Through the end of 2000:
– focus on action components, completing about 90% of
those currently planned.
– Start coding pump-priming knowledge, building basic
representations of about 200 objects and events.
• January through March 2001:
– Focus on exercising the component library by encoding
significant portions of Alberts. This work doubles as
essential pump-priming.
– Begin to represent generic objects, especially “role
concepts” (more on this later).
– Integrate the component library with core knowledge
developed by other team members (more on this later).
What’s in a Component?
• The specification gives the definition, slot
constraints, and links to standard linguistic
sources. Here’s an example.
• The KM code gives the axioms and an
explicit interface to the user. Here’s an
example. Note that the code includes only
local axioms; KM infers the rest. Here’s the
complete expansion.
Our Process for Building a Component
• form initial clusters of actions (e.g. transfer) based on an analysis
of Alberts, Roget’s clusters, Cyc, and other linguistic sources.
• write a specification for each action.
• search Alberts for all occurrences (including all morphological
variants) of each action, and make sure that the representation
will accommodate them. Here’s the result of analyzing the
actions in one chapter. These “coded examples”will be useful for
training SME’s.
• organize the actions taxonomically and pull out commonalities
that can be handled with various types of composition.*
• code the actions in KM along with simple test cases, commit
them to the CVS-managed library, and run all test cases daily.
Larger scenarios will provide the next level: integration testing.*
* These points will be elaborated below.
How to access the
Component Library
• Click here to visit the component library.
• It’s updated every day unless some test case
fails.
• We’ll add a feature to download the entire
library via FTP.
The Dictionary of Slots
• We want a simple, small, and slow growing set of slots.
Ours currently has 78 slots (53 relations and 25 properties)
and is inspired by well-studied sets of semantic roles from
Linguistics, (surveyed in Ken’s dissertation).
• Slots should apply intuitively to knowledge expressed
informally. We have early evidence based on 3 large
experiments.
• The semantics of the slots must be axiomatized. Here
are some examples.
• Slots must make the distinctions necessary for
inferencing (at least to the fidelity of the KR language)
• The slot language must continue to evolve.
Non-taxonomic composition:
Clichés
• a cliché is a small pattern of axioms that recurs throughout the
hierarchy. For example:
• Reflexive:
requiredslot: agent, object
agent=object
• Reciprocal:
requiredslot: agent, object
agent is object of an instance of this action
having this object as agent
• Undo(A):
precondition: object is the object of the
resulting-state of action A
postcondition: object is no longer the object of the
resulting-state of action A
Non-taxonomic composition:
Utility Concepts
• concepts that have natural homes within the
hierarchy, but also form a part of the
semantics of concepts across the hierarchy
• Copy:
– reasonable as a standalone concept
– also part of Transcribe, Forge, Encode,
Reproduce, etc.
Non-taxonomic composition:
model-as
• Many concepts in the KB are “role concepts”
– e.g., container, nutrient
– are generic
– are highly reusable (can be applied in many concepts)
•
•
•
•
•
“If the DNA containing the 5S rRNA genes is …”
“many DNA sequences produce two or more distinct proteins”
“The DNA guides the synthesis of specific RNA molecules…”
“The DNA is enclosed in …”
“The idea that DNA transfers information…”
• By separating the “model” (e.g. container) and its
application (e.g. to DNA), we can apply & reuse the
same model in many ways.
Applying models
• Traditional: “Hard-wire” models to the modeled things
Cell
generalizations: Container Consumer …?
• Better: Define machine-selectable “views”
Cell
model-as: Container (wall = membrane, ..)
Consumer (consumes = organic molecules, ..)
Vehicle (transported = DNA, …)
….
• Control when and how components apply
• Allows generic components to be used multiple ways
(more reuse) - difficult in the traditional approach!
How others can contribute to the
Component Library
• Because the Library is only 4 months old
and we’ve focused on particular types of
knowledge, much remains to be done. We
have several suggestions for how it might
be usefully expanded.
How SME’s might index the
Component Library
• SME’s will undoubtedly adjust to our tools
somewhat, but they start with English. We
should index the Library by English terms.
• Here’s a simple way to do that ... (next slide)
Mapping from Verbs to Actions
SME: I would like to use transport.
Shaken: Which of these senses of transport would you like?
- v. send from one person or place to another (see: Transfer)
- v. move while supporting … (see: Carry)
- v. hold spellbound
- v. transport commercially
- v. move something or somebody around (see: Move)
- n. the commercial enterprise of transporting goods and materials
- n. something that serves as a means of transportation (see: Transport-Device)
- n. a mechanism to transport magnetic tape over the head …
- n. an exchange of molecules across a membrane (see: Molecular-Transport)
- n. a state of being carried away by overwhelming emotion
We get “for free” the mapping from transport to:
Transfer, Carry, Move, Transport-Device, and Molecular-Transport
by linking our components to synsets in Wordnet.
The red components are currently in the Library; the blue components are planned.
Other types of Knowledge
we’re Encoding
• Properties usually surface as adjectives. We
have a framework for representing them, and a
plan for populating the KB.
• Pump-priming knowledge. We have proposed a
scenario for Jan’01 and started to represent
knowledge of biological objects. We start with
taxonomies and partonomies (like SME’s build),
then convert them automatically to KM.
Coordinating our efforts on
developing Core Knowledge
• The Core Knowledge Workshop in Austin next month
• Proposed agenda:
– Address representation challenges: continuous processes,
modes of existence, time, space, causality, modals and
counterfactuals, …
– Develop a detailed plan for integrating other core theories,
such as ‘Everyday Semantics’
– Design the Core Knowledge for Shaken 1.0
• Schedule:
– Duration: we suggest 3 days
– Dates: we suggest mid-December