Document 7249688

Download Report

Transcript Document 7249688

March 11, 2004

Knowledge Representation Chapter 10

1

Outline

March 11, 2004 • KR Introduction • Ontological Engineering • Categories and Objects • Actions, Situations, and Events • Mental Events and Mental Objects • Reasoning Systems for Categories • Reasoning with Default Information • Truth Maintenance Systems • Bio-Ontologies 2

March 11, 2004

KR Introduction

• General problem in Computer Science • Solutions = Data Structures – words – arrays – records – list • More specific problem in AI • Solutions = knowledge structures – lists – trees – procedural representations – logic and predicate calculus – rules – semantic nets and frames – scripts 3

Kinds of Knowledge

Things we need to talk about and reason about; what do we know?

• Objects – Descriptions – Classifications • Events – Time sequence – Cause and effect • Relationships – Among objects – Between objects and events • Meta-knowledge

Distinguish between knowledge and its representation

March 11, 2004 4

Representation Mappings

Reasoning Programs Facts Internal Representation March 11, 2004 English Representation • Knowledge Level • Symbol Level • Mappings are not one-to-one • Never get it complete or exactly right 5

Ontological Engineering

• Like knowledge engineering but applies to general-purpose knowledge bases • Ultimate goal is to represent everything in the world!!

• Result is an upper ontology Anything/Root Sets Numbers Categories AbstractObjects RepresentationalObjects Sentences Measurements Times Weights March 11, 2004 Interval GeneralizedEvents Places PhyscialObjects Processes Moments Things Stuff Animals Agents Humans Solid Liquid Gas 6

Special- and General-purpose Ontologies

• Special-purpose ontology: – Designed to represent a specific domain of knowledge; • genetics (GO) • immune system (IMGT) • mathematics (Tom Gruber) • General-purpose ontology: – Should be applicable in any special-purpose domain – Unifies different domains of knowledge March 11, 2004 • Upper ontology provides highest level framework - all other concepts follow 7

March 11, 2004

Cyc Upper Ontology

• Cycorp released 3,000 upper-level concepts into public domain • Cyc Upper Ontology satisfies two important criteria; – It is universal : Every concept can be linked to it – It is articulate : Distinctions are necessary and sufficient for most purposes 8

March 11, 2004

Categories - Representation

• Two choices for representation: – Predicate • Basketball(b) – Object • Basketballs • Member(b, Basketballs) • Subset(Basketballs, Balls) 9

March 11, 2004

Categories - Organizing

• Inheritance : – All instances of the category

Food

are edible –

Fruit

is a subclass of

Food

Apples

is a subclass of

Fruit

– Therefore,

Apples

are edible • The Class/Subclass relationships among

Food

,

Fruit

and

Apples

is a taxonomy 10

Categories - Partitioning

• Disjoint: The categories have no members in common • Exhaustive Decomposition: Every member of the category is included in at least one of the subcategories • Partition: Disjoint exhaustive decomposition March 11, 2004 11

Categories - Partitioning

Disjoint({Animals,Vegetables}) March 11, 2004 12

Categories - Partitioning

Disjoint({Animals,Vegetables}) Disjoint(s) <=> (  c1,c2 c1  s   Intersection(c1,c2) = {}) c2  s  c1  c2 March 11, 2004 13

Categories - Partitioning

Disjoint({Animals,Vegetables}) Disjoint(s) <=> (  c1,c2 c1  s   Intersection(c1,c2) = {}) c2  s  c1  c2 ExhaustiveDecomposition({Americans,Canadians ,Mexicans},NorthAmericans}) March 11, 2004 14

Categories - Partitioning

Disjoint({Animals,Vegetables}) Disjoint(s) <=> (  c1,c2 c1  s   Intersection(c1,c2) = {}) c2  s  c1  c2 ExhaustiveDecomposition({Americans,Canadians ,Mexicans},NorthAmericans}) ExhaustiveDecomposition(s,c)   c2 c2  s  i  c2) (  i i  c  March 11, 2004 15

Categories - Partitioning

Disjoint({Animals,Vegetables}) Disjoint(s) <=> (  c1,c2 c1  s   Intersection(c1,c2) = {}) c2  s  c1  c2 ExhaustiveDecomposition({Americans,Canadians ,Mexicans},NorthAmericans}) ExhaustiveDecomposition(s,c)   c2 c2  s  i  c2) (  i i  c  Partition({Males,Females},Animals) March 11, 2004 16

Categories - Partitioning

Disjoint({Animals,Vegetables}) Disjoint(s) <=> (  c1,c2 c1  s   Intersection(c1,c2) = {}) c2  s  c1  c2 ExhaustiveDecomposition({Americans,Canadians ,Mexicans},NorthAmericans}) ExhaustiveDecomposition(s,c)   c2 c2  s  i  c2) (  i i  c  Partition({Males,Females},Animals) Parition(s,c)  Disjoint(s)  ExhaustiveDecomposition(s,c) March 11, 2004 17

Categories - More

• PartOf PartOf(Bucharest,Romania) PartOf(Romania,EasternEurope) PartOf(EasternEurope,Europe) PartOf(Europe,Earth) March 11, 2004 • Composite Objects Biped(a) Body(b) PartOf(b,a)  c1    c2 (c3=c1    c1,c2,b Leg(c1) PartOf(c1,a)   Attached(c1,b)  [  c3 Leg(c3)  c3=c2)]  Leg(c2)  PartOf(c2,a) Attached(c2,b) PartOf(c3,a)   18

March 11, 2004

Categories – And More

• Count Nouns and Mass Nouns – How many aardvarks? How many butters!?!

x  Butter  PartOf(y,x)  y  Butter • Intrinsic and Extrinsic Properties – Intrinsic properties belong to the very substance of the object; e.g. flavor, color, density, boiling point, etc.

– Extrinsic properties change if the object is changed (cut in half); e.g. weight, length, shape, etc.

19

March 11, 2004

Actions, Situations and Events

20

Situation Calculus

March 11, 2004 • The states resulting from executing actions • Ontology: – Situations : logical terms describing initial situation and all situations that result from executing actions on a given situation Result(a,s) – Fluents : functions and predicates that may be different in different situations Age(Wumpus,S0) is Wumpus age in situation S0 – Atemporal or eternal: functions and predicates that are constant across all situations Gold(G1) 21

March 11, 2004

Situation Calculus – Actions

• Each action described by two axioms: – Possibility Axiom: Preconditions  Poss(a,s) – Effect Axiom: Poss(a,s)  changes that result from taking action 22

Situation Calculus - Example

Possibility Axioms: At(Agent,x,s) Gold(g)   Adjacent(x,y)  Poss(Go(x,y),s).

At(Agent,x,s)  At(g,x,s)  Poss(Grab(g),s).

Holding(g,s)  Poss(Release(g),s).

March 11, 2004 Effect Axioms: Poss(Go(x,y),s)  At(Agent,y,Result(Go(x,y),s).

Poss(Grab(g),s)  Holding(g,Result(Grab(g),s)).

Poss(Release(g),s)   Holding(g,Result(Grab(g),s)).

23

March 11, 2004

Go for the Gold!

GOAL: Bring the gold from [1,2] to [1,1] At(Agent,[1,1],S0)  At(G1,[1,2],S0).

 Holding(G1,S0).

Gold(G1).

Adjacent([1,1],[1,2])  Adjacent([1,2],[1,1]).

Do It: Go([1,1],[1,2]).

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?

Grab(G1).

24

March 11, 2004

Go for the Gold!

GOAL: Bring the gold from [1,2] to [1,1] At(Agent,[1,1],S0)  At(G1,[1,2],S0).

 Holding(G1,S0).

Gold(G1).

Adjacent([1,1],[1,2])  Adjacent([1,2],[1,1]).

Do It: Go([1,1],[1,2]).

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?

Grab(G1).

25

March 11, 2004

Go for the Gold!

GOAL: Bring the gold from [1,2] to [1,1] At(Agent,[1,1],S0)  At(G1,[1,2],S0).

 Holding(G1,S0).

Gold(G1).

Adjacent([1,1],[1,2])  Adjacent([1,2],[1,1]).

Do It: Go([1,1],[1,2]).

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?

Grab(G1).

26

March 11, 2004

Go for the Gold!

GOAL: Bring the gold from [1,2] to [1,1] At(Agent,[1,1],S0)  At(G1,[1,2],S0).

 Holding(G1,S0).

Gold(G1).

Adjacent([1,1],[1,2])  Adjacent([1,2],[1,1]).

Do It: Go([1,1],[1,2]).

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the gold?

Grab(G1).

27

March 11, 2004

The Frame Problem

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0)).

Now, can I grab the Gold?

Grab(G1).

28

March 11, 2004

The Frame Problem

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).

Now, can I grab the Gold?

Grab(G1).

What in the knowledge base allows me to go from my Result (above) to Grab(G1)?

29

March 11, 2004

The Frame Problem

Result: At(Agent,[1,2],Result(Go([1,1],[1,2]),S0).

Now, can I grab the Gold?

Grab(G1).

What in the knowledge base allows me to go from my Result (above) to Grab(G1)?

nothing 30

March 11, 2004

The Frame Problem

• How do we represent all the things in the world that stay the same?

– Represent all things at all situations; the representational frame problem – Project the results of a sequence of actions; the inferential frame problem 31

Representational Frame Problem

• Successor-State Axiom: Action is possible   (Fluent is true in result state Action’s effect made it true  It was true before and action left it alone).

• Truth value of each fluent in the next state depends on action and truth value in the current state March 11, 2004 Poss(a,s)  Go(x,y)  (At(Agent,y,Result(a,s))  (At(Agent,y,s)  a  a = Go(y,z))).

32

Time and Event Calculus

• Event Calculus based on points in time • Fluents hold at points in time as opposed to holding in situations March 11, 2004

“A fluent is true at a point in time if the fluent was initiated by an event at some time in the past and was not terminated by an intervening event.”

33

March 11, 2004

Event Calculus

• Initiates(e,f,t) and Terminates(w,f,t) • Event Calculus Axiom: T(f,t2)   e,t Happens(e,t)  (t

March 11, 2004

Event Calculus - more

• Can be extended to handle; – indirect effects – continuous change – nondeterministic effects – causal constraints – . . . 35

Generalized Events

March 11, 2004 • Combines aspects of space and time calculus • Allows representation of events occurring in a space-time continuum World War II is an event that happened in various geographic locations during a specific period of time within the 20 th century.

36

March 11, 2004

Processes

• Discrete Events: the event is a whole and a part of the event is no longer the same event • Processes can include subintervals; a part of a plane flight is still a member of the

Flying

class (aka

liquid events

) • Stated more precisely: “Any subinterval of a process is also a member of the same process category.” 37

March 11, 2004

Intervals

• Moment : has temporal duration of zero • Extended Interval : has temporal duration of greater than zero Partition({Moments,ExtendedIntervals},Intervals) Member(i,Moments)  Duration(i) = Seconds(0).

38

March 11, 2004

Intervals Ontology

Meet(i,j)  Before(i,j)  Time(End(i)) = Time(Start(j)).

Time(End(i)) < Time(Start(j)).

After(j,i)  Before(i,j).

During(i,j)   Time(Start(j))  Time(End(i))  Time(Start(i)) Time(End(j)).

Overlap(i,j)   k During(k,i)  During(k,j).

39

March 11, 2004

Mental Events and Mental Objects

• Knowledge about beliefs, specifically about those beliefs held by an agent – “Which agent knows about the geography of Maine?” • Provides an agent the ability to reason about beliefs of agents • However, need to define propositional attitudes, such as

Believes, Knows

and

Wants

as relations where the second argument is referentially opaque (no substitution of equal terms) 40

March 11, 2004

Reasoning Systems for Categories

• Categories are KR building blocks • Two primary systems for reasoning: – Semantic Networks • Graphical aids for visualizing knowledge • Mechanisms for inferring properties of objects based on category membership – Description Logics • Formal language for constructing and combining category definitions • Algorithms for classifying objects and determining subsumption relationships 41

Semantic Networks

March 11, 2004 • Graphical notation with underlying logical representation • A form of logic, but not FOL • Capable of representing objects, relations, quantification, … • Convenient representation of inheritance • Multiple Inheritance (sometimes) • Inverse links • Extendable using procedural attachments 42

Semantic Networks - More

• Can only express binary relationships – making it more difficult to express n-ary predicates; e.g.

Fly(Shankar,NewYork,NewDelhi,Monday)

• Negation, disjunction, nested function symbols, and existential quantification are missing March 11, 2004 • Some SNs include procedural attachments • Represents default values – assertions may be overridden by more specific values 43

March 11, 2004

Semantic Networks

Mammals SubsetOf HasMother SubsetOf Persons SubsetOf Legs Females 2 Males Mary SisterOf John Legs 1 44

Description Logics

March 11, 2004 • Notations to make it easier to describe definitions and properties of categories • Taxonomic structure is organizing principle • Subsumption : Determine if one category is a subset of another • Classification : Determine the category in which an object belongs • Consistency : Determine if membership criteria are logically satisfiable 45

March 11, 2004

Description Logics

• CLASSIC was one of first languages (Borgida, et al, 1989) • “All bachelors are unmarried adult males.” – DL: Bachelor = And(Unmarried,Adult,Male).

– FOL: Bachelor(x)  Male(x) Unmarried(x)  Adult(x)  46

March 11, 2004

Description Logics

• What does this DL statement say?

And(Man,AtLeast(3,Son), AtMost(2,Daughter), All(Son,And(Unemployed,Married, All(Spouse,Doctor))), All(Daughter,And(Professor, Fills(Department,Physics,Math)))).

47

Description Logics - More

March 11, 2004 • Emphasis on tractability of inference • Inference happens by; – Describe the problem instance – Asserting the instance into the KB to be handled by the subsumption apparatus • FOL cannot predict solution time • DL solve in time polynomial in size of KB • DLs usually lack disjuntion and negation (for time/speed considerations) 48

March 11, 2004

Current Description Logic

• DAML+OIL – DARPA Agent Mark-up Language + Ontology Inference Language (OIL) – Comes out of DARPA initiative – OIL from University of Manchester – http://www.w3.org/TR/daml+oil-reference • OWL – Ontology Web Language – A language for the semantic web – “Next generation” DAML+OIL – Flavors: OWL-Lite, OWL-DL and OWL (full) – W3C recommendation as of Feb 10, 2004 – http://www.w3.org/TR/2004/REC-owl-features-20040210/ 49

Reasoning with Default Information

March 11, 2004 • Open and Closed worlds – Open World : Information provided is not assumed to be complete, therefore inferences may result in sentences whose truth value is unknown – Closed World : Information provided is assumed complete, therefore ground sentences not asserted to be true are assumed false – Negation as Failure : A negative literal,

not P

, can be “proved” true if the proof of

P

fails 50

Nonmonotonic Logics: Circumscription

March 11, 2004 • Version of closed-world assumption • Specify predicates that are almost always false • Default rule stating that birds fly: Bird(x)   Abnormal(x)  Flies(x) • Abnormal() is circumscribed – reasoner assumes  Abnormal() unless Abnormal() is known to be true • Circumspection is

model preference

notion of

preferred

models in KB logic; 51

March 11, 2004

Nonmonotonic Logics: Default Logic

• Default rules express contingencies: Bird(x) : Flies(x)/Flies(x) • If Bird(x) is true, and Flies(x) consistent with KB, then conclude Flies(x) (by default) • Default rule form is; P : J 1 , …, J n /C • P = Prerequisite; J Conclusions • If any J = Justifications; is false, then C is not true C = 52

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: 53

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P 54

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue 55

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) 56

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) First, though, Retract(KB,P) to avoid P   P 57

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) First, though, Retract(KB,P) to avoid P   P What if P  Q? What happens to Q?

58

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) First, though, Retract(KB,P) to avoid P   P What if P  Q? What happens to Q?

Retract Q?

59

March 11, 2004

Truth Maintenance Systems

• Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) First, though, Retract(KB,P) to avoid P   P What if P  Q? What happens to Q?

Retract Q?

But what if we also have R  Q?

60

Truth Maintenance Systems

March 11, 2004 • Designed to handle Belief Revision: Let’s say our KB contains sentence P But P is found to be incorrect/untrue So, we want to say Tell(KB,  P) First, though, Retract(KB,P) to avoid P   P What if P  Q? What happens to Q?

Retract Q?

But what if we also have R  Q?

Therefore… 61

Truth Maintenance Systems

• “Rollback” mechanism – doesn’t scale up • Justification-based Truth Maintenance System (JTMS) – Includes in the KB the set of sentences from which the sentence was inferred – Sentences are

in

or

out

, based on truth value of supporting sentences March 11, 2004 • Assumption-based Truth Maintenance System (ATMS) – Maintains a set of supporting sentences, representing all states – Sentence holds in just those cases where all assumptions in one of the assumptions sets hold 62

March 11, 2004

Justification-based TMS

• Each sentence in KB includes all sentences that made it true P  Q has justification {P, P  Q} • What if Q has the following justifications, and we Retract(P)?

63

March 11, 2004

Justification-based TMS

• Each sentence in KB includes all sentences that made it true P  Q has justification {P, P  Q} • What if Q has the following justifications, and we Retract(P)?

{P, P  Q} 64

March 11, 2004

Justification-based TMS

• Each sentence in KB includes all sentences that made it true P  Q has justification {P, P  Q} • What if Q has the following justifications, and we Retract(P)?

{P, P  Q} {P, P  R  Q} 65

March 11, 2004

Justification-based TMS

• Each sentence in KB includes all sentences that made it true P  Q has justification {P, P  Q} • What if Q has the following justifications, and we Retract(P)?

{P, P  Q} {P, P  {R, R  R  P  Q} Q} 66

Justification-based TMS

• Each sentence in KB includes all sentences that made it true P  Q has justification {P, P  Q} March 11, 2004 • What if Q has the following justifications, and we Retract(P)?

{P, P  Q} {P, P  {R, R  R  P  Q} Q} • Sentences that comprise Justifications are

in

or

out

(not removed from KB) – efficiency 67

March 11, 2004

Assumption-based TMS

• Designed to make Belief Revision efficient • Represents all states at the same time • Each sentence in the KB has a set of assumption sets • For each sentence in the KB, the sentence holds when all assumptions in one of it’s assumption sets hold 68

March 11, 2004

Ontologies in Practice: The BioOntologies Consortium

69

March 11, 2004

Outline

• Motivation – The problem – The solution • Exchange Languages Evaluation – Initial Evaluation – Second-level Evaluation – Conclusions/Recommendations • Future Work 70

Motivation

• Explosive and uncontrolled growth of Bioinformation March 11, 2004 • It is increasingly important in the life sciences to integrate information across scientific disciplines and business areas – Terminology in the domain of molecular biology is inconsistent - information searches can be incomplete and inaccurate – Definitions and descriptions of life sciences objects differ among data sources - significant time and effort is required to integrate those data sources 71

March 11, 2004

What is DNA Topoisomerase ?

UMLS says it’s ==> EC 5.99.1.2

DNA Nicking-Closing Protein DNA Relaxing Enzyme DNA Relaxing Protein DNA Topoisomerase DNA Topoisomerase I DNA Type 1 Topoisomerase DNA Untwisting Enzyme DNA Untwisting Protein Omega Protein Topoisomerase I Type I DNA Topoisomerase Nicking-closing enzyme Relaxing enzyme Untwisting enzyme w-Protein Swivelase 72

March 11, 2004

Motivation - Shared Ontologies

• Ontologies in the life sciences currently exist, but not in a coordinated/shared manner • Shared ontologies provide benefits; – sharing the work – database integration – exchange of biological data – developing shared understandings – differences can provide focus on interesting problems 73

The Solution: Ontologies

“An ontology is a specification of a conceptualization.” “An ontology is a description of the concepts and relationships that can exist for an agent or a community of agents. ... A common ontology defines the vocabulary with which queries and assertions are exchanged among agents.” T.R. Gruber (1993) March 11, 2004 74

Goals of Ontologies

March 11, 2004 • Provide standardized vocabularies for text mining and information retrieval • Formalized ontologies are expressed in a common language (or a small number of languages), facilitating representation and exchange of ontological knowledge • Building common ontologies will establish shared understandings within the community  so, create a consortium as a forum to develop these ontologies 75

Bio-Ontologies Consortium Goals

March 11, 2004 • Enable interoperability/exchange of life sciences information • Establish a consortium for promoting and sharing open-source ontologies in the Life Sciences • Establish user community for sharing experiences with designing and building ontologies for the Life Sciences • Develop synergies with the Knowledge Management community to target tools/languages to life sciences ontologies • Create a permanent portal for the exchange of ontologies and ontology building tools 76

March 11, 2004

Bio-Ontologies Consortium Activity

• Enable interoperability/exchange of life sciences information – Successful exchange depends on; • Common, shared definitions • Common language to describe definitions – Therefore, select a language, or a small set of languages, for the exchange of life sciences ontologies 77

Select Candidate Languages (1)

March 11, 2004 • • • •

Ontolingua

– Long-standing effort in KR community – Based on work for common interchange language

CycL

– Significant effort in KR community – Largest commercial vendor of ontological tools

OML/CKML

– XML based language – new language, so possible to influence development

OPM

– OO model to describe single- and multi-DB schemas – tool used in bioinformatic community 78

Select Candidate Languages (2)

March 11, 2004 • • • • •

XML and XML/RDF

– Web-based language – Significant work going on to extend expressivity

UML

– Widely used modeling tool in commercial marketplace – Based on OO concepts (supported by industry)

OKBC

– API for accessing distributed Knowledge Bases – Current work by KR community

ASN.1

– Early representation language for Bioinformatics

ODL

– De facto standard for OO databases 79

Evaluation Criteria (1)

• Language Support and Standardization – Does the language have a formal specification?

– What support (documentation, tutorials, tech support, …) is available?

– Does the language implement a standard? If so, who controls this standard?

• Data model/capabilities – How rich is the expressiveness of the language, I.e., does the language support negation, conjunction, disjunction, relations, ...

March 11, 2004 80

Evaluation Criteria (2)

• Performance – Scalability to real-world problems – Stability (languages with tools/environments) • Other Issues/Pragmatics – Current users of the language – Domains in which the language has been applied – Connection to data sources (knowledge sources & storage formats (relational, OO, …)) March 11, 2004 81

Initial Evaluation - Results

March 11, 2004 • Keys to acceptance: – Rich expressive power – Stability and history of use – Approachable/understandable syntax – Open to collaboration • Keys to non-acceptance: – Proprietary language – Wedded to a commercial system 82

Initial Evaluation - Results

Ontolingua OML/CKML XML/RDF OKBC OPM CycL UML/XMI Express High High Low Med Med High Low Stable Tools High High High Low Med High Low Med High High Med High High High Support Med Med High Med Med Med Low Collab Med High High Med Low Low Low Status IN IN OUT OUT OUT OUT OUT March 11, 2004 83

Next Level Evaluation

• Two languages stood out as strong candidates; – Ontolingua – OML/CKML March 11, 2004 • Conduct experiments to represent biological entities; – select two life sciences ontologies • Ecocyc Gene Ontology • GeneClinics data model/ontology – represent each ontology in both Ontolingua and OML 84

Gene Ontology - Ontolingua (1)

(DEFINE-CLASS |

Genes

| (?X) "The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley. One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins (Product-Types).” :DEF (AND (|DNA-Segments| ?X))) ?VALUE))) (DEFINE-FUNCTION

CENTISOME-POSITION

(?FRAME) :-> ?VALUE

"This slot lists the map position of this gene on the chromosome in centisome units.” :DEF (AND (|Genes| ?FRAME) (NUMBER ?VALUE))) (DEFINE-RELATION "This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form [reference-id].”

CITATIONS

(?FRAME ?VALUE) :DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE))) (DEFINE-RELATION

COMMENT

(?FRAME ?VALUE) "The Comment slot stores a general comment about the object that contains the slot.” :DEF (AND (:THING ?FRAME) (STRING ?VALUE))) (DEFINE-FUNCTION "The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made).”

COMMON-NAME

(?FRAME) :-> ?VALUE

:DEF (AND (|Organisms| ?FRAME) (STRING ?VALUE))) (DEFINE-RELATION March 11, 2004

EVIDENCE

(?FRAME ?VALUE) "Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence analysis.” :DEF (AND (|Genes| ?FRAME) ((:ONE-OF :EXPERIMENT :SEQUENCE-ANALYSIS) ?VALUE))) 85

Gene Ontology - Ontolingua (2)

(DEFINE-RELATION

HISTORY

(?FRAME ?VALUE) "Contains a textual history of changes made to this frame. Each item is either a string or a note frame." :DEF (AND (:THING ?FRAME) ((:OR :STRING |Notes|) ?VALUE))) (DEFINE-FUNCTION

INTERRUPTED

? (?FRAME) :-> ?VALUE

"The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon inserted.” :DEF (AND (|Genes| ?FRAME) (BOOLEAN ?VALUE))) (DEFINE-FUNCTION

LEFT-END-POSITION

(?FRAME) :-> ?VALUE

“” :DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE))) (DEFINE-RELATION

PRODUCT

(?FRAME ?VALUE) "This slot lists the product of a gene, which could be a polypeptide or a tRNA. Multiple products will be recorded in the case that several chemically modified forms of the protein product exist.

" :DEF (AND (|Genes| ?FRAME) ((:OR |Polypeptides| RNA) ?VALUE))) (DEFINE-RELATION

PRODUCT-STRING

(?FRAME ?VALUE) "This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame).” :DEF (AND (|Genes| ?FRAME) (STRING ?VALUE))) (DEFINE-RELATION ?VALUE)))

PRODUCT-TYPES

"Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc.” :DEF (AND (|Genes| ?FRAME) ((:ONE-OF :ENZYME :REGULATOR :LEADER :MEMBRANE :TRANSPORT :STRUCTURAL :RNA :PHENOTYPE :FACTOR :CARRIER) (?FRAME ?VALUE) 86 March 11, 2004

Gene Ontology - Ontolingua (3)

(DEFINE-FUNCTION

RIGHT-END-POSITION

(?FRAME) :-> ?VALUE

”” :DEF (AND (|DNA-Segments| ?FRAME) (NUMBER ?VALUE))) (DEFINE-RELATION

SYNONYMS

(?FRAME ?VALUE) "One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object.” :DEF (AND (|Generalized-Reactions| ?FRAME) (STRING ?VALUE))) (DEFINE-FUNCTION "This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -." :DEF

TRANSCRIPTION-DIRECTION

(AND (DNA ?FRAME) ((:ONE-OF "+" "-") ?VALUE))) (?FRAME) :-> ?VALUE

87 March 11, 2004

Gene Ontology - OML/CKML (1)

This OML ontology defines an encoding of the gene classification system developed by Monica Riley.

The class of all genes is divided into several subclasses. Genes whose function is unknown or known only approximately are grouped into the classes ORFs and Unclassified-Genes, respectively. Genes of known function have been classified using two orthogonal classification schemes developed by Monica Riley.

One scheme classifies genes according to the physiological role of their product class (Physiological-Roles); the other scheme classifies genes according to the function of their product, such as enzymes and transport proteins

HISTORY

?" srcType="Genes" tgtType="data.Boolean"> The value of this slot is T for genes that are interrupted, i.e., those that have an early stop codon " srcType="CKML#Object" tgtType="data.String"> Contains a textual history of changes made to this frame. Each item is either a string or a note March 11, 2004 88

Gene Ontology - OML/CKML (2)

EVIDENCE

" srcType="Genes" tgtType="Evidence"> Describes evidence for the defined function of this object. Currently we distinguish between function that is determined experimentally, and function that is determined through computational sequence

CENTISOME-POSITION

CITATIONS COMMENT COMMON-NAME

" srcType="Genes" tgtType="data.Real"> This slot lists the map position of this gene on the chromosome in centisome units. " srcType="CKML#Object" tgtType="data.String"> This slot lists general citations pertaining to the object containing the slot. Each value of the slot is a citation of the form [reference-id]. " srcType="CKML#Object" tgtType="data.String"> The Comment slot stores a general comment about the object that contains the slot. " srcType="CKML#Object" tgtType="data.String"> The primary name by which an object is known to scientists -- a widely used and familiar name (in some cases arbitrary choices must be made).

TRANSCRIPTION-DIRECTION

" srcType="Genes" tgtType="Transcription-Direction"> This slot specifies the direction along the chromosome in which this gene is transcribed; allowable values are + or -. 89

Gene Ontology - OML/CKML (3)

One or more secondary names for an object -- names that a scientist might attempt to use to retrieve the object. The Synonyms should include any name a user might use to try to retrieve an object.

PRODUCT-TYPES RIGHT-END-POSITION

" srcType="Genes" tgtType="data.String"> This slot holds a text string that describes the product of this gene; this slot is only used when EcoCyc does not describe the gene product as a frame (such as a polypeptide frame). " srcType="Genes" tgtType="Product-Types"> Describes the type of the gene product, e.g., is it an enzyme, an RNA, etc. " srcType="Genes" tgtType="data.Real"/> March 11, 2004 90

Gene Ontology - OML/CKML (4)

<

Genes

id="EG10707" text="pheA"> <

LEFT-END-POSITION

tgt="2735765"/> <

CENTISOME-POSITION

tgt="58.97035d0"/> <

TRANSCRIPTION-DIRECTION

tgt="+"/> <

RIGHT-END-POSITION

tgt="2736925"/> <

EVIDENCE

src="EG10707" tgt="EXPERIMENT"/> <

NAMES

<

NAMES

< src="EG10707" tgt="pheA"/> src="EG10707" tgt="b2599"/>

PRODUCT

MONOMER"/> <

PRODUCT-STRING

dehydratase"/> src="EG10707" tgt="CHORISMUTPREPHENDEHYDRAT src="EG10707" tgt="chorismate mutase-P and prephenate March 11, 2004 91

Experiments - Results (Ecocyc)

• OML representation: – OML’s expressive capabilities captured most aspects of gene ontology – some limitations in expressive capability; no facets, cardinality or multiple collection types – terminology differences and definitions not modular March 11, 2004 • Ontolingua representation: – Ontolingua expressed all of gene ontology – Lisp syntax of Ontolingua not readily approachable 92

Experiments - Results (GeneClinics)

• OML representation: – Expressive capabilities adequate to the job – OML/CKML is based on conceptual graphs and may have more expressive capabilities in the long term • Ontolingua representation: – Ontolingua based on frames semantics which more closely aligns with relational and OO data models – Lisp syntax not acceptable to larger community March 11, 2004 • Both languages would benefit from life sciences examples 93

Conclusions and Recommendations

• The language most suitable for the exchange of life sciences ontologies should have the following key characteristics: – Frame-based representation • Long history of work with frame-based representation model • Mappings between this model and relational and/or OO data sources are easily expressed – XML-based syntax • Critical for exchange among physically dispersed community • New tools being developed in XML community • Lots of momentum in the web-based community March 11, 2004 94

Current Efforts

March 11, 2004 • Developed specification for an XML-based exchange language – XOL (XML Ontology Language) based on Ontolingua (Karp/Chaudhri) • Frame-based semantics for OML/CKML • Developing process for submission of life sciences ontologies to the Bio-Ontologies Consortium 95

March 11, 2004

Other Ontology Efforts

• Gene Ontology Consortium (http://genome www.stanford.edu/GO/) • BioPathways Consortium (http://www.3rdmill.com/BioPathways) • mmCIF (http://ndbserver.rutgers.edu/mmcif) 96

March 11, 2004

Bio-Ontologies Consortium Future Work

• Content development • Elicit and review ontology submissions • Synergies with OMG • Provide public-domain ontologies to the Life Sciences community and encourage use of those ontologies • Bio-Ontologies 2000 97