CS440 (Advanced Information Modeling) Lecture 1

Download Report

Transcript CS440 (Advanced Information Modeling) Lecture 1

Formalization of ORM Revisited
Terry Halpin
INTI International University, Malaysia and LogicBlox, Australia
E-mail: [email protected]
1
Contents
• Introduction
• Entities, Semantic Values and Data Values
• Facts, Fact Types, Constraints and Reference Schemes
• Derivation Options
• Future Plans
2
Introduction
•
To ensure that ORM models and queries are unambiguous and executable,
they must be formalizable in terms of underlying logics.
•
My 1989 PhD thesis provided an algorithm to map extended NIAM (= ORM 1)
models (schemas plus populations) to predicate logic.
•
This thesis used deduction trees to prove various theorems concerning strong
satisfiability (population consistency) as well as schema equivalence and
implication (using conservative extensions to define predicates unique to the
other schema before evaluation).
•
In the 1990s, other formalizations of fact-based modeling were provided (e.g.
De Troyer’s PhD, and the formalization of PSM by ter Hofstede, Proper & van
der Weide).
3
•
In 2005, ORM 2 was introduced, modifying the graphical notation, adding
new features (e.g. semiderived fact types, asserted subtypes, role value
constraints, deontic constraints).
•
From 2009 onwards, transforms were implemented at LogicBlox to map
most of ORM 2 to extended datalog.
•
In 2011, ring constraints for local reflexivity, transitivity and strong
intransitivity were added to ORM 2.
•
In recent years, various mappings have been provided to map large parts
of ORM 2 to various description logics.
•
This presentation discusses aspects of my latest formalization of ORM 2,
which is based on classical first-order logic, with some extensions (e.g.
modal operators, bag comprehension, recursion, localized CWA for
negations in derivation rules).
4
Entities, Semantic Values and Data Values
•
My PhD treated named value types (e.g. CountryCode)
DataType
as subtypes of conceptual datatypes (e.g. String).
•
My new approach (also adopted by the FBM WG)
ValueType
replaces this subtyping by a mapping relationship.
ObjectType
(.name)
An ORM object is an individual
EntityType
Country
Person
Employee
SemanticValueType
DataType
(as in classical logic, not OWL)
maps to
CountryCode
String
FamilyName
String
EmployeeNr
Unsigned Integer
5
If value types are subtypes of datatypes, then this model is allowed.
CharString
IATAairlineCode
is of / has
Country
(.name)
has / is of
‘CH’
Airline
(.name)
‘CH’ Bemidji Airlines
Switzerland
‘CH’
CountryCode
is based on a term in
‘CH’ Latin
Language
(.name)
But does the CountryCode ‘CH’ = the IATA airline code ‘CH’?
I say No, because of PII (Identical objects have exactly the same properties)
x,y [x = y   (x  y)]
where  ranges over predicates1
The country code ‘CH’ is based on a term in Latin.
(true)
The IATA airline code ‘CH’ is based on a term in Latin. (false)
1Implemented in first-order logic as the SI inference rule
6
Country
CountryCode
String
New dotted line notation
for datatypes
ORM adopts finite model theory.
In each ORM model, the population of entities and semantic values is finite
(and hence so is the population of data values in use).
As in datalog, syntactic restrictions on derivation rules and queries ensure that
use of datatype operations (e.g. numeric addition or string concatenation) do
not generate infinite sets.
Every ORM model implicitly includes the unary type predicates Object, Entity,
SemanticValue and DataValue. The above schema basically formalizes thus:
x (Country x  Entity x)
x (CountryCode x  SemanticValue x)
x (String x  DataValue x)
7
Semantic values are effectively typed constants, e.g. CountryCode ‘CH’.
The identity relationship “=” is defined between objects of any type, and the
usual axioms for identity apply (reflexive, symmetric, transitive, SI).
The representation relationship “” (is represented by) provides an injective
mapping from semantic values of a given type to data values
e.g.

CountryCode
‘US’
is shorthand for
CountryCode
CountryCode ‘US’
String
‘US’
Axioms for :
A1
A2
A3
x,y [x  y  (SemanticValue x & DataValue y)]
x (SemanticValue x  1y x  y)
y,T 0..1x (Tx & x  y)
2nd-order (T is a type variable)1
A3 is implemented as an axiom schema in first-order logic by replacing quantification over
types by a conjunction over the finite set of domain predicates
1
8
(a)
(b)
CountryCode

CountryCode
String
CountryCode ‘US’
‘US’
‘US’
Either of the above may now be formalized thus:
x (CountryCode x  SemanticValue x) & x (String x  DataValue x)
x,y [(CountryCode x & x  y)  String y]
x (CountryCode x & x  ‘US’)


CountryCode
CountryCode ‘US’
String
‘US’
Pronoun
‘US’ Pronoun ‘US’
The additional part formalizes thus:
x (Pronoun x  SemanticValue x)
x,y [(Pronoun x & x  y)  String y] & x (Pronoun x & x  ‘US’)
Although the country code ‘US’  the pronoun ‘US’, their datatypes are compatible,
and lexical comparisons may be performed by comparing their data values.
9
FORML 2 (our textual language for ORM) is intended to be used for model
output (verbalization), model input, and for queries.
Once the reference scheme is declared, e.g. Gender(.code),
mapping from entities to semantic values to data values is implicitly understood,
e.g.
Gender ‘M’
is implicitly expanded to
Gender that has GenderCode that  ‘M’.
Moreover, a user-comparison such as
CountryCode = Pronoun
is interpreted as a comparison between their data value representations
because that is the likely intent1.
In practice, such implicit mappings can be implemented efficiently
(e.g. using tagged types and autoboxing/unboxing).
1
10
Facts, Fact Types, Constraints
and Reference Schemes
(a)
Country
(.code)
(b)
Country
CountryCode
has
Schema (a) abbreviates schema (b).
Reserving “isIdentified” for “is identified by a preferred reference scheme”,
this formalizes thus:
x (Country x  Entity x) & x (CountryCode x  SemanticValue x)
x,y [x hasCountryCode y  (Country x & CountryCode y)]
x (Country x  1y x hasCountryCode y)
y (CountryCode y  0..1x x hasCountryCode y)
x,y (x CountryCodeIsOfCountry y  y hasCountryCode x )
x (x isIdentified  Entity x)
x,y [x hasCountryCode y  x isIdentified]
11
Scientist
(.name)
Albert Einstein
was born in
Country
(.code)
DE
Top-level (ignoring Object, Entity etc.) entity types are implicitly mutually exclusive,
e.g.
x (Scientist x  ~Country x)
Similarly for top-level semantic value types.
Ignoring reference schemes, the fact type may be formalized thus1:
x,y [x was born in y  (Scientist x & Country y)]
x (Scientist x  1y x was born in y)
The fact instance formalizes thus:
s,sn,c,cc (s hasScientistName sn & sn  ‘Albert Einstein’
& c hasCountryCode cc & cc  ‘DE’ & s wasBornIn c)
1 See the paper for a discussion of predicate naming and typing options
12
is in
State
Country
(.code)
StateCode
has
The preferred, external uniqueness constraint formalizes thus:
c,sc 0..1s (s isInCountry c & s hasStateCode sc)
s,c,sc [(s isInCountry c & s hasStateCode sc)  s isIdentified]
President
is president of
Country
(.code)
The entity-to-entity reference scheme formalizes thus:
p (President p  1c p isPresidentOf c)
c (Country c  0..1p p isPresidentOf c)
p,c [p isPresidentOf c  p isIdentified]
13
is chief executive officer of
ChiefOfficer
Company
is chief technical officer of
This disjunctive reference scheme allows chief officers to be identified by definite
descriptions such as “the CEO of Megasoft” or “the CTO of Megasoft”.
This formalizes thus:
x,y [x is chief executive officer of y  (ChiefOfficer x & Company y)]
x,y [x is chief technical officer of y  (ChiefOfficer x & Company y)]
x [ChiefOfficer x  1y (x is chief executive officer of y
 x is chief technical officer of y)]
x,y [x is chief executive officer of y  ~z(x is chief technical officer of z)]
y (Company y  0..1x x is chief executive officer of y)
y (Company y  0..1x x is chief technical officer of y)
x,y [(x is chief executive officer of y  x is chief technical officer of y)
 x isIdentified]
14
A mandatory, functional, acyclic relationship.
Should this be allowed?
Employee
reports to
No, because it can only be satisfied in an infinite model.
ORM conforms to finite model theory, so this model is illegal.
Instead, model thus:
#≤1
is chief executive
Employee
reports to
15
Modal operators may be used to explicitly assert modalities
e.g.
is licensed to drive
Person
Car
drives
The binary predicate is m:n (the default for binaries in classical logic).
We can record this as a conscious declaration of the modeler using the
alethic possibility operator :
 (p 1.. c p drives c & c 1..p p drives c)
The deontic subset “constraint” may be formalized using the obligation
operator O
e.g.
Op(c p drives c  p is licensed to drive)
16
Derivation Options
An ORM object type or fact type may be
• Asserted
(all its instances are simply asserted)
• Derived
(all its instances are derived)
• Semiderived
(it may have asserted instances and derived instances)
In classical logic, predicates that are currently asserted or fully derived may
become semiderived if the user chooses to add a relevant derivation rule or
assertion respectively.
17
In ORM, complete derivation rules (iff-rules) are formalized as universally
quantified equivalences ,
e.g.
x,y [x is father of y  (x is a parent of y & x is male)]
Incomplete derivation rules (if-rules) are formalized as universally quantified
conditionals ,
e.g.
p (p is cancer prone  p smokes)
p1,p2 [p1 is a grandparent of p2  p3(p1 is a parent of p3
& p3 is a parent of p2)]
18
Universally quantified equivalences and conditionals may also be used to
formalize equality and subset constraints.
ORM distinguishes between different ways to implement such equivalences or
conditionals, e.g. as constraints or (semi-)derivations. These differences matter.
E.g. if we choose the subset constraint below, we cannot assert that Pat smokes
unless we also assert that Pat is cancer-prone.
This is not the case for these other options.
is cancer-prone
is cancer-prone
+
Person
Person
smokes
In mapping to DatalogLB,
we use <- rules for derivations
and -> rules for constraints.
Such distinctions are not yet
supported in OWL.
smokes
+
Person is cancer-prone if Person smokes.
*
is cancer-prone
Person
smokes
* Person is cancer-prone iff
Person smokes
or has been exposed to excessive radiation
or ... .
19
The previous slides focused on some newer aspects of my ORM formalization.
For formalization of the three new ring constraints , see:
Halpin, T. & Curland, M. 2011, ‘Enriched Support for Ring Constraints’, On the
Move to Meaningful Internet Systems 2011: OTM 2011 Workshops, eds.
R. Meersman, T. Dillon, P. Herrero. Springer LNCS 7046, pp. 309-318.
Most other kinds of ORM constraints are formalized in a similar way to that
provided in my PhD thesis, except that I now prefer sorted logic.
Any sorted logic formula may be trivially rewritten in unsorted logic,
e.g.
p:Person d:Date p was born on d
p[ Person p  d (Date d & p was born on d) ]
20
FORML is based on sorted logic, using subscripts where relevant to distinguish
individual variables of the same type (e.g. Person1 instead of p:Person),
adopting Horn clause abbreviations (e.g. implicit universal quantification for
unqualified head variables), and using pronouns for correlation
E.g.
Person1 is a grandson of Person2 iff
Person1 is male
and is a child of some Person3 who is a child of Person2.
p1:Person,p2:Person
[p1 is a grandson of p2  (p1 is male
& p3:Person(p1 is a child of p3 & p3 is a child of p2)]
21
Future Plans
Our current and planned future research efforts include providing a more
rigorous and complete formalization and implementation of
• ORM derivation rules and queries that include negation
• ORM derivation rules and queries that include aggregate functions
• Dynamic constraints
22