ICBO 2011 July 28-30, 2011 Buffalo, New York, An Advanced Strategy for Integration of Biological Measurement Data Hiroshi Masuya1, Georgios V.
Download ReportTranscript ICBO 2011 July 28-30, 2011 Buffalo, New York, An Advanced Strategy for Integration of Biological Measurement Data Hiroshi Masuya1, Georgios V.
ICBO 2011 July 28-30, 2011 Buffalo, New York,
An Advanced Strategy for Integration of Biological Measurement Data
Hiroshi Masuya 1 , Georgios V. Gkoutos 2 , Nobuhiko Tanaka 1 , Kazunori Waki 1 , Yoshihiro Okuda 3 , Tatsuya Kushida 3 , Norio Kobayashi 4 , Koji Doi 4 , Kouji Kozaki 5 , Robert Hoehndorf 1 , Shigeharu Wakana 1 , Tetsuro Toyoda 4 Riichiro Mizoguchi 5 and 1: RIKEN BioResource Center, Tsukuba, Japan 2: Department of Genetics, University of Cambridge, UK 3: NalaPro Technologies, Inc, Tokyo, Japan 4: RIKEN BASE, Yokohama Japan 5: Department of Knowledge Systems, ISIR, Osaka University, Japan
Motivation of this study
Phenotypes represent a broad range of variations in measured qualities Sophisticated informatics infrastructure (ontology) Integrated phenotypic information whole
Organism A Organism B Organism C Organism D Mining… Biological knowledge
To contribute to development of the informatics infrastructure for the description, exchange and mining of phenotypic data.
Phenotypic Quality (PATO):
PATO provides a practical basis for vocabulary and semantics for the description of phenotype information across species.
•
Single hierarchy model of “quality” suite for BFO
•
Standard of phenotype annotation across species.
(“EQ” annotation)
•
Less confusions than “EAV” annotation for non-ontology-familiar people.
•
Basis of inferences of cross-species phenotype equivalence with EQ.
(e.g. mouse phenotype and disease) MP:0002269 ! muscular atrophy
E Q
MA:0000015 ! muscle PATO:0001623 ! atrophied FMA:30316 ! muscle
E
PATO:0001623 ! atrophied
Q
HP:0003202 ! Amyotrophy
Expansion of PATO
We attempted to expand the PATO ontology to ensure a more advanced, explicit and consistent knowledge framework.
Objectives: 1. To provide fundamental classification of quality values on the basis of measurement scales.
2. To provide strict data model to operate context dependencies of ordinal values.
3. To provide model of datum (or description) as a informational entity with the structure of common formalisms.
Fundamental classification of quality-value (1)
Refrain from 2-hiearchy model (and EAV formalism)
There were a lot of discussions for PATO to take 1 hiearchy and EQ…
1.
Number of studies claims that the fundamental classification of values: “scales of measurement” (Stevens S.S, 1946) is beneficial for data integration in the field of experimental science.
length
20cm Long, short
temperature
37 ℃ 310.15K
high, low
color
red, blue..
This classification takes as starting point the mathematical operation!
Fundamental classification of quality-value (2)
2.
Foundation of explicit description of change of quality is needed
Color 1: green to orange Color 2: orange to green 1 2 t1 1 2 t2
Ontology
BFO, PATO DOLCE Growing boy and his height quality
System of quality
1-hiearchy 2-hiearchy
Formalism
EQ EAV Explicit description of color change is needed.
Qualitative and quantitative descriptions are integrated in a single knowledge framework in DOLCE. For the coordination of ongoing efforts, equivalence mapping of these systems is beneficial.
Model of context-dependency of ordinal value (1)
I’m big!!
Problem of “ large ant and small elephant”
How to classify value instances?
Context A: simple comparison I’m small..
value A value C “Small” class value B value D “Large” class smaller Threshold X (some value) larger
Model of context-dependency of ordinal value (2) Problem of “ large ant and small elephant”
I’m big!!
How to classify value instances?
I’m small..
Context B: deviation based comparison (context of inference of cross-species equivalence of phenotypes)
value A value C deviation Threshold Y1 and Y2 (deviation-based value) value B value D “abnormally large” class larger smaller “normal size” class larger smaller “abnormally small” class
Knowledge model of context dependencies of ordinal scale values is needed!
Model of datum as an informational entity
Current version of DOLCE, BFO and PATO deal only with the primary reality and do not deal with quality description.
1. Distinction of a “true value” and an “empirical measurement” as an approximation is needed.
weigh t weigh t Reality (Unknown…) weigh t Information
2. Modeling of informational entities with common formalisms (eg. EQ, EAV and so on) and their relationships would be useful!
Reality (Unknown...) weigh t
EQ EAV
weigh t weigh t Information
BFO quality PATO OBI
Expansion of PATO with YAMATO framework
Mapping
YAMATO
Interoperability
DOLCE A reference ontology
“PATO2YAMATO”
•
Equivalence mapping between 1- and 2-hiearcy models
quale quality-space •
Model of context dependency
•
Model of datum with common formalisms
Practical use based proposals… • •
Yet Another More Advanced Top-Level Ontology
•
Features: (YAMATO: Mizoguchi, 2009) Framework of interoperability of quality-related concepts between top-level ontologies. Support of classification of scales of measurements.
Model of context dependency with “role” Detailed model of “representation” (an informational object) that involves quality representation.
Equivalence mapping of 1- and 2-hearcy model
BFO
(Upper level)
YAMATO
(Upper level)
quality quality_ quantity quality
(convertible) identical
property generic quality
identical
quality value
DOLCE
(Upper level)
region quality space quale
About 1,000 PATO terms were manually mapped to YAMATO framework. Classification of quality value (scales of measurements : Stevens S.S, 1946)
identical
Modeling of context dependency with “role”
I’m a teacher.
An entity often plays different “ roles ” with different characteristics under different contexts
(at school) (at home) I’m a husband In the
distribution for weight,
some
large-roles
thereby becomes
weight quality values
playing
role holders, abnormally heavy
context
role-holder
( Entity playing a role)
Distribution for weight
role
Abnormally heavy
potential player
large-role weight quality value
depend on playable Concept model of role and role-holder heavier than normal value qualitative value for weight
Modeling of context dependency with “role”
I’m a teacher.
An entity often plays different “ roles ” with different characteristics under different contexts
(at school) (at home) I’m a husband In the
distribution for weight,
some
large-roles
thereby becomes
weight quality values
playing
role holders, abnormally heavy
context
Potential
player Role-holder Implementation and representation in Hozo ontology editor
Inter-relationships among contexts
Classification of organisms
Inherit Inherit
Inference of classification: ”Abnormally light in elephant is lighter than abnormally heavy in ant” “abnormally heavy”
Context of distribution of weight in elephant
Coordination of ordinary values under different contexts
Context of distribution of weight in ant
Abnormally heavy in elephant
larger
Normal weight in elephant
larger
Abnormally light in elephant
larger
Abnormally heavy in ant
larger
Normal weight in ant
larger
Abnormally light in ant
Simple comparison context
Quality representation in YAMATO
YAMATO provides “quality representation” for the foundation of formalized informational entities such as EQ, EAV and so on.
Reality Informational entities
Quality representation Weight Quality (Symbolization) Quality representation is modeled in the consistent way for content bearing informational entity, “representation”.
Basic structure for representation by symbol
quality representation
EP (=EQ) (BFO, PATO) EAV (DOLCE) Sentence of natural language Coding of genetic information Tupple *entity, #property Triple natural language alphabet nucleotide sequence molecular symbol (Mizoguchi, 2004) quality measurement *entity, #generic quality, value quality measurement anything… Specification of gene product
*: symbolization operation, #: Class => individual operation (equivalent with punning in OWL 2)
Current status of the reference ontology: PATO2YAMATO
•
Including about 1,000 PATO terms into YAMATO framework
•
Basic form of context-dependent ordinal values are defined. They are workable under the classification of organisms.
•
Basic form of quality representation (EAV and EQ) are already defined in YAMATO.
http://www.brc.riken.go.jp/lab/bpmp/ontology/ontology_pato2yato.html
Preliminary trial of simple conversion of EQ to EAV
1,450 EQ annotation: ( OBO cross-product file for Mammalian Phenotype ontology )
reference: PATO2YAMATO
EAV-quality representations in YAMATO framework
The ontology helps the automatic conversion from EQ to EAV!
We are planning full conversion of EQ across multiple species with coordinated EAV quality representation.
Summary of this talk
• •
This study shows: YAMATO’s framework helps to coordinate different “qualities” for phenotype information in both of reality and description level.
Role-model successfully coordinated ordinal values dependent on multiple contexts (deviation-based and simple comparison).
• • • •
Future views: Automatic conversion of EQ of multiple species to EAV.
Modeling of contexts of experimental conditions.
Integration of qualitative and quantitative phenotype data.
Coordination of more complicated phenotype data sets from multiple species and experiments.
Acknowledgements
RIKEN BioResource Center Nobuhiko Tanaka, Kazunori Waki, Terue Takatsuki University of Cambridge Georgios V. Gkoutos, Robert Hoehndorf NalaPro Technologies Inc Yoshihiro Okuda, Tatsuya Kushida Enegate corp Mamoru Ota RIKEN BASE Norio Kobayashi, Koji Doi, Tetsuro Toyoda Department of Knowledge Systems, ISIR, Osaka University Koji Kozaki, Riichiro mizoguchi
貴為和以 “
Harmony is to be valued.”
In “Seventeen-article constitution” (A.D 603, YAMATO imperial court in ancient Japan) Authored by Prince Sh ō toku (A.D. 573–621) Thank you !