Enhancing GO for the sake of clinical bionformatics Anand Kumar

Download Report

Transcript Enhancing GO for the sake of clinical bionformatics Anand Kumar

Enhancing GO for the sake of clinical bionformatics

Anand Kumar IFOMIS, University of Leipzig /Saarbrücken

Clinical Bioinformatics

 Clinical Informatics / Medical Informatics  Bioinformatics  Biology – Medicine  Bioinformatics – Medical Informatics  Biology – Bioinformatics – Medical Informatics – Medicine

Need of clinical bioinformatics

 Representation of entities present at various levels of granularity  Must respect nature  Must respect the professionals and the levels of granularity they deal with  Tissues : borderline?

Gene Ontology

 Very widely used for annotations of gene products  Not without problems  Moving target: Updates Improvements made regularly  Axes: Cellular component, Molecular function, Biological processes

Problems with GO

 Universals and particulars (Red blood cell and my Red blood cell)  Continuants and occurrants (Enzyme and enzymatic activity)  Dependent and independent entities (Respiration and Lungs)  Definitions of parents and children

Problems with GO

 Parthood relations (nucleus part of cell)  Circular definitions  Synonyms (antigen and antibody binding)  Compositional nature of terms (activator of the establishment of competence for transformation activity)  Representation of Time

Levels of Granularity

 Organism (human being)  Organ System (Respiratory system, Alimentary system)  Cardinal body parts (Head, Upper limbs)  Organs (Lung, Liver)  Organ parts (Upper lobe of Lung, Renal pelvis

Levels of Granularity

 Tissue (Pulmonary alveolar epithelium)  Tissue subdivision (Anterior epithelium of iris)  Collection of cells (Menstrual secretion)  Cells (Pulmonary epithelial cell)  Collection of subcellular organelles (Rough endoplasmic reticulum)  Subcellular organelles (Mitochondrium)  Molecules (Enzymes)

Levels of granularity

 Count nouns Vs. Mass nouns  Mass nouns as collections  Structure  Issue of Tissues  Portions? – Subdivisions  Units? – Lobules not tissues  Maximal portion? – Name tissue itself  Parts? – Tissue subdivisions

Granularity and Parthood

 Cardinal body parts and organ systems overlap.

 Definitions, for example 

gr

(

ribosome

)=Subcellular  Can we call ribosome part-of hepatocyte?

 Parthood has a closure for both parts and wholes

Granularity and Parthood

  Not all ribosomes are present within hepatocytes Should we create a class: hepatocytic cell’s ribosome?

 consider the lipopolysaccharides within the ribosomal membrane present within the hepatocyte  Should we create a class: hepatocytic cell ribosome’s cellular membrane’s lipopolysaccharides?

Granularity and Parthood

 hepatocyte*ribosome: Those ribosomes which are parts of hepatocytes  Each entity existing at a coarser level of granularity has an entity existing at each finer level of granularity as its part  Each entity existing at a finer level of granularity is a part of an entitiy existing at each coarser level of granularity  In some situations not all coarser levels exist: collection of cells, areolar tissue

Gene Ontology and Granularity

 Cellular component axis contains anatomical entities cell as its highest level of granularity  Molecular functions and biological processes do not end at that level, atleast not for multicellular organisms and higher orders

Gene Ontology and Granularity

 Functions and processes are dependent entities, dependent on independent entities which bear them  Since the independent entities, in this case anatomical entities, have the highest granularity as cell, any biological process which occurs at granularity coarser than cells can not get an adequate representation there

Gene Ontology and Granularity

 Processes at cellular and subcellular levels could together be some or all parts of the large biological processes  Biological processes like behavior, response to extracellular stimulus, sex determination clearly have some component processes which occur at cellular level of granularity ones but not all of them

Gene Ontology and Granularity

 GO has

Extracellular

 Defined as: The space external to the outermost structure of a cell  Since there is no external limit defined, that space could include virtually all the space within human body which are not intracellular  Not a good solution

Gene Ontology and Granularity

 GO does not provide links between the three orthogonal axes, though there are various teams working on it  Problems especially when biological processes needed to be linked to cellular components  Way too many of cellular components by automatic and semi-automatic methods found eligible as bearers of the general biological processes like growth, metabolism, homeostasis  Another reason to deal with granularity

Gene Ontology and Granularity

 Since GO entities are not species specific, except those with “sensu” like cytosolic ribosome (sensu Bacteria)  Difficult to understand the meaning behind entities like adult behavior  could mean a cellular level of granularity when applied to unicellular organisms and organism level of granularity for human beings

Gene Ontology and Granularity

 One of the ways to get around this problem is to consider the Gene Ontology Annotations, which are species-specific  Human gene products which are annotated to entities within GO’s orthogonal axes relate those entities specific for human beings  Might mean that relations between entities within GO’s orthogonal axes would be different for different species or may not exist at all

Granularity of Functions and processes

 More complicated than anatomical entities  Three possible ways to represent the granularity  On the basis of granularity levels existing for anatomical entities: Cellular functions (dependent on cells) exist at the cellular level of granularity.  Subcellular organelle functions, organ functions and so on this exist at the respective level of granularity of their bearers

Granularity of Functions and processes

 On the basis of time  Time periods can be years, months, weeks, days, hours, minutes, seconds and so on  a process which continues to take place over a longer period of time has a coarser temporal granularity as compared to processes which take place over a smaller period of time.

Granularity of Functions and processes

 On the basis of parthood relations existing between various functions and processes. (Most complicated)  Different functions and processes have different number of parts and different depths to which one can identify those parts  Not all instances of a smaller process are parts of larger process (

hexokinase 1 activity

: glycolysis, fructose and mannose metabolism, galactose metabolism, starch and sucrose metabolism and in aminosugar metabolism )

Granularity of Functions and processes

hexokinase 1 activity involved in glycolytic pathway,

to represent those cases where hexokinase 1 activity is involved in glycolysis (could be

glycolytic pathway*hexokinase 1 activity

)

Acknowledgements

 Work on this paper was carried out under the auspices of the Wolfgang Paul Program of the Humboldt Foundation and also of the EU Network of Excellence in Semantic Datamining and the project "Forms of Life" sponsored by the Volkswagen Foundation.

 Thanks to Barry Smith, Cornelius Rosse, Onard Mejino, Alan Rector, Jeremy Rogers

Enhancing GO for the sake of clinical bionformatics

Anand Kumar IFOMIS University of Leipzig/Saarbrücken