www.julesberman.info

Download Report

Transcript www.julesberman.info

Implementing an RDF Schema for
Pathology Images, From the
Association for Pathology
Informatics
Jules J. Berman, Ph.D., M.D.
APIII, Pittsburgh, PA
Monday, September 10, 2007
7:30 am – 8:30 am
Pathology images have no value unless they are
annotated with information that describes the
image.
Important descriptors of an image might include:
File information
Image capture information
Image format information
Specimen information
Patient information
Pathology information
Region of interest information
The API (Association for Pathology Informatics)
wants to provide anyone using pathology image
data with optional methods for annotating any kind
of pathology image, in any image format they
prefer.
We did not want to create yet another new standard
that obligates people to use a particular image
format.
Yet, we want to provide methods that could be
understood by colleagues using existing, free
standards for specifying data.
From 2004-2007, the API sponsored LDIP, the
Laboratory Digital Imaging Project, which
consisted of API members and imaging software
developers.
The original purpose of LDIP was to develop a
new, open data specification for pathology images.
LDIP had monthly conference calls, and the
minutes of their discussions are available for
anyone to review.
In 2007, after much discussion, the API Council
determined that there were, in existence, adequate
methods for annotating images. LDIP was
dissolved, and the API Council accepted the
primary goal of providing the field of pathology
informatics with a document that describes
available open annotation methods.
As a secondary goal, the API would provide a very
short RDF Schema that would permit those who
prefer RDF annotations to type their metadata
under general classes and properties that have
particular relevance to pathologists (more about
this later).
A technical white paper (by Jules Berman and Bill
Moore) that contains detailed methods for
annotating images is published today at:
www.julesberman.info/rdfimage.pdf
This paper is distributed under an open source
license, and can be downloaded, copied, redistributed, and even re-posted at other web sites.
The paper describes methods for 6 levels
(organized by increasing difficulty and complexity)
of image annotation.
The methods use existing standards (including
RDF, jpeg, exif, Dublin Core, XML Schema, W3C
Semantic Image Annotation) and do not create any
new standards, just one new very short RDF
Schema document.
Level 1. Simply composing a free-text description
of your image and any other information you'd like
to add, such as your name, and adding the
information as a Comment field in the header of the
image file. The Comment will not alter the binary
content of the image or the visual form of the
image.
When the file is copied, it will retain the header
comment, and anyone receiving the image can
read what you've added, using a simple Perl or
Ruby script provided in the document, or using a
simple extraction program prepared in any
preferred programming language.
Level 2. Insert the Dublin Core file descriptors into
your Comment.
The Dublin Core is basic information designed by
librarians to provide a minimal set of data to
describe the contents of an electronic document.
When the file is copied, it will retain the Dublin Core
metadata, and anyone receiving the image can
read what you've added, using a simple Perl or
Ruby program provided in the document, or using a
simple extraction program prepared in any
preferred programming language.
Level 3. Insert an RDF (Resource Description
Framework) document into your image file.
The RDF document can be extracted, and the
triples in the document can be extracted and
integrated with other data.
All data can be specified using RDF, developed by the
W3C.
RDF files are collections of statements expressed as data triples
<identified subject><metadata><data>
“Jules Berman” “blood glucose level” “85”
“Mary Smith” “eye color” “brown”
“Samuel Rice” “eye color” “blue”
“Jules Berman” “eye color” “brown”
When you bind a key/value pair to a specified object, you're moving
from the realm of data structure (i.e., XML) into the realm of data
meaning.
RDF permits data to be merged between different files
Medical file:
“Jules Berman” “blood glucose level”
“85”
“Mary Smith” “eye color” “brown”
“Samuel Rice” “eye color” “blue”
“Jules Berman” “eye color” “brown”
Merged Jules Berman database:
“Jules Berman” “blood glucose level”
“85”
“Jules Berman” “eye color” “brown”
“Jules Berman” “hat size” “9”
Hat file:
“Sally Frann” “hat size” “8”
“Jules Berman” “hat size” “9”
“Fred Garfield” “hat size” “9”
“Fred Garfield” “hat_type” “bowler”
Level 4. Insert your image into an RDF document.
The image can be extracted from the RDF
document.
Level 5. Point to your image file from an RDF
document.
The RDF document and the image file (for example
jpeg) can be separate documents linked by URLs.
Level 6. Break up your annotative data and your
image binaries into multiple documents that can be
pointed from any of the files and that can exclude
or include RDF or image binary data as desired.
The RDF data can be distributed into multiple
documents, and each RDF document may point to
more than one image file.
By annotating our images, we can ensure that the
image conveys meaning and value
By using RDF, we can ensure that the individual
triples can be integrated with heterogeneous data
sources beyond those of images.
By using pre-existing international standards for
describing data, we attain interoperability and avoid
the confusion and complexity that occurs whenever
a new standard is created.
See: www.julesberman.info/rdfimage.pdf