www.julesberman.info

Download Report

Transcript www.julesberman.info

Open Source Solutions for
Tissue Banking Informatics
Jules J. Berman, Ph.D., M.D.
INFORMATICS FOR REPOSITORIES
Wednesday, May 21, 2008
3:30 pm – 4:05 pm
Approaches to finding open source
solutions
1. Generalize (don't specialize). Wherever
possible, don't think of your tissue repository
problems as being unique. Try to think of your
problems as instances of very general informatics
problems.
In most cases, the same open source solutions
that work for bioinformaticians, astronomers, and
factory inventories will likely work for you
Approaches to finding open source
solutions
2. Learn a popular open source programming
language that is easy to learn and that is
supplemented by an enthusiastic biomedical
community
Perl
Python
Ruby
Approaches to finding open source
solutions
3. Use open source, unencumbered
nomenclatures, codes, syntactic formats.
Otherwise, can't share or post data through web
MESH (standard, open source, free)
UMLS (standard, encumbered)
SNOMED (standard, encumbered)
Neoplasm Classification (non-standard, open
source, free, standard syntax – XML, RDF)
http://www.julesberman.info/
Approaches to finding open source
solutions
4. Use an open source and general data syntax
HTML (formatting and linking)
XML (describing data)
RDF (getting meaning from described data)
All data can be specified using RDF, developed by the
W3C.
RDF files are collections of statements expressed as data triples
<identified subject><metadata><data>
“Jules Berman” “blood glucose level” “85”
“Mary Smith” “eye color” “brown”
“Samuel Rice” “eye color” “blue”
“Jules Berman” “eye color” “brown”
When you bind a key/value pair to a specified object, you're moving
from the realm of data structure (i.e., XML) into the realm of data
meaning.
RDF permits data to be merged between different files
Medical file:
“Jules Berman” “blood glucose level”
“85”
“Mary Smith” “eye color” “brown”
“Samuel Rice” “eye color” “blue”
“Jules Berman” “eye color” “brown”
Merged Jules Berman database:
“Jules Berman” “blood glucose level”
“85”
“Jules Berman” “eye color” “brown”
“Jules Berman” “hat size” “9”
Hat file:
“Sally Frann” “hat size” “8”
“Jules Berman” “hat size” “9”
“Fred Garfield” “hat size” “9”
“Fred Garfield” “hat_type” “bowler”
Approaches...
5. Use open source utilities – not software
applications (open source or otherwise)l
Utilities are simple programs that do one type of job,
very well. Often work from command-line (i.e., no
GUI)
Once you've mastered a dozen or so utilities, you
can handle most informatics task that you'll come
across.
Applications are often complex and seldom provide
the functionality you need (now or future).
Approaches ...
6. Learn the algorithms for your discipline.
Algorithms are process descriptions that work
every time.
Most informatics algorithms can be implemented
in under ten lines of software code
You can think of software applications as many
algorithms working under a GUI
If you really understand algorithms, you can make
important contributions to your field.
Approaches...
7. De-emphasize standards.
Most standards are difficult to understand, and
there are many of them, often covering obscure
domains. Many standards are just bad.
Data kept in a standard today may be nonstandard legacy data tomorrow.
Unlike physical standards, standards are
transformable (so why fuss over any one
standard?).
Standards can be encumbered
Specifications often a better solution than
Standards
Specifications are just descriptions of your data.
A specification requires a common language for
describing data (so that you and your computer can
understand what it's trying to convey).
Specifications give you enormous freedom to create
and describe new and unconventional data objects.
Usually done in RDF
If you've specified your data well, you can port
between standards when you need to.
Example: Pathology image annotation
Important descriptors of an image might include:
File information
Image capture information
Image format information
Specimen information
Patient information
Pathology information
Region of interest information
JPEG is an image format that is used by millions
of people in all types of professions, including the
medical profession
JPEG can now be used without worrying about IP
issues
You can put any information you want into the
header of a JPEG image (including an RDF
document) so that specified clinical/pathological
information can be conveyed with the image
Because images non-physical, it is usually easy to
interconvert image formats
By annotating our images, we can ensure that the
image conveys meaning and value
By using RDF, we can ensure that the individual
triples can be integrated with heterogeneous data
sources beyond those of images.
By using pre-existing international general
standards for describing any kind of data, we attain
interoperability and avoid the confusion and
complexity that occurs whenever a new standard is
created.
See: http://www.julesberman.info/spec2img.htm
Would you like to write a
Tissue Respository/Tissue Informatics
book?
[email protected]