caDSR Content Development

Download Report

Transcript caDSR Content Development

Presentation Title: The NCI’s cancer Data Standards
Repository (caDSR): Harmonization of Ontology,
Information Models and Metadata
9th Open Forum on Metadata Registries
Harmonization of Terminology, Ontology and Metadata
20th – 22nd March, 2006 , Kobe Japan.
Day: March 22
Slot No.P21
Name: Denise Warzel
Organization: National Cancer Institute Center for Bioinformatics
1
 Who
is NCI Center for Bioinformatics?
– Part of US Government National Institutes of Health (NIH)
– The Center for Bioinformatics is the National Cancer
Institute’s strategic and tactical arm for research information
management
– We collaborate with both intramural and extramural groups
– Mission to integrate and harmonize disparate research data
– Production, service-oriented data management
9th Open Forum for Metadata Registry, Kobe, 2006
NCI Goal: Relieve pain, suffering and
death due to cancer by the year 2015
 Enable
investigators and research teams
nationwide, or worldwide, to combine
and leverage their findings and
expertise in order to meet this goal
Our advantage: A homogeneous community of
interest with a common focus, willing to adopt
standards and recommended best practices
9th Open Forum for Metadata Registry, Kobe, 2006
Critical Success Factor: Semantic
Interoperability
ability of a system to
access and use
the parts or equipment
of another system
Syntactic
interoperability
Semantic
interoperability
Emphasis on Machine Interoperability
Over a cancer bioinformatics Grid
9th Open Forum for Metadata Registry, Kobe, 2006
Bruce Bargmeyer
We have come to join …
Terminology
&
Metadata
….. &
Information
Models
9th Open Forum for Metadata Registry, Kobe, 2006
The NCI’s
Cancer Data Standards Repository (caDSR)
An XMDR approach to Harmonization of
Terminolgy, Information Models and
Metadata
Achieved through:
- 3 layers of semantics
- Infrastructure and compatibility guidelines
- Model Driven Architecture
- Open Standards
- Tools
- Community Governance
9th Open Forum for Metadata Registry, Kobe, 2006
SYNTACTIC
caBIG Compatibility Guidelines
SEMANTIC
SEMANTIC
SEMANTIC
9th Open Forum for Metadata Registry, Kobe, 2006
MDA Approach

Analyze the problem space and develop the
artifacts for each scenario
– Use Cases

Use Unified Modeling Language (UML) to
standardize model representations and
artifacts. Design the system by developing
artifacts based on the use cases
– Class Diagram – Information Model
– Sequence Diagram – Temporal Behavior

Use meta-model tools to generate the code
9th Open Forum for Metadata Registry, Kobe, 2006
Limitations of MDA
Limited expressivity for semantics
 No facility for runtime semantic metadata
management

9th Open Forum for Metadata Registry, Kobe, 2006
caCORE
MDA plus a whole lot more!
Open software
Open development
ISO/IEC 11179
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE Tools
– NCI Terminology Browser to search and Navigate NCI
Thesaurus and other terminologies curated by EVS
– caDSR – Repository and administration Tool
– CDE Browser to Search for, View and Download
– Side-by-Side Compare
– Form Builder to Create user specified collections of
CDEs
- Sentinel Tool to Generate end user ‘Alerts’ triggered by
metadata changes
- caCORE SDK is a toolkit to create “semantically
integrated” applications -- all exposed API elements
have runtime accessible metadata that defines the
meaning of the elements using controlled vocabualries.
Access, Develop, Manage, Consume
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE Toolkit
SDK Components
 UML Modeling
Tool (any with XMI
export)
 Semantic Connector (concept binding
utility)
caCORE SDK Generates a
 UML Loader (model registration in
caDSR) caBIG Silver-Compliant System
 Codegen (middleware code generator)
 Security Adaptor (Common Security
Module)
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE Semantic Components
3 Layers of Semantics
Application Domain Model
(UML)
Common Data Elements
(11179)
Enterprise Vocabulary
(OWL-DL)
9th Open Forum for Metadata Registry, Kobe, 2006
S
E
C
U
R
I
T
Y
Application Objects
(UML)
9th Open Forum for Metadata Registry, Kobe, 2006
Common Data Elements
(11179)





What do all those data classes and attributes actually
mean, anyway?
Data descriptors or “semantic metadata” required
Computable, commonly structured, reusable units of
metadata are “Common Data Elements” or CDEs.
NCI uses the ISO/IEC 11179 standard for metadata
structure and registration
Semantics drawn from Enterprise Vocabulary Service
resources
9th Open Forum for Metadata Registry, Kobe, 2006
Enterprise Vocabulary
Description Logic
Concept Code
Relationships
Preferred Name
Definition
Synonyms
9th Open Forum for Metadata Registry, Kobe, 2006
Enterprise Vocabulary
Description Logic
“Carcinoma” Disease_Associated_with_Disease “Lytic Bone
Lesions”
Relationships
“TP53” Gene_associated_with_Disease “Breast Carcinoma”

Semantic Types:
– Gene_associated_with_Disease C43780 Molecular
abnormalities in the gene may be associated with the
manifestation of disease. The role is used to assert a
link between gene and disease and is considered to
have clinical relevance. The domain and range kind
for this role are Gene_Kind and
Findings_and_Disorders_Kind, respectively.
9th Open Forum for Metadata Registry, Kobe, 2006
How? Cancer Ontologic Research
Environment (caCORE)
• MDA - UML domain models (Blue)
Model Driven Architecture (MDA)
Simplify application development
Embed semantic integration – annotate model with cancer
Concepts
XMDR
• caDSR IEC/ISO 11179 Metadata Registry (Gold)
- Common Data Elements
- provides the semantics for data elements
- NCI expanded 11179 register models: UML, forms,
protocols, analytic services, analytic tools, data services
• EVS Shared Terminology (Red)
- Enterprise Vocabulary Services (EVS)
- Standard terms and definitions
- Cancer specific ontology
9th Open Forum for Metadata Registry, Kobe, 2006
Created a UML  caDSR Mapping
ValueDoman:Enumeration
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE SDK - Common Methodology Workflow
EVS

UML Model
XMI File
Semantic
Integration
Workbench
(SIW)
Fixed XMI
NO
Verified
EVSReport

Terminology Services
Using
CodeGen?
Successful
Test?
caDSR Services
NO

YES
Verified
Annotated
Fixed XMI
Load to Stage
YES
Compatibility
Review
Code Generator
Approved
Annotated
Fixed XMI

UML
Loader
UML
Loader
Stage
Prod
Public APIs
Metadata
Retrieval
caDSR
Production
caDSR
STAGE
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE
Application Domain Models
(UML)
9th Open Forum for Metadata Registry, Kobe, 2006
Computable Interoperability
Agent
C1708
name
Drug
C1708
id
nSCNumber
C1708:C41243
NDCCode
CTEPName
approvalDate
FDAIndID
approver
IUPACName
fdaCode
My model
Your model
9th Open Forum for Metadata Registry, Kobe, 2006
C1708:C41243
Tying it all together: The caCORE
semantic management framework
Application Objects
CDEs
Desc. Logic
Concept Codes
2223333
C1708
2223866
2223869
2223870
2223871
C1708:C41243
C1708:C25393
C1708:C25683
C1708:C42614
Common Data Elements
Enterprise Vocabulary
9th Open Forum for Metadata Registry, Kobe, 2006
caCORE Infrastructure
Public APIs
Domain object metadata
Data
Elements
Harmonized
Common data elements
(CDEs)
Vocabulary for
CDE specification
Dictionary, thesaurus
services
9th Open Forum for Metadata Registry, Kobe, 2006
Kevin Keck
XMDR 11179 Edition 3
Concepts and Relationships
9th Open Forum for Metadata Registry, Kobe, 2006
*Concept Use and Integration
with 11179 Part 3, Edition 2
Object Class
Conceptual Domain
Agent
Classification Schemes
caDSRTraining
Chemopreventive
Agent
Valid Values
Data Element Concept
Chemopreventive Agent
NSC Number
Cyclooxygenase Inhibitor
Doxercalciferol
Eflornithine
…
Ursodiol
Value Domain
NSC Code
Property
Representation
NSCNumber
Code
Data Element
Chemopreventive Agent Name
Context
caCORE
9th Open Forum for Metadata Registry, Kobe, 2006
Concept Use and Integration
Everything in Red in the caDSR is directly
Associated with a CONCEPT
UML Model/Package
Conceptual Domain
Drugs and Chemicals
C1913
UML datatype
or Enumeration
Classification Schemes
caDSR Training
UML Class
Valid Values
Object Class
Chemopreventive Agent
C1892
Data Element Concept
Chemopreventive Agent Namet
Property
Name
C42614
UML attribute
Value Domain
Drug Name Text
Anethole Trithione C246
Cyclooxygenase Inhibitor C1323
Ginger C2691
Green Tea C2694
Iloprost C48397
…
Ursodiol C1818
Representation
Name
C42614
Data Element: Chemopreventive Agent Drug Name
caDSR 11179 ID: 2008765v1.0
Semantic Signature: C1913.C1892.C42614.C42614.C246.C1323….
(ConceptualDomain.ObjectClass.Property.Repsentation.Values)
* based on ISO/IEC 11179 Part 3 Metamodel
9th Open Forum for Metadata Registry, Kobe, 2006
Where have we been? Where are we
now?…& where are we planning to go?
caGrid
System manuals
caCORE
Data dictionaries
caDSR
11179 E1
XML & related standards
EVS/caDSR
11179 E2
XMDR Project
11179 E3
EVS
Complex semantics management
Data engineering/XML Data
Semantics management for data
Data Standards/Data Administration
9th Open Forum for Metadata Registry, Kobe, 2006
Acknowledgements

NCICB
• Semantic Bits

ScenPro
- Ram Chilukuri
Peter Covitz
– Bill McCurry
Denise Warzel
– Tom Phillips
George Komatsoulis • MSD
– Robert Harding
- Nicole Thomas
Frank Hartel
– Jennifer Brush
Sherri De Coronado
•XMDR
Gilberto Fragoso
– Larry Hebel
•Bruce Bargmeyer (LBNL) – Smita Hastak
 Oracle
•Kevin Keck (LBNL)
– Steve Alred
•Frank Olken (LBNL)  ISO
– Prerna Aggarwal
•John McCarthy (LBNL) – ISO/IEC 11179 Information
– Christophe Ludet
•Karlo Berket (LBNL)
Technology - Metadata Registries
– Shaji Kakkodi
•Harold Solbrig (Mayo)
(MDR) Parts 1-6
– Jane Jiang
•Gayle Hodge (USGS)
• 2002(E) +
– Anwar Anhad
•Denise Warzel (NCI)
– Jennifer Dong
•Larry Fitzwater (EPA)
•Nancy Lawler (DOD)
•Sam Chance (DOD)
– NCICB URL
–
–
–
–
–
–
http://ncicb.nci.nih.gov/infrastructure/cacore_overview
9th Open Forum for Metadata Registry, Kobe, 2006