The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China Biomedical Data Integration and Knowledgebase Lei Liu, Ph.D. Shanghai Center for Bioinformation.

Download Report

Transcript The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China Biomedical Data Integration and Knowledgebase Lei Liu, Ph.D. Shanghai Center for Bioinformation.

The Fifth China-US Roundtable on Scientific Data Cooperation
October 27-28, 2011, Beijing, China
Biomedical Data Integration
and Knowledgebase
Lei Liu, Ph.D.
Shanghai Center for Bioinformation Technology
And
Shanghai Institutes for Biological Sciences, CAS
Part 1: Ontology
Knowledge Management
Data Integration and Exchange
Semantic Interoperability
Decision Support and Reasoning
Knowledge Management
Annotating Data and Resources
Accessing Biomedical Information
Mapping across Biomedical Ontologies
Ontology
Data Exchange &
Semantic Interoperability
Information and Data Integration
Semantic Interoperability
Ontology
Decision Support and Reasoning
Data Selection
Data Aggregation
Decision Support
Natural Language Processing Applications
Knowledge Discovery
Ontology
Example: Ontology Server
Example: Building Knowledge Base
Edit
Example: Building Knowledge Base
Search tool
Part 2: SNOMED CT
CAP
Systematized Nomenclature
of Medicine—Reference
Terminology(SNOMED RT)
SNOMED CT
Clinical Terms(CT)
NHS
CAP: College of American Pathologists
NHS: National Health Service
IHTSDO: International Health Terminology
Standards Development Organization
Core contents
SNOMED CT
Applications
Electronic Health Record Systems
Computerized Provider Order Entry(CPOE)
Knowledge databases used in clinical decision
support systems(CDSS)
Remote Intensive Care Unit Monitoring
Laboratory Reporting
Cancer Reporting
Genetic Databases
SNOMED CT
Medical domains of the 100 Medline indexed papers in which a specific medical domain has
been described. (BMC Medical Informatics and Decision Making 2008, 8(Suppl 1):S2)
SNOMED CT
Example: Mapping
Example: Encoding
Example: Standardization of Terminology
Part 3: OpenEHR
Promote and
publish formal
specification
Work closely with
standards bodies
Promote and
publish EHR
architectures and
models
Implement EHR
architectures into
clinical use
Objectives
Maintain open source
“reference”
implementation
Interoperable
health informatics
system
openEHR introduction
• Definition:
– openEHR is an open standard specification in
health informatics that describes the management
and storage, retrieval and exchange of health data
in electronic health records (EHRs)
• Features:
– Patient-centric
– Lifelong
– Vendor-independent
Architecture of OpenEHR
OpenEHR Release 1.0.2
Two-level modeling of openEHR
openEHR EHR system
implementation
applicability
• Apply
– Store data
– Search data
– Share data
• Not apply
– Control the
exchanging
flow
Integration of SNOMED CT into OpenEHR
HL7 v3 introduction
• mission: provides standards for
interoperability
• Features
– standard data, use reference information model
(RIM)
• CDA, standardize clinical documents for exchange
– support healthcare workflows (V3 messaging)
RIM
applicability
• Apply
– exchange information
– Control the exchanging flow
– Control the exchanging data’s size
• Not apply
– Store data (we can store CDAs, but it’s not a best
practice)
– Search data
Ongoing Biomedical Informatics Projects
Clinical Data Warehouse
Common Data Element Editor
HL7 V3 Message Model
Clinical Decision Support
Medical Natural Language Processing
Research Data Entry System
CDA Transfer Engine
Medical Channel
Data Sharing
Synchronous Liver Metastasis Model
+
Decision Support Drug Knowledge Base
Tissue Bank Annotation
Medical Terminology Service
Clinical Guideline Computerization
Others
Clinical Data and Sample is at the Core
of Translational Medicine
Biomarker
Genotypes
Clinical Data
Clinical
Practice
Clinical
Trial
LIMS
Biospecimen
caBIG® Workspaces
Domain
Workspaces
Clinical Trials Management Systems
(CTMS,临床实验管理系统)
Integrative Cancer Research
(ICR,综合肿瘤研究)
https://cabig.nci.nih.gov/workspaces/CTMS/
https://cabig.nci.nih.gov/workspaces/ICR
Tissue Banks & Pathology Tools (TBPT,
组织库&病理学工)
In Vivo Imaging
(Vivo成像)
https://cabig.nci.nih.gov/workspaces/TBPT
https://cabig.nci.nih.gov/workspaces/Imaging
Vocabularies&Common Data Elements
(VCDE,词汇&公共数据元素)
Cross Cutting &
Strategic
Workspaces
https://cabig.nci.nih.gov/workspaces/VCDE
Data Sharing & Intellectual Capital (DSIC,
数据共享&智能财产)
https://cabig.nci.nih.gov/working_groups/DSIC_SLWG
Architecture
(体系构架)
https://cabig.nci.nih.gov/workspaces/Architecture
Documentation & Training
(D&T,文件&培训)
https://cabig.nci.nih.gov/working_groups/Training_SLWG
References and Standards
Collaboration with NCI and caBIG:
•Attended the caBIG annual meeting and visited caBIG in 2008
•Two people from our center attended the Boot Camp
References used:
① caCORE (Cancer Common Ontologic Representation Environment):
② caDSR (Cancer Data Standards Repository)
③ NCI CBIIT (National Cancer institute Center for Biomedical
Informatics and Information Technology)
Tissue Bank Information Management System
样本数据库信息管理系统
全面解决方案
Biobank Information Management Platform
Use Cases
Combined Tissue Bank Annotation from
Operation Summary and Pathology Report
Medical Natural Language Processing
Difficulties of acquiring data and multiple times of entering

Direct connection to HIS、LIS and EMR

Automatic transferring of data without entering by staffs

Active reminding system for follow-up

Automatic Data Query and Extraction Across Systems
Clinical Information Enquiry System:
The overall framework and subsystems
PACS
database
Patient
follow-up
database
General enquiries
Patients‘
situation
of
treatment
database
Personalized
treatment
procedures
Diagnostic
tests
database
HIS
database
LIS
database
D-QIS
database
Molecular
classification
database
Sample
database
Clinical Information Enquiry System
Clinical Data Warehouse
Part to process RIM object(create, delete, update,
query RIM boject)
mif
document
CALL
Javasig
xml
document
RIM processor(busyness layer)
RIM object
Persistence layer(hibernate)
TABLE
RDB
RIM database structure
Clinical Data Warehouse
R-MIM Model
Database Structure
SOA Service Bus
Clinical Document
(XML)
Database Records
Clinical Document CDA Transfer Engine
Schema for Clinical Document
HL7 CDA Schema
Discharge
Summary
Mapping
Transfer
Engine
CDA File
Common Medical Terminology Service
Difficulties of Extracting Data
METHODS
Model's Performance
Model's Performance
Model's Performance
Biomedical Data Integration and Mining
Medical Informatic
Disease
Integration
Personalized Medicine
Databases
Data Mining
Personalized Medicine
Decision Support System
Bioinformatics
Translational Medicine
Genomics
Gene
Drug
Disease and Gene Integration
疾病信息
参考文献信息
GAD
疾病分类信息
COSMIC
主表
试验样本信息
Data Integration
突变信息
Gene2Disease Databases
Genetic Polymorphisms 39910
Gene Mutations 1506545
19 Major Diseases
Structured Gene Information 31412
基因信息
Drug and Drug metabolism Study
• Drug-Target-SNP Integration and Databases
dbSNP
HapMap
DrugBank
SNP
Drug
Data Tables
Drug Info
Target Info
Data Integration
Enzymes
50
Drug Targets
3866
Drug-Enzymes relationship
4387
Drug Metabolism
Drug Information
4414
SNP Info
Enzymes-SNP
9558
Drug-Target Information
Drug-Target Polymorphism Databases
Query Drug-Target-SNP
Records
12051
rsSNP
332476
GenBank_SNP
337259
ssSNP
1745368
Mutations from populations
1839782
Total
201MB
Mutation Information Integration
•
Extraction from Locus-specific databases
LSDB Addresses
Using WiKi Collect LSDB Addresses
1300 LSDB
Classification of Genes
 link to OMIM Database
http://129.89.44.120/twiki/bin/view
Mutation Information Extraction
Natural Language Processing
Two LSDB Data Extration
Alzheimer Disease & Frontotemporal Dementia Mutation Database
Sarcomere Protein Gene Mutation Database
1725 mutation records
Mutation Information Integration
•
•
Mutation Association with Disease Phenotypes
Standards
•
Gene Names -- HUGO
•
Diseases(ICD-10)
– Mapping ICD-10 and MeSH, using keyword search
– Adopt SNOMED CT,Build Disease Ontologies
映射后的ICD-10疾病词汇表
Mutation Information Integration
Disease Related Unique Mutation Search Engine,DRUMS
http://www.scbit.org/glif
Genes, Diseases,
Mutations, Sequences
More than 170,000
Mutations, 6000 genes
External Links
Documents upload
Query
By Genes
By Diseases
By Mutation types
Mutation Information Integration
• DRUMS Query Results
Biomedical Informatics Systems for
Translational Research
•EMR for Research
•EMR for Clinical Trial
•Follow-up Information
Systems
BioBank
•Omic Databases
•LIMS
•Bioinformatics
Analysis Platform
Database Establishment for Translational Research
SD Database
HEO
Data Parsing
Star Server
Data Parsing
EDW
DE-IDENTIFICATION
Information collected
during clinical care
One way hash
Restructuring
for research
Access through
secured online
application
Data export
edd
edd
bbbe d
bbbe d
ru b beedd ru b beed
ssccrruuubbbbbeedd ssccrruuubbbbbeeddd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbeedd ssccrruubbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbbeedd ssccrruubbbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbbeedd ssccrruubbbbeedd
ssccrruubbbeedd ssccrruubbbeedd
ssccrruubbbbeed ssccrruubbbbeed
ssccrruubbb ssccrruubbb
ssccr
ssccr
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
B699tre563msd..
F5rt783mbncds…
Informatics in EMR-based PGx Studies
DNA Biobank
EMR
Informatics
Approaches
• Natural language processing (NLP)
• Machine learning & data mining
Drug Exposure
Drug Response
Information Flow in Translational
Medicine
New
Therapeutic
knowledge
Clinical
Practice
Clinical
Data
Biospecimen
High
Throughput
Research
CODATA Task Group of Biomedical Ontology
提出生物医学数据互操作中的最关键问题
提出研究的重点方向
提出研究的思路与可能的技术路线
研讨预期的研究结果和可能的应用
研讨此研究的立项可能
The interoperability of Biomedical Data
Ontology Building Principles
Data Sharing Strategies
Technical Roadmap
Expected Achievements
Plan to make the first Discussion Meeting in 2011
2011 年内召开第一次研讨会,提出研究思路,形成核心团队,制定研究计划。