The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China Biomedical Data Integration and Knowledgebase Lei Liu, Ph.D. Shanghai Center for Bioinformation.
Download ReportTranscript The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China Biomedical Data Integration and Knowledgebase Lei Liu, Ph.D. Shanghai Center for Bioinformation.
The Fifth China-US Roundtable on Scientific Data Cooperation October 27-28, 2011, Beijing, China Biomedical Data Integration and Knowledgebase Lei Liu, Ph.D. Shanghai Center for Bioinformation Technology And Shanghai Institutes for Biological Sciences, CAS Part 1: Ontology Knowledge Management Data Integration and Exchange Semantic Interoperability Decision Support and Reasoning Knowledge Management Annotating Data and Resources Accessing Biomedical Information Mapping across Biomedical Ontologies Ontology Data Exchange & Semantic Interoperability Information and Data Integration Semantic Interoperability Ontology Decision Support and Reasoning Data Selection Data Aggregation Decision Support Natural Language Processing Applications Knowledge Discovery Ontology Example: Ontology Server Example: Building Knowledge Base Edit Example: Building Knowledge Base Search tool Part 2: SNOMED CT CAP Systematized Nomenclature of Medicine—Reference Terminology(SNOMED RT) SNOMED CT Clinical Terms(CT) NHS CAP: College of American Pathologists NHS: National Health Service IHTSDO: International Health Terminology Standards Development Organization Core contents SNOMED CT Applications Electronic Health Record Systems Computerized Provider Order Entry(CPOE) Knowledge databases used in clinical decision support systems(CDSS) Remote Intensive Care Unit Monitoring Laboratory Reporting Cancer Reporting Genetic Databases SNOMED CT Medical domains of the 100 Medline indexed papers in which a specific medical domain has been described. (BMC Medical Informatics and Decision Making 2008, 8(Suppl 1):S2) SNOMED CT Example: Mapping Example: Encoding Example: Standardization of Terminology Part 3: OpenEHR Promote and publish formal specification Work closely with standards bodies Promote and publish EHR architectures and models Implement EHR architectures into clinical use Objectives Maintain open source “reference” implementation Interoperable health informatics system openEHR introduction • Definition: – openEHR is an open standard specification in health informatics that describes the management and storage, retrieval and exchange of health data in electronic health records (EHRs) • Features: – Patient-centric – Lifelong – Vendor-independent Architecture of OpenEHR OpenEHR Release 1.0.2 Two-level modeling of openEHR openEHR EHR system implementation applicability • Apply – Store data – Search data – Share data • Not apply – Control the exchanging flow Integration of SNOMED CT into OpenEHR HL7 v3 introduction • mission: provides standards for interoperability • Features – standard data, use reference information model (RIM) • CDA, standardize clinical documents for exchange – support healthcare workflows (V3 messaging) RIM applicability • Apply – exchange information – Control the exchanging flow – Control the exchanging data’s size • Not apply – Store data (we can store CDAs, but it’s not a best practice) – Search data Ongoing Biomedical Informatics Projects Clinical Data Warehouse Common Data Element Editor HL7 V3 Message Model Clinical Decision Support Medical Natural Language Processing Research Data Entry System CDA Transfer Engine Medical Channel Data Sharing Synchronous Liver Metastasis Model + Decision Support Drug Knowledge Base Tissue Bank Annotation Medical Terminology Service Clinical Guideline Computerization Others Clinical Data and Sample is at the Core of Translational Medicine Biomarker Genotypes Clinical Data Clinical Practice Clinical Trial LIMS Biospecimen caBIG® Workspaces Domain Workspaces Clinical Trials Management Systems (CTMS,临床实验管理系统) Integrative Cancer Research (ICR,综合肿瘤研究) https://cabig.nci.nih.gov/workspaces/CTMS/ https://cabig.nci.nih.gov/workspaces/ICR Tissue Banks & Pathology Tools (TBPT, 组织库&病理学工) In Vivo Imaging (Vivo成像) https://cabig.nci.nih.gov/workspaces/TBPT https://cabig.nci.nih.gov/workspaces/Imaging Vocabularies&Common Data Elements (VCDE,词汇&公共数据元素) Cross Cutting & Strategic Workspaces https://cabig.nci.nih.gov/workspaces/VCDE Data Sharing & Intellectual Capital (DSIC, 数据共享&智能财产) https://cabig.nci.nih.gov/working_groups/DSIC_SLWG Architecture (体系构架) https://cabig.nci.nih.gov/workspaces/Architecture Documentation & Training (D&T,文件&培训) https://cabig.nci.nih.gov/working_groups/Training_SLWG References and Standards Collaboration with NCI and caBIG: •Attended the caBIG annual meeting and visited caBIG in 2008 •Two people from our center attended the Boot Camp References used: ① caCORE (Cancer Common Ontologic Representation Environment): ② caDSR (Cancer Data Standards Repository) ③ NCI CBIIT (National Cancer institute Center for Biomedical Informatics and Information Technology) Tissue Bank Information Management System 样本数据库信息管理系统 全面解决方案 Biobank Information Management Platform Use Cases Combined Tissue Bank Annotation from Operation Summary and Pathology Report Medical Natural Language Processing Difficulties of acquiring data and multiple times of entering Direct connection to HIS、LIS and EMR Automatic transferring of data without entering by staffs Active reminding system for follow-up Automatic Data Query and Extraction Across Systems Clinical Information Enquiry System: The overall framework and subsystems PACS database Patient follow-up database General enquiries Patients‘ situation of treatment database Personalized treatment procedures Diagnostic tests database HIS database LIS database D-QIS database Molecular classification database Sample database Clinical Information Enquiry System Clinical Data Warehouse Part to process RIM object(create, delete, update, query RIM boject) mif document CALL Javasig xml document RIM processor(busyness layer) RIM object Persistence layer(hibernate) TABLE RDB RIM database structure Clinical Data Warehouse R-MIM Model Database Structure SOA Service Bus Clinical Document (XML) Database Records Clinical Document CDA Transfer Engine Schema for Clinical Document HL7 CDA Schema Discharge Summary Mapping Transfer Engine CDA File Common Medical Terminology Service Difficulties of Extracting Data METHODS Model's Performance Model's Performance Model's Performance Biomedical Data Integration and Mining Medical Informatic Disease Integration Personalized Medicine Databases Data Mining Personalized Medicine Decision Support System Bioinformatics Translational Medicine Genomics Gene Drug Disease and Gene Integration 疾病信息 参考文献信息 GAD 疾病分类信息 COSMIC 主表 试验样本信息 Data Integration 突变信息 Gene2Disease Databases Genetic Polymorphisms 39910 Gene Mutations 1506545 19 Major Diseases Structured Gene Information 31412 基因信息 Drug and Drug metabolism Study • Drug-Target-SNP Integration and Databases dbSNP HapMap DrugBank SNP Drug Data Tables Drug Info Target Info Data Integration Enzymes 50 Drug Targets 3866 Drug-Enzymes relationship 4387 Drug Metabolism Drug Information 4414 SNP Info Enzymes-SNP 9558 Drug-Target Information Drug-Target Polymorphism Databases Query Drug-Target-SNP Records 12051 rsSNP 332476 GenBank_SNP 337259 ssSNP 1745368 Mutations from populations 1839782 Total 201MB Mutation Information Integration • Extraction from Locus-specific databases LSDB Addresses Using WiKi Collect LSDB Addresses 1300 LSDB Classification of Genes link to OMIM Database http://129.89.44.120/twiki/bin/view Mutation Information Extraction Natural Language Processing Two LSDB Data Extration Alzheimer Disease & Frontotemporal Dementia Mutation Database Sarcomere Protein Gene Mutation Database 1725 mutation records Mutation Information Integration • • Mutation Association with Disease Phenotypes Standards • Gene Names -- HUGO • Diseases(ICD-10) – Mapping ICD-10 and MeSH, using keyword search – Adopt SNOMED CT,Build Disease Ontologies 映射后的ICD-10疾病词汇表 Mutation Information Integration Disease Related Unique Mutation Search Engine,DRUMS http://www.scbit.org/glif Genes, Diseases, Mutations, Sequences More than 170,000 Mutations, 6000 genes External Links Documents upload Query By Genes By Diseases By Mutation types Mutation Information Integration • DRUMS Query Results Biomedical Informatics Systems for Translational Research •EMR for Research •EMR for Clinical Trial •Follow-up Information Systems BioBank •Omic Databases •LIMS •Bioinformatics Analysis Platform Database Establishment for Translational Research SD Database HEO Data Parsing Star Server Data Parsing EDW DE-IDENTIFICATION Information collected during clinical care One way hash Restructuring for research Access through secured online application Data export edd edd bbbe d bbbe d ru b beedd ru b beed ssccrruuubbbbbeedd ssccrruuubbbbbeeddd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbeedd ssccrruubbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbbeedd ssccrruubbbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbbeedd ssccrruubbbeedd ssccrruubbbeedd ssccrruubbbbeed ssccrruubbbbeed ssccrruubbb ssccrruubbb ssccr ssccr B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… B699tre563msd.. F5rt783mbncds… Informatics in EMR-based PGx Studies DNA Biobank EMR Informatics Approaches • Natural language processing (NLP) • Machine learning & data mining Drug Exposure Drug Response Information Flow in Translational Medicine New Therapeutic knowledge Clinical Practice Clinical Data Biospecimen High Throughput Research CODATA Task Group of Biomedical Ontology 提出生物医学数据互操作中的最关键问题 提出研究的重点方向 提出研究的思路与可能的技术路线 研讨预期的研究结果和可能的应用 研讨此研究的立项可能 The interoperability of Biomedical Data Ontology Building Principles Data Sharing Strategies Technical Roadmap Expected Achievements Plan to make the first Discussion Meeting in 2011 2011 年内召开第一次研讨会,提出研究思路,形成核心团队,制定研究计划。