Transcript Semantic Web Approach to Biological Database Integration
Biological Databases, Integration, and Semantic Web
Kei Cheung, Ph.D.
Yale Center for Medical Informatics Genomics and Bioinformatics, December 4, 2006
Outline
• Database introduction – Overview – Query language • Database integration – Issues • Semantic Web approach to database integration – Overview of Semantic Web
Introduction
• The Human Genome Project has transformed the biological sciences into information sciences • Advances in the biological sciences depend on: – creation of new knowledge – effective information management • Future progress in biological research will be highly dependent on the ability of the scientific community to both deposit and utilize stored information on-line. • The database challenge for the future will be to develop new ways to acquire, store and retrieve not only biological data, but also the biological context for these data.
Variety of Biological Databases
• Different data categories – DNA sequence, gene expression, protein structure, pathway, etc • Community vs. lab-specific vs. proprietary databases • Mega vs. medium vs. boutique databases • One thing in common: many of them are Web accessible
Food for thoughts
• Will a biological database different a biological journal?
What is a database?
• A database is a collection of records stored in a computer in a systematic way, so that a computer program can consult it to answer questions.
• The items retrieved in answer to queries become information that can be used to make decisions. • The computer program used to manage and query a database is known as a database management system (DBMS) – E.g., Oracle, MS Access, MySQL
Database components
• The central concept of a database is that of a collection of records, or pieces of knowledge • For a given database, there is a structural description of the type of facts held in that database: this description is known as a schema • The schema describes the objects that are represented in the database, and the relationships among them.
Data Model
• There are a number of different ways of organizing a schema (i.e., of modeling the database structure): these are known as data models. – Relational model – Hierarchical model – Network model – Object oriented model
Query Language
• A query language is a computer languages used to create, modify, retrieve and manipulate data from databases • SQL (Structured Query Language) is a well known query language for relational databases – SQL is an ANSI standard language for RDBMS’s – Different RDBMS’s vendors may provide slightly different SQL syntax or additional proprietary extensions that are applicable only to their systems
• CREATE TABLE • INSERT • SELECT • UPDATE • DELETE • CREATE VIEW
SQL
CREATE TABLE
CREATE TABLE
Example
CREATE TABLE sgd_features( sgd_id VARCHAR(20) NOT NULL PRIMARY KEY, feature_type VARCHAR(20) NOT NULL DEFAULT ‘ORF’, quality VARCHAR(20), feature_name VARCHAR(20), standard_name VARCHAR(20), chromosome INT(2) NOT NULL, start_coord INT(10) NOT NULL, end_coord INT(10) NOT NULL, strand CHAR(1) NOT NULL, description VARCHAR(500) );
INSERT
INSERT INTO