Transcript Slides

Relational Databases:
Basic Concepts
BCHB524
2014
Lecture 21
11/12/2014
BCHB524 - 2014 - Edwards
Outline







What is a (relational) database?
When are relational databases used?
Commonly used database management
systems
Using existing databases
Creating and populating new databases
Python and relational databases
Exercises
11/12/2014
BCHB524 - 2014 - Edwards
2
(Relational) Databases


Databases store information
Bioinformatics has lots of file-based information:






FASTA sequence databases
Genbank format sequences
Store sequence, annotation, references, annotation
Good as archive or comprehensive reference
Poor for a few items
Relational databases also store information


11/12/2014
Good for a few items at a time
Flexible on which items
BCHB524 - 2014 - Edwards
3
Relational Databases

Store information in a table


Rows represent items
Columns represent items' properties or attributes
Name
Continent
Region
Surface Area
Population
GNP
Brazil
South America
South America
8547403
170115000
776739
Indonesia
Asia
Southeast Asia
1904569
212107000
84982
India
Asia
Southern and Central Asia
3287263
1013662000
447114
China
Asia
Eastern Asia
9572900
1277558000
982268
Pakistan
Asia
Southern and Central Asia
796095
156483000
61289
United States
North America
North America
9363520
278357000
8510700
11/12/2014
BCHB524 - 2014 - Edwards
4
Relational Databases


Tables can be millions of rows
Can access a few rows fast




Countries more than 100,000,000 in population?
Countries on the “Asia” continent?
Countries that start with “U”?
Countries with GNP = 776739
Name
Continent
Region
Surface Area
Population
GNP
Brazil
South America
South America
8547403
170115000
776739
Indonesia
Asia
Southeast Asia
1904569
212107000
84982
India
Asia
Southern and Central Asia
3287263
1013662000
447114
China
Asia
Eastern Asia
9572900
1277558000
982268
Pakistan
Asia
Southern and Central Asia
796095
156483000
61289
United States
North America
North America
9363520
278357000
8510700
11/12/2014
BCHB524 - 2014 - Edwards
5
When are Relational
Databases Used?

LARGE datasets


Store data first ...



For single key, simple data structures often work
Store results of expensive compute or data-cleanup


... ask questions later
Lookup or sort by many keys


Does data fit in memory?
Compute once and return results many times
"Random" or unknown access patterns
Specialized data-structures not appropriate

11/12/2014
Use string/sequence indexes for sequence data
BCHB524 - 2014 - Edwards
6
Common DBMS

Oracle


MySQL


Commercial, market leader, widely used in
businesses
Free, open-source, widely used in bioinformatics,
suitable for large scale deployment
Sqlite

11/12/2014
Free, open-source, minimal installation
requirements, no users, suitable for small scale
deployment
BCHB524 - 2014 - Edwards
7
Lets look at some examples

We'll use a third-party program to "look at"
Sqlite databases:


Download examples:


SqliteStudio (Linux), SqliteSpy (Windows), …
World.db3, taxa.db3 from Course data folder
Use SqliteStudio to look at examples

11/12/2014
World.db3, taxa.db3
BCHB524 - 2014 - Edwards
8
Using existing databases

Use the "select" SQL command to find
relevant rows








select * from Country where Population > 100000000;
select * from Country where Continent = 'Asia';
select * from Country where Name like 'U%';
select * from Country where GNP = 776739;
Each command ends in semicolon ";".
"where" specifies the condition/constraint/rule.
"*" asks for all attributes from the relevant rows.
Lets experiment with world and taxa databases.
11/12/2014
BCHB524 - 2014 - Edwards
9
Using existing databases

Select can combine (“join”) multiple tables

Use the where condition to match rows from each
table and “link” corresponding rows…
select * from taxonomy, name
where
taxonomy.rank = 'species'
and
name.name_class = 'misspelling'
and
name.tax_id = taxonomy.tax_id
11/12/2014
BCHB524 - 2014 - Edwards
10
Using existing databases

Select can sort and/or return top 10
select * from taxonomy
limit 10;
select * from taxonomy
order by scientific_name;
select * from taxonomy
order by tax_id desc
limit 10;
11/12/2014
BCHB524 - 2014 - Edwards
11
Using existing databases

Select can count and do string matching.
select count(*) from taxonomy
where scientific_name like 'D%';

"like" uses special symbols:



% matches zero or more symbols
_ match exactly one symbol
Some RDBMS support regular expressions

11/12/2014
MySQL, for example.
BCHB524 - 2014 - Edwards
12
Creating databases

Use the "create" SQL command to create
tables
CREATE TABLE taxonomy (
tax_id INTEGER PRIMARY KEY,
scientific_name TEXT,
rank TEXT,
parent_id INT
);
CREATE TABLE name (
id INTEGER PRIMARY KEY,
tax_id INT,
name TEXT,
name_class TEXT
);
11/12/2014
BCHB524 - 2014 - Edwards
13
Populating databases

Use the "insert" SQL command to add rows
to tables
Usually, the special id column is initialized
automatically

INSERT INTO name (tax_id,name,name_class)
VALUES (9606,'H. sapiens','synonym');
SELECT * from name where tax_id = 9606;
11/12/2014
BCHB524 - 2014 - Edwards
14
Python and Relational
Databases

Issue select statements from python and
iterate through the results
import sqlite3
conn = sqlite3.connect('taxa.db3')
c = conn.cursor()
c.execute("""
select * from name
where name like 'D%'
limit 10;
""")
for row in c:
print row

Sometimes it is easiest to make Python do
some of the work!
11/12/2014
BCHB524 - 2014 - Edwards
15
Python and Relational
Databases

Use parameter substitution for run-time
values
import sys
import sqlite3
tid = int(sys.argv[1])
11/12/2014
conn = sqlite3.connect('taxa.db3')
params = [tid,'scientific name']
c = conn.cursor()
c.execute("""
select * from name
where tax_id = ? and name_class = ?;
""",params)
for row in c:
print row BCHB524 - 2014 - Edwards
16
Next-time:
Object-relational mappers

Setup python to treat tables as classes, rows
as objects
# Set up data-model
from model import *
hs = Taxonomy.get(9606)
for n in hs.names:
print n.name, "|", n.nameClass
condition = Name.q.name.startswith('Da')
for n in Name.select(condition):
print n.name, "|", n.nameClass
11/12/2014
BCHB524 - 2014 - Edwards
17
Lab exercises

Read through an online course in SQL


sqlcourse.com, sql-tutorial.net, ...
Write a python program to lookup the
scientific name for a user-supplied organism
name.
11/12/2014
BCHB524 - 2014 - Edwards
18