Transcript Slide 1

IELM 511: Information System design
Introduction
Part 1. ISD for well structured data – relational and other DBMS
Info storage (modeling, normalization)
Info retrieval (Relational algebra, Calculus, SQL)
DB integrated API’s
ISD for systems with non-uniformly structured data
Basics of web-based IS (www, web2.0, …)
Markup’s, HTML, XML
Design tools for Info Sys: UML
Part III: (one out of)
API’s for mobile apps
Security, Cryptography
IS product lifecycles
Algorithm analysis, P, NP, NPC
Agenda
Structured Query Language (SQL)
DB API’s
Recall our Bank DB design
BRANCH( b_name, city, assets)
CUSTOMER( cssn, c_name, street, city, banker, banker_type)
LOAN( l_no, amount, br_name)
PAYMENT( l_no, pay_no, date, amount)
EMPLOYEE( e_ssn, e-name, tel, start_date, mgr_ssn)
1
ACCOUNT( ac_no, balance)
SACCOUNT( ac_no, int_rate)
n
CACCOUNT( ac_no, od_amt)
1
n
n
m
BORROWS( cust_ssn, loan_num)
n
1
DEPOSIT( c_ssn, ac_num, access_date)
DEPENDENT( emp_ssn, dep_name)
n
m
1
n
Background: Structured Query Language
Basics of SQL:
A DataBase Management System is an IT system
Core requirements:
- A structured way to store the definition of data [why ?]  DDL
- Manipulation of data [obviously!]  DML
SQL: a combined DDL+DML
SQL as a DDL
A critical element of any design is to store the definitions of its components.
In DB design, we deal with tables, using table names, attribute names etc.
Each of these terms should have unambiguous syntax and semantics.
A systematic way to specify and store these meta-data is by the use of a
Data Definition Language
The information about the data is stored in a Data Dictionary
SQL provides a unified DDL + a Data Manipulation Language (DML).
SQL as a DDL: create command
A DB stores one or more tables
and one or more indexes
To create a new database:
create database my_database;
A table stores data
To create a new table:
create table my_table (
attribute_name attribute_type
….,
constraint, …
);
constraint,
To create an index on a table:
create index my_index on my_table( attribute);
An index is a special file
for faster DB look-up,
when searching the specified
table for some data using
the specified attribute.
SQL as a DDL: create command examples
create database bank;
LOAN( l_no, amount, br_name)
create table loan (
l_no
char(10),
amount
double,
br_name
char(30) references branch(b_name),
primary key (loan_number)
);
BORROWS( cust_ssn, loan_num)
create table borrows (
cust_ssn
char(11),
loan_num
char(10),
primary key (cust_ssn, loan_num),
constraint borrows_c1 foreign key cust_ssn references customer( cssn),
constraint borrows_c2 foreign key loan_num references loan( l_no)
);
Note on metadata: system catalogs
Metadata = data about data.
DBMS manages a ‘data dictionary’ sometimes called ‘system catalog’ with
- When was the DB and each table created/modified
- Name of each attribute, its data type, and comments describing it,
- List of all users who can access the DB and their passwords,
- Which user can do what (read/add/update/delete/authorize) to the data.
System catalog itself is stored in a table, and users can see (if they
have authority) the data in it.
SQL as a DML: insert, drop commands
To add one row into a table:
insert into branch values( “Downtown”, “Brooklyn”, 9000000);
insert into loan values( “L17”, 1000, “Downtown”);
Note: char( ), date, datetime types: data must be “quoted”
integer, single, double (number data types) are not quoted.
Sequence in which you execute ‘insert’ matters !
This insert will fail unless table ‘branch’ has a row with ‘Downtown’
To remove an entire table from the DB:
drop table branch;
Note: this ‘drop’ command will fail if, e.g. there is data in table ‘loan’ [why?]
SQL as a DML: select command
Optional
Required
To get some data from a ( set of ) table (s):
select attribute1, …, attribute_n
from table_1, …, table_m
where selection_or_join_condition1, …, selection_or_join_condition_r
group by attribute_i
having aggregate_function( attribute_j, … )
order by attribute_k
SQL as a DML: select command
To get some data from a ( set of ) table (s)
select customer, loan_no
from borrows;
select * from borrows;
select customer as “customer ssn”
from borrows;
customer
loan_no
111-12-0000
L17
222-12-0000
L23
333-12-0000
L15
444-00-0000
L93
666-12-0000
L17
111-12-0000
L11
999-12-0000
L17
777-12-0000
L16
customer ssn
111-12-0000
loan_no
select distinct loan_no
from borrows;
L17
L23
L15
Notes:
* is a wildcard
as: gives alias name to attribute
L93
L11
L16
222-12-0000
333-12-0000
444-00-0000
666-12-0000
111-12-0000
999-12-0000
777-12-0000
SQL select: row filters
Example: Find the names of all branches that have given loans larger than 1200
LOAN
select distinct branch_name
from loan
where amount > 1200
Note: all operations in ‘where’
are applied one row at a time
loan_number
amount
branch_name
L17
1000
Downtown
L23
2000
Redwood
L15
1500
Pennyridge
L93
500
Mianus
L11
900
Round Hill
L16
1300
Pennyridge
branch_name
Redwood
Pennyridge
SQL select: joins
Example: Find the customer ssn, loan no, amount and branch name for all loans > 1200
BORROWS
LOAN
select customer, loan.*
from borrows, loan
where loan_no = loan_number
and amount > 1200
WHERE clause:
multiple q-conditions  and, or, not
comparing cell values: >, =, !=, <, etc.
loan_number
amount
branch_name
customer
loan_no
L17
1000
Downtown
111-12-0000
L17
L23
2000
Redwood
222-12-0000
L23
L15
1500
Pennyridge
333-12-0000
L15
L93
500
Mianus
444-00-0000
L93
L11
900
Round Hill
666-12-0000
L17
L16
1300
Pennyridge
111-12-0000
L11
999-12-0000
L17
777-12-0000
L16
q-condition for join of loan, borrows
selection condition
customer
loan_number
amount
branch_name
222-12-0000
L23
2000
Redwood
333-12-0000
L15
1500
Pennyridge
777-12-0000
L16
1300
Pennyridge
SQL select: joins with table and column aliases
Example: Find the names of employees and their manager.
E=M
e_ssn
e_name
tel
start_date
mgr_ssn
111-22-3333
Jones
12345
Nov-2005
321-32-4321
333-11-4444
Smith
54321
Mar-1998
111-22-3333
123-45-6789
Lee
54321
Mar-1998
111-22-3333
555-66-8888
Turner
55555
Aug-2002
321-32-4321
987-65-4321
Jones
87621
Mar-1995
888-99-9999
888-99-9999
Chan
87654
Feb-1980
777-77-7777
321-32-4321
Adams
77777
Feb-1990
777-77-7777
777-77-7777
Black
99111
Jan-1980
null
select E.e_name as worker, M.e_name as boss
from employee as E, employee as M
where E.mgr_ssn = M.e_ssn
Note:
E, M are aliases (copies) of employee table
worker
boss
Jones
Adams
Smith
Jones
Lee
Jones
Turner
Adams
Jones
Chan
Chan
Black
Adams
Black
Black
null
SQL select: nested queries, in
Example: Find ssn of customers who have both deposit and loan
DEPOSIT
customer
loan_no
Jan 1, 09
111-12-0000
L17
A215
Feb 1, 09
222-12-0000
L23
333-12-0000
A102
Feb 28, 09
333-12-0000
L15
555-00-0000
A305
Mar 10, 09
444-00-0000
L93
888-12-0000
A201
Mar 1, 98
666-12-0000
L17
111-12-0000
A217
Mar 1, 09
111-12-0000
L11
999-12-0000
L17
000-12-0000
A101
Feb 25, 09
777-12-0000
L16
c_ssn
ac_num
accessDate
888-12-0000
A101
222-12-0000
select c_ssn
from deposit
where c_ssn in ( select customer
from borrows)
Notes:
‘in’ performs a set membership test
BORROWS
c_ssn
222-12-0000
333-12-0000
111-12-0000
SQL select: nested queries, in
Example: Find ssn of customers who have a deposit but no loan
DEPOSIT
customer
loan_no
Jan 1, 09
111-12-0000
L17
A215
Feb 1, 09
222-12-0000
L23
333-12-0000
A102
Feb 28, 09
333-12-0000
L15
555-00-0000
A305
Mar 10, 09
444-00-0000
L93
888-12-0000
A201
Mar 1, 98
666-12-0000
L17
111-12-0000
A217
Mar 1, 09
111-12-0000
L11
999-12-0000
L17
000-12-0000
A101
Feb 25, 09
777-12-0000
L16
c_ssn
ac_num
accessDate
888-12-0000
A101
222-12-0000
select c_ssn
from deposit
where c_ssn not in ( select customer
from borrows)
BORROWS
c_ssn
888-12-0000
555-00-0000
888-12-0000
000-12-0000
Notes:
‘not in’ is true if ‘in’ is false.
SQL select: nested, correlated queries, exists
Existential qualifier (a generalization of ‘in’)
Example: Find the names of branches that have given no loan
BRANCH
LOAN
branch_name
city
assets
Downtown
Brooklyn
9000000
Redwood
Palo Alto
2100000
Pennyridge
Horseneck
1700000
Mianus
Horseneck
400000
Round Hill
Horseneck
8000000
Pownal
Bennington
300000
North Town
Rye
3700000
Brighton
Brooklyn
7100000
loan_number
amount
branch_name
L17
1000
Downtown
L23
2000
Redwood
L15
1500
Pennyridge
L93
500
Mianus
L11
900
Round Hill
L16
1300
Pennyridge
select branch_name
from branch
where not exists ( select *
from loan
where branch.branch_name = loan.branch_name)
1. Correlated: ‘where’ clause of inner query refers to outer query
2. ‘exists’ is true is there is >= 1 row in evaluating inner query; ‘not exists’ is true is ‘exists’ is false
SQL select: arithmetic operations on columns
Report the branch name and assets in units of millions BRANCH
select branch_name, assets*0.000001 as “assets (m)”
from branch
Notes: arithmetic ops can be used in SELECT, WHERE, HAVING
branch_name
city
assets
Downtown
Brooklyn
9000000
Redwood
Palo Alto
2100000
Pennyridge
Horseneck
1700000
Mianus
Horseneck
400000
Round Hill
Horseneck
8000000
Pownal
Bennington
300000
North Town
Rye
3700000
Brighton
Brooklyn
7100000
branch_name
assets (m)
Downtown
9.0
Redwood
2.1
Pennyridge
1.7
Mianus
0.4
Round Hill
8.0
Pownal
0.3
North Town
3.7
Brighton
7.1
SQL select: group by, group-wise aggregation functions
Example: Report the average, maximum amount, and number of loans by branch
LOAN
loan_number
amount
branch_name
L17
1000
Downtown
L23
2000
Redwood
L15
1500
Pennyridge
L93
500
Mianus
L11
900
Round Hill
L16
1300
Pennyridge
select branch_name, avg( amount) as Avg, max( amount) as Max,
count( branch_name) as no_loans
from loan
group by branch_name
branch_name Avg
Max
order by no_loans desc
Pennyridge
1400
1500
1. Aggregating functions: avg, max, min, sum, count
2. avg/max return average/max for each group
no_loans
2
Downtown
1000
1000
1
Redwood
2000
2000
1
Mianus
500
500
1
Round Hill
900
900
1
SQL select: group by, having
‘having’ is used to screen out groups from the output
Example: Report the small loans (<= 1500) held by 2 or more people.
LOAN
loan_number
amount
branch_name
L17
1000
Downtown
L23
2000
Redwood
L15
1500
Pennyridge
L93
500
Mianus
L11
900
Round Hill
L16
1300
Pennyridge
BORROWS
select loan_number, amount, count( loan_number) as no_debtors
from loan, borrows
where loan_number = loan_no and amount <= 1500
group by loan_number
having count(loan_number) >= 2
customer
loan_no
111-12-0000
L17
222-12-0000
L23
333-12-0000
L15
444-00-0000
L93
666-12-0000
L17
111-12-0000
L11
999-12-0000
L17
777-12-0000
L16
loan_number
amount
no_debtors
L17
1000
3
‘having’ conditions are only applied to data after rows have been grouped
‘order by’ used with ‘group by’ will be applied to groups.
SQL select: date functions
SQL provides special functions to handle dates, times and strings
Example: report those customers who have been inactive for over 5 years
DEPOSIT
select c_ssn
from deposit
where datediff( yy, accessDate, getdate( ) ) > 5
datediff units: yy (years), …, ns (nano-seconds)
c_ssn
ac_num
accessDate
888-12-0000
A101
Jan 1, 09
222-12-0000
A215
Feb 1, 09
333-12-0000
A102
Feb 28, 09
555-00-0000
A305
Mar 10, 09
888-12-0000
A201
Mar 1, 98
111-12-0000
A217
Mar 1, 09
000-12-0000
A101
Feb 25, 09
c_ssn
ac_num
accessDate
888-12-0000
A201
Mar 1, 98
SQL select: string functions
It is often useful to use wild-cards for string matching
CUSTOMER
select ssn, name, street, city
from customer
where name LIKE ‘J%’
or street LIKE ‘[^mnp]%’
or city LIKE ‘%[ ]%’
Wildcards:
%  zero or more chars
[asd]  match one char out of list [asd]
[^asd]  matches any one char except a, s, d.
ssn
name
street
city
banker
b_type
111-12-0000
Jones
Main
Harrison
321-32-4321
CRM
222-12-0000
Smith
North
Rye
321-32-4321
CRM
333-12-0000
Hayes
Main
Harrison
321-32-4321
CRM
444-12-0000
Curry
North
Rye
333-11-4444
LO
555-12-0000
Turner
Putnam
Stamford
888-99-9999
DO
666-12-0000
Williams
Nassau
Princeton
333-11-4444
LO
777-12-0000
Adams
Spring
Pittsfield
123-45-6789
LO
888-12-0000
Johnson
Alma
Palo Alto
888-99-9999
DO
999-12-0000
Brooks
Senator
Brooklyn
123-45-6789
LO
000-12-0000
Lindsay
Park
Pittsfield
888-99-9999
DO
ssn
name
street
city
111-12-0000
Jones
Main
Harrison
777-12-0000
Adams
Spring
Pittsfield
888-12-0000
Johnson
Alma
Palo Alto
999-12-0000
Brooks
Senator
Brooklyn
SQL as a DML: update command…
To modify an entry in a cell
update loan
set amount = amount - 200
where loan_number = ( select loan_no
from borrows, customer
where customer = ssn and name = ‘Jones’ )
BORROWS
LOAN
CUSTOMER
loan_number
amount
branch_name
customer
loan_no
ssn
name
street
city
L17
1000
Downtown
111-12-0000
L17
111-12-0000
Jones
Main
Harrison
321-32-4321
CRM
L23
2000
Redwood
222-12-0000
L23
222-12-0000
Smith
North
Rye
321-32-4321
CRM
L15
1500
Pennyridge
333-12-0000
L15
333-12-0000
Hayes
Main
Harrison
321-32-4321
CRM
L93
500
Mianus
North
Rye
333-11-4444
LO
Round Hill
L93
Curry
900
444-00-0000
444-12-0000
L11
Putnam
Stamford
888-99-9999
DO
Pennyridge
L17
Turner
1300
666-12-0000
555-12-0000
L16
111-12-0000
L11
666-12-0000
Williams
Nassau
Princeton
333-11-4444
LO
999-12-0000
L17
777-12-0000
Adams
Pittsfield
123-45-6789
LO
777-12-0000
L16
select * from loan
LOAN
Spring
banker
loan_number
amount
L17
800
branch_name
b_type
888-12-0000
Johnson
Alma
Palo Alto
888-99-9999
DO
999-12-0000
Brooks
Senator
Brooklyn
123-45-6789
LO
000-12-0000
Lindsay
Park
L15
Pittsfield
1500
888-99-9999
Pennyridge
DO
L23
2000
Downtown
Redwood
L93
500
Mianus
L11
700
Round Hill
L16
1300
Pennyridge
SQL as a DML: delete command…
To delete a row from a table
delete from loan
all rows of loan table deleted
delete from customer
where name = ‘Jones’
request to delete row of customer table
with name = ‘Jones’
[will it succeed ?]
BORROWS
CUSTOMER
customer
loan_no
ssn
name
street
city
banker
b_type
111-12-0000
L17
111-12-0000
Jones
Main
Harrison
321-32-4321
CRM
222-12-0000
L23
222-12-0000
Smith
North
Rye
321-32-4321
CRM
333-12-0000
L15
333-12-0000
Hayes
Main
Harrison
321-32-4321
CRM
444-00-0000
L93
444-12-0000
Curry
North
Rye
333-11-4444
LO
666-12-0000
L17
555-12-0000
Turner
Putnam
Stamford
888-99-9999
DO
111-12-0000
L11
666-12-0000
Williams
Nassau
Princeton
333-11-4444
LO
999-12-0000
L17
777-12-0000
Adams
Spring
Pittsfield
123-45-6789
LO
888-12-0000
Johnson
Alma
Palo Alto
888-99-9999
DO
777-12-0000
L16
999-12-0000
Brooks
Senator
Brooklyn
123-45-6789
LO
000-12-0000
Lindsay
Park
Pittsfield
888-99-9999
DO
Views in SQL
A view is a virtual table defined on a given Database:
The columns of the view are either
(i) columns from some (actual or virtual) table of the DB
or
(ii) columns that are computed (from other columns)
Main uses of a view:
- Security (selective display of information to different users)
- Ease-of-use
-- Explicit display of derived attributes
-- Explicit display of related information from different tables
-- Intermediate table can be used to simplify SQL query
Views in SQL..
Create a view showing the names of employees, their ssn, telephone number,
their manager's name, and how many years they have worked in the bank.
create view bank_employee as
select e.e_ssn as ssn, e.e-name as name, e.tel as phone, m.e-name as manager,
datediff( yy, start_date, getdate( )) as n_years
from EMPLOYEE as e, EMPLOYEE as m
where e.mgr_ssn = m.e_ssn
select * from bank_employee
ssn
name
phone
manager
n_years
111-22-3333
Jones
12345
Adams
15
333-11-4444
Smith
54321
Jones
12
123-45-6789
Lee
54321
Jones
12
555-66-8888
Turner
55555
Adams
8
987-65-4321
Jones
87621
Chan
15
888-99-9999
Chan
87654
Black
30
321-32-4321
Adams
77777
Black
30
777-77-7777
Black
99111
null
30
Operations on Views
View definition is persistent – once you define it, the definition stays
permanently in the DB until you drop the view.
The DBMS only computes the data in a view when it is referenced
in a SQL command (e.g. in a select … command)
no physical table is stored in the stored memory corresponding to the view.
You can use the view in any SQL query just the same as any other table, BUT
(1) You cannot modify the value of a computed attribute
(2) If an update/delete command is execute, the underlying data in the
referenced table of the view is updated/deleted.
[this can cause unexpected changes in your DB]
Concluding remarks on SQL
SQL language has some other useful commands and operators [e.g. see here]
In addition, most DBMS will provide many non-standard operators and services
to facilitate information system deployment and administration.
DBMSs can handle very large amount of data, and process queries very fast.
IBM’s DB2 can handle over 6m transactions per min (tpm); Oracle 10g, over 4m tpm
To speed up queries, you can use indexes.
Common DBMSs: IBM DB2, Oracle 10g, Microsoft SQL Server, Sybase, MySQL.
all support SQL.
Database API’s
Most people use DBs, but always through some computer program interface (API).
Most DBMSs will provide program ‘libraries’ (a collection of a set of complied
functions) with functions to:
- Connect to the DBMS
- Select a DB
- Send a SQL command, and receive the response in some standard data structure.
Each DBMS provides one library for each programming language.
On Windows™ (and several other) systems, these libraries are called ODBC
odbc (DLL)
SQL query
DBMS
DB
your code
odbc func
more code
Response
Client App
Bank tables..
BRANCH
EMPLOYEE
e_ssn
e_name
9000000
111-22-3333
Jones
12345
Nov-2005
321-32-4321
Palo Alto
2100000
333-11-4444
Smith
54321
Mar-1998
111-22-3333
Pennyridge
Horseneck
1700000
123-45-6789
Lee
54321
Mar-1998
111-22-3333
Mianus
Horseneck
400000
555-66-8888
Turner
55555
Aug-2002
321-32-4321
Round Hill
Horseneck
8000000
987-65-4321
Jones
87621
Mar-1995
888-99-9999
Pownal
Bennington
300000
888-99-9999
Chan
87654
Feb-1980
777-77-7777
North Town
Rye
3700000
321-32-4321
Adams
77777
Feb-1990
777-77-7777
Brighton
Brooklyn
7100000
777-77-7777
Black
99111
Jan-1980
null
branch_name
city
Downtown
Brooklyn
Redwood
assets
tel
start_date
mgr_ssn
CUSTOMER
DEPOSIT
ssn
name
street
city
banker
111-12-0000
Jones
Main
Harrison
321-32-4321
222-12-0000
Smith
North
Rye
333-12-0000
Hayes
Main
444-12-0000
Curry
555-12-0000
b_type
c_ssn
ac_num
accessDate
CRM
888-12-0000
A101
Jan 1, 09
321-32-4321
CRM
222-12-0000
A215
Feb 1, 09
Harrison
321-32-4321
CRM
333-12-0000
A102
Feb 28, 09
North
Rye
333-11-4444
LO
555-00-0000
A305
Mar 10, 09
Turner
Putnam
Stamford
888-99-9999
DO
888-12-0000
A201
Mar 1, 98
666-12-0000
Williams
Nassau
Princeton
333-11-4444
LO
111-12-0000
A217
Mar 1, 09
777-12-0000
Adams
Spring
Pittsfield
123-45-6789
LO
000-12-0000
A101
Feb 25, 09
888-12-0000
Johnson
Alma
Palo Alto
888-99-9999
DO
999-12-0000
Brooks
Senator
Brooklyn
123-45-6789
LO
000-12-0000
Lindsay
Park
Pittsfield
888-99-9999
DO
LOAN
BORROWS
customer
loan_no
111-12-0000
L17
loan_number
amount
branch_name
222-12-0000
L23
L17
1000
Downtown
333-12-0000
L15
L23
2000
Redwood
444-00-0000
L93
L15
1500
Pennyridge
666-12-0000
L17
L93
500
Mianus
111-12-0000
L11
L11
900
Round Hill
999-12-0000
L17
L16
1300
Pennyridge
777-12-0000
L16
Not all tables of our normalized
design are shown; please create
and populate for practice.
References and Further Reading
Silberschatz, Korth, Sudarshan, Database Systems Concepts, McGraw Hill
Next: IS for non-structured data