C20.0046: Database Management Systems Lecture #1 Matthew P. Johnson

Download Report

Transcript C20.0046: Database Management Systems Lecture #1 Matthew P. Johnson

C20.0046: Database
Management Systems
Lecture #1
Matthew P. Johnson
Stern School of Business, NYU
Spring, 2004
M.P. Johnson, DBMS, Stern/NYU, Sp2004
1
Personnel

Instructor: Matthew P. Johnson


KMEC 8-176, [email protected]
Office hours: KMC 8-176, Th 12:15-2:15


please visit!
Tutor/TF/grader: Matthew P. Johnson!
M.P. Johnson, DBMS, Stern/NYU, Sp2004
2
Communications

Web page: http://www.columbia.edu/~mpj9/dbms




syllabus
course policies
may move in the future…
Blackboard web site


Some materials will be available here
Discussion board



send general-interest messages here to benefit all!
Go to http://sternclasses.nyu.edu
Click on C20.0046
M.P. Johnson, DBMS, Stern/NYU, Sp2004
3
Acknowledgements


Thanks to Ramesh, Ullman, et al., Raghu and
Johannes, Dan Suciu, Arthur Keller, David
Kuijt for course materials
See classpage for other related, antecedent
DBMS courses
M.P. Johnson, DBMS, Stern/NYU, Sp2004
4
What Is a Database?


A very large, integrated collection of data.
Models real-world enterprise.


Entities (e.g., students, courses, instructors, TAs)
Relationships (e.g., Joe is taking C20.0046)




George is currently taking C20.0046
Dick is currently teaching C20.0046
Condi is currently TA-ing C20.0046 but took it last
semester
A Database Management System (DBMS)
is a software package designed to store and
manage databases.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
5
Databases are everywhere
Example: Ordering a pizza
Databases involved?
Pizza Hut’s DB


1.


stores previous orders by customer
stores previous credit cards used
Credit card records
2.


huge databases of (attempted) purchases
location, date, amount, parties
phone company’s records
3.

Local Usage Details (“Pull his LUDs, Lenny.”)
Caller ID
4.

ensures reported address matches destination
M.P. Johnson, DBMS, Stern/NYU, Sp2004
6
Your wallet is full of DB records








Driver’s license
Credit cards
NYUCard
Medical insurance card
Social security card
Gym membership
Money (serial numbers)
Maybe even photos
M.P. Johnson, DBMS, Stern/NYU, Sp2004
7
Databases are everywhere


Q: Websites backed by DBMSs?
 retail: Amazon, etc.
 data-mining: Page You Made
 search engines: Google, etc.
 directories: Internic, etc.
 searchable DBs: IMDB, tvguide.com, etc.
Q: Non-web examples of DBMSs?
 criminal/terrorist: TIA
 airline bookings
 NYPD’s CompStat
 all serious crime stats by precinct
 Retailers: Wal-Mart, etc.
 when to re-order, purchase patterns, data-mining

Genomics!
M.P. Johnson, DBMS, Stern/NYU, Sp2004
8
Example of a Traditional DB App
Suppose we are building a system
to store the information about:
 checking accounts
 savings accounts
 account holders
 state of each of each person’s accounts
M.P. Johnson, DBMS, Stern/NYU, Sp2004
9
Can we do it without a DBMS?
Sure we can! Start by storing the data in files:
checking.txt
savings.txt
customers.txt
Now write C or Java programs to implement
specific tasks
M.P. Johnson, DBMS, Stern/NYU, Sp2004
10
Doing it without a DBMS...

Transfer $100 from George’s savings to
checking:
Write a C program to do the following:
Read ‘savings.txt’
Find&update the record “George”
balance -= 100
Write ‘savings.txt’
Read ‘checking.txt’
Find&update the record “George”
balance += 100
Write ‘checking.txt’
M.P. Johnson, DBMS, Stern/NYU, Sp2004
11
Problems without an DBMS...
1. System crashes:
Read ‘savings.txt’
Find&update the rec “George.”
Write ‘savings.txt’
Read ‘checking.txt’
Find&update the rec “George”
Write ‘checking.txt’



CRASH !
Q: What is the problem ?
A: George lost his $100
Same problem even if reordered
2. Simultaneous access by many users


George and Dick visit ATMs at same
Lock checking.txt before each use– what is the problem?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
12
Problems without an DBMS...
3.Large data sets (say 50GB)


No indices


Why is this a problem?
Finding “George” in huge flatfile is expensive
Modifications intractable without better data
structures


“George”  “Georgie” is very expensive
Deletions are very expensive
M.P. Johnson, DBMS, Stern/NYU, Sp2004
13
Problems without an DBMS...
5.Security?


File system may be insecure
File system security may be coarse
6.Application programming interface (API)?

suppose need other apps to access DB
7.How to interact with other DBMSs?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
14
General problems to solve




In building our own system, many Qs arise:
how do we store the data? (file organization, etc.)
how do we query the data? (write programs…)
make sure that updates don’t mess things up?


leave the DB “consistent”
provide different views on the data?

e.g., ATM user’s view v. bank teller’s view

how do we deal with crashes?

Too hard! Go buy a DBMS!

Q: How does a DBMS solve these problems?
A: See third part of course

M.P. Johnson, DBMS, Stern/NYU, Sp2004
15
Big issue: Transaction processing


Grouping of several queries (or other
database actions) into one transaction
ACID properties

Atomicity


Consistency


constraints on relationships
Isolation



all or nothing
concurrency control
Simulated solipsim
Durability

Crash recovery
M.P. Johnson, DBMS, Stern/NYU, Sp2004
16
Atomicity & Durability


Saw how George lost $100 with makeshift DBMS
DBMS prevents this outcome


xacts are all or nothing
One idea: Keep a log (history) of all actions in set
of xacts


Durability: Use log to redo or undo certain ops in
crash recovery
Atomicity: don’t really commit changes until end
 Then, all at once
M.P. Johnson, DBMS, Stern/NYU, Sp2004
17
Isolation

Concurrent execution is essential for performance.




Interleaving actions of different user programs
can lead to inconsistency:


Frequent, slow disk accesses
 don’t waste CPU – keep running
e.g., two programs simultaneously withdraw from the
same account
DBMS ensures such problems don’t arise:

users can pretend they are using a single-user
system.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
18
Isolation

Contrast with a file in two Notepads




Strategy: ignore multiple users
whichever saves last wins
first save is overwritten
Contrast with a file in two Words



Strategy: blunt isolation
One can edit
To the other it’s read-only
M.P. Johnson, DBMS, Stern/NYU, Sp2004
19
Consistency

Each xant (on a consistent DB) must leave it
in a consistent state


can define integrity constraints
checks the defined claims about the data remain
true
M.P. Johnson, DBMS, Stern/NYU, Sp2004
20
Data Models



Any DBMS uses a data model: collection of
concepts for describing data
Schema: description of partic set of data,
using some data model
Relational data model: most widely used (by
far) data model



Oracle, DB2, SQLServer, other SQL DBMSs
main concept: relation ~ table of rows & columns
a rel’s schema defines its fields
M.P. Johnson, DBMS, Stern/NYU, Sp2004
21
Example: university database

Conceptual schema:




Physical schema:



Students(ssn: string, name: string, login: string,
age: int, gpa: real)
Courses(cid: string, cname: string, credits: int)
Enrolled(sid:string, cid:string, grade: string)
Relations stored as unordered text files.
Indices on first column of each rel
External Schema (View):


Course_info(ssn: string, name: string)
My_courses(cname: string, grade: string)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
22
How the programmer sees the DBMS

Start with DDL to create tables:
CREATE TABLE Students (
Name CHAR(30)
SSN CHAR(9) PRIMARY KEY NOT NULL,
Category CHAR(20)
) ...

Continue with DML to populate tables:
INSERT INTO Students
VALUES(‘Howard, ‘123456789’, ‘undergraduate’)
. . . .
M.P. Johnson, DBMS, Stern/NYU, Sp2004
23
How the programmer sees the DBMS

Tables:
Takes:
Students:
SSN
123-45-6789
234-56-7890
Courses:
CID
C20.0046
C20.0056

Name
Howard
Wesley
…
Category
undergrad
grad
…
SSN
123-45-6789
CID
C20.0046
123-45-6789
C20.0056
234-56-7890
C20.0046
…
CName
Databases
Advanced Software
semester
Spring,
2004
Spring,
2004
Fall, 2003
Still implemented as files, but behind the
scenes can be quite complex
“data independence” = separate logical view
from physical implementation
M.P. Johnson, DBMS, Stern/NYU, Sp2004
24
Querying: Structured Query Language

Find all the students who have taken C20.0046:


Find all the students who C20.0046 last fall:


SELECT SSN
FROM Takes
WHERE CID=“C20.0046” AND Semester=“Fall, 2003”
Find the students’ names:


SELECT SSN
FROM Takes
WHERE CID=“C20.0046”
SELECT Name
FROM Students, Takes
WHERE Students.SSN=Takes.SSN AND
CID=“C20.0046” AND Semester=“Fall, 2003”
Query processor does this efficiently.
M.P. Johnson, DBMS, Stern/NYU, Sp2004
25
Database Industry

Relational databases are a great success of
theoretical ideas.






based on most “theoretical” type of math there is: set theory
DBMS companies are among the largest software
companies in the world.
Oracle, IBM (with DB2), Microsoft (SQL Server,
Microsoft Access), Sybase.
Also opensource: MySQL, Postgres, etc.
$20B+ industry.
XML (“semi-structured data”) also important

New lingua franca for exchanging data
M.P. Johnson, DBMS, Stern/NYU, Sp2004
26
The Study of DBMS

Several aspects:




This course covers all three


Modeling and design of databases
DBMS programming: querying and update
DBMS implementation
though more time on first two
Also will look at some more advanced areas

XML, data-mining, LDAP?
M.P. Johnson, DBMS, Stern/NYU, Sp2004
27
Databases are used by

DB app programmers



Database administrators (DBAs)






desktop app programmers
web developers
design schemas
security/authorization
crash recovery
tuning
better paid than programmers!
Everyone else (perhaps indirectly)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
28
Course outline

Database design:



The relational model:



Entity/Relationship models
Modeling constraints
Relational algebra
Transforming E/R models to relational schemas
SQL

Views and triggers
M.P. Johnson, DBMS, Stern/NYU, Sp2004
29
Outline (Continued)






Connecting to a database from a
programming language
Storage and indexing
Transactions
XML
Advanced topics
May change as course progresses

partly in response to audience
M.P. Johnson, DBMS, Stern/NYU, Sp2004
30
Textbook

Database Systems: The Complete Book



Available:




Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer
D. Widom
1st Edition (2001)
NYU bookstore
Amazon/BN (direct links on classpage)
Amazon.co.uk (total is/was about $20 less)
First two chapters in PDF on classpage
M.P. Johnson, DBMS, Stern/NYU, Sp2004
31
SQL Readings


Optional reference: SQL in a Nutshell
Online (free) SQL tutorials include:


A Gentle Introduction to SQL (http://sqlzoo.net/)
SQL for Web Nerds
(http://philip.greenspun.com/sql/)
M.P. Johnson, DBMS, Stern/NYU, Sp2004
32
Grading

Prerequisites: Programming experience


Work & Grading:








presumably C/C++/Java
Homework 30%: O(4)
Project: 30% - see below.
Midterm (closed book/notes): 15%
Final (closed book/notes): 20%
Class participation: 5%
Stern Curve
Class attendance is required
Absences will affect your total grade
M.P. Johnson, DBMS, Stern/NYU, Sp2004
33
The Project: design end-to-end DB app

data model



creation of DB in Oracle


Identify entities (and fields), relationships
Identify resulting relations (tables)
Insertion of real(alistic) data
(web) app for accessing/modifying data


Identification of “interesting” questions to ask
Production of DBMS interface

Work in pairs (start forming now)

Choose topic on your own: previous e.g.s online
Start forming your group today!

M.P. Johnson, DBMS, Stern/NYU, Sp2004
34
Collaboration






Homework and exams done individually
Project done with your team members only
Non-cited use of others’ problem solutions,
code, etc. = plagiarism
See Stern’s stern academic honesty policy
Contact me if you’re at all unclear before a
particular case
Cite any materials used if you’re at all unclear
after a particular case
M.P. Johnson, DBMS, Stern/NYU, Sp2004
35
On-going Feedback


Don’t wait until the end-of-semester course
evals to complain or give feedback on how to
improve course. (It’s too late for you then!)
Come see me early on during my office hours


or send me email with your concerns
“We’re in touch, so you be in touch.”
M.P. Johnson, DBMS, Stern/NYU, Sp2004
36
Summary



DBMS used to maintain, query large
datasets.
Benefits include recovery from system
crashes, concurrent access, data integrity,
security, and quick application development.
Database skills are critical in financial
services, marketing and other business
areas!
M.P. Johnson, DBMS, Stern/NYU, Sp2004
37
So what is this course about, really ?
A bit of everything !
 Languages: SQL, XPath, XQuery
 Data modeling
 Some theory!



Algorithms and data structures (in the third part)




Functional dependencies, normal forms
e.g., how to find most efficient schema for data
e.g., indices make data much faster to find – but how?
Lots of implementation and hacking for the project
Business DBMS examples/cases
Most importantly: how to meet real-world needs
M.P. Johnson, DBMS, Stern/NYU, Sp2004
38
For next time


Get the book
Read chapters 1, 2.1-2.2
M.P. Johnson, DBMS, Stern/NYU, Sp2004
39
For right now: written survey






name
previous cs/is/math/logic courses
previous programming experience
career plans: programmer, DBA, MBA, etc.
why taking class
any religious holidays during class
M.P. Johnson, DBMS, Stern/NYU, Sp2004
40