Introduction to Database Principles http://cbb.sjtu.edu.cn
Download
Report
Transcript Introduction to Database Principles http://cbb.sjtu.edu.cn
2012-2013 (2)
BI203: Introduction to Database Principles
http://cbb.sjtu.edu.cn/course/database
MaoyingWu (吴茂英)
[email protected]
July 18, 2015
Course Syllabus
Terminologies and Concepts
Data (数据)
Database (数据库)
Database Management System (DBMS,数据库管理系统)
Database Administrator (DBA,数据库管理员)
Database System (DBS,数据库系统)
DBMS: 数据库管理系统
hardware
Operating
system
DBMS/compi
ler
Application
developer
Application
DBS: 数据库系统
user
user
Applications
Applications
developer
Database
Administrator
DBMS
Operating
system
Database
user
Textbook and Reference
① Abraham Silberschatz, Henry F. Korth, S. Sudarshan.
Database System Concepts. McGraw-Hill Companies.
② 杨冬青,马秀莉,唐世渭。数据库系统概念。机
械工业出版社
③ Jeffrey D. Ullman, Jennifer Wildom. A First Course in
Database Systems. Dept. of CS, Stanford University
④ 王珊,萨师煊。数据库系统概论。高等教育出版
社
Frequently-used DBMS
Oracle
MySQL
PostgreSQL
Microsoft SQL Server
IBM DB2
Course Outline
Ch1. Introduction
Section 1: Relational database (关系型数据库)
Ch2. Relational model (关系模型)
Ch3. SQL (结构化查询语言)
Ch4. Advanced SQL
Ch5. Other Relational Query Languages
Section 2: Database design (数据库设计)
Ch6. Database Design: Entity-Relationship Model (实体-关系模型)
Ch7. Relational Database Design (关系数据库设计)
Ch8. Application Design (应用程序设计)
Course Outline (cont.)
Section 3: Data storage、query (数据存储与查询)
Ch9. Storage and File Structure (存储与文件结构)
Ch10. Indexing and Hashing (索引与哈希)
Ch11. Query Processing (查询处理)
Ch12. Query Optimization (查询优化)
Section 4: Transaction Management (事务管理)
Ch13. Transaction (事务概念)
Ch14. Concurrency Control (并发控制)
Ch15. Recovery system (系统恢复)
Section 5: Object-Based Databases and XML (面向对象数据库)
Ch16. Object-Based Databases
Ch17. XML
Ch18. Data mining (数据挖掘)
Schedule
Lecture + Discussion
32 hours
Computer Labs
16 hours
Group-based Project
> 10 hours
Final Exam (written)
2 hours
Scoring Policy
Assignments
10%
Computer Lab Reports
20%
Group projects
Final
10%
60%
Evaluation form for reports
0: Not Done
1: Incomplete
2: Late Complete
3: Needs Improvement
4: Complete
5: Well Done
Chapter 1: Introduction
-- Introduction to database principles
MaoyingWu ([email protected])
23-Feb-13
Outlines
Database History
Why database
Database Architecture
Data Models
Conceptual Model
Logic Model
Physical Model
Data Storage and Queries
Transaction Management
Database Users and Administrators
History of Database Systems
1950s and early 1960s
Magnetic tapes for storage
provide only sequential access
Punched cards for input
Late 1960s and 1970s:
Hard disks allow direct access to data
Network and hierarchical data models
Ted Codd: the relational data model
ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
High-performance transaction processing
History (cont.)
1980s:
Relational prototypes evolve into commercial systems
SQL becomes industrial standard
Parallel and distributed database systems
Object-oriented database systems
1990s:
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
2000s:
XML and Xquery standards
Automated database administration
Why database systems?
Drawbacks of early-days file-based data storing
Data redundancy and inconsistency
Difficulties in accessing the data
Data isolation – multiple files and formats
Integrity problems
integrity constraints become “buried” in program codes rather than being
stated explicitly
hard to add new constraints or change existing ones
Atomicity of updates
Concurrent access by multiple users
Security problems
The solution
The Database system (DBS)
Database Architecture
Architecture levels
Physical level (物理层)
describes how a record (e.g., customer) is stored.
Logical level (逻辑层)
describes data stored in database, and the relationships among the data.
type customer = record
customer_id: string;
customer_name: string;
customer_street: string;
customer_city: integer;
end;
View level (可视层)
application hide details of data types. Views can also hide information (such as an
employee’s salary) for security purposes.
DBMS Components
Data Models (数据模型)
Data model is the abstract of real-world data characteristics,
which include
data definition (数据描述、定义)
data organization (数据组织)
data manipulation (数据操作)
real
world
Information World
Conceptual Model
Computer World
DBMS data model
Data Models: Components
Data structure (数据结构)
the static characteristic (静态特征) of the data
set of object type (描述对象类型的集合)
Data manipulation (数据操作)
dynamic characteristics (动态特征)
Query/modify (insert/delete/update)
Integrity constraints (完整性约束)
entity integrity (实体完整性)
referential integrity (参照完整性)
Data Models (数据模型)
Concept:A collection of tools for describing
Data (数据)
Data relationships (关系)
Data semantics (语义)
Data constraints (约束)
Known Models:
Relational model (关系模型)
Entity-Relationship (E-R) model (数据库设计)
Object-based models (Object-oriented and Object-relational, 面向
对象、对象关系模型)
Semi-structured model (XML)
Older: Network model, hierarchical model
Relational Model (关系模型)
Relation (关系): a 2-dimensional table to represent data (关系)
Attributes (属性): names for the columns of a relation
(title, year, length, genre)
Schema (关系模式): name of a relation + set of attributes
Movies(title, year, length, genre)
Tuple (元组): a row of a relation
(StarWars, 1977, 124, sciFi)
Domain (域): the elementary type associated with each attribute of a relation
Movies(title:string, year:integer, length:integer, genre:string)
Keys of relations (键/码)
Many constraints for relational models
key constraint is one kind of constraint
a set of attributes forms a key for a relation if we do not
allow two tuples in a relation instances to have the same
values in all the attributes of the key
Examples:
Movies(title, year, length, genre)
Database Schema: Movies
Relation terminology
Jargon (专业术语)
Vulgo (俗称)
Relation name (关系名)
Table name (表名)
Schema (关系模式)
Table header (表头/表描述)
Relation (关系)
2-d table (二维表)
Tuple (元组)
Row (行)
Attribute (属性)
Column (列)
Attribute name (属性名)
Column name (列名)
Attribute value (属性值)
Column value (列值)
Domain (域)
Column type (列类型)
In-class Exercise 1
Two relations in a banking database: accounts and
customers
Indicate the following:
acctNo
type
Balance
attributes of each relation
12345
savings
12000
23456
checking
1000
tuples of each relation
34567
savings
25
components of one tuple from each
firstName
lastName
id
account
Robbie
Banks
901-222
12345
Lena
Hand
805-333
23456
Lena
Hand
805-333
34567
relation
relation schema for each relation
database relation schema
a suitable domain for each relation
considering the order, how many ways
to present each of the two relation
instances?
DML (数据操作语言)
Data Manipulation Language
Language for accessing and manipulating the data organized
by appropriate data model
aka, query language (查询语言)
Two classes of languages
procedural (过程性): specifies what data is required and how to get
these data
declarative (声明性): specifies what data is required without
knowing how to get these data
SQL: most widely used query language
DDL (数据定义语言)
Data-defintion Language
Specified notation for defining the database schema
Example:
create table account(
account_no
char(10),
balance integer)
DDL compiler generates a set of tables stored in a data dictionary (数据字典)
Data dictionary contains metadata (元数据)
Database schema
Data storage and definition language
specifies the storage structure and access methods (存储结构和访问方法)
Integrity constraints
domain constraints (类型约束)
referential integrity (参考约束)
assertions (要求约束)
Authorization (权限)
SQL (结构化查询语言)
Structural Query Language
Principal language to describe and manipulate relational
databases
SQL-99 standard
Two aspects:
Data-Definition language (DDL) for declaring database schemas
Data-Manipulation language (DML) for querying and modifying
SQL: 3 kinds of relations
tables (表):relations stored in the database allowing
modification as well queries
views (视图): relations defined by a computation,
constructed when needed
temporary tables (临时表): constructed by SQL
processor when performing queries.
Database Design (数据库设计)
The process of designing the general structure of the database
Conceptual design (概念设计):real world =>conceptual
model
business decision: what attributes is required?
computational decision: what relation schemas? how to
distribute the attributes among various relation schemas?
Logical design (逻辑设计):decision on the database
schema.
Physical design (物理设计): Decision on the physical layout
of the database
E-R Model (实体-关系模型)
The Entity-Relation Model
Models the enterprise as a collection of entities (实体) and
relationships (关系)
Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
Relationship: an association among several entities
Object-relational model
对象关系模型
Extend the relational data model by including object orientation
and structs to deal with added data types.
Allow attributes of tuples to have complex types, including nonatomic values such as nested relations.
Preserve relational foundations, in particular the declarative access
to data, while extending modeling power.
Provide upward compatibility with existing relational languages.
XML (可扩展标记语言)
Extensible markup language
WWW Consortium (W3C)
Originally as a document markup language
The ability to specify new tags, and to create nested tag
structures
great way to exchange data
Basis for all new-generation data interchange formats.
A wide variety of tools is available for parsing, browsing and
querying XML documents/data
XML
Extensional Markup Language
Storage management (存储管理)
Storage manager
A module that provides the interface between the low-level data
stored in the database and the application and queries submitted to
the system.
Responsibilities:
Interaction with the file manager
Efficient storing, retrieving and updating of data
Issues:
Storage access
File organization
Indexing and hashing
Query processing: 3 steps
Parsing and translation (解读与翻译)
Optimization (优化)
Evaluation (评价)
Query processing (cont.)
Alternative ways of evaluating a given query
Equivalent expressions
Different algorithms for each operation
Cost difference between a good and a bad way of evaluating a
query can be enormous
Need to estimate the cost of operations
Depends critically on statistical information about relations
which the database must maintain
Need to estimate statistics for intermediate results to compute
cost of complex expressions
Transaction Management
事务管理
Transaction (事务): A collection of operations that
performs a single logical function in a database application
Transaction-management component (事务管理组
件) ensures consistent (correct) state of database in case of
system failures (e.g., power failures and operating system
crashes) and transaction failures.
Concurrency-control manager (并发控制管理器)
controls the interaction among the concurrent transactions,
to ensure the consistency of the database.
Database Architecture
数据库架构
The architecture is greatly influenced by the underlying
computer system
Centralized (集中式)
Client/Server (C/S, 客户端/服务器)
Parallel (multi-processor, 并行)
Distributed (分布式)
Database users (数据库用户)
Users are distinguished by their way to interact with the database
system
Application Programmers (应用程序员) – interact with
system through DML calls
Specialized users (专业用户) - write specialized applications
that do not fit into the traditional data processing framework
Naïve users (普通用户) – invoke one of the permanent
applications that have been written previously
Persons accessing the database over the web, bank tellers, clerical staff
Database Administrator
数据库管理员
Responsible for coordinating all the activities of the database
system
good understanding of the enterprises’ information resources and
needs
Tasks include:
schema definition (模式描述)
storage structure (存储结构) and access method definition (访问方
法描述)
schema and physical organization and modification (物理层面)
authorities grants (权限分配)
specifying integrity constraints (完整性约束)
acting as liaison with users (与其他用户的联系纽带)
monitoring performance and responding to changes in need (性能检
测、应对需求和危机)
Learning strategies
Know how => know why
适当的不求甚解
After-class assignment 1
Install MySQL on your own computer
Linux: from source package or LAMP
Windows: from rebuilt package or XAMP
Learn some basic knowledge on HTML/PHP and B/S
development
Review of Perl/Python/C programming skills