Transcript Slide 1
Multi-tier Architectures
&
Distributed Databases
CP3410
Daryle Niedermayer, I.S.P.,
PMP
Topics:
A history of database processing
• Dumb Terminals & Mainframes
• Client-Server
• Multi-tier Configurations
The Need for Reliability
• New Hardware Configurations
E-commerce Considerations
Distributed Systems
A Brief History of Database
Processing
Computers as a tool of modern
business only took off in the late
1950’s/early 1960’s.
For the first 20 years (~1960-1980)
databases sat on a large mainframe
computer. Users connected directly
to the mainframe using “dumb
terminals.”
What are Dumb Terminals?
They are a monitor and a keyboard
and a network connection
• There is no hard-drive, no CPU
• They can’t do work on their own
• They know enough to connect to the
mainframe
• Data entered by a user is sent to the
mainframe for processing
• The mainframe sends the results back
to the terminal to draw on the screen
They are a way for users to work on
the mainframe while sitting in their
own offices
All processing was done by the
mainframe. The terminal was just an
input/output device.
What were Dumb Terminals like?
Pros:
• Very fast (for its day)
• Easy
• Good enough for the amount of data
required (which wasn’t much)
Cons:
• Reports were simple; not well formatted
• Everyone got to watch a black screen
with green printing all day.
Client-Server Architectures
(aka 2-Tier Architectures)
1980-present
With introduction of smart
workstations and PC’s processing
could be shared between the
mainframe and the local terminal:
• Early workstations included SUN, PDP11s (and other DEC PDP minicomputers)
• Eventually IBM-386s, 486s, Pentiums
Server Roles
Store the data
Organize, index and manipulate data
Manage contention and data
concurrency
Receive and process queries and
other operations
Client Roles
Decide what operation to ask the
server to perform
Display and format data
Some notes on Client-Server
The client has some “smarts” (unlike
the dumb terminals):
• Using software it decides what data it
needs from the server.
• It asks for that data and receives the
results.
• It formats or uses the results for further
processing.
Examples: MySQL Browser, MS-SQL
Pros:
• Shares the processing between the server and
client
• Both sides can play to their strengths
• Only the data that is needed goes over the
network
Cons:
• Takes more expensive hardware at the client
end
• Software can be more expensive (a copy for
every workstation)
A note on MS Access
MS Access can look like a ClientServer, but it usually isn’t.
• Most of the time, the database sits on a
fileserver, not a database server. This
means that the entire file must be
downloaded to the local machine before
Access can use any of it. This is not the
way for a client-server to behave!
However, it is possible…
MS Access can be used as a client
front-end with a full Database Server
handling the server side.
• MS-SQL or MySQL can serve as the
“back-end” server.
• MS Access on the client then connects
to the server using an ODBC
connection.
Multi-Tier Architectures
(aka n-Tier Architectures)
1990-Present
Came with the birth of the Internet
and TCP/IP
• TCP/IP gives us a way for machines to
communicate regardless of what
application they are using
N-Tier means more than 2-Tier
Internet Applications
Internet Applications are almost
always N-Tier
• Need to be very scalable (quickly grow
capacity)
• Need to have high-availability (it’s
always business hours somewhere
around the world)
• Need to have strong security
The Need for Reliability
In the previous slide, there were
multiple Application Servers
This allows for:
• The system to respond to huge
differences in traffic volumes
• The system to still be available even if
one server crashes
More servers can be added to meet
demand
Other Redundancies
Although the diagram does not show
it, additional firewalls and Proxy
Servers can be added for
redundancies as well.
High Availability Configurations
Hardware can be configured to have
“High Availability”
• HA means that the hardware itself will
recover from a system problem without
having to wait for human intervention.
• Recovery typically takes under 15
seconds.
High Availability Appliances
Firewall Appliances and Proxy
Servers usually have static
configurations:
• Their content and configurations do not
change often;
• Their content and configuration only
change as a result of operator input
HA Firewalls
Both firewalls are powered on
with identical configurations
A “heartbeat” signal is shared
between them every few
seconds
If the Standby Firewall does
not get a heartbeat when
expected, it takes over the IP
address and traffic of the
Active Firewall until an
operator fixes the problem
HA Databases
HA Databases are much more
difficult:
• How do you take over the data when it
changes all the time?
• How do you take over in the middle of a
transaction?
• How do you take over the data if it is on
a hard drive inside a disabled server?
SAN to the Rescue
Storage Area Networks (SANs) store data
outside of a server.
They are huge racks of disk drives that
are connected to a SAN controller.
The SAN controller along with its switches
is known as “the Fabric” (you’ll see why).
The SAN controller itself is also mirrored in
a HA configuration.
Together with its
servers, a SAN is
more of a “Fabric”
than a network.
Any failure is
immediately
recoverable
through other
connections
Capacity
A SAN can hold terabytes (1,000 Gb) or
even petabytes (1,000,000 Gb) of data for
dozens or even hundreds of servers at the
same time.
SAN disks are usually configured in a RAID
array so that disks are mirrored. This way,
if one disk fails, the data is still on at least
one other disk.
Connections usually Fibre-Optics rather
than copper wires to ensure high
bandwidth and transmission speeds.
Back to HA Databases…
If we put our data on a SAN rather
than on a hard drive inside the DB
server, we can still access the data
even if the DB server itself fails.
A Stand-by server then just takes
over the Fabric connections of the
sick server as well as it’s IP Address
and Network connections.
HA Clusters
Because we’re not failing over
everything (since the data is on the
SAN), the DB servers only need
enough disk space to boot
themselves up.
We call this configuration a “Cluster”
and each physical server is a “Node”
in the “Cluster”
Other Advantages of Clusters
Multiple Database servers can
provide load balancing for each other
• We can even have 3 or more nodes with
2 or more active and the last one
serving as a spare for any of the others
By manually switching in the
Standby server, the Active Server
can be upgraded without taking a
system outage
E-Commerce Considerations
In planning an E-commerce system,
you need to consider the following:
• If you’re customers are all over the
world, you can never unplug your
system for maintenance without losing
customers.
• You need to manage transactional
integrity across multiple Application
Servers.
• Transactions need to be managed across
multiple web pages:
During the first dozen pages, the user puts
together their shopping cart
Then the user goes to “check-out.” This
involves a few more pages as they input
their identity, their shipping information,
and their payment details.
• What if they abandon the transaction?
When do you rollback?
• How do you protect customers’ data?
What personal information do you store
about your customers in your database?
Do you store this information “in the clear”
(plaintext) or encrypted so that no one else
can make use of it if your system is
cracked?
How do you protect your customer’s
information from your own employees?
Distributed Systems
Imagine a Database
Cluster that spans the
globe:
•
•
•
•
One
One
One
One
node
node
node
node
is
is
is
is
in
in
in
in
London
Tokyo
Toronto
Doha
This is a Distributed
Database
Management System
or DDBMS
Why DDBMS?
Communications used to be
expensive
• Rather than have 1000 employees all
over the world connect over a 56K
modem to a DBMS in London, we would
pay for high speed connections between
each DBMS node and then have users
connect to their local node (at cheaper
rates)
For Example:
A modem call over a telephone line from
Doha to a non-GCC country costs about
$0.90/minute.
If there are 100 users in Doha, this would
cost $90/minute or $5400/hour for these
users to connect.
It may be cheaper to put a database
server in Doha and then synchronize the
data over a high-speed line.
Why This Doesn’t Work…
Outside of Qatar, international
telephone line charges are now
about $0.02/minute. For 100 users,
this works out to $120/hour which is
certainly affordable if users need
dial-up.
Why This Doesn’t Work (2)
As well, High Speed Internet costs
have also dropped:
• 512 Kbps (effectively about 380Kbps)
costs about $60USD/month in Qatar.
• In Canada, 6,500 Kbps costs about
$45USD.
So, it’s not a problem for everyone to
connect to the database in London.
Why This Doesn’t Work (3)
DDBMS also have a great deal of
difficulty with:
• Synchronizing data: How do you
manage concurrency across thousands
of miles and different networks and
telephone companies? (You thought
database locking on a local machine was
hard)
• Networking:
DDBMS require very high speed networks.
There is a lot of data to be synchronized
constantly
DDBMS need very fault tolerant networks.
Network paths between nodes need to be
redundant and reliable
These networks are very, very expensive
• Security: How do you make sure the
data is being transmitted between
nodes securely?
• Increased Storage: You are copying
data in every location. This requires
duplicate hardware (SANs are not
cheap) and a lot of extra disk space.
• Increased demand for very specialized
expertise. The knowledge of how to look
after a DDBMS is not easy to come by.
These people are in demand.
Where a DDBMS Makes Sense
When you can copy the same metadata
across all systems but the actual data is
geographically specific.
• Eg. The customer and employee data for Qatar
is stored in Doha and no where else; the
customer and emplyee data for Europe is
stored in London and no where else. If an
employee transfers from London, his record is
physically moved from the London to the Doha
database server.
Other uses of DDBMS
Disaster Recovery planning for some
HA Financial Systems as well as
public health and safety systems:
• Credit Card authorizations (Visa,
MasterCard)
• Banking Systems (ATMs)
• Public Utilities (999 service and
Telephone companies)
• Air Traffic Control Systems
Assessment of DDBMS
There are very few reasons to have a
DDBMS
• They are expensive to set up and run
• They have problems in managing data
synchronization (making sure that all
the data is up to date in all nodes)
• There are usually better, cheaper
options to share the data across a large
geographic area.
Acknowledgements
SAN photograph:
www.nasi.com/images/IBM_SAN256
M.jpg
SAN Configuration:
http://www.microsoft.com/library/m
edia/1033/technet/images/itsolutions
/wssra/raguide/storagedevices/igsdp
g03_big.gif