Transcript Servlets

CS6320 – Performance
L. Grewe
1
Number of requests a website
receives is unpredictable
CNN, NY Times, ABC News
unavailable from 9-10 AM
(Eastern Time)
Page views/day
(in millions)
CNN.com
150
9/11*
100
50
Usual
0
Content providers’ dilemma: how many resources to
provision?
Need on-demand scalabilty
Content Delivery Network (CDN) Solution
Page views/day
(in millions)
CNN.com
Normal
800
50k
12-Sep-01
600
1.2
k
400
200
0
50k
Election day
(Nov 2), 2004
Page was 1.2k
instead of 50k on
12 Sep, 01
Used Akamai on
Election day
Source: http://www.tcsa.org/lisa2001/cnn.txt
http://www.akamai.com/en/html/about/press/press479.h
tml
Typical Web-Site Architecture
Request
User
s
Execute Access
code
DB
Response
App
Web
Server Server
Home
server
DB
CDN Architecture
Internet core
User
s
CDN nodes
Content
providers
CDNs excel at delivering static content.
Advantages of CDNs


Large infrastructure handles load
spikes
Clients charged on a per-usage basis
• no need to guess what resources to
provision

Moves data closer to end-users
• decreases latency and increases
throughput
CDN Application Services
CDN’s can also run applications
Internet
User
s
but for data-intensive dynamic
applications…
DB
database server becomes the bottlene
Methods to scale the database
component

In-house database scalability: [DBCache,
DBProxy, MTCache, NEC Cache Portal]
• Must provision for peak load

Database outsourcing: Database as a service
[Hacigumus+ ICDE ’02, SIGMOD ’02]
• Have to cede control of data

Database Scalability Service (DBSS): Shared
infrastructure that caches applications’ data
[INRIA/LIP6, CIDR ’05, SIGMOD ’06, ICDE ’07]
S3 Database Scalability Service

CDN-like proxy nodes cache results of
database queries
• reduces load on central database servers

All database updates sent to central
server
• clients don’t cede ownership of their data

Uses publish/subscribe system to
maintain data consistency
• avoids additional load at the central
server

Content provider may encrypt
Database Scalability Service
users:
Content Delivery
Network
DBSS
Internet
home server
databases:
Database Scalability Service
users:
Internet
Web and application
servers
DBSS
home server
databases:
Database Scalability Service
client
apps:
DBSS
Internet
home server
databases:
Outline
Need for on-demand scalability
 S3 invalidation mechanism
 Security-scalability tradeoff
 Reducing latency

Addressing consistency

TTL is wasteful:
• Often refresh cached data unnecessarily
(workloads dominated by reads)
• Must set TTL=0 for strong consistency!

Solution: update or invalidate cached data
only when affected by updates
• Naïve approach: home organizations notify
proxy servers of relevant updates  not
scalable
Our approach:
Fully-distributed, proxy-to-proxy
update notification mechanism
Distributed Consistency Mechanism
update
users
update notification
proxy node
Multicast
Environme
nt
update
notification
• Distributed app-level multicast environment, e.g.,
Scribe
• Forward all updates to backend home servers
Configuring Multicast Channels


Key observation: Web applications
typically interact with DB via a small,
fixed set of query/update templates
(usually 10-100)
Example:
SELECT qty FROM inv WHERE id = ?
UPDATE inv SET qty = ? WHERE id = ?
mplates: natural way to configure chann
Options:
Channel-by-query or Channel-by-update
Channel-by-Query Option



One channel per query template Q: C(Q)
Begin caching
result(s) of query
template Q
Subscribe to C(Q)
Evict only query
result for Q
Unsubscribe from C(Q)
Issue update
Determine which query templates
Q1, …, Qn affected; send notification
on each C(Qi)
Few subscriptions/cached result
Many invalidation notifications/update
Conflicts determined lazily (upon update)
Channel-by-Update Option

One channel per update template U: C(U)
Begin caching
result(s) of query
template Q
Determine which update templates
U1, …, Un apply; subscribe to each
C(Ui)
Evict only query
result for Q
Unsubscribe from all C(Ui) above
Issue update using
Send notification on C(U)
template U


Many subscriptions/cached result
Few invalidation notifications/update
nflicts determined eagerly (when caching
Parameter-Specific Channels


Optimization: consider parameter bindings
supplied at runtime … for example:
Q5: SELECT qty FROM inv WHERE id = ?
• When issued with id = 29, create extra
parameter-specific channel C(5, 29)
• Subscribe to both C(5) and C(5, 29)

Upon update:
• If update affects a single item with id = X, send
notification on channel C(5, X)
Saves work if X  29
• Updates affecting multiple items sent to C(5)

S3 Prototype



Tomcat as proxy web server/servlet
container
Proxy database cache written in Java
Queries: access cached data when possible
•
•




Cache JDBC query results (i.e., materialized
views)
Index results by JDBC query representation
MySQL4 as back-end database
Updates: sent to back-end database
Invalidation notifications delivered via Scribe
Experiments on Emulab (Utah) – Thanks!
Benchmark Applications

Bookstore (TPC-W, from UW-Madison)
• Online bookseller, a standard web
benchmark
• Changed the popularity of books

Auction (RUBiS, from Rice)
• Modeled after Ebay

Bulletin board (RUBBoS, from Rice)
• Modeled after Slashdot
Benchmarks model popular websites
Selective: cache queries only if subscribed to
parameter-dependent groups
Impact of Cooperative Caching
Throughput (WIPS)
250
200
NoProxy
150
NoCache
100
SimpleCache
Ferdinand
50
0
bookstore brow sing mix
bookstore shopping mix
auction
Outline
Need for on-demand scalability
 S3 invalidation mechanism
 Security-scalability tradeoff
 Reducing latency

Guaranteeing security in a DBSS setting
Limit ability to observe an application’s
data by:
– DBSS administrator
– Unauthorized application through the
DBSS
Security-Scalability tradeoff in the DBSS
setting
Analyzing the code helps in managing this
tradeoff
A simple solution for guaranteeing security

Outsource database scalability
• Home server: master copies of all
data—handles updates directly

No query execution on the DBSS
• DBSS caches query results (readonly)—kept consistent by invalidation
All data passing through the DBSS can be
encrypted:
Query, Update, Query results
A Simple Example
toys (toy_id, toy_name)
No Invalidations
Q1:toy_id=15
Q1
U1
Nothing
is
15 GI Joe encrypte
d
Home server Database
11 Barbie
Q1:Empty
toy_id=15
DBSS
Q1: SELECT toy_id FROM toys WHERE toy_name=“GI Joe”
U1: DELETE FROM toys WHERE toy_id=5
Invalidate
Empty
Q1:Result
Q1
U1
Q1: Result
11 Barbie
15 GI Joe
Results
are
encrypte
d
More encryption leads to more invalidations
Challenge: providing scalability
while guaranteeing security
When updates occur, DBSS needs to
invalidate
Application faces a dilemma in what data to encrypt
(secure)
More
Less
encryption
encryption
Conservative
Precise
Invalidation
Invalidation
Security
Scalability
Security-scalability tradeoff
Opportunity for managing the tradeoff
Not all data is equally
sensitive
Data
Sensitivity Moderately
Completely
insensitive
sensitive
Bestsellers
list
Inventory records,
customer records
Don’t care
Care but worried
about scalability
impact
Extremely
sensitive
Credit Card
Information
Secure at
all costs
But for most data, nontrivial to
assess:
1. Data-sensitivity
2. Scalability impact of securing the
Key Insight: arbitrary queries and
updates not possible
function get_toy_id ($toy_name) {
$template:=“SELECT toy_id FROM toys
WHERE toy_name=?”;
$query:=attach_to_template ($template, $toy_name);
execute ($query);
…
}
Given templates:
Can statically identify data
not needed for precise invalidation
Data not useful for invalidation: examples
Example 1:
Q1: SELECT toy_id FROM toys WHERE toy_name=?
Q2: SELECT toy_name FROM toys WHERE toy_id=?
No data is needed for precise
invalidation
Example 2:
Q1: SELECT toy_id FROM toys WHERE toy_name=?
U1: DELETE FROM toys WHERE toy_id=?
Query parameters are not needed for precise
invalidation (the query result is needed though)
Security without hurting scalability
Data not needed for
invalidation
Can secure “for free” (without hurting
scalability)
Security Conscious Scalability Approach
[SIGMOD ’06]
As a result,
Tradeoff has to be only managed over remaining
data
Sample experiment: methodology
• Scalability: max # concurrent users with
acceptable response times
• Security: # templates with encrypted
results
Users



5 ms
CDN and DBSS
100
ms
Home server
California Privacy Law determined sensitive
data
Non-transactional invalidation
Start with a cold cache
Benchmark Applications

Bookstore (TPC-W, from UW-Madison)
• Online bookseller, a standard web
benchmark
• Changed the popularity of books

Auction (RUBiS, from Rice)
• Modeled after Ebay

Bulletin board (RUBBoS, from Rice)
• Modeled after Slashdot
Benchmarks model popular websites
Security-Scalability Tradeoff
Q1
SELECT toy_id FROM toys WHERE toy_name=?
Q2
SELECT qty FROM toys WHERE toy_id=?
Q3
SELECT cust_name FROM customers WHERE cust_id=?
U1: DELETE FROM toys WHERE toy_id=5
Template
x
Securit
y
Scalabilit
y
Blind
Template
Parameters Query
result
x
x
Statement
View
x
x
x
Invalidations
All Q1, Q2, Q3
All Q1, Q2
All Q1,
Q2 with toy_id=5
Q1 with toy_id=5
Q2 with toy_id=5
X denotes encrypted,
visible
Scalability (number of
concurrent users
supported)
Magnitude of Security-Scalability tradeoff
View
Statement
Template
Blind
900
600
300
00
0
Auction
Bboard
Benchmark
Applications
Bookstore
Security Results
Query data that can be encrypted “for
free”
Parameters
and result
Result
4
6
18
Nothing
Auctio
n
17
7
12
Bboard
7
7
14
Bookstor
e
Security Results in Detail



Auction: The historical record of user bids
was not exposed
Bboard: The rating users give one another
based on the quality of their posting
Bookstore: Book purchase association rules
discovered by the vendor – customers who
purchase book A also purchase book B
Scalability Conscious Security Approach
(SCSA) to managing the tradeoff
Scalability (Number of
concurrent users
supported)
900
Nothing
encrypted
SCSA
600
Everything
encrypted
300
0
0
5
10
15
20
25
Security (Number of query templates with encrypted
results)
30
1. Easy to either get good scalability or good security
2. SCSA presents a shortcut to manage the tradeoff
Outline
Need for on-demand scalability
 S3 invalidation mechanism
 Security-scalability tradeoff
 Reducing latency

Contributors to User Latency
Request, high
latency
Response, high
latency
Web serverApp server
Database
Traditional architecture
CDN
DBSS
high
latency
Database
DBSS architecture
A single HTTP request  Multiple database
42
Sample Web Application Code
function find_comments ($user_id) {
$template:=“SELECT from_id, body FROM comments
WHERE to_id=?”
$query:=attach_to_template ($template, $user_id)
$result:=execute ($query)
foreach ($row in $result)
print (get_body ($row), get_name (get_id ($row)))
}
(N+1) queries are issued because:
• Convenient for programmers to abstract database
values
• No effect in the traditional setting
Found many examples in the benchmark
43
applications
Reducing User Latency in a DBSS
Setting
Transformations to reduce number of round-trips
1. Group execution of queries: MERGING
transformation
2. Overlap execution of queries: NONBLOCKING
transformation
Web
Application
Transformed Code
Code
Procedural
program with
embedded SQL
44
Holistic
transformation
s using src-tosrc compilers
Transformed
program and SQL