Document 7157486

Transcript Document 7157486

A Server-less Architecture for Building
Scalable, Reliable, and Cost-Effective
Video-on-demand Systems
Raymond Leung and Jack Y.B. Lee
Department of Information Engineering
The Chinese University of Hong Kong
Contents





Introduction
Server-less Architecture
Performance Evaluation
System Scalability
Summary
Introduction
Client-Server Architecture

Traditional client-server architecture


clients connect to server for streaming
system capacity limited by server capacity
Introduction
Motivation

Limitation of client-server system



Availability of powerful client-side device, or called settop box (STB)



system capacity limited by server capacity
high-capacity server is very expensive
home entertainment center - VCD/DVD player, digital music
jukebox, etc.
relatively high processing capability, and local HD storage
Server-less architecture



eliminates the dedicated server
each user node (STB) serves both as a client and as a miniserver
fully distributed storage, processing, and streaming
Architecture
Server-less Architecture

Basic principles



dedicated server is eliminated
users are divided into clusters
video data is distributed to nodes in a cluster
Architecture
Challenges






Data placement policy
Retrieval and transmission scheduling
Fault tolerance
Distributed directory service
System adaptation and dynamic reconfiguration
etc.
Architecture
Data Placement Policy

Block-based striping



video data is divided into fixed-size blocks and then distributed
among nodes in the cluster
low storage requirement, load balanced
capable of fault tolerance using redundant unit(s)
Architecture
Retrieval and Transmission Scheduling

Round-based Schedulers


retrieves data block in each micro-round
transmission starts at the end of micro-round
transmission rate: Rv/N
round 0
group 0
round 1
Transmission
Q bytes
Disk retrieval
Tg: micro round
Tf: macro round
group 0
round 2
Architecture
Retrieval and Transmission Scheduling

Disk retrieval scheduling



Grouped Sweeping Scheme1 (GSS)
able to control the tradeoff between disk efficiency and buffer
requirement
Transmission scheduling

Macro round length


time required that every node sends out a data block of Q bytes
depends on system scale, data block size and video bitrate
Tf 
nQ
Rv
Tf – macro round length
n – number of nodes within a cluster
Q – data block size
Rv – video bit-rate
1P.S.
Yu, M.S. Chen & D.D. Kandlur, “Grouped Sweeping Scheduling for DASD-based Multimedia Storage Management”, ACM
Multimedia Systems, vol. 1, pp. 99 –109, 1993
Architecture
Retrieval and Transmission Scheduling

Transmission scheduling

Micro round length


under the GSS scheduling, the GSS group duration within each
macro round
depends on macro round length and number of GSS groups
Tg 
Tf
g

NQ
gRv
Tg – micro round length
Tf – macro round length
g – number of GSS groups
Architecture
Fault Tolerance

Node characteristics




lower reliability than high-end server
shorter mean time to failure (MTTF)
system fails if any one of the nodes fails
Fault tolerance mechanism

erasure correction code to implement fault tolerance



2A.
Reed-Solomon Erasure code2 (RSE)
retrieve and transmit coded data at higher data rate
recover data blocks at the receiver node
J. McAuley, “Reliable Broadband Communication Using a Burst Erasure Correcting Code”, in Proc. ACM SIGCOMM 90, Philadelphia,
PA, September 1990, pp. 287–306.
Architecture
Fault Tolerance

Redundancy


encode redundant data from video data
recover lost data in case of node failure(s)
Performance Evaluation
Performance Evaluation





Storage capacity
Network capacity
Disk access bandwidth
Buffer requirement
System response time
Performance Evaluation
Storage Capacity


What is the minimum number of nodes required to store a
given amount of video data?
For example:




If each node can allocate 1GB for video storage, then



video bitrate: 150 KB/s
video length: 2 hours
storage required for 100 videos: 102.9GB
103 nodes are needed (without redundancy); or
108 nodes are needed (with 5 nodes added for redundancy)
This sets the lower limit on the cluster size.
Performance Evaluation
Network Capacity


How many nodes can be connected given a certain
network switching capacity?
For example:


If the network switching capacity is 32Gbps, and assume
60% utilization


video bitrate: 150KB/s
up to 8388 nodes (without redundancy)
Network switching capacity is not a bottleneck.
Performance Evaluation
Disk Access Bandwidth

Recall the retrieval and transmission scheduling:
transmission rate: Rv/N
round 0
group 0
round 1
group 0
round 2
Transmission
Q bytes
Disk retrieval
Tg: micro round
Tf: macro round

Continuous data transmission constraint:


must finish retrieval before transmission in each micro-round
need to quantify the disk retrieval round length and verify
against the above constraint
Performance Evaluation
Disk Access Bandwidth

Disk retrieval round length



time required retrieving data blocks for transmission
depends on seeking overhead, rotational latency and data block
size
suppose k requests per GSS group
t round (k ) – maximum retrieval round length

Q
max
tround (k , Q )  k  tseek
(k )  k W 1 
rmin





t
-- fixed overhead
max
seek
(k ) – maximum seek time for k requests
W-1 – rotational latency
rmin – minimum transfer rate
Q – data block size

Continuous data transmission constraint:
 N  Tf
t round  , Q  
g
 g
Performance Evaluation
Disk Access Bandwidth

Example:






3G.
Disk: Quantum Atlas 10K3
Data block size (Q): 4KB
Video bitrate (Rv): 150KB/s
Number of nodes: N
GSS group number (g): N (reduced to FCFS scheduling)
Tf

NQ Q


 0.027s
Micro round length: Tg 
g gRv Rv

Disk retrieval round length: 0.017s < 0.027s
Therefore the constraint is satisfied even if FCFS
scheduler is used.
Ganger and J. Schindler, “Database of Validated Disk Parameters for DiskSim”, http://www.ece.cmu.edu/~ganger/disksim/diskspecs.html
Performance Evaluation
Buffer Requirement

Receiver buffer requirement

double-buffering scheme:


one for storing data received from the network plus locally
retrieved data blocks
another one for video decoder
Br  2NQ

Sender buffer requirement

under GSS scheduling:

1
Bs  1   NQ
g

Performance Evaluation
Buffer Requirement
Total buffer requirement versus system scale

Data block size: 4KB, Number of GSS groups: g=N
6
Buffer Requirement (MB)

4
2
0
0
100
Receiver Buffer
Sender Buffer
Total Buffer
200
300
Number of Nodes
400
500
Performance Evaluation
System Response Time

System response time



Scheduling delay under GSS




time required from sending out request to data retrieval starts
can be analyzed using urns model
detailed derivation available elsewhere4
Prefetch delay


4Lee,
time required from sending out request to playback begins
scheduling delay + pre-fetch delay
time required from retrieving data to playback begins
one micro round to retrieve a data block and one macro round to
transmit the whole block to the client node
J.Y.B., “Concurrent push-A scheduling algorithm for push-based parallel video servers”, IEEE Transactions on Circuits and Systems
for Video Technology, Volume: 9 Issue: 3 , April 1999, Page(s): 467 -477
Performance Evaluation
System Response Time
For example:

Data block size: 4KB
100
10
Time (sec)

1
0.1
0.01
0
100
200
300
Number of Nodes
Scheduling Delay
Prefetch Delay
System Response Time
400
500
System Scalability
System Scalability

Not limited by network or disk bandwidth


Limited by system response time



prefers FCFS disk scheduler over SCAN
prefetch delay increases linearly with system scale
example: response time of 5.615s at a scale of 200 nodes
Solution


forms new clusters to expand system scale
uses smaller block size (limited by disk efficiency)
Summary
Summary

Server-less architecture proposed for VoD




Challenges addressed:




dedicated server is eliminated
each node serves as both a client and a mini-server
inherently scalable
data placement policy
retrieval and transmission scheduling
fault tolerance
Performance evaluation


acceptable storage and buffer requirement
scalability limited by system response time
End of Presentation
Thank you
Question & Answer Session
Appendix
Reliability

Higher reliability achieved by redundancy
1
1
each node has independent failure and recovery rate,  and 
respectively
let state i be the system state where i out of the N nodes failed
at state i, the changing rate to state (i+1) and (i-1) are i and  i
respectively
assume the system can tolerate up to h failures using
redundancy
the system state diagram is shown as follows:






0

1

2

h
...
h
h+1
...
system failure
Appendix
Reliability

System mean time to failure (MTTF)


can be analyzed by continuous time Markov Chain model
solving the expected time from state 0 to state (h+1) in previous
diagram,
j 1

 i   i k
h
T0     k j0

i 0 j 0
  i k
k 0







Appendix
Impact of Redundancy

Bandwidth requirement (without redundancy)

(N-1) received from network and one locally retrieved from disk
C

N 1
Rv
N
Rv – video bit-rate
Bandwidth requirement (with h redundancy)

additional network bandwidth will be needed for transmitting
the redundant data
N
N 1
CR  C

Rv
N h N h
Appendix
Impact of Redundancy

Data block size (without redundancy)


block size: Q bytes
Data block size (with h redundancy)

block size: Qr  Q N 
 N h

Document 7157486

Transcript Document 7157486

Directory