Document 7157486
Download
Report
Transcript Document 7157486
A Server-less Architecture for Building
Scalable, Reliable, and Cost-Effective
Video-on-demand Systems
Raymond Leung and Jack Y.B. Lee
Department of Information Engineering
The Chinese University of Hong Kong
Contents
Introduction
Server-less Architecture
Performance Evaluation
System Scalability
Summary
Introduction
Client-Server Architecture
Traditional client-server architecture
clients connect to server for streaming
system capacity limited by server capacity
Introduction
Motivation
Limitation of client-server system
Availability of powerful client-side device, or called settop box (STB)
system capacity limited by server capacity
high-capacity server is very expensive
home entertainment center - VCD/DVD player, digital music
jukebox, etc.
relatively high processing capability, and local HD storage
Server-less architecture
eliminates the dedicated server
each user node (STB) serves both as a client and as a miniserver
fully distributed storage, processing, and streaming
Architecture
Server-less Architecture
Basic principles
dedicated server is eliminated
users are divided into clusters
video data is distributed to nodes in a cluster
Architecture
Challenges
Data placement policy
Retrieval and transmission scheduling
Fault tolerance
Distributed directory service
System adaptation and dynamic reconfiguration
etc.
Architecture
Data Placement Policy
Block-based striping
video data is divided into fixed-size blocks and then distributed
among nodes in the cluster
low storage requirement, load balanced
capable of fault tolerance using redundant unit(s)
Architecture
Retrieval and Transmission Scheduling
Round-based Schedulers
retrieves data block in each micro-round
transmission starts at the end of micro-round
transmission rate: Rv/N
round 0
group 0
round 1
Transmission
Q bytes
Disk retrieval
Tg: micro round
Tf: macro round
group 0
round 2
Architecture
Retrieval and Transmission Scheduling
Disk retrieval scheduling
Grouped Sweeping Scheme1 (GSS)
able to control the tradeoff between disk efficiency and buffer
requirement
Transmission scheduling
Macro round length
time required that every node sends out a data block of Q bytes
depends on system scale, data block size and video bitrate
Tf
nQ
Rv
Tf – macro round length
n – number of nodes within a cluster
Q – data block size
Rv – video bit-rate
1P.S.
Yu, M.S. Chen & D.D. Kandlur, “Grouped Sweeping Scheduling for DASD-based Multimedia Storage Management”, ACM
Multimedia Systems, vol. 1, pp. 99 –109, 1993
Architecture
Retrieval and Transmission Scheduling
Transmission scheduling
Micro round length
under the GSS scheduling, the GSS group duration within each
macro round
depends on macro round length and number of GSS groups
Tg
Tf
g
NQ
gRv
Tg – micro round length
Tf – macro round length
g – number of GSS groups
Architecture
Fault Tolerance
Node characteristics
lower reliability than high-end server
shorter mean time to failure (MTTF)
system fails if any one of the nodes fails
Fault tolerance mechanism
erasure correction code to implement fault tolerance
2A.
Reed-Solomon Erasure code2 (RSE)
retrieve and transmit coded data at higher data rate
recover data blocks at the receiver node
J. McAuley, “Reliable Broadband Communication Using a Burst Erasure Correcting Code”, in Proc. ACM SIGCOMM 90, Philadelphia,
PA, September 1990, pp. 287–306.
Architecture
Fault Tolerance
Redundancy
encode redundant data from video data
recover lost data in case of node failure(s)
Performance Evaluation
Performance Evaluation
Storage capacity
Network capacity
Disk access bandwidth
Buffer requirement
System response time
Performance Evaluation
Storage Capacity
What is the minimum number of nodes required to store a
given amount of video data?
For example:
If each node can allocate 1GB for video storage, then
video bitrate: 150 KB/s
video length: 2 hours
storage required for 100 videos: 102.9GB
103 nodes are needed (without redundancy); or
108 nodes are needed (with 5 nodes added for redundancy)
This sets the lower limit on the cluster size.
Performance Evaluation
Network Capacity
How many nodes can be connected given a certain
network switching capacity?
For example:
If the network switching capacity is 32Gbps, and assume
60% utilization
video bitrate: 150KB/s
up to 8388 nodes (without redundancy)
Network switching capacity is not a bottleneck.
Performance Evaluation
Disk Access Bandwidth
Recall the retrieval and transmission scheduling:
transmission rate: Rv/N
round 0
group 0
round 1
group 0
round 2
Transmission
Q bytes
Disk retrieval
Tg: micro round
Tf: macro round
Continuous data transmission constraint:
must finish retrieval before transmission in each micro-round
need to quantify the disk retrieval round length and verify
against the above constraint
Performance Evaluation
Disk Access Bandwidth
Disk retrieval round length
time required retrieving data blocks for transmission
depends on seeking overhead, rotational latency and data block
size
suppose k requests per GSS group
t round (k ) – maximum retrieval round length
Q
max
tround (k , Q ) k tseek
(k ) k W 1
rmin
t
-- fixed overhead
max
seek
(k ) – maximum seek time for k requests
W-1 – rotational latency
rmin – minimum transfer rate
Q – data block size
Continuous data transmission constraint:
N Tf
t round , Q
g
g
Performance Evaluation
Disk Access Bandwidth
Example:
3G.
Disk: Quantum Atlas 10K3
Data block size (Q): 4KB
Video bitrate (Rv): 150KB/s
Number of nodes: N
GSS group number (g): N (reduced to FCFS scheduling)
Tf
NQ Q
0.027s
Micro round length: Tg
g gRv Rv
Disk retrieval round length: 0.017s < 0.027s
Therefore the constraint is satisfied even if FCFS
scheduler is used.
Ganger and J. Schindler, “Database of Validated Disk Parameters for DiskSim”, http://www.ece.cmu.edu/~ganger/disksim/diskspecs.html
Performance Evaluation
Buffer Requirement
Receiver buffer requirement
double-buffering scheme:
one for storing data received from the network plus locally
retrieved data blocks
another one for video decoder
Br 2NQ
Sender buffer requirement
under GSS scheduling:
1
Bs 1 NQ
g
Performance Evaluation
Buffer Requirement
Total buffer requirement versus system scale
Data block size: 4KB, Number of GSS groups: g=N
6
Buffer Requirement (MB)
4
2
0
0
100
Receiver Buffer
Sender Buffer
Total Buffer
200
300
Number of Nodes
400
500
Performance Evaluation
System Response Time
System response time
Scheduling delay under GSS
time required from sending out request to data retrieval starts
can be analyzed using urns model
detailed derivation available elsewhere4
Prefetch delay
4Lee,
time required from sending out request to playback begins
scheduling delay + pre-fetch delay
time required from retrieving data to playback begins
one micro round to retrieve a data block and one macro round to
transmit the whole block to the client node
J.Y.B., “Concurrent push-A scheduling algorithm for push-based parallel video servers”, IEEE Transactions on Circuits and Systems
for Video Technology, Volume: 9 Issue: 3 , April 1999, Page(s): 467 -477
Performance Evaluation
System Response Time
For example:
Data block size: 4KB
100
10
Time (sec)
1
0.1
0.01
0
100
200
300
Number of Nodes
Scheduling Delay
Prefetch Delay
System Response Time
400
500
System Scalability
System Scalability
Not limited by network or disk bandwidth
Limited by system response time
prefers FCFS disk scheduler over SCAN
prefetch delay increases linearly with system scale
example: response time of 5.615s at a scale of 200 nodes
Solution
forms new clusters to expand system scale
uses smaller block size (limited by disk efficiency)
Summary
Summary
Server-less architecture proposed for VoD
Challenges addressed:
dedicated server is eliminated
each node serves as both a client and a mini-server
inherently scalable
data placement policy
retrieval and transmission scheduling
fault tolerance
Performance evaluation
acceptable storage and buffer requirement
scalability limited by system response time
End of Presentation
Thank you
Question & Answer Session
Appendix
Reliability
Higher reliability achieved by redundancy
1
1
each node has independent failure and recovery rate, and
respectively
let state i be the system state where i out of the N nodes failed
at state i, the changing rate to state (i+1) and (i-1) are i and i
respectively
assume the system can tolerate up to h failures using
redundancy
the system state diagram is shown as follows:
0
1
2
h
...
h
h+1
...
system failure
Appendix
Reliability
System mean time to failure (MTTF)
can be analyzed by continuous time Markov Chain model
solving the expected time from state 0 to state (h+1) in previous
diagram,
j 1
i i k
h
T0 k j0
i 0 j 0
i k
k 0
Appendix
Impact of Redundancy
Bandwidth requirement (without redundancy)
(N-1) received from network and one locally retrieved from disk
C
N 1
Rv
N
Rv – video bit-rate
Bandwidth requirement (with h redundancy)
additional network bandwidth will be needed for transmitting
the redundant data
N
N 1
CR C
Rv
N h N h
Appendix
Impact of Redundancy
Data block size (without redundancy)
block size: Q bytes
Data block size (with h redundancy)
block size: Qr Q N
N h