Document 7394773
Download
Report
Transcript Document 7394773
A Server-less Architecture for
Building Scalable, Reliable, and
Cost-Effective Video-on-demand
Systems
Presented by: Raymond Leung Wai Tak
Supervisor: Prof. Jack Lee Yiu-bun
Department of Information Engineering
The Chinese University of Hong Kong
Contents
1. Introduction
2. Challenges
3. Server-less Architecture
4. Reliability Analysis
5. Performance Modeling
6. System Dimensioning
7. Multiple Parity Groups
8. Conclusion
1. Introduction
Traditional Client-server Architecture
Clients connect to server and request for video
Server capacity limits the system capacity
Cost increases with system scale
1. Introduction
Server-less Architecture
Motivated by the availability of powerful user devices
Each user node (STB) serves both as a client and as a mini-server
Each user node contributes to the system
Memory
Processing power
Network bandwidth
Storage
Costs shared by users
1. Introduction
Architecture Overview
Composed of clusters
STB
Playback
STB
STB
STB
STB
Autonomous
Clusters
2. Challenges
Video Data Storage Policy
Retrieval and Transmission Scheduling
Fault Tolerance
Distributed Directory Service
Heterogeneous User Nodes
System Adaptation – node joining/leaving
3. Server-less Architecture
Storage Policy
Video data is divided into fixed-size blocks (Q bytes)
Data blocks are distributed among nodes in the cluster (data striping)
Low storage requirement and load balancing
Capable of fault tolerance using redundant blocks (discussed later)
3. Server-less Architecture
Retrieval and Transmission Scheduling
Round-based scheduler
Grouped Sweeping Scheduling1 (GSS)
1P.S.
Composed of macro rounds and micro rounds
Tradeoff between disk efficiency and buffer requirement
Yu, M.S. Chen & D.D. Kandlur, “Grouped Sweeping Scheduling for DASD-based Multimedia Storage Management”, ACM
Multimedia Systems, vol. 1, pp. 99 –109, 1993
3. Server-less Architecture
Retrieval and Transmission Scheduling
Data retrieved in current micro round will be transmitted
immediately in next micro round
Each retrieval block is divided into b transmission blocks for
transmission
Transmission block size: U Q b
Transmission lasts for one macro round
U bytes
round 0
(group 0)
group 1
round 1
(group 0)
group 2
Transmission
Q bytes
Disk retrieval
Tg: micro round
Tf: macro round
round 2
(group 0)
group 1
group 2
3. Server-less Architecture
Retrieval and Transmission Scheduling
Macro round length
Defined as the time required by all nodes transmitting one retrieval
block
Number of requests served: N
Macro round length:
Tf
NQ
Rv
Micro round length
Each macro round is divided into g micro rounds
Number of requests served: N/g
Micro round length:
Tg
Tf
g
NQ
gRv
3. Server-less Architecture
Modification in Storage Policy
As the retrieval blocks are divided into transmission blocks for
transmission
Video data is striped across transmission blocks, instead of retrieval
blocks
3. Server-less Architecture
Fault Tolerance
Recover from not a single node failure, but multiple simultaneously
node failures as well
Redundancy by Forward Error Correction (FEC) Code
e.g. Reed-Solomon Erasure Code (REC)
3. Server-less Architecture
Impact of Fault Tolerance on Block Size
Tolerate up to h simultaneous failures
To maintain the same amount of video data transmitted in each
macro round, the block size is increased to Qr.
N
Qr Q
N h
Similarly, the transmission block size is increased to Ur.
Ur
Qr
N
U
b
N h
4. Reliability Analysis
Reliability Analysis
Find out the system mean time to failure (MTTF)
Assuming independent node failure/repair rate
Tolerate up to h failures by redundancy
Analysis by Markov chain model
0
1
2
h
...
h
h+1
...
system failure
4. Reliability Analysis
Reliability Analysis
With the assumption of independent failure and repair rate
i ( N i )
i i
Let Ti be the expected time the system takes to reach state h+1 from
state i
T0
1
0
T1
T1
1
1
1
T2
T0
1 1 1 1
1 1
Th
h
h
1
Th 1
Th 1
h h h h
h h
Th 1 0
4. Reliability Analysis
Reliability Analysis
By solving the above set of equations, the system MTTF (T0) is
j 1
h i i k
T0 k j0
i 0 j 0
i k
k 0
With a target system MTTF, we can find the redundancy (h) required
4. Reliability Analysis
Redundancy Level
Defined as the proportion of nodes serving redundant data (h/N)
Redundancy level versus number of nodes on achieving the target
system MTTF
Required Redundancy Level
0.25
0.2
0.15
0.1
50
100
150
200
Achieving MT TF of 1,000 hrs
Achieving MT TF of 10,000 hrs
Achieving MT TF of 100,000 hrs
250
300
Number of Nodes
350
400
450
500
5. Performance Modeling
Storage Requirement
Network Bandwidth Requirement
Buffer Requirement
System Response Time
Assumptions:
Zero network delay
Zero processing delay
Bounded clock jitters among nodes
5. Performance Modeling
Storage Requirement
Let SA be the combined size of all video titles to be stored in the
cluster
With redundancy h, additional storage is required
The storage requirement per node (SN)
SA
SN
N h
5. Performance Modeling
Bandwidth Requirement
Assume video bitrate of Rv bps
Without redundancy, each node transmits (N1) streams of video
data to other nodes in the cluster,
Each stream consuming a bitrate of Rv/N bps
With redundancy h, additional bandwidth is required
The bandwidth requirement per node (CR)
N 1
CR
Rv
N h
5. Performance Modeling
Buffer Requirement
Composed of sender buffer requirement and receiver buffer
requirement
Sender Buffer Requirement
Under GSS scheduling
1
Bs , r 1 NQr
g
1 N2
1
Q
g
(
N
h
)
5. Performance Modeling
Receiver Buffer Requirement
Store the data temporarily before playback
Absorb the deviations in data arrival time caused by clock jitter
Br ,r
b
21 NU
Tf
Total Buffer Requirement
One data stream is for local playback rather than transmission
Buffer sharing for this local playback stream
Subtract b buffer blocks of size Ur from the receiver buffer
b
1 N2
b
NU
Bt ,r 1
Q 2 1
T f N h
g N h
5. Performance Modeling
System Response Time
Time required from sending out request to playback begins
Scheduling delay + pre-fetch delay
Scheduling delay under GSS
Time required from sending out request to data retrieval starts
Can be analyzed using urns model
Detailed derivation available in Lee’s work2
new request
Disk retrieval
Tg: micro round
Tf: macro round
2Lee,
J.Y.B., “Concurrent push-A scheduling algorithm for push-based parallel video servers”, IEEE Transactions on Circuits and Systems
for Video Technology, Volume: 9 Issue: 3 , April 1999, Page(s): 467 -477
5. Performance Modeling
Prefetch delay
Time required from retrieving data to playback begins
One micro round to retrieve a data block and buffering time to fill up
the prefetch buffer of the receiver
Additional delay will be incurred due to clock jitter among nodes
1 1 b
D p 1 T f
g b Tf
6. System Dimensioning
Storage Requirement
What is the minimum number of nodes required to store a given
amount of video data?
For example:
If each node can allocate 2 GB for video storage, then
video bitrate: 4 Mb/s
video length: 2 hours
storage required for 100 videos: 351.6 GB
176 nodes are needed (without redundancy); or
209 nodes are needed (with 33 nodes added for redundancy)
This sets the lower limit on the cluster size
6. System Dimensioning
Network Capacity
How many nodes can be connected given a certain network
switching capacity?
For example:
If the network switching capacity is 32Gbps, and assume 60%
utilization
video bitrate: 4 Mb/s
up to 2412 nodes (without redundancy)
Network switching capacity is not a bottleneck
6. System Dimensioning
Disk Access Bandwidth
Determine the value of Q and g to evaluate the buffer requirement
and the system response time
Finite disk access bandwidth limits the value of Q and g
Disk Model on Disk Service Time
Time required to retrieve data blocks for transmission
Depends on seeking overhead, rotational latency and data block size
Suppose k requests per GSS group
The maximum service time in worst case scenario
t round (k ) – maximum round service time
tround (k , Qr ) k t
max
seek
1 Qr
(k ) k W
rmin
t
-- fixed overhead
max
seek
(k ) – maximum seek time for k requests
W-1 – rotational latency
rmin – minimum transfer rate
Qr – data block size
6. System Dimensioning
Constraint for Smooth Data Flow
Disk service round to be finished before transmission
Disk service time shorter than micro round length
N
T
tround , Qr f
g
g
6. System Dimensioning
Buffer Requirement
Decreasing block size (Qr) and increasing number of groups (g) to
achieve minimum system response time, provided that the smooth
data flow constraint is satisfied
20
15
Buffer (MB)
10
5
0
0
50
100
Receiver Buffer
Sender Buffer
Total Buffer
150
200
250
300
Number of nodes
350
400
450
500
6. System Dimensioning
System Response Time
System response time versus number of nodes in the cluster
15
10
Time (s)
5
0
0
50
100
150
Scheduling Delay
Prefetch Delay
System Response Time
300
250
200
Number of nodes
350
400
450
500
6. System Dimensioning
Scheduling Delay
Relatively constant while system scales up
Prefetch Delay
Time required to receive the first group of blocks from all nodes
Increases linearly with system scale – not scalable
Ultimately limits the cluster size
What is the Solution?
Multiple parity groups
7. Multiple Parity Groups
Primary Limit in Cluster Scalability
Prefetch delay in system response time
Multiple Parity Groups
Instead of single parity group, the redundancy is encoded with
multiple parity groups
Decrease the number of blocks required to receive before playback
Playback begins after receiving the data of first parity group
Reduce the prefetch delay
7. Multiple Parity Groups
Multiple Parity Groups
Transmission of different parity groups are staggered
round 0
node 0
Parity
Group 1
Parity
Group 1
round 2
Parity
Group 1
Transmission
....
.
.
.
round 0
node i
round 1
round 2
Transmission
....
round 0
node j
round 2
Transmission
....
.
.
.
round 0
round 1
round 2
node (N-1) Transmission
....
Parity
Group 2
Parity
Group 2
Parity
Group 2
7. Multiple Parity Groups
Impact on Performance
Buffer requirement
System response time
Redundancy requirement
Buffer Requirement
The number of blocks within same parity group is reduced
Receiver buffer requirement is reduced
Br , p
bp NU
21
Tf p
7. Multiple Parity Groups
System Response Time
Playback begins after receiving the data of first parity group
System response time is reduced
Dp, p
1 1 bp
T
1
g bp T f f
7. Multiple Parity Groups
Redundancy Requirement
Cluster is divided into parity groups with less number of nodes
Higher redundancy level to maintain the same system MTTF
Tradeoff between response time and redundancy level
0.5
Required Redundancy Level
0.4
0.3
0.2
0.1
50
100
MTTF=10,000
MTTF=10,000
MTTF=10,000
MTTF=10,000
MTTF=10,000
150
hrs, p=1
hrs, p=2
hrs, p=3
hrs, p=4
hrs, p=5
200
250
300
Number of Nodes
350
400
450
500
7. Multiple Parity Groups
Performance Evaluation
60
60
50
50
40
40
30
30
20
20
10
10
0
0.1
0.15
0.2
Response Time
Buffer
0.25
0.3
0.35
Redundancy Level
0.4
0.45
0
0.5
Total Buffer (MB)
Buffer requirement and system response versus redundancy level at a
cluster size of 1500 nodes
Both system response time and buffer requirement decrease with
more redundancy (i.e. more parity groups)
System Response Time (sec)
7. Multiple Parity Groups
Cluster Scalability
What are the system configurations if the system
0.35
5
0.3
4
0.25
3
0.2
2
0.15
0
200
400
600
800
1000
Number of Nodes
Redundancy Level, Buffer=16MB
Redundancy Level, Buffer=8MB
Response Time, Buffer=16MB
Response Time, Buffer=8MB
1200
1400
1
1600
System Response Time (sec)
a. achieves a MTTF of 10,000 hours, and
b. keeps under a response time constraint of 5 seconds, and
c. keeps under a buffer requirement of 8/16 MB?
Redundancy Level
7. Multiple Parity Groups
Cluster Scalability
The cluster is divided into more parity groups if it exceeds either
the response time constraint, or
the buffer constraint
The redundancy level keeps relatively constant as the increased
cluster size results in improved redundancy efficiency that
compensates for the increased redundancy overhead incurred by the
multiple parity group scheme (eg. 16 MB buffer constraint)
7. Multiple Parity Groups
Shifted bottleneck in Cluster Scalability
Transmission buffer increases linearly with cluster scale and cannot
be reduced by multiple parity group scheme
The system is forced to divided into more parity groups to reduce the
receiver buffer requirement to stay within the buffer constraint
The redundancy overhead is sharply increased and the system
response system is sharply reduced (eg. 8 MB buffer constraint)
Eventually the total buffer requirement exceeds the buffer constraint
even the cluster is further divided into more parity groups
Scalability Bottleneck Shifted to the Buffer Requirement
System can be further scaled up by forming autonomous clusters
8. Conclusion
Server-less Architecture
Scalable
Reliable
Acceptable redundancy level to achieve reasonable response time in a
cluster
Further scale up by forming new autonomous clusters
Fault tolerance by redundancy
Comparable reliability as high-end server by the analysis using Markov
chain
Cost-Effective
Dedicated server is eliminated
Costs shared by all users
8. Conclusion
Future Work
Distributed Directory Service
Heterogeneous User Nodes
Dynamic System Adaptation
Node joining/leaving
Data re-distribution
End of Presentation
Thank you
Question & Answer Session.