PADRES A Content-based Pub/Sub System

Download Report

Transcript PADRES A Content-based Pub/Sub System

MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Distributed Ranked Data
Dissemination in Social Networks
ICDCS, July 9-11th 2013
Kaiwen Zhang
Joint work with:
Mo Sadoghi
Vinod Muthusamy
Hans-Arno Jacobsen
http://www.padres.msrg.utoronto.ca
University of Toronto
2
Top-k &
publish/subscribe
for social networks
match &
publisher
‘John
name
Doe’
name =
= `John
`JohnDoe’
Doe’
location
location =
= `USA’
`Philadelphia’
`New York’
Advertisement path
Subscription path
Publication path
forward
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
broker
subscriber
name = `John Doe’
subscriber
location = `America’
k = 1, W = 2
Closest to Philadelphia
2
Use cases

Events-heavy applications require top-k

Social networks


News feeds homepage
Location-based applications


MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Online games
Efficient support for top-k in pub/sub



Top-k publications for a subscription
Mixed subscriptions (top-k and regular)
Topology is provided
3
Outline





MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Top-k model for publish/subscribe
Related work
Current late vs. naive early approach
Proposed window chunking solution
Evaluation
4
Top-k processing
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Regular
broker window
operation
Count-based
parameters supplied by
subscription:
k is # publications
Each publication is scored W is window size
and the top-k
δ is shift size
are extracted
5
Related work

Top-k computation for pub/sub




Defining scoring functions
Data structures for storing top-k results
Approximate solutions based on histograms
Top-k processing in database


MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Reverse problem: find existing data for a query
No work on top-k dissemination in pub/sub


Top-k computation occur within a single broker
Collect the entire stream at the edge
6
Current late approach
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Rest of the
topology
is agnostic
to is collected
The
entire
matching
stream
Can wethe
reduce
traffic
top-katintra-network
semantics:
the
edge and upstream?
processed
bythey
pushing
the
top-k
computation
simply Submit
forward
matching
events
top-k subscription
to adetermine
the
top-k.
{k = 2, w = 4, δ = 1}
(a,b,c,d)
This approach is not efficient!
Low scoring publications are propagated to the edge
and then filtered out.
[1,2,a,b][2,a,b,3][a,b,3,c][b,3,c,4][3,c,4,d]
(1,2,a,b,3,c,4,d) =>
[1,2,a,b][2,a,b,3][a,b,3,c][b,3,c,4][3,c,4,d]
(1,2,3,4)
Maintains top-k processing &
converts into regular subscription
7
Naive early approach
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Merge top-k streams to obtain final results
Only disseminate top-k publications
(1,2,b,c)
[a,b,c,d] (b,c)
[1,2,3,4]
(1,2)
Fill windows at publisher edge &
compute top-k publications
8
Correctness criterion

Goal - same result as the late approach


No false positives or negatives
Stream reconstruction criterion


MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
A stream of top-k publications is correct if its
reconstructed stream of all publications,
possible according to the ordering semantics, can
be processed centrally to obtain the same
result.
Ordering guarantees?


Consider per-publisher FIFO ordering
Multiple interleavings of publications possible
9
Naive counter-example
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Publications are delivered according
to per-source ordering
Reconstructing the stream:
we fail to consider windows such as
[b,c,d,1] which are “overlapping”Fill windows
at the publisher edge brokers
Forward local top-k results
k = 2, W = 4, δ = 1
1
0
Overlapping windows problem

Key idea: Send a few more publications



MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Enough to prevent false negatives
Less than all publications to be efficient
Key insight: Computing overlapping
windows



Publishers compute windows for own publications
Windows which contain publications from
different source brokers can only be computed
downstream
Need full knowledge of publications in such
windows
1
1
Window chunking technique
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Left and right guards are
Each chunk contains publications
full windows of publications
from a single source broker
that start and end a chunk
Overlapping windows
Chunks contain a stream of top-k publications
can only occur in the intrachunk region,
for successive windows
for which
havemust
guard
windows which can be
The subscriber
edge we
broker
fully
...before choosing another chunk to process
 Hybrid solution
processed downstream
process one chunk...




Send all publications for overlapping windows
Reduce the occurrence of overlapping windows
Early top-k filtering of local source windows
Late top-k filtering of overlapping windows
1
2
Evaluation summary

Setup:



Normalized to late approach
Sensitivity analysis


PADRES implementation
SciNet: cluster of ~1000 cores
Main metric: throughput reduction


MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Top-k semantics, workload, etc.
Performance analysis

Traces from Twitter and Facebook
1
3
Timing sensitivity
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Does not scale when mixed
1
4
Offset chunks
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Publication is filtered
for S1 but part of the guard of S2:
must be forwarded
Future work:
solve the issue by
synchronizing chunks
adaptively
A publication can only be filtered
if it is not part of any guards
1
5
Social workload properties
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Use of popularity
as scoring function
1
6
Social workload properties
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Top-k “cuts” the long tail
of Facebook popularity function:
unpopular publications are filtered
Twitter has a wider tail:
a wider variety of publications
are found in top-k's
Offset chunks are present:
windows are filled at different times
1
7
Conclusions

Top-k support for publish/subscribe



Reduce intra-network traffic
Maintain correctness
Proposed hybrid chunking solution



For event-heavy applications (social networks)
Efficient top-k distribution


MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Early top-k computation of local windows
Late top-k computation of overlapping windows
Evaluation observations


Need for chunk synchronization
Topic popularity in social networks beneficial
1
8
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Thank you! Questions?
padres.msrg.org
1
9
Scoring function sensitivity
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Uniform distribution:
does not scale as all publications
are selected by at least one
Zipfian distribution:
subscriber
traffic reduction even at 1000 subscriptions
Same top-k for every subscription:
maximize pruning
2
0
ImpactNon-buffering
of deduplication
solution even worse!
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Deduplication is essential
Constant traffic reduction
(Best-case scalability)
2
1
Latency comparison
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Similar latency:
Computation overhead
is not considered
2
2