Gigascope : a stream database for network monitoring

Download Report

Transcript Gigascope : a stream database for network monitoring

Heartbeat Mechanism and its
Applications in Gigascope
Vladislav Shkapenyuk (speaker),
Muthu S. Muthukrishnan
Rutgers University
Theodore Johnson
Oliver Spatscheck
AT&T Labs – Research
Unblocking streaming operators
• Data stream management systems (DSMS) work
with infinite stream of tuples
• How to get answers out of join, aggregation, etc.,
before the end of time?
– limit the scope of output tuples which input tuple can
affect
• Two views
– define a window over the input streams for the
blocking operators (STREAM, TelegraphCQ)
– use a pipelined operator, make use of an existing sort
order (Gigascope, Tribeca)
• most queries make reference to timestamps
Unblocking streaming operators
• Some stream attributes are labeled with temporal
properties (e.g monotone increasing)
• In aggregation query one grouping attribute must
have a timestampness :
SELECT tb, srcIP, count(*) FROM TCP
GROUP BY time/60 as tb, srcIP
tb is infered to be monotone increasing too
• Similarly stream merge (union) and join also
need to have a set of attributes that have
temporal properties
What if a data streams stalls?
• Consider a query that
merges multiple streams
High-level
Aggregation
• Presence of tuples carries
temporal information,
Low-level
Aggregation
absence doesn't
Stream
Merge
Low-level
Aggregation
– memory overflow at merge
backup
• Similar issues with every operator with
multiple input streams (e.g. joins)
main
Stream Punctuations
• Unblock operators by embedding special marks
in the stream
– indicate the end of the subset of the data
• Stalled stream can notify the parent about the
end of the epoch
Lots of issues
- How these punctuations can be generated and
propagated?
- How do we integrate such a mechanism into highperformance DSMS?
Gigascope Architecture
• DSMS designed for monitoring
high-rate data streams
– pure stream database (no stored
relations or continuous queries)
– pipelined operators that rely on
temporal properties of the stream
App
high
low
high
low
ring buffer
• Two layer architecture for early
data reduction
– fast lightweight data reduction
queries (LFTA)
– high level queries for expensive
processing (HFTA)
NIC
low
Pipelined Operators
• Aggregation:
SELECT tb, srcIP, count(*) FROM TCP
GROUP BY time/60 as tb, srcIP
• Merge operator performs a union of two streams
R and S in a way that preserves timestamps:
MERGE R.tb : S.tb
FROM Inpackets R, Outpackets S
• A join query on streams R and S must contain a
join predicate such as R.tb=S.tb :
SELECT R.sourceIP, R.tb, R.length_sum + S.length_sum
OUTER_JOIN from Inpackets R, Outpackets S
where R.sourceIP = S.destIP and R.tb = S.tb
Gigascope heartbeats
• Initially designed to collect statistics about
operator load
High-level
operators
Low-level
operators
• Special messages propagated using
regular tuple routing mechanism
- performance monitoring
- failure detection
Unblocking operators using heartbeats
• Stream punctuation mechanism
– injects special temporal update tuples into operator’s
output stream
– notifies the operator about the end of subset of a data
(end of the time window on aggregations, stream
merge and joins operate)
• Heartbeats are the perfect vehicles for carrying
the temporal update tuples
– regular propagation through operator DAG
– unblocks all operators on its way in timely manner
Temporal update tuples
• Temporal update tuples generated by operator
have a schema identical to regular tuple
– only values of temporal attributes are initialized (the
rest is ignored)
– future tuples are guaranteed not to violate temporal
properties of the stream
Operator output schema:
(Timebucket, SrcIP, DestIP, PacketCount)
Timebucket is monotone increasing
Temporal tuple
(T, Unitlitialized, Unitlitialized, Unitlitialized)
– guarantees that all future tuples will have value of
Timebucket >= T
Heartbeat generation
• Naïve solution
– operators emit last produced tuple cast as a
temporal tuple
– too conservative to be useful – heartbeats
don’t carry any additional information
• Goal: aggressively generate the values of
temporal attributes
– set attributes to maximum values we can
safely guarantee
Heartbeat generation
• Two approaches
– infer the values of temporal update tuples based on
tuples operator received so far
– infer based on system time
• Inference based on received tuples
– works when operators observe some tuples but they
might be filtered out by selection predicates
– works on every level of query execution
• Inference based on system clock
– works even with completely stalled streams
– only for time based temporal attributes
– potentially dangerous
Inferring temporal attributes
• Every operator maintains state required to
correctly generate temporal update tuples
– last seen values of all temporal attributes referenced
in select clause
– operator specific state
• Attribute values for temporal tuples are
computed using inference rules
SELECT tb, srcIP, count(*) FROM TCP
GROUP BY time/60 as tb, srcIP
If last seen value of time is X, infer that the value of tb for
temporal update tuple should be X/60
Inferring temporal attributes
• What if the stream is completely stalled?
– cannot advance values of temporal attributes
• Inference based on system time
– works in the temporal attribute can be correlated with
system clock (usually the case in network streams)
– unsafe for high level operators (need to reason about
propagation delays)
– need to be careful about the clock skew
• Gigascope uses skew information entered by admin to infer
the values of temporal attributes
Selection & merge operators
• Selection operator (filtering):
– save the last seen values of temporal attributes
regardless of whether tuple passes selection
predicate
• Merge (stream union):
– combines multiple streams while preserving ordering
properties
• Requires buffering of input streams
– maintains minimum timestamp values observed by
every input
• S1_ max, S2_max, … Sn_max
– Uses MIN(S1_ max, S2_max, … Sn_max) to
generate temporal update tuple
Aggregation & sampling operator
• Maintains hash table of aggregates for current
time window
– when the time window advances the table content is
flushed
– uses traffic shaping (slow flush) to avoid flushing
excessive amounts of data
• Slow flush can lead to incorrect generation of
temporal tuples
– if there is some unflushed tuples in hash table,
generate temporal tuples based on unflushed tuples
– otherwise uses last seen values saved by operator
Join operators
• Stream join between R and S relates timestamp
from R to timestamp in S (e.g. R.ts = S.ts)
– critical for guaranteeing bounded memory
– supports inner and,right,and full outer equi-joins
• Maintains maximum values of timestamps
observed on each stream (Rmax and Smax)
– Rmax and Smax can be composite structures storing
max values of all attributes that a part of timestamp
• Infers the values of attributes of temporal update
tuples based on MIN(Rmax, Smax)
Experimental Evaluation
• Two main data feeds
– DAG4.3GE Gigabit Ethernet interfaces
– 100,000 packets/sec (about 400Mbit/sec)
• One low-rate control data feed
– 100Mbit interface
– Good representative of backup interface
• Dual 2.8 GHz P4 server w/ 4 GB of RAM,
FreeBSD 4.8
Merge Query
SELECT tb, protocol, srcIP, destIP, srcPort,
destPort, count(*)
FROM DataProtocol
GROUP BY time/10 as tb, protocol, srcIP,
destIP, srcPort, destPort
High-level
Aggregation
Stream
Merge
Stream
Merge
Low-level
Aggregation
control
Low-level
Aggregation
main1
Low-level
Aggregation
main2
Performance Evaluation
Memory usage (MB)
Query memory usage
500
400
300
200
100
0
0
5
10
15
20
25
Hearbeat interval (sec)
30
35
Outer Join Query
Query flow1:
SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cnt
FROM [main0_and_control].DataProtocol
GROUP BY time/10 as tb,protocol,srcIP,destIP,srcPort,destPort;
Query flow2:
SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cnt
FROM main1.DataProtocol
GROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort;
Query full_flow:
SELECT flow1.tb, flow1.protocol, flow1.srcIP, flow1.destIP, flow1.srcPort,
flow1.destPort, flow1.cnt, flow2.cnt
OUTER_JOIN FROM flow1, flow2
WHERE flow1.srcIP=flow2.srcIP and flow1.destIP=flow2.destIP and
flow1.srcPort=flow2.srcPort and flow1.destPort=flow2.destPort and
flow1.protocol=flow2.protocol and
flow1.tb = flow2.tb
Outer Join Query
Outer Join
High-level
Aggregation
High-level
Aggregation
Stream
Merge
Low-level
Aggregation
backup
Low-level
Aggregation
main1
Low-level
Aggregation
main2
Performance Evaluation
Memory usage (MB)
Query memory usage
600
500
400
300
200
100
0
0
10
20
30
40
50
Hearbeat interval (sec)
CPU load w/ heartbeats enabled – 37.5%
w/ heartbeats disabled – 37.3%
60
70
Other heartbeat applications
• Fault tolerance
– Heartbeats regularly propagate through query DAGs
– Easy detection of failed nodes
• System performance analysis
– Every heartbeat message is timestamped by
receiving node
– Timestamp traces are perfect for analyzing queuing
delays
• Distributed query optimization
– Every heartbeat message carries runtime statistics
(operator selectivities, sampling rates, in/out rates,
memory footprint, etc)
– Collected statistics can be fed to distributed query
optimizer
Conclusions
• Punctuation carrying heartbeats
– effective at unblocking streaming operators on all
levels
– significantly reduce query memory utilization
– capable at working on multiple Gigabit line speeds
• Variety of other uses
– fault tolerance, performance analysis, distributed
query optimization
• Part of production version of Gigascope