Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP.
Download ReportTranscript Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP.
Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication Maarten Wiggers PhD student, University of Twente, NL Co-author and supervisor: Marco Bekooij, NXP Semiconductors Research Gerard Smit, University of Twente Outline Context – Streaming applications – Programming multiprocessor architectures Problem – Problem statement – Related work Variable Rate Dataflow – Chain topology – Arbitrary graph topology Experiment Conclusion [Wiggers – DATE 2008, Wiggers – RTAS 2008] Maarten Wiggers -- University of Twente 2 Outline Context – Streaming applications – Programming multiprocessor architectures Problem – Problem statement – Related work Variable Rate Dataflow – Chain topology – Arbitrary graph topology Experiment Conclusion Maarten Wiggers -- University of Twente 3 Multi-stream car-entertainment system Maarten Wiggers -- University of Twente 4 Application model Jobs process streams of data Jobs are composed of tasks Simultaneously running jobs together form use-cases Jobs often have real-time requirements – Firm (FRT) if deadline misses are highly undesirable (steep quality degradation) use-case use-case FRT video job task input data stream input data stream task task task task output stream to display output stream to speakers FRT audio job Maarten Wiggers -- University of Twente 5 Task graphs Jobs are implemented as task graphs – Tasks communicate fixed-sized containers over fixed-sized FIFO buffers Container is a place-holder for data – Task has random access in container Task only starts an execution on sufficient – Full containers in input buffers – Empty containers in output buffers (back-pressure) • Backpressure robustly prevents buffer overflow Required quanta of containers can be – Known at design-time – Dependent on the actual processed stream Maarten Wiggers -- University of Twente 6 Example job – MP3 playback n=[0,960] MP3 decoding task consumes a variable number of bytes per frame – Every execution a different number of bytes consumed – BR task executes a-periodically – No static-order schedule for BR and MP3 run-time arbitration Throughput constraint : sink needs to execute strictly periodically – All tasks are pushing data towards the sink – For sufficiently large buffers, sink can execute strictly periodically Maarten Wiggers -- University of Twente 7 Example job – H.263 video decoder m=[0,6536] n=[0,2376] Variable length decoder (VLD) consumes a variable number of bytes per frame VLD produces a variable number of blocks per frame DQ and IDCT process blocks Motion compensator assembles a frame from blocks Throughput constraint : sink needs to execute strictly periodically Maarten Wiggers -- University of Twente 8 Application trend Behaviour of applications is increasingly input-data dependent, e.g. – Entropy encoding – Adaptation to channel conditions by digital radio’s Reflected in – – – – Input-data dependent execution times Conditional execution of code Mode changes Input-data dependent execution rates Input-data dependent execution rates requires run-time arbitration Maarten Wiggers -- University of Twente 9 Trend challenge Required properties – Functionally deterministic behaviour: output values completely determined by input values – Deadlock free – Throughput constraint satisfied Research challenge is to define models – – – – For which required properties are decidable Can model applications with input-data dependent behaviour Include effects of run-time arbitration E.g. Variable-Rate Dataflow Maarten Wiggers -- University of Twente 10 Multi-processor architecture template Multi-processor system required for performance and power reasons P DSP $ External SDRAM ctrl Arb mem NI I/O CA NI NI NI Network-on-Chip [Hansson – TODAES 2008] Maarten Wiggers -- University of Twente 11 Compute settings (cyclic) task graph WCET throughput and latency constraint multiprocessor instance Dataflow synthesis scheduler settings and buffer capacities Maarten Wiggers -- University of Twente 12 Compute settings Guarantees on end-to-end throughput requires guarantees on deadlock-freedom Models that provide end-to-end throughput guarantees are not Turing complete – Poses restrictions on • Applications : e.g. inter-task synchronisation behaviour • Architectures : e.g. applicable run-time arbitration schemes Goal: define a model that can guarantee throughput for H.263 Maarten Wiggers -- University of Twente 13 Example Every execution, task B can choose to consume either 2 or 3 Required buffer capacity for deadlock freedom? Maarten Wiggers -- University of Twente 14 Example (cont.) Attempt : assume maximum consumption quantum in every execution Requires buffer capacity of 3 for deadlock freedom Maarten Wiggers -- University of Twente 15 Example (cont.) However, when consuming the minimum quantum Buffer capacity of 3 is insufficient! Maarten Wiggers -- University of Twente 16 Example (cont.) Maarten Wiggers -- University of Twente 17 Example (cont.) Maarten Wiggers -- University of Twente 18 Example (cont.) Deadlock! Maarten Wiggers -- University of Twente 19 Outline Context – Streaming applications – Programming multiprocessor architectures Problem – Problem statement – Related work Variable Rate Dataflow – Chain topology – Arbitrary graph topology Experiment Conclusion Maarten Wiggers -- University of Twente 20 Problem Compute buffer capacities – Guarantee satisfaction of throughput constraint – Tasks can require data-dependent quantum of data and space per execution Maarten Wiggers -- University of Twente 21 Problem Compute buffer capacities – Guarantee satisfaction of throughput constraint – Tasks can require data-dependent quantum of data and space per execution Assumptions – Run-time arbitration on shared resources – Upper and lower bounds on transferred quanta – Upper bound on execution time – Throughput constraint: sink or source that executes strictly periodically Maarten Wiggers -- University of Twente 22 Related work Quasi static-order scheduling – Transfer quanta change only after (sub) graph iterations – For every iteration a static-order schedule computed • Bounded memory is decidable – Models are amenable for code-synthesis – Examples • Heterochronous Dataflow [Girault – TCAD 1999] • Parameterised Dataflow [Bhattacharya – TSP 2001] – Requirement on changes only after graph iterations is a global requirement • Iteration is a graph property • VLD parses stream and decides next quantum locally – Static order scheduling excludes overlapped schedules of graphs with different transfer quanta Maarten Wiggers -- University of Twente 23 Requirements on quanta change Maarten Wiggers -- University of Twente 24 Requirements on quanta change Maarten Wiggers -- University of Twente 25 Requirements on quanta change Quasi static-order scheduling: 2*A and 3*B before change Maarten Wiggers -- University of Twente 26 Requirements on quanta change Variable-Rate Dataflow: can change every firing Maarten Wiggers -- University of Twente 27 Related work Variable token sizes instead of variable number of transferred tokens – [Sen – ASSP 2005] – Experiment will show that this results in larger buffers – Variable consumption quantum by VLD depends on processed stream • BR task is unaware of the semantics of the stream cannot know quantum Maarten Wiggers -- University of Twente 28 Related work Variable token sizes instead of variable number of transferred tokens – [Sen – ASSP 2005] – Experiment will show that this results in larger buffers – Variable consumption quantum by VLD depends on processed stream • BR task is unaware of the semantics of the stream cannot know quantum Maarten Wiggers -- University of Twente 29 Related work Run-time arbitration – Not required to compute schedules at design-time – Only need to show that for all transfer quanta a schedule exists – State-of-the-art • Real-time calculus (group of Thiele at ETH Zurich) • Symta/S (group of Ernst at TU Braunschweig) – These approaches have • Difficulties with cyclic dependencies that influence the temporal behaviour • No means to reason about bounded memory or deadlock properties – E.g. no concept similar to consistency Maarten Wiggers -- University of Twente 30 Outline Context – Streaming applications – Programming multiprocessor architectures Problem – Problem statement – Related work Variable Rate Dataflow – Chain topology – Arbitrary graph topology Experiment Conclusion Maarten Wiggers -- University of Twente 31 Phase 1 Next slides discuss buffer capacity computation in case of chain topology Maarten Wiggers -- University of Twente 32 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs Maarten Wiggers -- University of Twente 33 Variable Rate Dataflow (by example) Implementation = Task graph Model = Dataflow graph Maarten Wiggers -- University of Twente 34 Variable Rate Dataflow Task graph – Tasks – Buffers Tasks – Have a bounded response time – Consume and produce data between start and finish Buffers have a finite and fixed capacity Maarten Wiggers -- University of Twente Dataflow graph – Actors – Queues Actors – Have a fixed response time – Consume tokens atomically at the start – Produce tokens atomically at the finish Queues have infinite depth 35 Execution time response time time-slice period WCET wx WCRT wx WCET wx (T Tx ) T x Maarten Wiggers -- University of Twente 36 Execution time response time time-slice period WCET wx WCRT wx WCET wx (T Tx ) T x Explained in detail in [Wiggers – RTAS 2007] Generalisation that includes all starvation-free schedulers in [Wiggers – SCOPES 2007] Maarten Wiggers -- University of Twente 37 Variable Rate Dataflow Task graph – Tasks – Buffers Tasks – Have a bounded response time – Consume and produce data between start and finish Buffers have a finite and fixed capacity Input specification Maarten Wiggers -- University of Twente Dataflow graph – Actors – Queues Actors – Have a fixed response time – Consume tokens atomically at the start – Produce tokens atomically at the finish Queues have infinite depth Analysis vehicle 38 Approach Model task graph on architecture by Variable-Rate Dataflow graph Let actor vτ model the throughput constraining task Compute sufficient number of tokens to enable actor vτ to execute strictly periodically Computed number of tokens equals required buffer capacity – One-to-one correspondence • Containers in task graph – tokens in dataflow graph • Enabling condition task – firing rule actor • Containers consumed and produced – tokens consumed and produced – Execution times of actors are upper bound on execution times of tasks – Self-timed execution of Variable-Rate Dataflow is temporally monotonic Maarten Wiggers -- University of Twente 39 Monotonic temporal behaviour VRDF actors have sequential firing rules [Lee – 1995] – The number of tokens that is required to be present on inputs is completely determined by already consumed tokens VRDF actors are functional – The produced tokens are a function of the consumed tokens Given self-timed execution. If a token arrives earlier on an input, then – This can only lead to an earlier satisfaction of the firing rule, and – This can only lead to an earlier production of the same tokens E.g. a smaller response time of a VRDF actor cannot lead to any later token arrival time Because of scheduling anomalies this is not true for the task graph! – A smaller response time can lead to later container arrival times Token arrival times conservatively bound container arrival times Maarten Wiggers -- University of Twente 40 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and – A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative – Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente 41 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and – A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative – Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente 42 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for – Maximum consumption quantum Maximum required firing rates of A for – Minimum production quantum Maarten Wiggers -- University of Twente 43 Approach – step 1 Determine on each edge the maximum required transfer and firing rates Sink has to fire strictly periodically Maximum required transfer rate on edge for – Maximum consumption quantum Maximum required firing rates of A for – Minimum production quantum Maarten Wiggers -- University of Twente 44 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and – A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative – Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente 45 Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Maarten Wiggers -- University of Twente 46 Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum larger difference between bounds Maarten Wiggers -- University of Twente 47 Approach – step 2 Given linear bounds on production and consumption times Find difference between bounds that allows existence of schedule for all sequences of quanta Actor starts at t=0 Consumes tokens at start Produces tokens at finish Finish – start = response time Larger quantum larger delay next start time If largest quantum between bounds, then every sequence between bounds Maarten Wiggers -- University of Twente 48 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and – A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative – Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente 49 Approach – step 3 Difference between linear bounds is buffer capacity Buffer capacity is maximum difference between tokens consumed and produced Maarten Wiggers -- University of Twente 50 Approach – computation of suff. tokens Find valuation of token transfer parameters that lead to maximum required token transfer rates On each edge, take maximum required rate as the slope of – A linear upper bound on token production times, and – A linear lower bound on token consumption times Derive offset of linear bounds such that for all sequences of transfer quanta there exists a schedule for which bounds are conservative – Offset is relative to start of first firing of actor Use linear bounds to compute sufficient number of initial tokens This number of tokens is also sufficient for smaller transfer rates Maarten Wiggers -- University of Twente 51 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour – A delay Δ in production time cannot lead to a production that is delayed by more than Δ Maarten Wiggers -- University of Twente 52 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour – A delay Δ in production time cannot lead to a production that is delayed by more than Δ Maarten Wiggers -- University of Twente 53 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour – A delay Δ in production time cannot lead to a production that is delayed by more than Δ Maarten Wiggers -- University of Twente 54 Approach – step 4 Buffer capacities are sufficient for smaller rates Smaller rate by A delay in schedule of A VRDF graphs have linear temporal behaviour – A delay Δ in production time cannot lead to a production that is delayed by more than Δ Maarten Wiggers -- University of Twente 55 Chains of buffers Find the maximum firing rates for all actors Compute buffer capacities for these rates If MP3 consumes less, then starts of BR are postponed By linearity data will still arrive on time at MP3 Computed buffer capacities verified in our dataflow simulator Maarten Wiggers -- University of Twente 56 Phase 1 and 2 Next slides discuss buffer capacity computation in case of chain topology Subsequent slides discuss extension to graphs Maarten Wiggers -- University of Twente 57 Relaxing constraints on topology Graph definition – Consistency of task graph – Consistency is not sufficient for bounded memory Computation of buffer capacities is now a global problem Maarten Wiggers -- University of Twente 58 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks Maarten Wiggers -- University of Twente 59 Parameter communication Communication of parameter values Enables modelling of conditional execution of tasks Sequential firing rules Maarten Wiggers -- University of Twente 60 if-then-else Buffer capacities computed for all combinations of sequences of t and f t=!f (mutual exclusivity) is just a subset Model abstracts from actual relations between parameters Maarten Wiggers -- University of Twente 61 Consistency Transfer quanta on edges determine relative firing rates [Lee – TC 1987] [Lee – TPDS 1991] Maarten Wiggers -- University of Twente 62 Consistency Transfer quanta on edges determine relative firing rates Multiple paths between two actors – Requires check whether their exist firing rates with bounded memory Maarten Wiggers -- University of Twente 63 Consistency Fixed transfer quanta cannot model data-dependent behaviour Allowing for different transfer quantum in every firing Specification of intervals is insufficient Maarten Wiggers -- University of Twente 64 Consistency Specification of intervals is insufficient Maarten Wiggers -- University of Twente 65 Consistency Specification of intervals is insufficient Therefore introduce transfer parameters Maarten Wiggers -- University of Twente 66 Consistency Specification of intervals is insufficient Therefore introduce transfer parameters Variable-Rate Dataflow graph is (strongly) consistent if there exists a non-trivial symbolic solution to the symbolic balance equations Maarten Wiggers -- University of Twente 67 Consistency is insufficient Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable [Buck – 1993] Maarten Wiggers -- University of Twente 68 Consistency is insufficient Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Maarten Wiggers -- University of Twente 69 Consistency is insufficient Boolean dataflow graph Bounded memory depends on control values Bounded memory can be undecidable Maarten Wiggers -- University of Twente 70 Chosen restriction In the VRDF graph we require that repetition rate of actors in this subgraph is one Maarten Wiggers -- University of Twente 71 Chosen restriction In the VRDF graph we require that repetition rate of actors in this subgraph is one Every parameter value should correspond with an iteration of this sub-graph Maarten Wiggers -- University of Twente 72 Chosen restriction Maarten Wiggers -- University of Twente 73 Chosen restriction Maarten Wiggers -- University of Twente 74 Chosen restriction Maarten Wiggers -- University of Twente 75 Chosen restriction OK Maarten Wiggers -- University of Twente 76 Chosen restriction This restriction implies that (strong) consistency is sufficient for bounded memory Maarten Wiggers -- University of Twente 77 Buffer capacities Requirement – Sink determines throughput for all transfer quanta Tasks are pushing data to sink – Different quanta imply different task execution rates – Tasks always need to be able to follow Buffer capacity – Should enable tasks to follow maximum required rate – Variation in quanta requires larger buffers Maarten Wiggers -- University of Twente 78 Buffer capacity (I) Maarten Wiggers -- University of Twente 79 Buffer capacity (II) Maarten Wiggers -- University of Twente 80 Buffer capacity (III) Minimise difference in start times Maarten Wiggers -- University of Twente 81 Buffer capacity (IV) Required buffer capacity Maarten Wiggers -- University of Twente 82 General topology Minimum difference between start times of actors – Not a property of an edge – Determined by all paths β=1 A Maarten Wiggers -- University of Twente β=1 B β=1 C 83 General topology Minimum difference between start times of actors – Not a property of an edge – Determined by all paths β=1 A s=0 Maarten Wiggers -- University of Twente β=1 B s=1 β=1 C s=2 84 General topology Minimum difference between start times of actors – Not a property of an edge – Determined by all paths β=1 2 A s=0 Maarten Wiggers -- University of Twente β=1 B s=1 β=1 C s=2 85 Buffer capacity with β=1 Required buffer capacity Maarten Wiggers -- University of Twente 86 Buffer capacity with β=2 Required buffer capacity Maarten Wiggers -- University of Twente 87 General topology Minimum difference between start times of actors – Not a property of an edge – Determined by all paths Network flow problem – Constraints • minimum differences per edge – Objective • start times as close as possible together Maarten Wiggers -- University of Twente 88 Outline Context – Streaming applications – Programming multiprocessor architectures Problem – Problem statement – Related work Variable Rate Dataflow – Chain topology – Arbitrary graph topology Experiment Conclusion Maarten Wiggers -- University of Twente 89 H.263 decoder m is number of bytes read per picture n is number of blocks per picture Motion compensation needs to know how many blocks to read to assemble a picture Maarten Wiggers -- University of Twente 90 Alternative implementation Maarten Wiggers -- University of Twente 91 Buffer capacity Our implementation – Buffer capacity is in blocks Alternative implementation – buffer capacity is in frames Maarten Wiggers -- University of Twente 92 Conclusion Trend : streaming applications are increasingly dynamic – Include tasks that have data-dependent execution rates – Implies run-time arbitration Variable Rate Dataflow – Production and consumption quanta can change in every execution – Can include effects of run-time arbitration – Efficient checks on execution in bounded memory Compute buffer capacities that guarantee satisfaction of a throughput constraint – Temporal monotonicity : token arrival times are conservative container arrival times – Temporal linearity : Δ later token arrival time cannot result in any token arrival time that is delayed by more than Δ Maarten Wiggers -- University of Twente 93 Questions? [email protected] Maarten Wiggers -- University of Twente 94 References [Bhattacharya – TSP 2001] B. Bhattacharya and S.S. Bhattacharyya. Parameterized Dataflow Modeling for DSP Systems. IEEE Transactions on Signal Processing. October 2001 [Buck – 1993] J. Buck. Scheduling Dynamic Dataflow Graphs with Bounded Memory using the Token Flow Model. PhD thesis, University of Berkeley. 1993 [Girault – TCAD 1999] A. Girault, B. Lee and E.A. Lee. Hierarchical Finite State Machines with Multiple Concurrency Models. IEEE Transactions on CAD. June 1999 [Hansson – TODAES 2008] A. Hansson, K.G.W. Goossens, M.J.G. Bekooij and J. Huisken. CoMPSoC: A Composable and Predictable Multi-Processor System on Chip Template. ACM Transactions on Design Automation of Electronic Systems. To appear [Lee – TC 1987] E.A. Lee and D. Messerschmitt. Static Scheduling of Synchronous Dataflow Programs for Digital Signal Processing. IEEE Transactions on Computers. January 1987 [Lee – TPDS 1991] E.A. Lee. Consistency in Dataflow Graphs. IEEE Transactions on Par. and Distr. Systems. 1991 [Lee – 1995] E.A. Lee and T. Parks. Dataflow Process Networks. Proc. of the IEEE. May 1995 [Sen – ASSP 2005] M. Sen, S.S. Bhattacharyya, T. Lv, and W. Wolf. Modeling Image Processing Systems with Homogeneous Parameterized Dataflow Graphs. In Proc. ASSP. March 2005 Maarten Wiggers -- University of Twente 95 References [Wiggers – RTAS 2007] M.H. Wiggers, M.J.G. Bekooij, P.G. Jansen and G.J.M. Smit. Efficient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure. In Proc. RTAS. April 2007 [Wiggers – SCOPES 2007] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Modelling Run-Time Arbitration by Latency-Rate Servers in Dataflow Graphs. In Proc. SCOPES. April 2007 [Wiggers – DATE 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Computation of Buffer Capacities for Throughput Constrained and Data-Dependent Inter-Task Communication. In Proc. DATE. April 2008 [Wiggers – RTAS 2008] M.H. Wiggers, M.J.G. Bekooij and G.J.M. Smit. Buffer Capacity Computation for Throughput Constrained Streaming Applications with Data-Dependent Inter-Task Communication. In Proc. RTAS. April 2008 Maarten Wiggers -- University of Twente 96