Transcript Bez nadpisu
Full Packet Monitoring Sensors: Hardware and Software Challenges Vladimír Smotlacha CESNET High-speed network monitoring Scalability limited by: • throughput of local bus - flow at 10 Gb/s exceeds throughput of PCI-X 64/133 • CPU performance • data handling in RAM • disk systems - amount of stored data - sustained write speed Flow based monitoring Motivation: describe dynamics of link traffic • Elementary flow specified by - source and host IP address - transport protocol - source and destination port (if applicable) - start and end time (Timeouts ! ) • Flow data aggregation - end point - host, network, AS - time granularity • Example: NetFlow - implemented in routers - database of open flows - statistics of each flow Packet based monitoring Motivation: describe dynamics of selected connections • Flow specification - all packets that match arbitrary criteria (e.g., “all UDP and TCP packets sent to port 456”) - flow is dealt as generalized socket - filter is expressed in a special language (e.g., BPF, FPL, C library) • Example: pcap - based on BPF - used in tcpdump, snort, ntop, ngrep, ethereal, ... - intuitive way of writing filters Software optimization • Performance - effective filters - CPU instructions/packets - optimal manipulation with packets - memory mapping - parallelism in packet processing examples: • FFPF - new extensible language - intensive computation pushed into kernel - support of network processors • nCap - handle full 1 Gbps data flow Monitoring API Basic abstraction: network flow - create & terminate the flow - read packets from the flow - apply functions to the flow - read results of functions MAPI functions - filtering ( BPF filters) - logging - accounting - sampling - cooking (IP defragmentation & TCP reassembly) - string search Hardware-software codesign Putting functionality down to the hardware • FFPF - support of network processors • MAPI - utilizes available functionality - DAG cards - SCAMPI cards Intelligent hardware adapters Goals - reduce the amount of data passing local bus - reduce CPU load and memory request - do complex classification of packets - move computational intensive algorithms to adapter - introduce new parallel algorithms - accurate timestamps Adapters functionality • Timestamping - unique accurate timestamp to each packet - clock synchronization required • Header based filtering - rule to specify passing through packets or • Header based classification - one rule per each class - disjunctive rules - packets belongs to one class - non-disjunctive rules - packet can belong to more classes Adapters functionality (cont) • Packet shrinking - cut unnecessary payload to reduce data • Sampling - reduction of packet number - deterministic x probabilistic • Calculation of statistics - based on packet length x time interval between packets • String searching - packets containing string pass the unit SCAMPI adapter Packet classification CAM - matching a (sub)field with a constant value (e.g., IP address, network address, protocol) Processing unit - arithmetic comparison with a constant value (e.g., port, interval of port values) Whenever possible, comparison is done in CAM Pair (C,P) • C - CAM row (with “don’t care” bits) • P - sequence of comparison (conditional jump) instructions Semantics • matching row C of CAM points to an instruction sequence P • instruction result: • assign packet to a class & stop (packet classified) • stop without assigning (not classified) • continue with next instruction Filter language - FL • Primitive operation: comparison of an arbitrary header field with a constant •Filter specification: expression consisting of primitive operations, ‘and’, ‘or’, ‘not’ and brackets •Implementation • expression is transformed to DNF example: „A and (C or D) and (E or F) or G and H“ is equal to „ACE or ACF or ADE or ADF or GH“ • each primitive operation or a conjunction of them is translated to max. one pair (C, P) • FL expression in DNF is translated to a number of pairs (C, P) Searching of string • CAM with 272 bits wide row • Algorithm implemented in hardware: - 16 byte long string stored in 16 rows CAM, shifted by 0,1,2,... 15 bytes - comparison with 32 bytes of payload in one CAM - in next cycle, payload is shifted for 16 bytes • Implementation in Scampi - search of more then 100 strings simultaneously - designed throughput 3 Gb/s • Issues - finds only first occurrence of any string - in case of longer strings lot of false positives -> additional software verification Open problems • Searched string occurs on border of two packets - solution: flow cooking in adapter • Dealing with non-disjunctive classes - solution: evaluation of all intersections -> possibly exponential number of new pairs (C, P)