FPGA Acceleration of Information Management Services Richard Linderman Mark Linderman Air Force Research Laboratory Information Directorate Chun-Shin Lin Electrical and Computer Engr. Univ.
Download ReportTranscript FPGA Acceleration of Information Management Services Richard Linderman Mark Linderman Air Force Research Laboratory Information Directorate Chun-Shin Lin Electrical and Computer Engr. Univ.
FPGA Acceleration of Information Management Services Richard Linderman Mark Linderman Air Force Research Laboratory Information Directorate Chun-Shin Lin Electrical and Computer Engr. Univ. of Missouri-Columbia 2005 MAPLD/203 1 Linderman Introduction • Information Management capabilities are built upon core services such as publish and subscribe • Enhanced Publish and subscribe (Pub-Sub) services allow subscribers to specify predicates that filter out undesired publications • The goal of this research is to accelerate XML predicate evaluations using FPGAs • Recent advances include the incorporation of 2-bit logic encodings to enhance partial evaluations and incremental design techniques to handle volatile predicates 2005 MAPLD/203 2 Linderman Pub-Sub Brokering Problem • Information regarding a publication is described using an XML metadata document. • What the subscribers want are defined using XPATH predicates. • The pub-sub brokering system evaluates predicates against the XML document to find matches. 2005 MAPLD/203 3 Linderman Metadata in XML: an example <metadata> <baseObject> <InfoObjectType> <Name>mil.af.rl.mti.report</Name> <MajorVersion>1</MajorVersion> <MinorVersion>0</MinorVersion> </InfoObjectType> <PayloadFormat>text/plain</PayloadFormat> <TemporalExtent> <Instantaneous>2003-08-10T14:20:00</Instantaneous> </TemporalExtent> <PublicationTime/> <InfoObjectID/> <PublisherID/> <PlatformID/> </baseObject> <IntelReportObject> <OriginatorID>VMAQ1</OriginatorID> <DetectionDateTime>20030728T163105Z</DetectionDateTime> <Latitude>42.538888888888884</Latitude> <Longitude>19.0</Longitude> <MTIObject> <TrackID>000001</TrackID> </MTIObject> </IntelReportObject> </metadata> 2005 MAPLD/203 4 Linderman Examples of Predicates • (((/metadata/IntelReportObject/Latitude>60) or (/metadata/IntelReportObject/Longitude <60)) and (/metadata/IntelReportObject/OriginatorID ='bravo')) • ((/metadata/IntelReportObject/MTIObject/TrackID>17) and (/metadata/IntelReportObject/OriginatorID !='alpha') and (/metadata/IntelReportObject/Latitude>45) and (/metadata/IntelReportObject/Longitude >45)) 2005 MAPLD/203 5 Linderman FPGAs for Acceleration • Use an FPGAs to implement a finite state machine to parse the metadata document. The XML document is written into the block RAM of the FPGA from a microprocessor through DMA . • Predicates are evaluated in parallel using the data generated by the parser. (Combinational logic). 2005 MAPLD/203 6 Linderman The System Micro-processor (a node on HHPC) 64-bit bus Input FIFO & XML Parser Predicates Evaluator Output FIFO FPGA board 2005 MAPLD/203 7 Linderman Software on PC List of Leaves Schema List of Leaves Generator Hash Table Perfect Hash Parameters & Table Generator Top-level VHDL Predicate VHDL Predicates Parser VHDL Modifier Generator pred.vhd constant.vhd parser.vhd Xilinx ISE Parsed results to HHPC pe.hex or pe.x86 A one-step tool has been developed to generate the “x86” file automatically for configuring an FPGA. 2005 MAPLD/203 8 Linderman Handling of Data Types • • • Character strings: converted to 16-bit data by hashing Numbers: Supports up to 3 decimal digits (sufficient for standard lat/lon representations) Date/time: converted into 32-bit representation 2005 MAPLD/203 9 Linderman Experimental Evaluation To evaluate how much time is saved using FPGA, we compared predicate evaluation time on two configurations: 1. Xeon® processor (software only) • Xeon® computes all predicates 2. Xeon® with the FPGA coprocessor (H/W + S/W) • Xeon® only evaluates residual (maybe true/maybe false) predicates All HHPC software written in C 2005 MAPLD/203 11 Linderman Timing Results Character string (leaves) dominant case: (75 MHz Clock) For 10 publications and 30 predicates: -----------------------------------------Average DMA transaction: 17.600000 usec Average check time: 16.200000 usec Average check all time: 406.600000 usec -----------------------------------------Numerical data (leaves) dominant case: (50 MHz Clock) For 20 publications and 100 predicates: -----------------------------------------Average DMA transaction: 20.000000 usec Average check time: 10.150000 usec Average check all time: 2614.450000 usec ------------------------------------------ 2005 MAPLD/203 12 Linderman The Need of Incremental Design • When the set of predicates becomes big, the synthesis time can become quite long (hours for thousands predicates). • If a few predicates are changed, re-synthesis is time consuming. • Partitioning the set into subsets allows the resynthesis for only altered subset(s) and thus saves time. 2005 MAPLD/203 14 Linderman Stable and Volatile Predicate Sets Stable Set • Includes stable predicates • Size is BIG Stable Set • Synthesis time could be LONG • Re-synthesis is not required Volatile Set Volatile Set • Includes volatile predicates • Size is small • Re-synthesis takes short time. 2005 MAPLD/203 15 Linderman Experimental Results Using Stable and Volatile Sets: 1. For 400 stable predicates plus 6 in Volatile Set Nonincremental Incremental Synthesis 1197 sec 7 sec Place and route 480 sec (with area grouping) 183 sec (with area grouping) 2. For 1000 stable predicates plus 6 in Volatile Set Nonincremental Incremental Synthesis 7240 sec 7 sec Place and route 752 sec (with area grouping) 278 sec (with area grouping) * Saving on synthesis is major. * Synthesis time increases exponentially with set size. 2005 MAPLD/203 16 Linderman Equal Size Small Predicate Sets • All sets are equal size Stable Set : : Stable Set Stable Set Volatile set Volatile set Volatile set 2005 MAPLD/203 • Any set can be re-synthesized inexpensively. For better efficiency, it is suggested to put volatile predicates into one or a few subsets although any set is allowed to be re-synthesized. Advantage • Any set can be re-synthesized inexpensively. Disadvantage • More hardware used 17 Linderman Experimental Results Using Small Sets: for 1000 predicates One set 5 subsets 10 subsets (1000) (200 each) (100 each) ________________________________________________________ - Total synthesis time 7240s 1644s 976s - Place and route 286s 299s 369s - Reconfigure when one subset has been changed 7526s < 609s <455s - Hardware (FFs) 3698 3698 3698 - Hardware (LUTs) 10205 12391 14738 2005 MAPLD/203 18 Linderman Experimental Results (cont.) Observations/Explanation 1. Synthesis time decreases significantly when smaller subsets are used. Reason: minimization is local. 2. Place and Route time increases when smaller subsets are used Reason: the number of LUTs increases; more hardware to handle 3. Use the same amount of flip-flops (FFs) Reason: predicate hardware is completely combinational. 4. The number of LUTs increases when the subset is smaller Reason: logic function minimization is local and thus less efficient/complete. 2005 MAPLD/203 19 Linderman Conclusion • Hardware + Software evaluation times substantially better than software-only implementations • Help from incremental synthesis is significant and that from incremental place and route seems limited. • To make the incremental synthesis more efficient, partitioning predicates into smaller subsets helps much. A drawback is that the hardware usage will increase. • For the size of about 1000 predicates, reconfiguration time can be reduced from 7526 sec (over 2 hours) to several minutes (e. g. 455 seconds) depending on set partition. 2005 MAPLD/203 20 Linderman