FPGA Acceleration of Information Management Services Richard Linderman Mark Linderman Air Force Research Laboratory Information Directorate Chun-Shin Lin Electrical and Computer Engr. Univ.

Download Report

Transcript FPGA Acceleration of Information Management Services Richard Linderman Mark Linderman Air Force Research Laboratory Information Directorate Chun-Shin Lin Electrical and Computer Engr. Univ.

FPGA Acceleration of Information
Management Services
Richard Linderman
Mark Linderman
Air Force Research Laboratory
Information Directorate
Chun-Shin Lin
Electrical and Computer Engr.
Univ. of Missouri-Columbia
2005 MAPLD/203
1
Linderman
Introduction
• Information Management capabilities are built
upon core services such as publish and
subscribe
• Enhanced Publish and subscribe (Pub-Sub)
services allow subscribers to specify
predicates that filter out undesired publications
• The goal of this research is to accelerate XML
predicate evaluations using FPGAs
• Recent advances include the incorporation of
2-bit logic encodings to enhance partial
evaluations and incremental design techniques
to handle volatile predicates
2005 MAPLD/203
2
Linderman
Pub-Sub Brokering Problem
• Information regarding a publication is
described using an XML metadata
document.
• What the subscribers want are defined
using XPATH predicates.
• The pub-sub brokering system evaluates
predicates against the XML document to
find matches.
2005 MAPLD/203
3
Linderman
Metadata in XML: an example
<metadata>
<baseObject>
<InfoObjectType>
<Name>mil.af.rl.mti.report</Name>
<MajorVersion>1</MajorVersion>
<MinorVersion>0</MinorVersion>
</InfoObjectType>
<PayloadFormat>text/plain</PayloadFormat>
<TemporalExtent>
<Instantaneous>2003-08-10T14:20:00</Instantaneous>
</TemporalExtent>
<PublicationTime/>
<InfoObjectID/>
<PublisherID/>
<PlatformID/>
</baseObject>
<IntelReportObject>
<OriginatorID>VMAQ1</OriginatorID>
<DetectionDateTime>20030728T163105Z</DetectionDateTime>
<Latitude>42.538888888888884</Latitude>
<Longitude>19.0</Longitude>
<MTIObject>
<TrackID>000001</TrackID>
</MTIObject>
</IntelReportObject>
</metadata>
2005 MAPLD/203
4
Linderman
Examples of Predicates
• (((/metadata/IntelReportObject/Latitude>60)
or (/metadata/IntelReportObject/Longitude <60))
and (/metadata/IntelReportObject/OriginatorID ='bravo'))
• ((/metadata/IntelReportObject/MTIObject/TrackID>17)
and (/metadata/IntelReportObject/OriginatorID !='alpha')
and (/metadata/IntelReportObject/Latitude>45)
and (/metadata/IntelReportObject/Longitude >45))
2005 MAPLD/203
5
Linderman
FPGAs for Acceleration
• Use an FPGAs to implement a finite
state machine to parse the metadata
document. The XML document is
written into the block RAM of the FPGA
from a microprocessor through DMA .
• Predicates are evaluated in parallel
using the data generated by the parser.
(Combinational logic).
2005 MAPLD/203
6
Linderman
The System
Micro-processor
(a node on HHPC)
64-bit bus
Input FIFO
&
XML
Parser
Predicates
Evaluator
Output FIFO
FPGA board
2005 MAPLD/203
7
Linderman
Software on PC
List of Leaves
Schema
List of Leaves
Generator
Hash Table
Perfect Hash
Parameters & Table
Generator
Top-level
VHDL
Predicate VHDL
Predicates
Parser VHDL
Modifier
Generator
pred.vhd
constant.vhd parser.vhd
Xilinx ISE
Parsed results to
HHPC
pe.hex or pe.x86
A one-step tool has been developed to generate the “x86” file automatically for configuring an FPGA.
2005 MAPLD/203
8
Linderman
Handling of Data Types
•
•
•
Character strings: converted to 16-bit data by
hashing
Numbers: Supports up to 3 decimal digits
(sufficient for standard lat/lon representations)
Date/time: converted into 32-bit
representation
2005 MAPLD/203
9
Linderman
Experimental Evaluation
To evaluate how much time is saved using FPGA, we
compared predicate evaluation time on two
configurations:
1. Xeon® processor (software only)
• Xeon® computes all predicates
2. Xeon® with the FPGA coprocessor (H/W + S/W)
• Xeon® only evaluates residual (maybe
true/maybe false) predicates
All HHPC software written in C
2005 MAPLD/203
11
Linderman
Timing Results
Character string (leaves) dominant case: (75 MHz Clock)
For 10 publications and 30 predicates:
-----------------------------------------Average DMA transaction: 17.600000 usec
Average check time: 16.200000 usec
Average check all time: 406.600000 usec
-----------------------------------------Numerical data (leaves) dominant case: (50 MHz Clock)
For 20 publications and 100 predicates:
-----------------------------------------Average DMA transaction: 20.000000 usec
Average check time: 10.150000 usec
Average check all time: 2614.450000 usec
------------------------------------------
2005 MAPLD/203
12
Linderman
The Need of Incremental Design
• When the set of predicates becomes big, the
synthesis time can become quite long (hours for
thousands predicates).
• If a few predicates are changed, re-synthesis is
time consuming.
• Partitioning the set into subsets allows the resynthesis for only altered subset(s) and thus saves
time.
2005 MAPLD/203
14
Linderman
Stable and Volatile Predicate Sets
Stable Set
• Includes stable predicates
• Size is BIG
Stable Set
• Synthesis time could be LONG
• Re-synthesis is not required
Volatile Set
Volatile Set
• Includes volatile predicates
• Size is small
• Re-synthesis takes short time.
2005 MAPLD/203
15
Linderman
Experimental Results
Using Stable and Volatile Sets:
1. For 400 stable predicates plus 6 in Volatile Set
Nonincremental
Incremental
Synthesis
1197 sec
7 sec
Place and route
480 sec
(with area grouping)
183 sec
(with area grouping)
2. For 1000 stable predicates plus 6 in Volatile Set
Nonincremental
Incremental
Synthesis
7240 sec
7 sec
Place and route
752 sec
(with area grouping)
278 sec
(with area grouping)
* Saving on synthesis is major.
* Synthesis time increases exponentially with set size.
2005 MAPLD/203
16
Linderman
Equal Size Small Predicate Sets
• All sets are equal size
Stable Set
:
:
Stable Set
Stable Set
Volatile set
Volatile set
Volatile set
2005 MAPLD/203
• Any set can be re-synthesized
inexpensively.
For better efficiency, it is suggested to
put volatile predicates into one or a few
subsets although any set is allowed to
be re-synthesized.
Advantage
• Any set can be re-synthesized
inexpensively.
Disadvantage
• More hardware used
17
Linderman
Experimental Results
Using Small Sets: for 1000 predicates
One set
5 subsets
10 subsets
(1000)
(200 each)
(100 each)
________________________________________________________
- Total synthesis time
7240s
1644s
976s
- Place and route
286s
299s
369s
- Reconfigure when one
subset has been changed
7526s
< 609s
<455s
- Hardware (FFs)
3698
3698
3698
- Hardware (LUTs)
10205
12391
14738
2005 MAPLD/203
18
Linderman
Experimental Results (cont.)
Observations/Explanation
1. Synthesis time decreases significantly when smaller subsets are used.
Reason: minimization is local.
2. Place and Route time increases when smaller subsets are used
Reason: the number of LUTs increases; more hardware to handle
3. Use the same amount of flip-flops (FFs)
Reason: predicate hardware is completely combinational.
4. The number of LUTs increases when the subset is smaller
Reason: logic function minimization is local and thus less
efficient/complete.
2005 MAPLD/203
19
Linderman
Conclusion
• Hardware + Software evaluation times substantially better
than software-only implementations
• Help from incremental synthesis is significant and that from
incremental place and route seems limited.
• To make the incremental synthesis more efficient,
partitioning predicates into smaller subsets helps much. A
drawback is that the hardware usage will increase.
• For the size of about 1000 predicates, reconfiguration time
can be reduced from 7526 sec (over 2 hours) to several
minutes (e. g. 455 seconds) depending on set partition.
2005 MAPLD/203
20
Linderman