Architecture for a Hardware Based, TCP/IP Content Scanning

Download Report

Transcript Architecture for a Hardware Based, TCP/IP Content Scanning

TCP STREAM PROCESSING
AT GIGABIT LINE RATES
TCP
Processor
HARDWARE CIRCUIT
David Vincent Schuehler
Dissertation Defense
Washington University in St. Louis
Department of Computer Science and Engineering
November 3, 2004
Outline
• Motivation and Background
• Architecture and Related Work
• Live Internet Traffic Processing
• Conclusion and Future Work
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
2
Motivation
• Inspect data moving through networks
• Enable application level data processing
• Secure networks
– Safeguard confidential data
• Detect and prevent intrusions
– Worms, viruses, spam, espionage
• Mitigate denial of service attacks
• Characterize and analyze network traffic
• Operate at multi-gigabit data rates
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
3
Transmission Control Protocol
• 86% to 90% of all Internet traffic uses TCP
– Web, email, file transfer, remote login, secure communications
• Provides virtual bit pipe between two end systems
–
–
–
–
Retransmission services
Data reordering services
Flow control services
Congestion avoidance services
Network Data Packets Moving Through Network
Source
Payload
Header
Data Payload
TCP Hdr
Destination
IP Hdr
Layout of Single Packet
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
4
Internet
Hand held
computer
G
Government Agency
Cell phone
G
Cellular tower
C
C
Satellite
uplink
C
G
C
Computer Computer
Laptop
C
C
C
G
C
Computer Computer
G
C
G
UNIVERSITY
University
C
Corporation
G
G
G
Gateway router
Internet
Service
Provider
C
Core router
TCP
Processor
HARDWARE CIRCUIT
Municipality
David V. Schuehler
5
Internet
Hand held
computer
G
Government Agency
Cell phone
G
Cellular tower
C
C
Satellite
uplink
C
G
C
Computer Computer
Laptop
C
C
C
G
G
C
Computer Computer
C
G
C
UNIVERSITY
University
SPAM
INTRUSION
Corporation
G
G
G
Gateway router
Internet
Service
Provider
C
Core router
Municipality
VIRUS
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
6
Internet
Hand held
computer
G
Government Agency
Cell phone
G
Cellular tower
C
C
Satellite
uplink
C
G
C
Computer Computer
Laptop
C
C
C
G
G
C
Computer Computer
C
G
C
UNIVERSITY
University
SPAM
INTRUSION
Corporation
G
G
G
Gateway router
Internet
Service
Provider
C
Core router
Municipality
VIRUS
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
7
Cost of Internet Attacks
Economic Impact
Worldwide (mi2g’04)
Representative
Attacks (cost)
Year
2003
$236 Billion
Sobig.F ($2B)
Blaster ($1.3B)
Slammer ($1.2B)
2002
$118 Billion
KLEZ ($9B)
Bugbear ($950M)
2001
$36 Billion
Nimbda ($635M)
Code Red ($2.62B)
SirCam ($1.15B)
2000
$26 Billion
Love Bug ($8.75B)
1999
$20 Billion
Melissa ($1.10B)
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
8
Economic Damage Estimate
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
9
Design Requirements
• Architecture that is fast
– Hardware-based system
– High-performance (multi-gigabit networks)
– Per-flow context storage & retrieval
• Architecture that is scalable
– Performance improves with advances in technology
• In-line traffic processing model
• Implementation using reasonable resources
– FPGA implementation can be done in research lab
• Framework that is flexible
– Integrates with multiple applications
– Multi-device coordination of TCP stream processing
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
10
Outline
•
•
•
•
Motivation and Background
Architecture and Related Work
Live Internet Traffic Processing
Conclusion and Future Work
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
11
TCP-Processor Architecture
Data Processing Circuit
TCP Processing Architecture
Input
Buffer
TCP
Processing
Engine
Packet
Routing
State Store Manager
Egress
Stats
Off-Chip Memory
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
12
TCP Processing Engine
Frame FIFO
Checksum
Engine
Control
&
State
FIFO
TCP State
Processing
Flow Hash
Computation
Output State Machine
Input State Machine
TCP Processing Engine
State Store Manager
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
13
Challenges and Design Choices
• Performance
– Operate at multi-gigabit data rates
– Hardware-based design exploiting pipelining and parallelism
• Flow classification
– Open addressing hash with limited bucket sizes
• Context storage and retrieval
– Requires memory read and write for each packet
– 64-byte per-flow context - use burst read/write operations
• Reassembly of out-of-order packets
– Multiple processing modes (guaranteed and passive)
• TCP processing
– Flow monitoring instead of flow termination
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
14
Link Speeds and Packet Rates
Link Type
Data rate
40 byte
pkts/sec
64 byte
pkts/sec
500 byte
pkts/sec
1500 byte
pkts/sec
OC-3
155 Mbps
.48 M
.3 M
38 K
12 K
OC-12
622 Mbps
1.9 M
1.2 M
.16 M
52 K
GigE
1.0 Gbps
3.1 M
2.0 M
.25 M
83 K
OC-48
2.5 Gbps
7.8 M
4.8 M
.63 M
.21 M
OC-192
10 GigE
10 Gbps
31 M
20 M
2.5 M
.83 M
OC-768
40 Gbps
125 M
78 M
10 M
3.3 M
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
15
Systems with TCP Processors
• Load balancing systems
SYN
– Content (cookie) based request routing
– Delayed binding technique
– Limited to scanning start of flow
• TCP offload engines
E
N
D
U
S
E
R
– Move TCP protocol processing to NIC
– Targeting Gigabit NIC market
– Intel, NEC, Adaptec, Lucent, and others
• SSL Accelerators
– Offload encryption/decryption
– Protocol translation
• Intrusion Detection Systems
SYN ACK
ACK
L
O
A
D
Request
Response
B
A
L
A
N
C
E
R
Encrypted
SYN
SYN ACK
ACK
E
N
D
U
S
E
R
W
E
B
Request
Response
SYN
SYN ACK
S
E
R
V
E
R
ACK
Request
Response
Not Encrypted
S
S
L
A
C
C
E
L
E
R
A
T
O
R
SYN
SYN ACK
ACK
Request
Response
W
E
B
S
E
R
V
E
R
– Traffic Rates < 1Gbps
– Perform content scanning and some stream reassembly
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
16
Related Work in TCP Processing
• Software-based TCP processing
–
–
–
–
–
–
Ethereal, tcpdump, etc – require post processing
Snort w/TCP option – larger virtual packets
Cluster-based online monitoring system (Mao: WIDM’01)
Bro – rule based processing (Paxson: Computer Networks’99)
STAT/STATL – state based processing (Vingna: DISCEX’00)
Intel – Xeon as packet processor (Regnier: HotI’03)
• Hardware-based TCP processing
–
–
–
–
–
Georgia Tech – 1 flow/circuit (Necker: FCCM’02)
University of Oslo – 1 flow/ circuit (Li: FPL’03)
Indiana University and Imperial College – Netflow statistics
University of Tokyo – multi-flow stream scanning (Sugawara: FPL’04)
Intel TCP processor – 8k connections, 9Gbps (Xu: HotChips’03)
• Network processors
– Intel IXP 1200, 2400, 2800, 2850
– Motorola PowerQUICC
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
17
Data Rate
X
Context Records
Taxonomy of Packet Processors
Store little or no state
TCP-Processor
Software based systems
Intel projects
Network
Processors
Experimental
TCP Processor
TCP-Processor
Snort w/TCP option
TCP
Termination
BRO/
STATL
Packet
Capture
Other FPGA
TCP Processors
Load Balancer
SSL Accelerator
IP Lookup
Packet Forwarding
Software
Hardware
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
18
Multi-Device Coordination
• Encodes interface signals
• Regenerates waveforms on separate device
• Provides extensible format & self describing structure
Device 1
TCP
Processing
Circuit
Encode
Device 2
Device 3
Data
Processing
Circuit
1
Decode
TCP
Processor
HARDWARE CIRCUIT
Data
Processing
Circuit
2
Transport
David V. Schuehler
19
Place & Route Results
•
•
•
•
Including Protocol Wrappers & Encoder/Decoder
Target Xilinx Virtex XCV2000E-8
FPX Platform
Number of BLOCKRAMs
– 95 out of 160
(59%)
• Number of SLICEs
– 7279 out of 19200
(37%)
• Maximum clock frequency: 85.565MHz
• Maximum data throughput: 2.7 Gbps
• Maximum packets per second: 2.9M packets/sec
– Min 29 clock cycles per packet (345 ns)
– Throughput limited by memory latency
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
20
Content Scanning
TCP circuit
Scan circuit
Xilinx XCV2000E FPGA
addr
TCP
Encode
TCP
Decode
D[64]
TCP
Decode
TCP
Encode
Scan Circuit
addr
CTL
Proc
TCP-Processor
PC100
SDRAM
update
state
D[64]
query
state
PC100
SDRAM
Xilinx XCV2000E FPGA
IPWrapper
addr
ZBT
SRAM
D[36]
IPWrapper
Frame Wrapper
Frame Wrapper
Cell Wrapper
Cell Wrapper
State
Store
Ctl Cell
Processor
addr
D[36]
ZBT
SRAM
Control Interface
Network Traffic
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
21
Outline
•
•
•
•
Motivation and Background
Architecture and Related Work
Live Internet Traffic Processing
Conclusion and Future Work
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
22
Washington University Network
• 384 Mbps total Internet bandwidth
– 300 Mbps Internet
– 84 Mbps Internet2
• Approx 19,000 active end systems
• Approx 10,000 students
• Traffic analyzed for 5 week period
– Aug 20th to Sep 24th
– Over 1000 charts generated
• Selected highlights presented
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
23
Washington University Network
Internet
/
Internet
2
To TCP Processor
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
24
Live Internet Traffic Analysis
WUGS-20
WUGS-20
External
Stats
Monitor
Port 5
Port 4
Empty
PortTracker
Circuit
Port 6
Port 3
Scan
Circuit
G-Link
Switch Ctrl
Port 7
Port 2
Unused
TCP
Processor
Port 0
Port 1
GigE
Line Card
GigE
Line Card
TCP
Processor
HARDWARE CIRCUIT
Standalone
FPX-in-a-Box
WashU
Internet
traffic
David V. Schuehler
25
Data Collection
Real-time processing
MRTG queries the
SNMP agent and
generates traffic
charts
StatsCollector
SNMP
Agent
Multi-Router
Traffic Grapher
A Perl script reads
raw data files and
calls gnuplot to
generate charts
gnuplot
Pkts
A SNMP agent
publishes the
statistics in a
standard format
Pkts
Statistics are sent
to StatsCollector
application from
hardware circuits
StatsCollector
spools raw data to
disk files and
retransmits stats
Time
Time
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
26
Current Live Traffic
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
27
Collected Statistics
TCP Statistics
Configuration Information
SSM New Connections
SSM End Connections
SSM Reused Connections
SSM Active Connections
INB Input Words
INB Input Packets
INB Dropped Packets
INB Output Packets
ENG TCP Packets
ENG SYN Packets
ENG FIN Packets
ENG RST Packets
ENG Zero Length Packets
ENG Retransmitted Packets
ENG Out-of-Sequence Pkts
ENG Bad Checksums
RTR TCP Data Bytes
RTR Client Packets
RTR Bypass Packets
EGR Client Packets In
EGR Bypass Packets In
EGR TCP Checksum Update
EGR Packets Out
Protocol Statistics
Cells In
Cells Dropped
Cells Bypass
Cells Out
Frame Words In
Frame Packets In
IP Packets Dropped
IP Packet Fragments
IP Packets In
IP Words In
IP Packets Bypass
IP Words Bypass
IP Bad Checksum
TCP
Processor
HARDWARE CIRCUIT
Port Statistics
FTP
SSH
Telnet
SMTP
TIM
Nameserv
Whois
Login
DNS
TFTP
Gopher
Finger
HTTP
POP
SFTP
SQL
NNTP
NetBIOS
SNMP
BGP
GACP
IRC
DLS
LDAP
HTTPS
DHCP
Lower
Upper
Scan Statistics
String 1
String 2
String 3
String 4
David V. Schuehler
28
Typical Daily Traffic Pattern
Lowest activity
Highest activity
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
29
IP and TCP Traffic Rates
>90% TCP packets
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
30
Zero Length TCP Packets
20-40% zero length pkts
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
31
Fragmented IP Packets
.25% Fragmented
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
32
Packet Sequencing
3x-4x more
retransmitted
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
33
Packet Sequencing (cont)
3%-4% Retransmitted
TCP
Processor
HARDWARE CIRCUIT
1% Out of Seq
David V. Schuehler
34
Worm/Virus Detection
• Search for digital signatures
• MyDoom (appeared 1/26/04)
–
–
–
–
–
Spread via email attachment
Opens back door via ports 3127-3198
Contains SMTP engine to replicate itself
Contains denial of service attack (25% operational)
At Peak, 1 in 12 emails contained virus
• Netsky (appeared 3/1/04)
– Spread via email attachment
– Scans drives C through Z looking for email addresses
– Contains SMTP engine to replicate itself
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
35
MyDoom Virus Detection
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
36
Netsky Virus Detection
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
37
Denial of Service Attack
• TCP SYN Attack
– 8 minutes in duration
– 71,000 TCP pkts/sec avg (34,000 normal)
– 40,000 TCP SYN pkts/sec avg (2,000 normal)
• IP attack (non TCP traffic)
– 3.5 minutes in duration
– 91,000 IP pkts/sec peak (36,000 normal)
– 57,000 Non-TCP pkts/sec peak (2,000 normal)
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
38
Attack Difficult to Detect
TCP: 10:25 to 10:34am
IP: 10:37 to 10:41am
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
39
Both Attacks Visible
Non-TCP attack
TCP attack
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
40
TCP SYN Attack
20x increase in
SYN packets
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
41
Attack Directed at SSH Port
counter saturated
True spike at 2.4 M pkts
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
42
Non-TCP Attack
29x increase in
non-TCP packets
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
43
Flow Classification and Attacks
•
•
•
•
•
•
State store contains 1 million records
Record removed after TCP FIN or RST
Stale records are not aged out
500,000 to 800,000 active records normal
DoS attack can cause flow saturation
Table quickly settles back to normal range
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
44
Active State Store Records
400,000 new flows
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
45
Outline
•
•
•
•
Motivation and Background
Architecture and Related Work
Live Internet Traffic Processing
Conclusion and Future Work
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
46
Insights
• 20%-40% zero length packets
– Increase from 18% to 22% (Shalunov: Internet2‘01)
– Implies larger amount of 1-way traffic
– Optimization skips processing of these packets
• 5% out of order packets
– Agrees with results from (Jaiswal: Infocom‘03)
• Flow classification tables need to be larger
– Flow table ½ to ¾ full during normal processing
– 1M entry table saturated during attack
• Automated response systems required
– Short lived attacks difficult to address manually
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
47
Contributions
• Developed Architecture for TCP-Processor
– Hardware-based system
– High-performance (multi-gigabit networks)
– Per-flow context storage & retrieval
• Implemented TCP-Processor in Reprogrammable Hardware
– Operates at 85Mhz on Xilinx Virtex 2000E FPGA
– Maximum throughput of 2.7 Gbps
– Maximum 2.9M packets/sec
• Created inter-device protocol TCP applications
– Multi-device coordination of TCP stream processing
– Interfaces with TCP-Processor
– Self-describing/extensible transport protocol
• Analyzed live Internet traffic
– Insight into Internet traffic profiles
• Supported academic and commercial endeavors
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
48
Future Work
•
•
•
•
•
•
•
•
Packet defragmentation
Flow classification
Packet storage manager
10Gbps and 40Gbps data processing
Histogram (packet size, packet type, etc)
Event rate detection
Traffic sampling and real-time analysis
Application integration
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
49
Acknowledgments
• Advisor & committee
–
–
–
–
–
John Lockwood (advisor)
Chris Gill
Ron Loui
Ron Indeck
Dave Schimmel
• Reuters (formerly Bridge)
– Scott Parsons
– Deb Grossman
– John Leighton
• Recommendations
–
–
–
–
• ARL faculty & staff
–
–
–
–
Jon Turner
Patrick Crowley
Fred Kuhns
John DeHart
• CSE faculty & staff
• ARL & FPX students
• NTS
• Reviewers
– Tanya Yatzeck
– James Hartley
• Family
– Jerry & Lois (parents)
– Chris & Kreslyn
– Nancy, Jeff & Nathan
– Steve Wiese
• Global Velocity
– Matthew Kulig
Scott Parsons
Don Bertier
Andy Cox
Chris Gray
• Friends
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
50
Questions
TCP
Processor
HARDWARE CIRCUIT
David V. Schuehler
51