An Encryption-Enabled Network Protocol Accelerator Steffen Peter, Mario Zessack, Frank Vater, Goran Panic, Horst Frankenfeldt, and Michael Methfessel IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im.

Download Report

Transcript An Encryption-Enabled Network Protocol Accelerator Steffen Peter, Mario Zessack, Frank Vater, Goran Panic, Horst Frankenfeldt, and Michael Methfessel IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im.

An Encryption-Enabled Network Protocol Accelerator

Steffen Peter, Mario Zessack, Frank Vater, Goran Panic, Horst Frankenfeldt, and Michael Methfessel IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Outline

Motivation

TCP

General Hardware Design

Cryptographic Accelerators

Implementation

Conclusions

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Motivation

Wireless sensor network Tiny sensor nodes Cluster head

• •

standard TCP

high data rates

security low energy

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Interne t

Motivation

Increasing amount of data

even in mobile and ubiquitous scenarios

need for good transport performance

Low cost

Small silicon area

Energy efficient

Need for security

Secrecy, Integrity, Reliability

• • • •

Support of standard protocols TCP (transport) AES (data encryption) ECC/ECDSA (signature, key agreement)

Is dedicated hardware the solution?

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

TCP

Standard transport protocol of the Internet

Connection-based protocol

-

Three-way handshake Complicated connection tear-down

Basic data integrity mechanism

-

Checksum

Error correction mechanisms

-

Fast retransmit, slow (re-)start, many others

Flow control - Buffer and congestion control

No actual security mechanisms

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

TCP – Profiling Results

Transmit Receive

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

TCP Profiling – Implications

Copying data consumes most time and energy

Reduce copy operations as much as possible

• •

Protocol handling needs merely 1/5th of the total computation

Is it worth hard-wiring the TCP state machines in hardware?

Trade-off performance



flexibility

Checksum is the most expensive computation

The obvious dedicated hardware unit

How to integrate in the data flow?

Memory allocation needs more than 5 percent of time

Can a dedicated unit help here?

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

TCP Profiling – Our Answers

One copy architecture

Data is copied directly from the peripheral handler to the right memory location (assigned by CPU)

During this one copy operation other operational blocks (checksum, encryption) listen on the bus and do their work

MIPS CPU performs complicated (but low effort) TCP logic

Connection build-up/tear-down, error handling, congestion control

Software handling allows protocol variations and debugging

• •

Dedicated checksum-block

checksum block computes checksum during the one copy operation No dedicated memory manager unit

Hard-wired memory manager reduces time by 77 percent (from 5%

1%)

BUT high hardware costs (300 flip-flops) and lack of flexibility

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

General Design

MIPS CPU handles complex protocols.

CPU never touches payload to Host Host interface handler

• •

CPU System Bus Internal 32 kByte SRAM stores packets.

AMBA bus connects system components.

• •

RF Data I/O Standard bus system allows modular approach.

Che cks um to MAC/PHY Periperhal bus (APB) connects GPIO, UART, SPI ports.

System concept: interacting independent units.

• •

Units exchange commands and status using register file.

General-purpose formalism for command/status syntax.

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com

SRAM

© 2008 - All rights reserved

Incoming packet Sleeps

General Design - Flow

• •

to Host Full header processing

• •

Window update System Bus SRAM CPU RF Data I/O Check sum

• •

Basic header

Packet received and Checksum ok?

Select memory slot for packet to MAC/PHY

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Results (Performance and power consumption)

Simulation results: split power among different entities.

• • •

Maximal data rate in pure software on MIPS is 20.7 Mbit/sec.

Hardware accelerators reduce load on CPU and save 50% of power.

Maximal data rate with hardware accelerators is 40 Mbit/sec.

Case Rate CPU CPU AMBA Reg Card EPP/ Total (Mb/sec) active bus file bus UA power

SW 20.7 100% 60 14 7 4 4 89 mW HW 20.7 15% 9 14 7 4 12 46 mW HW 40.0 31% 18 14 7 4 12 55 mW •

Measured power consumption is 2.5 times simulated power.

• •

Measured power includes pads.

Consumed power varies for different production runs.

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Cryptographic Accelerators

AES (Advanced Encryption Standard)

• •

Symmetric stream cipher Suitable for low-power high-throughput data streams

Standardized in November 2000 (NIST/National Institute for Standards and Technology; USA)

Input data length: 128 bit; key length: 128, 196, 256 bit

Assumed to be secure for the next 70 years

ECC (Elliptic Curve Cryptography)

• •

Asymmetric cryptography Suitable signatures and key-establishment

Key length 160-571 bit (NIST standard)

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Advanced Encryption Standard (AES)

S E C R E T K E Y

Key

I H P M i c r o e l e c t r o

Data Calc key xor S-Box Shift row 10 Rounds Mix Column

84 30 AE CB 97 38 AD 58 7A 0E 67 CF FE 43 17 80

Output data

Huge design space

Sharing S-Boxes reduces performance but leads to smaller designs

Pipelining and Parallelism boost performance – but cost area and energy

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

AES - Results

• • • •

Throughput: ~52 MBit/s @33 Mhz (includes input and output of the data blocks) Size: 0.336mm² in 0.25 CMOS (8,450 equivalent gates) 70 clock cycles per 128 bit data block for en-/decryption 72 times faster than software implementation on MIPS (33 MHz) and it requires 0.4% energy of the software solution Company

Elliptic Semi: AES IHP AES

Size Gates

8.000 +1.400 bit Memory 8.450

Clock cycles

379 70 IAIK Graz 8.930

64 IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com

Clock frequency

100 MHz 66 MHz 64 MHz

Throughput

33,7 Mbit/s 108MBit/s 128MBit/s © 2008 - All rights reserved

Elliptic Curve Cryptography (ECC)

Asymmetric cryptography

Basis for many key exchange and signature algorithms (ECDSA)

Trapdoor : Elliptic Curve Point Multiplication

one can compute: Q = kP

it is infeasible to determine k for given Q and P

Higher security with shorter key lengths

about 1/10th of RSA’s key size

Still operations on Elliptic Curves are expensive

one 233 bit EC Point multiplication needs: 1200 additions, 1500 multiplications, 800 squarings, 1 division (233 bit each in the finite field)

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

ECC - Design

Utilization

Area

Asymmetric cryptography

Trapdoor : Elliptic Curve Point Multiplication 50%

– –

15% one can compute: Q = kP it is infeasible to determine k for given Q and P 5% 95% 70%

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

20%

ECC – Implementation Results

Time for one ECPM (233 bit):

• •

MIPS: 410 ms HW: 0.4 ms

Energy for one ECPM (233 bit):

• •

MIPS: 16 mWs HW: 0.03 mWs Company

Elliptic Semi: B-233 IBM B-163 IHP B-233

Size KGates

71 117 72-42

Energy mWs

0.14

0.02-0.04

Time/Op ms

6.68

0.19

0.08-0.35

Technology used

0.13µm 0.13µm 0.25µm IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Implemented Chip (Design)

CardBus (Linux/Windows Host) I-Cache (16 kB) CardBus (Master) EJTAG (Debug) MIPS Processor Core AMBA AHB Bus D-SPRAM (8 kB) Memory Controller (AHB Slave) CPU Control Bus EPP UART Registers & & Control Internal SRAM SRAM (32 kB) (32 kB) Check Sum1 ECC Data I/O

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com

UART GPIO Serial 1+2 GPIO

© 2008 - All rights reserved

Flash SRAM

Implemented Chip (Chip Photo)

Size: 7.3 x 7.4 mm (54 mm²) Core: 44 mm² Pads: 219 in QFP256 package Transistors: 4.8 M in 0.25

μm Packet SRAM: 32 kByte Instruction cache: 16 kByte Data scratchpad: 8 kByte

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Implementation (Test Board)

Allows:

• • • •

Testing the implementation in practice Tests of interoperability Performance tests Energy measurements

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Conclusions

• • • • •

Profiling of TCP/IP code identified bottlenecks

TCP checksum and copying use 90% of power.

High data rate needs hardware accelerators.

Chip is a hardware solution for TCP/IP handling

Takes care of middle protocol layers efficiently.

AMBA-based bus as prototype for modular systems

Assemble different systems quickly.

Pre-tested components lead to reliability.

Cryptographic components allow security for low-cost

Designs for AES and ECC improve performance and energy consumption for security operations by three orders of magnitute.

TCP chip creates basis for further developments

Extension to higher data rates (Gbit/sec).

Use as component of complex single-chip systems.

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved

Thank You

Questions?

[email protected]

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2008 - All rights reserved