cesel: flexible crypto acceleration talks/2018/kiningham.pdfcrypto background symmetric ciphers aes,...

31
CESEL: Flexible Crypto Acceleration Kevin Kiningham Dan Boneh, Mark Horowitz, Philip Levis

Upload: others

Post on 12-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

CESEL: Flexible Crypto Acceleration

Kevin KininghamDan Boneh, Mark Horowitz, Philip Levis

Page 2: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Cryptography

• Mathematical operations to secure data

• Fundamental for building secure systems

• Computationally intensive: takes time and energy

Page 3: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Example: Emergency Response Drone

• Secure connections for real-time control

• Authentication

• Real-time control means these operations must be efficient and fast• Want feedback loops on 1-10ms scale• Some algorithms take 10-100ms

• For other use cases, energy efficiency is also important

Page 4: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Hardware Acceleration• Solution today: fixed hardware crypto accelerators• Implement crypto algorithm directly in silicon• Huge energy/latency improvement, but inflexible

• Problem: Security requirements change over time• Requirements change• Vulnerabilities emerge

• Current accelerators can’t adapt to new crypto • New cipher requires physical replacement of device• Can’t replace drones every 2 years

Page 5: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Flexible Acceleration• We need flexible acceleration• Support wide range of crypto• “Good enough” power reduction

Energy

Flexibility

ASIC

Software

CESEL

Page 6: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Outline• Motivation

• Common Operations in Cryptography

• Features for Flexible Acceleration• Bitslicing and Permutation• Flexible SIMD

• CESEL Architecture

• Evaluation

Page 7: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Crypto BackgroundSymmetric

Ciphers AES, DES, …

Hash Functions

SHA, BLAKE, …

Asymmetric Ciphers

RSA, ECC, …

Page 8: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Crypto BackgroundSymmetric

Ciphers AES, DES, …

Hash Functions

SHA, BLAKE, …

Asymmetric Ciphers

RSA, ECC, …

• Based around idea of “diffusion”• Single input bit change should randomize output

• Traditionally done with substitution tables and bit or byte permutation networks• Easy to construct, very efficient in hardware

• Some recent ciphers use arithmetic operations and bitshifting• More efficient in software, easier to parallelize

Page 9: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Crypto BackgroundSymmetric

Ciphers AES, DES, …

Hash Functions

SHA, BLAKE, …

Asymmetric Ciphers

RSA, ECC, …

• Based around a cryptographically hard problem• E.g. discrete log, factoring

• Many “long-word” arithmetic operations• E.g. RSA typically uses words with >1024 bits

• Much more computationally expensive than symmetric

Page 10: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Commonalities Between Crypto Algorithms

• Investigated 38 crypto algorithms• Drawn from major protocols/libraries/competitions

• Symmetric ciphers/Hash functions• Bit/Byte permutations• Internally parallel• Operation width <= 64 bits

• Asymmetric ciphers• Long-word arithmetic operations (>128 bits)

Page 11: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Outline• Motivation

• Common Operations in Cryptography

• Features for Flexible Acceleration • Bitslicing and Permutation• Flexible SIMD

• CESEL Architecture

• Evaluation

Page 12: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Efficient Permutations

• Problem: How to support efficient permutations?• Arbitrary bit-level permutations are expensive

• Insight: Byte permutations are common case• Especially true for software targeted ciphers• Let’s support those first

• Supported using a crossbar network

Page 13: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Efficient Permutations

2 3 1 4

Inputs

Outputs

Page 14: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Efficient Permutations

2 3 1 4

1

Inputs

Outputs

Page 15: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Efficient Permutations

2 3 1 4

2

3

1

4

Inputs

Outputs

N^2 = 16 connections

Page 16: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Efficient Permutations• Can also reduce size of network by splitting permutation over

multiple cycles

• In CESEL, 32-byte permutation in 4 cycles• Additional optimizations allow word (4-byte) permutations in

single cycle• “Best” network depends on area/perf. constraints

2 3 1 4

1

2

2 3 1 4

3

4

Cycle #1 Cycle #2

(N^2) / 2 = 8 Connections

Page 17: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Bitslicing

• Problem: Crossbar does not scale to bit permutations• 256-bit permutation requires 65k connections

• Insight: Vast majority of ciphers don’t require arbitrary permutations, just same permutation applied in parallel

• Use bitslicing to convert parallel bit permutations into word permutations• Commonly used in software crypto implementations• Very fast in hardware

Page 18: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Bitslicing Example1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 1 1 1 2 2 2 2 3 3 3 3 44 4 4

• Bitslicing groups bits by their position in each word

• After regrouping, word permutation is performed

• Finally, bits are regrouped into original positions

• Allows cheap parallel application of bit permutations

Page 19: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Bitslicing Example1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 1 1 1 2 2 2 2 3 3 3 3 44 4 4

• Bitslicing groups bits by their position in each word

• After regrouping, word permutation is performed

• Finally, bits are regrouped into original positions

• Allows cheap parallel application of bit permutations

1 1 1 13 3 3 3 2 2 2 244 4 4

Page 20: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Bitslicing Example1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4

1 1 1 1 2 2 2 2 3 3 3 3 44 4 4

• Bitslicing groups bits by their position in each word

• After regrouping, word permutation is performed

• Finally, bits are regrouped into original positions

• Allows cheap parallel application of bit permutations

1 1 1 13 3 3 3 2 2 2 244 4 4

3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2

Page 21: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Flexible SIMD

• Problem 2: Ciphers have a wide range of bitwidths for their “fundamental” operations• E.g 8-bits: AES, 32-bits: DES, ChaCha, 64-bits:

SHA512• Asymmetric algorithms require very large bitwidths

• Solution flexible SIMD• Allow large number of lane widths in ISA• CESEL supports between 8 and 256-bits

• Long bitwidths supported by mapping to smaller widths and executing over multiple cycles

Page 22: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Outline• Motivation

• Common Operations in Cryptography

• Features for Flexible Acceleration• Bitslicing and Permutation• Flexible SIMD

• CESEL Architecture

• Evaluation

Page 23: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

• In order SIMD architecture with 256-bit data path

• Designed to execute as co-processor

• Split into Frontend (fetch/decode) and Backend (execute/writeback)

• Number of lanes and lane width is flexible

CESEL Overview

Page 24: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

CESEL Frontend• No data dependent control flow• Not needed in vast majority of crypto code• Simplifies implementation

• Loops use hardware “Loop Stack”• Hardware keeps track of loop iteration +

boundary• Fetch stage can always predict next

instruction

• 16-bit instructions• Minimizes energy cost of instruction

fetches (significant in practice)

Page 25: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

CESEL Backend

• SIMD lane width is flexible• 32x8 bits up to 1x256 bits• Set by special instruction during

execution

• Internally, each operation mapped onto 16-bit execution units

• Fast byte permutation + bitslice

Page 26: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Programming Model

• Stream Based Coprocessor• Allows computing over very large inputs• E.g. perform integrity check of all of flash

• Main processor never needs to read keys

• Also allows easy integration with DMA• E.g. encrypt every ADC read

Page 27: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Outline• Motivation

• Common Operations in Cryptography

• Features for Flexible Acceleration• Bitslicing and Permutation• Flexible SIMD

• CESEL Architecture

• Evaluation

Page 28: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

• Implemented CESEL in 180nm TSMC

• Baseline was 32-bit RISC-V CPU optimized for similar area/cycle time

• Measured total system energy for fair comparison

Experimental Setup

Page 29: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Results

• 3.8-60x improvement vs RISC-V baseline

• ~20x more energy than dedicated ASIC

Type ASIC CESEL RISC-VBitsliced

AESSymmetric Encryption

7.07 (0.05x)

147.5 (1x)

9,036 (60x)

SHA2 Hash - 3,279 (1x)

12,360 (3.8x)

ChaCha Symmetric Encryption

15.3 (0.05x)

302.4 (1x)

2,021 (6.7x)

Curve25519

Key Exchange - 8,163

(1x)40,454 (5.0x)

RSA Signatures - 16,840 (1x)

87,401 (5.2x)

R-LWE Post-Quantum - 19,822

(1x)109,502 (5.5x)

Total estimated energy in nJ

Page 30: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Results• Does this improvement matter?

• Estimated energy savings in real IoT application• Sensor collecting/transmitting ~1KB over BLE• Curve25519 key exchange once per week

No Crypto

With CESEL

RISC-V Only

Application 3100 3100 3100Crypto 0 460 2300

Total 3100(1x)

3560(1.14x)

5400(1.74x)

Total estimated energy in nJ

Page 31: CESEL: Flexible Crypto Acceleration Talks/2018/Kiningham.pdfCrypto Background Symmetric Ciphers AES, DES, … Hash Functions SHA, BLAKE, … Asymmetric Ciphers RSA, ECC, … • Based

Recap

• IoT needs crypto acceleration + flexibility

• Our solution: CESEL• Wide SIMD + long word support• Special instructions (permute, bitslice)• No data-dependent control flow

• Significant energy savings compared to software• ~5x for most ciphers• 1.5x longer deployment time