cache attacks and countermeasures: the case of aes dag arne osvik, adi shamir and eran tromer...

Cache Attacks and Countermeasures: the Case of AES

Dag Arne Osvik, Adi Shamir and Eran Tromer

Presented byOphir Arbiv

ophirarb@post.tau.ac.il

Sources

[1] Cache Attacks and Countermeasures: the Case of AES (Extended Version),2005, Dag Arne Osvik, Adi Shamir and Eran Tromer.

[2] theory.csail.mit.edu/~tromer/SKC2006/cache-skc06.ppt – Tromer’s lecture in MIT.

[3] www.l-sec.be/calit/present/AdiShamir.pdf - Adi Shamir’s lecture in Weizman Inst.

• 1997 - DES becoming outdated NIST announces competition to design a successor.

• Evaluation criteria - Security, Cost, Algorithm & Implementation

Characteristics

• 21 Algorithms were received. In 2001 - NIST selected Rijndael as the

proposed AES algorithm.

• Rijndael was proposed by Dr. Vincent Rijmen and Dr. Joan Daemen from

Belgium

• Properties:

– Symmetric

– Block Cipher

– Based in finite mathematics

– 128 bit Data and Key size of 128, 192 and 256 bits.

– Resistant to known attacks.

AES – Advanced Encryption Standard

Source: http://klabs.org/mapld05/presento/103_swankoski_p.ppt

AES Algoritrhm

101010 KTXC

• The mathematical description of the algorithm:

Efficient Implementation

• Originally proposed in the Rijndael spec, and is now widely used.• Uses pre-computed table lookups.

)10(03210 ,,, & ,,, TTTTTTTTTables:

Round implementation:

• Each round - 16 table lookups, 16 xor’s, and 12 shifts.•.Tables occupy – 4 KB (X2)

• During AES selection, only branch statements, arithmetic, and data-dependent shift were considered vulnerable.

• Proposed Algorithms was widely analyzed.

• Apparently, since it uses only table lookup, xor & shift, NIST declared Rijndael “not vulnerable to timing attacks.• 2003 - NSA declared AES-128 can be used to protect all US Government data except Top Secret data which needs AES-256 (at least).• No known direct attacks as for today.• Expected to be the standard for 20+ years.

AES - summary

Side Channels

Plaintext

Cipher

Ciphertext

• Any observable information emitted as a byproduct of the physical implementation of the cryptosystem.

Side Channels

Source: www.stanford.edu/~jbonneau/AES_side_channel.ppt

Examples for side-channels :

•Power consumption (simple, differential…)

• Time

• Heat

• Acoustic Noise (Keyboards..)

• Cache

• Fault (power glitch, jitter..)

• Electromagnetic radiation

• Visual

Side Channels

CPU core60% (until recently)

Main memory7-9%

cacheAnnual speedincrease:

Typicallatency:

50-150ns0.3ns → timing gap

Why Cache Analysis?

• The cache is a shared resource.

=> cache state affects and affected by all processes.

=> possible crosstalk between processes.

• Process memory is usually protected but…

• Information about memory access patterns of other processes is

leaked.

• Cache attacks are pure software attacks.

• Very cheap.

• A process with no special privileges & no interaction with the

cryptographic code (some variants) can attack the cryptographic code.

Cache Attacks

cache line

(B bytes)

cache s

che lines)

memory block(B bytes)

How Cache Works?

Memory Access

•The cache holds copies of aligned blocks of B bytes in main memory (blocks). •When a memory access instruction is processed, memory cell is searched in the cache first. •If a cache miss occurs, a full memory block is copied into the appropriate set (S possible sets) into one of the W cache lines.

How Does a Cached Table Look Like?

Notation

•δ – the cache line size B divided by the size of each table entry (usually 64/4 =16).

•<y> = the memory block of y in Tl.

<y> = <z> iff when used as lookup indices into the same table T`, they would cause access to the same memory block

• Qk(p,l,y) = 1 - iff the AES encryption of the plaintext p under the encryption key k accesses the memory block of index y in Tl at least once (during the 10 rounds).

Cache Attacks on AES

• The efficient implementation of the algorithm has a big weakness:

The lookup addresses strongly rely on the encryption key ( The Secret).

• Therefore, by knowing which memory cells were accessed we can extract the key (suppose a BUS attack).

• Usually the attacker doesn’t have access to the BUS and the memory is partitioned and protected by the OS.

•The Solution : The cache is a shared resource through which we can learn about the memory access patterns of other processes.

Synchronous Attacks• The plaintext or cipher-text is known

• The attacker can operate synchronously with the encryption (on the same processor).

• Examples:– sending data packets through a secure channel in a VPN.

– Linux’s dm-crypt and cryptoloop services.

• The Attack Scheme1. Obtain a set of random samples, Mk(p,l,y) of the predicate Qk(p,l,y).

2. Perform off-line cryptanalysis:a) Guess small parts of the key.

b) Use the guess to predict memory accesses.

c) Check whether the predictions are consistent with the collected data.

One Round Attack• Consider one of the memory accesses in the 1st round: T0[p0 k0]

• Given a candidate value k’0 and samples of Q(p,l,y):

– The useful samples are those that fulfill: p0 k’0y

– If k’0k0 then for all useful samples:

• p0 k0 p0 k’0 y so

• T0[p0 k0] accesses address y => Q(p,l,y)=1

– Otherwise:• p0 k0 p0 k’0 y => Q(p,l,y)=0

• But there are 35 more “random” accesses to T0…

with probability (1-1/16)350.104 A few hundred (!) random samples suffice to eliminate all bad

candidates.

High nibble of all key bytes (log2(256/ δ)) are extracted (64 bits).

Full Key Extraction• We managed to narrow down each byte of the key to δ possibities, with a straightforward method.

(in the common case it means extracting half the key - 64 bits)

•This is all the possible information from 1st round accesses.

• By moving to 2nd round and taking advantage of the non-linearity of the S-box we can extract the full key!!

• These equations for the 2nd round are easily derived from the Rijndael specification:

{ s(·) denotes the Rijndael S-box function and • denotes multiplication over GF(256).}

• is used as an index to T2.

• The only relevant unknowns in the index are the low nibbles of k0,k5,k10 and k15 (216 candidates).•Can test a candidate as before:

– Predict this lookup according to guess {k’0,k’5,k’10, k’15} (lower nibble k2 irrelevant).– Identify useful samples, i.e., those where y is in the same memory block as the prediction–Check whether Q(p,l,y)=1 for all useful samples.

•There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles.=> full key recovery using ~2000 random samples.

Two Round Attack

• How do we obtain the measurements Mk(p,l,y) of predicate Qk(p,l,y) ??

• Inter-process crosstalk can be exploited in two ways:– Effect of the cache on the encryption

(timing).– Effect of the encryption on the cache.

Measurement Methods

1. Make sure the tables are cached

2. Evict one cache set

3. Time an encryption and see if it’s slow

Measurement Method 1: Evict + Time D

Results• Weakness of this method:

– It relies on timing the triggered encryption

=> it is very sensitive to variations in the operation (noise due

scheduling, branches, cache contention and ect.)

• The authors were able to extract key only from artificial service (using OpenSSL libs) but not from real services.

1. Completely evict tables from cache

2. Trigger a single encryption

3. Access attacker memory again and see which cache sets are slow

Measurement Method 2: Prime + Probe• Trying to discover the set of memory blocks read by the encryption a posteriori, by examining the state of the cache after encryption.

Results• Yields more information (4 · 256/ δ) from a single encryption• Not a timing attack! Attacker is timing a simple operation performed by itself!• Insensitive to timing variance in encryption code path (crucial for effective attacks on complicated systems).• No real need to trigger the encryption – can wait until it happens by itself… :

Synchronous Attacks - summary• For a known plain-text & sync. attacker • Two Measurement methods.• Results:

– OpenSLL libs on Athlon 64:• Evict + Time – 500,000 encryptions. (why?)• Prime & Probe – 300 encryptions, (16K on P4E).

– Real Linux dm_crypt:• Prime & Probe – 800 write operations – 65 ms + 3 sec offline analysis.

• Variants …

Asynchronous Attack• Someone runs encryptions computations using a secret key.• Attacker process runs on the same CPU at (roughly) the same time.• Assume the plaintext/ciphertext has a non-uniform (conditional) distribution:

–English–Formatted data–Headers–Ciphertext gleaned from wire

•Examples: just about any use of crypto on a multi-user system

Finding the key• Compare two distributions:

– Measured memory accesses statistics.– Predicted memory accesses statistics, under the given plaintext distribution and the

key hypothesis.

• Find key that yields best correlation

Countermeasures• The authors consider numerous countermeasures e.g.:

– Avoiding Memory Accesses– Alternative Lookup Tables– Data-Oblivious Memory Access Pattern– Cache State Normalization and Process Blocking– Disabling Cache Sharing– Static or Disabled Cache– Dynamic Table Storage– Hiding the Timing

• None of the them solves the problem completely. Some are architecture/application dependant or require changes in the system.•None are both secure, efficient (or cheap) and generic.

=> Case specific solutions – probably a combination of the methods.

Thank you!

Questions?

Homework

1. What is the difference between Evict+Time and Prime+Probe measurement methods.

2. In the case of known cipher-text, how would the attack change?

(hint: can be more efficient – see paper)

3. Why does a first round synchronous attack able to extract only half the key bits? (on a δ=16 platform)

4. Does the addition of random delay to the encryption algorithm improve the immunity against synchronous attacks? Why?

cache attacks and countermeasures: the case of aes dag arne osvik, adi shamir and eran tromer...

Documents

zeev dvir – dvirzeev@post.tau.ac.il genmax from: “...

crowd mining - pierre...

introduction to infosec – recitation 09 nir krakowski...

searching on multi-dimensional data col 106 slide courtesy:...

מרצה: פרופסור לב ויידמן שעת קבלה:...

modal aspects of before: semantics and pragmatics of...

garbage collection mooly sagiv msagiv@post.tau.ac.il

simulating physics with computers richard p. feynman...

tau - a. levantlevant/mathappl/dfr_pres.pdf · 2008. 2....

lower bounds for collision and distinctness with small range...

structural bioinformatics seminar dina schneidman email:...

eran tromer

eran tromer slides credit: dan boneh , stanford

1 architectural side channels in cloud computing eran tromer...

Μητρώο εκλεκτόρων ΤΕΤΥ 2017 ·...

introduzione alla terminologia pschiatrica doron gothelf,...

introduction to information security rop – recitation 5...

graded modality comparison: a modified approach yuval pinter...

md5 considered harmful today creating a rogue ca certificate...

droiddisintegrator: intra-application information...