sram-fpga implementation of masked s-box … · sram-fpga implementation of masked s-box based dpa...

SRAM-FPGA Implementation of Masked S-Box Based DPA countermeasure for AES

Najeh Kamoun (1), Lilian Bossuet (2), Adel Ghazel (1)

(1) CIRTA’COM Lab, SUP’COM, Cité technologique des Communications El Ghazala, Ariana, Tunisia (2) IMS Lab, ENSEIRB, Bordeaux, FRANCE

e-mail : [email protected], [email protected], [email protected]

Abstract— This paper presents FPGA implementation and overhead evaluation for an algorithmic DPA countermeasure for Advanced Encryption Standard AES. To reduce implementation overhead the masked compact S-Box, proposed by Canright, was chosen to implement a DPA countermeasure on an SRAM FPGA. Obtained results showed that secured AES IP leads to slices number increase by 60,1% and a frequency decrease by 4%.

Keywords-— AES, DPA, Masked S-Box, SRAM FPGA.

I. INTRODUCTION

The expanding use of digital communications, electronic financial transactions and digital signature applications have raised demanding security issues to fulfill the requirements for secrecy, integrity and non-repudiation of exchanged confidential information. In this context, cryptographic algorithms and devices are fundamental for building blocks of every secure communication systems. Symmetric cipher is widely used because it is a simple and efficient algorithm. Since 2001, the Advanced Encryption Standard AES [1] is considered by the National Institute of Standards and Technology NIST as the symmetric cipher block standard. It is a secure encryption algorithm, used not only for U.S. government documents, but also in electronic commerce.

Besides performing a cipher algorithm a cryptographic device is also requested to physically protect the secret data manipulated during its real execution. Traditionally, cryptanalysis has been directed around algorithms but, since only few years, hardware implementation is considered as a fundamental part for security evaluation of the cryptographic design. New challenges in this field are the so called side-channel attacks [2], i.e. which exploit information leakage from the cryptographic device due to physical phenomena such as power consumption, electromagnetic radiation and execution timing. These attacks are based on monitoring a physical quantity and applying statistical analysis to extract confidential information from extremely noisy signals.

Many research works focused particularly on studying Differential Power Attack (DPA) [2-6] and have proposed multiple countermeasure techniques at different levels: algorithmic, system and logic [7-8].

In this paper we will focus on algorithmic DPA countermeasure for AES by firstly choosing low complexity masking technique. Secondly design its FPGA implementation

then analyze proposed solution hardware complexity and timing performances.

The present paper is organized as follows. Section II describes AES algorithm and DPA principles. Masked S-Box based DPA countermeasure is chosen in section III. Section IV presents FPGA implementation details and results.

II. AES AND DPA PRINCIPLES

A. AES Algorithm processingThe Advanced Encryption Standard AES [1] is a symmetric

cipher which encrypts and decrypts 128-bit data with key’s size 128, 192 or 256 bits. In this paper, we are interested in the implementation of the AES encryption scheme with 128-bit key. The 16-byte input plain text Di (i∈[1..16]) is arranged in a four-by-four byte matrix called state, as it is indicated in Table I. All the transformations in AES operate on the state.

Table I. Data arrangement in AES state

D1 D5 D9 D13

D2 D6 D10 D14

D3 D7 D11 D15

D4 D8 D12 D16

The main AES cipher transformations are:

AddRoundKey : A round key is added to the state matrix using the XOR operation. The rounds keys are derived from the key of the first round using Key Expansion algorithm.

ShiftRows : The second row of the state matrix is cyclically shifted by one byte to the left, the third row by two bytes and the fourth row by three bytes. The first one remains unchanged. The ShiftRows transformation increases the "diffusion" properties of AES.

SubBytes : Each byte of the state matrix is substituted using a bijective Substitution Box: short S-Box. The S-Box is based on the non-linear inversion in the finite field GF(28) and a bitwise affine transformation. The S-Box step increases the “confusion” properties of AES.

MixColumns : This step is a linear transformation, which increases the diffusion properties of AES. Equation (1)

mailto:[email protected]



describes the relation between the output column byte ci and the input column byte bi.

0 0

1 1

2 2

3 3

02 03 01 0101 02 03 0101 01 02 0303 01 01 02

c bc bc bc b

=

(1)

The matrix elements 03, 02 and 01 correspond to the polynomials x + 1, x and 1. The multiplication operation is achieved in GF(28) modulo the polynomial x4+1.

Key Expansion: The key expansion derives the round keys from the cipher key.

Figure 1 describes all the transformations in an AES encryption. Iteration’s number is equal to 10 for the 128-bit key.

Out

Key Expansion

AddRound Key SubBytes ShiftRows MixColumns

Key

In

Figure 1. Basic transformation of an AES encryption

B. DPA principleTo predict power consumption of the device under attack,

power analysis attacks [2] generally require a hypothetical model of the device under attack to predict its power consumption. For example, FPGAs are usually made of CMOS gates. The switching activity in these devices is responsible for the main component of the power consumption. For a single CMOS gate, the dynamic power consumption PD, as given by equation (2), is the product of the gate load capacitance CL, the dynamic voltage VDD, the probability ptr of output transition from 0 to 1 and the clock frequency F [4].

2D L DD trP C V p F= (2)

The equation (2) shows the data dependence through the PD

variable with the power consumption. Consequently, an attacker may consequently estimate device power consumption at time t by correspondent output’s hamming weight inside the device. Based on this simple observation, power analysis attacks have been applied to numerous algorithms and devices like FPGAs [3, 6]. We will focus on attacks targeting the AES algorithm. The attacker proceeds as follows. First, he targets the most significant byte of the key KMS. Then, for N different plain texts, he predicts the hamming weight at the S-Box output, for every possible value of KMS. The result of this prediction is a Nx256 selected prediction matrix MP.

In the second part of the attack, the adversary lets the circuit encrypt the same N plain texts with a fixed key and he

measures the power consumption of the device while the chip is operating the targeted operation. This results in a Nx1 measurement vector VM.

Finally, the attacker computes the correlation between the measurement vector and all the columns of the selected prediction matrix MP. If the attack is successful, it is expected that only one value, corresponding to the correct key bits, leads to a high correlation. An efficient way to compute the correlation is to use the Pearson coefficients that can be expressed as indicated in equation (3).

( . ( )) ( ). ( ( ))( , ( ))var( ).var( ( ))

M P i M P iM P i

M P i

E V M c E V E M cC V M cV M c

−= (2)

In this expression, E(VM) denotes the mean of the measurements set VM and var(VM) its variance, MP(ci) the column number i in the matrix MP. More explanations of the power analysis attack principles can be found in previous publications [5].

III. ALGORITHMIC DPA COUNTERMEASURE CHOICE

The feasibility of power analysis attacks is due to the fact that cryptographic devices power consumption depends on the executed cryptographic algorithms intermediate values. Hence, the idea of countermeasures is to make the power consumption independent of those intermediate values to withstand such attacks. Security is as strong as the weakest link; therefore protecting cryptographic designs should be done on all levels of abstraction. Those levels can be: algorithm, system and logic. Each abstraction layer represents specific modeling, design and implementation issues that must be covered for secure system operation [6]. Figure 2 explains the possible actions at each level.

System level countermeasures cannot be integrated inside the design so it will be easily localized by the adversary. This countermeasure effect can be removed by smoothing the power consumption. Moreover, it consumes a significant part of the chip power.

Figure 2. Classification of security levels.

Algorithmic countermeasures can be restricted to the masking schemes. Its basic idea, explained in references [7, 8], is to randomize the intermediate results that are produced during the computation of a cryptographic algorithm.

We take the case of an AES cipher. All the steps in a round of AES are affine, except for the Galois field inversion sub-

Countermeasures level

Logic level

System level

Algorithmic level

Logic gates

Power supply

Random mask

step of the SubBytes step. For the other steps, calculation of the mask correction is linear, so an additive mask is most convenient. Some previous research works have suggested switching to a multiplicative mask for the Galois inverse step [8], but one inescapable weakness is that a zero data byte is unmasked by multiplication. Oswald et al. [9] suggested the use of additive and multiplicative mask for SubBytes function with the Tower field representation. Applying this representation, inversion in GF(28) involves several multiplications and one inversion in the subfield GF(24), which in turn involves multiplications and inversion in GF(22). In the sub-subfield GF(22), inversion is identical to squaring, and so is linear (over GF(2)). They showed how to compute the mask correction for the tower field approach. Many of the correction terms involve multiplication in subfields, and they mention how some of these multiplications can be eliminated through clever re-use of parts of the input mask for the output.

Canrigth in [10] proposed to incorporate this masking approach into the compact S-Box introduced in [11]. Applying the same optimizations for the unmasked S-Box, to the mask correction terms, results in a more compact masked S-Box. S-Box layer occupies 60% of the area in an efficient hardware implementation of the AES [1]. Then the area overhead is above 120% if we use 16 S-Box. Mentens in [12] optimizes the area of secure AES with masking schemes to 20% minimizing the number of S-Box functions to four without taking into consideration the area of the True Random Noise Generator TRNG [13]. We mention that Mentens has used the Satoh S-Box [12].

Algorithmic countermeasures are global solutions and can be adapted to variant technologies. It is necessary to employ the true random noise generator for the mask function. This generator takes an important area and the robustness of this countermeasure is based on its quality of randomness [9].

IV. SRAM FPGA IMPLEMENTATION AND RESULTS

Figure 3 presents the experimental set-up for the DPA and its countermeasure implementation and evaluation for secure AES implemented on an FPGA board.

An Agilent 54622D digital storage oscilloscope is used and has a bandwidth of 100 MHz with maximum sampling rate of 200 MSample/sec. To obtain enough sample points per cycle, we lowered our design speed to 95 Hz. The communication between the scope and the PC is done via the General Purpose Bus Interface GPIB (IEEE-488). A 0.2 Ω resistor is inserted between the power supply and the FPGA Board. Voltage difference between CH1 and CH2 is measured with a 20 MHz low-pass probe to reject DC signal.

A. Implementation of secure S-Box

The S-Box composed of SubBytes and the AddRoundKey function, as indicated in figure 4, was considered for a hardware implementation on a Xilinx Virtex 4 FPGA (LX25-FF676). The input data Din is combined with a secret key-byte K with exclusive-or. Then, the S-Box module is applied to retrieve the output samples S. The design of the compact S-

Box, proposed by Canright [14], was used for this implementation.

Figure 3. Experimental set-up for DPA and countermeasure implementation

Figure 4. Data flow chart of of unsecured AES S-Box module

Table 2. FPGA implementation results for AES S-Box module.

PerformanceAES S-Box

Unsecured Secure with masking

Area (slices) 36 100

Area overhead 0% 44%

Frequency (MHz) 88 60

Frequency decrease 0% 31%

We notice from the table 2 that the secure AES S-Box with masking needs larger area. Indeed, the number of used slices increased by 44%. However, the frequency decreased from 88 MHz to 60 MHz. This is due to the increase of the used slices.

B. Implementation of secure AES designResults of table 3 shows that secure AES with the masking

scheme needs larger area than the unsecured one. As for the previous implementation, the number of used slices rose by 60,1 %. The frequency decreased from 4%. This is due to the increase of the used slices.

V. CONCLUSION

In this paper, FPGA implementation of a first order algorithmic DPA countermeasure for Advanced Encryption

AddRoundKey

SubBytes

Din

Dout

Standard (AES) is proposed. After presenting AES processing and DPA principle authors focused on choosing the low complexity algorithmic countermeasure technique

Table 3. FPGA implementation results for AES box

Performance AESUnsecured Secure with masking

Area (slices) 1424 2281Area overhead 0% +60.1%

Frequency (MHz)

143 137

Frequency decrease

0% 4%

.The study of proposed techniques in recent literature showed that best solution can be obtained with the Canright Masked Compact AES S-Box. The originality of presented results in this work comes from the exploration for the first time of SRAM FPGA implementation overhead for this DPA countermeasure technique. Obtained results showed that secured AES IP leads to slices number increase by 60,1% and a frequency decrease by 4%.

ACKNOWLEDGMENT

The authors would like to acknowledge the University of 7th

of November at Carthage for financing this work and the IMS Lab at the University of Bordeaux I for giving access to their security lab facilities.

REFERENCES

[1] National Institute of Standards and Technology (NIST). FIPS-197: Advanced Encryption Standard, November 2001. Available online at http://www.itl.nist.gov/fipspubs/.

[2] P. Kocher, J. Jaffe, B. Jun, Differential Power Analysis, in the proceedings of Cypto 1999, Lecture Notes in Computer Science, vol 1666, pp 398-412, Santa-Barbara,USA, August 1999, Springer-Verlag.

[3] S. B. Örs, E. Oswald and B. Preneel, Power-Analysis Attacks on an FPGA--First Experimental Results. Proceedings of Cryptographic Hardware and Embedded Systems – CHES 2003, 5th International Workshop Cologne, Germany, September 8–10, 2003, pp. 35-50, LNCS 2779.

[4] J.M. Rabaey, Digital Integrated Circuits, Prentice Hall International, 1996.

[5] F.-X. Standaert, E. Peeters, F. Macé, J.-J. Quisquater, Updates on the Security of FPGAs Against Power Analysis Attacks, in the proceedings of ARC 2006, Lecture Notes in Computer Science, vol 3985, pp 335-346, Delft, The Netherlands, March 2006, Springer-Verlag

[6] L. Batina, N. Mentens, and I. Verbauwhede, Side-channel Issues for Designing Secure Hardware Implementations, In IEEE international online testing symposium, IOLTS 2005, special session on side-channel and fault attacks IEEE, 4 pages, July 6-8, 2005, Saint Raphael, France. 36(4):68–74, April 2003.

[7] T. Popp, S. Mangard, E. Oswald, Power Analysis Attacks and Countermeasures , IEEE Design & Test of Computers - Design and Test of ICs for Secure Embedded Computing, IEEE, 2007.

[8] M.-Laurent Akkar and C. Giraud. An implementation of DES and AES,secure against some attacks. In CHES 2001, volume 2162 of Lecture Notes in Computer Science, pages 309–18, 2001.

[9] E. Oswald and K. Schramm, An Efficient Masking Scheme for AES Software Implementations, Proceedings of WISA 2005, LNCS 3786, Springer, pp. 292-305, 2006.

[10] D. Canright and L. Batina, A Very Compact "Perfectly Masked" S-Box for AES, Applied Cryptography and Network Security, ACNS 2008, June 3-6, New York.

[11] D. Canright . A Very Compact S-Box for AES, Workshop on Cryptographic Hardware and Embedded Systems (CHES2005), Lecture Notes in Computer Science 3659, pp.441-455, Springer-Verlag 2005.

[12] N. Mentens, L. Batina, B. Preneel, and I. Verbauwhede, An FPGA Implementation of Rijndael: Trade-offs for side-channel security, In IFAC Workshop - PDS 2004, Programmable Devices and Systems, Elsevier, pp. 493-498, 2004.

[13] V.Fischer, M. Drutarovsky, M. Simka, N. Brochard. High Performance True Random Number Generator in Altera Stratix FPLDs. In Field Programmable Logic and Application (FPL), sptember 2004.

[14] T. Messerges, E. Dabbish, and R. Sloan, Investigations of Power Analysis Attacks on Smartcards, IEEE Trans. Computers, vol. 51,no. 5, May 2002.

http://www.itl.nist.gov/fipspubs/

sram-fpga implementation of masked s-box … · sram-fpga implementation of masked s-box based dpa...

Documents