5 · web viewkey_word(0) key_word(1) key_word(2) key_word(3) t figure 34: key scheduler simulation...

74
Hardware and Software Implementation of RIJNDAEL AES Algorithm EE-540 Project Report Vipul Joshi Ali Bassam Sanjit Kurup Nasrin Ahmed Shafaat Qureshi

Upload: others

Post on 10-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Hardware and Software

Implementation of RIJNDAEL AES

Algorithm

EE-540

Project Report

Vipul JoshiAli Bassam

Sanjit KurupNasrin Ahmed

Shafaat Qureshi

Page 2: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Table of Contents1. Introduction……………………………………………………………………..3

1.1 Project perspective………………………………………………………...3

2. Design…………………………………………………………………………....4

2.1 Spiral Design Model………………………………………………………4

2.2 Design Flow………………………………………………………………..5

2.3 Project Modeling………………………………………………………….6

3. The Advanced Encryption Algorithm………………………………………...6

3.1 Introduction……………………………………………………………….6

3.2 Overview of Encryption………………………………………………….7

3.3 Overview of Decryption……….…………………………………………11

4. Module…………………………….……………………………………………13

4.1 Software module………………………………………………………….13

4.2 Intel IXP2850……………………………………………………………..14

4.3 Intel IXP2850 Cryptographic Unit………………………………………14

4.4 Intel SDK 3.0………………………………………...……………………15

4.5 Implementing AES on IXP 2850…………………………………………16

5. Hardware Implementation of AES……………………………………………22

5.1 Design process…………………………………………………………….22

5.2 Interface Design…………………………………………………………..22

5.3 Encryption block………………………………………………………….25

5.4 Decryption block………………………………………………………….33

6. Synthesis of the AES Chip…………………………………………………….41

7. Conclusion……………………………………………………………………..56

8. Reference………………………………………………………………………57

2

Page 3: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

1. Introduction:

The Rijndael block cipher algorithm was chosen by NIST as the new advanced

encryption standard (AES). As DES is not regarded as a Standard anymore the industry

would now rush into implementing AES for cryptographic implementations on their

products. Being the strongest encryption algorithm which never has been broken till now,

it comes with overheads like performance. Various hardware implementations for AES

exist, but have their own pros & cons and there is lot of work being done in the area to

achieve perfection.

1.1 Product Perspective

The growth of the network processor industry in last two years has signified their promise

their primary purpose being packet processing at up to OC – 192 speeds. Giving much

better performance as compared to general purposes processors Network processor units

are now a very integral part of the gigabyte Internet.

Advanced Encryption Standard (AES) the latest encryption standard approved by NIST is

by far becoming the default choice for encryption in networked applications. Hardware

implementation of the algorithm gives better performance but offers less flexibility and is

also difficult and time consuming to implement as compared to a software

implementation. With the implementation of the security block in IXP 2850 Intel has

implemented encryption as an ASIC chip on the same board as their processor. Putting

minimum specifics in the block assures flexibility for various applications. This

flexibility is achieved by including a programming model along which runs on the

microengines.

Our objective is to implement the Advanced Encryption Standard on hardware using a

FPGA chip. Simultaneously develop a software implementation using the SDK 3.0 for

Intel’s IXP 2400/2800.

3

Page 4: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

The Intel SDK 3.0 consists of a decent API which gives relative control to the

programmer.

Comparison between these two approaches using the same flow charts, block lengths, key

lengths and same data will determine the overheads of using hardware and if they are

worth it. Also it would test the performance of the Intel’s Programming model.

2. Design

2.1 Spiral Design Model

The spiral model was incorporated in our design. In this manner our design was

constantly updated as more requirements were met. The system feasibility was checked

and as specifications were specified the model was updated. After determining the system

feasibility and determining the specifications a prototype was developed. This led to the

making of an initial system. The initial system then developed into an enhanced system

as more requirements and specification were met.

Figure 1 – Spiral Model

The spiral model (figure 1) successively refines the system as we move through initial

basic systems to full-scale systems. This is advantageous as the overall system is updated

4

Page 5: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

and one may never have to restart the design. This is in contrast with the waterfall design

that may have to be redesigned as new features spring up. One disadvantage of the spiral

model can be stated as that it takes more time. But this is a trade off as in case the system

design fails one may have to redesign the system, which may end up consuming even

more time.

2.2 Design Flow

The entire project can be broken down into two major modules .i.e. Hardware module

and software module.

Figure 2 Design flow

The first to be considered were the requirements and specification flowed by the

architecture design. There were two separate designs for hardware and software modules

respectively. The step involved integrating the modules and finally testing.

Requirements Specification

Architecture

Hardware design Software design

Integration

Testing

5

Page 6: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

2.3 Project Modeling

Figure 3 Project Modeling

3. The Advanced Encryption Algorithm

3.1 Introduction

In 1997 the National Institute of Standards and Technology (NIST) of the United

States put out a call for proposals for a new symmetric algorithm, that will be called the

Advanced Encryption Standard (AES). The algorithm DES was since 1976 the standard

for symmetric algorithms, and a replacement was necessary. In 1998 15 can- didate

algorithms were accepted, and after one year of research 5 of these candidates were

announced as finalists:

6

Page 7: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

• MARS (IBM corporation)

• RC6 (RSA Laboratories)

• Rijndael (J. Daemen & V. Rijmen)

• Serpent (E. Biham et al.)

• Twofish (B. Schneier et al.)

On all of these algorithms extensive research has been done to find attacks or weak-

nesses. According to NIST, all 5 finalists appear to offer adequate security. Also much

research has been done to test the performs of these 5 algorithms in both software and

hardware. In 2000 NIST announced that Rijndael was chosen as the successor of DES,

the AES. The combination of security, performance, efficiency, implementability and

flexibility made Rijndael an appropriate selection for the AES.

The candidates for the AES algorithm had to fulfill certain design criteria. First, of

course the algorithm should be a symmetric algorithm and it must be resistant

against all known attacks. Furthermore, the AES must be efficient in performance and

memory for different platforms. The design must be simple, and it should be able to

handle different key lengths (128, 192 and 256 bits). The block length of the cipher

should be 128 bits.

3.2 Overview of Encryption

Figure 4 Overview of AES

7

Page 8: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

It starts with an initial round followed by a number of standard rounds and it ends with

the final round. Only four different operations are necessary to compute these rounds and

a key schedule.

It is possible in Rijndael to use different keylengths according to the security level that is

required for the application. Rijndael is defined as a block cipher with key lengths of

128, 192 or 256 bits. The possible input block lengths are 128, 192 or 256 for the

Rijndael algorithm. The AES algorithm is exactly the same as the Rijndael algorithm,

but it only defines one block length of 128 bits.

The Rijndael algorithm is such that each bit is dependent on all bits from 2 rounds ago,

e.g. full diffusion is provided. The number of rounds that must be run is dependent on the

key length.

Key length (words) Number of rounds (Nr )

AES-128 4 10

AES-192 6 12

AES-256 8 14

Figure 5 Key length

3.2.1 AddRoundKey

The AddRoundKey operation is a simple EXOR operation between the State and the

RoundKey. The RoundKey is derived from the Cipherkey by means of the key schedule.

The State and RoundKey are of the same size and to obtain the next State an EXOR

operation is done per element:

s (i, j)= s(i, j) ⊕ w(i, j). (5)

where s is the current State, s the next State and w the round key.

8

Page 9: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Figure 6 AddRoundKey

3.2.2 Sub Bytes

The operation SubBytes is similar to the S-boxes used in the DES-algorithm. Rijndael

has only one S-box. The design criteria for the S-box are such that it is resistant against

the known differential and linear cryptanalysis and attack using algebraic manipulations.

Figure 7 The AES S Box

9

Page 10: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

3.2.3 Shift Rows

Figure 8 Shift Rows

In ShiftRows the rows of State are cyclically shifted with different offsets. Row 1 is

shifted over c1 bytes, row 2 over c2 bytes, and row 3 over c3 bytes. The values of c1, c2,

and c3 depend on the block length Nb:

3.2.4 MixColumns

Figure 9 MixColums

10

Page 11: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

The MixColumn transformation is an operation on the different columns.

To calculate the MixColumn transformation the columns of the current state are

considered as polynomials over GF(28).

3.2.5 Key schedule

The RoundKeys are derived from the CipherKey by means of a key schedule.

The number of RoundKeys necessary to encrypt one block of information depends on the

block length and key length as this determines the number of rounds. For a block length

of 128 bits, 11 RoundKeys (1 for initial round, 9 for standard rounds and 1 for the final

round) are needed.

Figure 10 KeySchedule

3.3 Process for Decryption

The Inverse Cipher:The inversion of the cipher code presented in section 3.2 is straightforward and is just the reverse process of encryption.

In this section, the whole reverse cipher of the state is described in detail. The state undergoes in a decryption sequence:

a. Inverse shift rowb. Inverse sub bytes transformation using S-Boxc. Inverse mix columnd. Inverse sub key

3.3.1 Inverse Shift row

11

Page 12: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

The inverse shift row step is just the inverse of the forward shift row operation used in the encryption process. The first row is kept as it is and the second row is right shifted one place, the third row is right shifted to two places and he fourth row is right shifted to three places. The inverse right shift operation is explained for Nc = Nb = 4 in figure 11.

S0,0 S0,1 S0,2 S0,3

S1,0 S1,1 S1,2 S1,3

S2,0 S2,1 S2,2 S2,3

S3,0 S3,1 S3,2 S3,3

Figure 11: Inverse shift row

3.3.2 Inverse Sub bytes transformation using S-Box

The inverse sub bytes transformation uses the inverse S-Box table provided in the figure 12.

Figure 12: Inverse S-Box table

3.3.3 Inverse Mix column

S0,0 S0,1 S0,2 S0,3

S1,3 S1,0 S1,1 S1,2

S2,2 S2,3 S2,0 S2,1

S3,1 S3.2 S3,3 S3,0

12

Page 13: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

The inverse mix column transformation acts independently on every column of the state.

Figure 13: Inverse mix column

3.3.4 Inverse Round Key

The inverse round key transformation is its own inverse. Each round key is generated for

each round.

4. Modules

4.1 Software Module

The software module consists of implementing the Advanced Encryption Standard on

Intel’s IXP 2850. The IXP 2850 consists of two cryptographic units having hardware

cores of AES, 3DES and SHA-1. It also consists of a SDK for implementing various

functionalities for Network Processing.

4.2 Intel’s IXP 2850

The newest member of the Intel® second generation network processor product family,

the Intel® IXP2850 network processor, delivers high-performance packet and content

processing with robust security features in a single platform. By integrating capabilities

that have typically required multiple specialized processors, the Intel IXP2850 provides a

secure and cost-effective platform that enables a broad range of emerging applications.

13

Page 14: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

4.3 IXP 2850 Cryptographic Unit

The cryptography unit performs authentication and bulk encryption. It is believed that

these two datapath tasks are critical strategic functions for the network processor. The

crypto engines are innovative designs that have a very small footprint, yet the two

engines provide 10Gb/s throughput performance.

14

Page 15: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

4.4 IXP SDK 3.0

The Developer Workbench is an integrated development environment for assembling,

compiling, linking, and debugging microcode that runs on the IXP2400, IXP2800, and

IXP2850 Network Processor Microengines. The Workbench is a Microsoft* Win32*

application that runs on Windows 2000* platforms.

Features

Important Workbench features include:

• Source level debugging.

• Debug-only project creation mode.

• Execution history.

• Statistics.

• Media Bus device and network traffic simulation for the Network Processors

• Command line interface to the Network Processor simulators (Transactors).

• Customizable graphical user interface (GUI) components.

Debugging support

The Workbench supports debugging in four different configurations:

• Local simulation with no foreign model, in which the Workbench and the Network

Processor simulator (Transactor) both run on the same Microsoft Windows* platform.

• Local simulation with local foreign models, in which the Workbench, the Transactor,

and one or more foreign model Dynamic-Link Libraries all run on the same Windows

platform.

15

Page 16: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

• Local simulation with a remote foreign model, in which the Workbench and the

Transactor both run on the same Windows platform and communicate over the network

with a foreign model running on a remote system.

• Hardware, in which the Workbench runs on a Windows host and communicates over a

network or a serial port with a subsystem containing actual Network Processors. (Not

currently available for IXP2800 Network Processors).

4.5 Implementing AES on IXP 2850

4.5.1 Design

Figure 14 AES design

16

Page 17: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

4.5.2 Overview

The software implementation encrypts encrypting and decrypting on a single microengine

using one thread. The steps for doing the above are as follows:

1. Plain text to be encrypted is written form the DRAM transfer registers into the

input RAM of the crypto unit.

2. The key is written into both crypto units.

3. The text is encrypted using crypto unit 0 and the resulting cipher text is written

into the DRAM read registers.

4. The cipher text is copied from the DRAM read register into the DRAM write

register and then is transferred to input RAM of the crypto unit 1.

5. The cipher text is decrypted and is then written into the DRAM read transfer

registers.

6. The text is then compared to the original text to see if the program works.

4.5.3 Software Functions

The following important functions of the SDK 3.0 API are used for the above

steps:

Step 1: Writing the plain text into crypto input RAMxbuf_alloc($$orig_plain_text, 16, write)

crypto_write_ram(

$$orig_plain_text[0],

DATA_RAM_ADDR,

8,

ENCRYPT_UNIT,

ram_sig)

ctx_arb[ram_sig]

Step 2: Loading the Keycrypto_load_key(

$$key[0],

3,

DECRYPT_UNIT,

17

Page 18: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

CRYPTO_BANK,

DECRYPT_STATE,

key_sig)

ctx_arb[iv_sig, key_sig]

Step 3: Encrypt crypto_cipher(

$$encrypt_data[0],

DATA_RAM_ADDR,

8,

CRYPTO_CIPHER_ENCRYPT,

CRYPTO_CIPHER_NO_CBC,

CRYPTO_CIPHER_AES_128,

ENCRYPT_UNIT,

CRYPTO_BANK,

ENCRYPT_STATE,

cipher_sig)

ctx_arb[cipher_sig]

Step: 4 Decryptcrypto_cipher(

$$new_plain_text[0],

DATA_RAM_ADDR,

8,

CRYPTO_CIPHER_DECRYPT,

CRYPTO_CIPHER_NO_CBC,

CRYPTO_CIPHER_AES_128,

DECRYPT_UNIT,

CRYPTO_BANK,

DECRYPT_STATE,

cipher_sig)

18

Page 19: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

4.5.4 Simulation Output

Figure 15 Original plain text

The above figure shows the simulation output’s data watch in which we can see the text

loaded into the Input RAM of the crypto unit.

19

Page 20: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Figure 13 Encrypted Text

The above figure shows the cipher text after encryption using AES 128 bit key.

20

Page 21: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Figure 14 Writing Encrypted text to DRAM

Figure 15 Decrypted text same as original plain text

21

Page 22: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5. Hardware Implementation of AES

5.1 Design Process

The hardware design process was divided into blocks and each block was handled in

order. The design started with the interface design defining the data and instruction flow

from AES co-processor to IXP-2400. After the interface is designed the encryption and

the decryption blocks are designed using VHDL and simulated to make sure the modules

are working. The VHDL modules are then fed to the synthesizer to generate netlist and

simulated again to make sure the module is producing expected results. After the netlist is

available the floor-planning and placement tools are used to layout the final chip.

5.2 Interface Design

The interface of the AES block is designed to allow data flow between the INTEL IXP

2400 board and the security co-processor. A block diagram of the hardware

implementation of the AES algorithm and the interface is shown below.

22

32

32

AES2pack

busy

P2AESburst

P2AESRnW

mptdataout

P2AESreq

mptaddr

mptdatain

Encryption

Decryption

CONTROL

PCI

Bus

Data_out

Key_in

Data_in

Data_out

Key_in

Data_in

32

128

128

128

128

128

128

Page 23: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Burst Write Handshake:

1- The Processor checks if busy line is high. If high (busy), The processor drops its

request. If busy is low (not busy), the processor then drives P2AESBURST high

requesting a burst write transfer ( 4 long words per burst write).

2- The processor drives a valid address on the address/data bus,P2AESRnW low,

then drives the data and pulses P2AESREQ high for one clock cycle signaling the

AES chip that a valid address/data is available on the bus.

3- The AES chip latches the address/data and wait for the second data until all four

long words are latched.

4- When the processor is done with its burst write transfer, it drives P2AESBURST

low signaling the end of the burst write cycle. The Burst write handshake

waveform is shown below.

Figure 16: Burst write Handshake

Burst Read Handshake:

1- The Processor checks if busy line is high. If high (busy), The processor drops its

request. If busy is low (not busy the AES has Encrypted/decrypted), the processor

then drives P2AESREQ and P2AESBURST high requesting a burst read transfer

(4 long words per burst read).

2- The processor drives a valid address on the address/data bus,P2AESRnW high,

then waits for the AES chip to place data on the address/data bus.

3- When the AES is ready, it drives the least significant word on the address/data

bus and pulses the AES2PACK high for one cycle signaling that a valid data is

now available on the bus.

23

Page 24: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

4- The processor then latches the first data word and wait for the following data.

5- The AES repeats step3 until all 128 bit data are transfer.

6- When all 128 bit data have been received, the processor drives the P2AESREQ

Low signaling the end of the burst read cycle. The Burst read handshake

waveform is shown below.

Figure 17: Burst Read Handshake

24

Page 25: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.3 Encryption Block

The encryption process for 128-bit data size goes through 10 rounds. The initial round

only adds the starting key and the input data and the result is the input of round 1. Round

1 through round 10 the starting data goes through sub-byte transformation, shift rows

transformation, mix column transformation and then added with the specific round key

generated for each round from previous round key. The flowchart of the top-level module

controlling the encryption block is shown in figure 18. All the modules are implemented

using VHDL and are provided in the appendix.

Figure 18: Top level AES encryption block

25

Page 26: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.3.1 Control

The state machine defined in the control module and shown in figure 19 controls the AES

encryption rounds. The round constants used to generate key schedule are different for

each round and is provided in the state machine.

5.3.2

Figure 19: Control Finite State Machine

26

Page 27: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.3.2 Rounds

Rijndael encryption goes through 10 rounds of encryption process. Each round the input

data is mapped into s-box, performed shift rows and mix column on it and then added to

the round keys. This operation is shown in the figure 20.

data_in

last_mux_sel

data_out

Figure 20: Block diagram of Round operation

The s-box transformation, shift rows and mix column are explained in details below and

also the simulation results are shown for each block.

27

S-Box

Shift Row

Mix Column

Add Round Key

Mux

Page 28: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

i.> Sub-Bytes Transformation

In the sub-bytes transformation the input data is mapped from the s-box table shown in

figure 7. For example if the input data is “193de3bea0f4e22b9ac68d2ae9f84808”, sub-

bytes transformation is done for each 8 bits of XY where the first 4 bits is ‘X’ and the last

4 bits is ‘Y’. In the given input data the first 8 bits is 19, where

X = 1 and Y = 9

From the table provided in figure 7, corresponding mapped data for 19 is d4. Similarly

for the next 8bits is mapped and so on. The complete transformation is shown in figure

21.

19 a0 9a e9

3d f4 c6 f8

e3 e2 8d 48

be 2b 2a 08

Figure 21: Sub-byte transformation

The mapped result is “d42711aee0bf98f1b8b45de51e415230”. The simulation result

shown in figure 22 shows that the “bytesub” result is “d42711aee0bf98f1b8b45de51

e415230”and hence confirms that the module for sub-bytes transformation is working.

d4 E0 B8 1e

27 Bf B4 41

11 98 5d 52

ae F1 E5 30

28

Page 29: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Figure 22: Simulation of rounds

ii> Shift Rows

After sub-byte transformation the shift rows operation is performed on the result. The

process of shift rows is the first row is kept as it is , the second row is shifted left once,

the third row is shifted left twice and the fourth row is shifted left thrice. For example if

the sub-byte output is “d42711aee0bf98f1b8b45de51 e415230” then the shift rows is

performed as shown in figure 23.

d4 E0 B8 1e

27 Bf B4 41

11 98 5d 52

ae F1 E5 30

Figure 23: Shift row transformation

The shifted result is “d4bf5d30e0b452aeb84111f177de961a”. The simulation result

shown in figure 22 shows that the “shiftrow” result as “d4bf5d30e0b452aeb84111

f177de961a”and hence confirms that the module for shift rows is working.

d4 E0 B8 1e

bf B4 41 27

5d 52 11 98

30 ae F1 F5

29

Page 30: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

iii> Mix Column

The result from shift row operation is multiplied with a given matrix in mix column

module. The operation is explained in figure 24.

04 E0 48 28

66 cb F8 06

81 19 D3 26

E5 9a 7a 4c

Figure 24: Mix Column operation

The result of mix column operation is “046681e5e0cb199a48f8d37a2806264c”. From

figure 22 it is seen that the “mixcolumn” result is “046681e5e0cb199a48f8d37

a2806264c” and hence confirms that the mix column module is working.

iv> Key addition

In this step the round key generated by the key schedule, explained next, is XOR-ed with

the mix column result.

5.3.3 Key Schedule

In encryption the key schedule is generated from the 128 bit key input. The block

diagram of the key scheduler is shown in figure 25.

d4 E0 B8 1e

27 Bf B4 41

11 98 5d 52

ae F1 E5 30

02 03 01 01

01 02 03 01

01 01 02 03

03 01 01 02

30

== X

Page 31: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

key_reg_mux_sel

load_key_reg

key_word(0)

key_word(1) key_word(2) key_word(3)

T

Figure 25: Key Scheduler

31

MUX

KEY REGISTER

XOR

XOR

XOR

XOR

XOR

SHIFT

S BOX 4

round_constant

user_keynext_key

key_word(3)

Page 32: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.3.4 Simulation Result

The encryption block is simulated with an input data and key length of 128 bits. The data

input data is entered as “3243f6a8885a308d313198a2e0370734” and the input key is

“2b7e151628aed2a6abf7158809cf4f3c”. The encrypted cipher output is

“3925841d02dc09fbdc118597196a0b32”. The result matches with the spec.

Figure 26. Encrypted result

32

Page 33: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.4 Decryption Block

The encryption process for 128-bit data size goes through 10 rounds. The initial round

only adds the starting key and the input data and the result is the input of round 1. Round

1 through round 10 the starting data goes through sub-byte transformation, shift rows

transformation, mix column transformation and then added with the specific round key

generated for each round from previous round key. The flowchart of the top-level module

controlling the encryption block is shown in figure 27. All the modules are implemented

using VHDL and are provided in the appendix.

Figure 27: Top level AES block controlling decryption

33

Page 34: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.4.1 Control

The state machine defined in the control module and shown in figure 28 controls the AES

encryption rounds. The round constants used to generate key schedule are different for

each round and is provided in the state machine.

Figure 28: Finite state machine

34

Load Inputs

INIT

Round10

Round9

Round8

Round7 Round6 Round5

Round4

Round3

Round2

Round1

Round 0round_const = 00110110data_reg_mux_sel=00key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00011011data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 10000000data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 01000000data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00100000data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00010000data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00001000data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00000100data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00000010data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

round_const = 00000001data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

last_mux_sel=1data_reg_mux_sel=01key_reg_mux_sel=1load_data_reg=1load_key_reg=1

data_reg_mux_sel=11key_reg_mux_sel=0load_data_reg=1load_key_reg=1

Decrypt=1

Page 35: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.4.2 Rounds

Rijndael decryption goes through 10 rounds of decryption process. Each round inverse

shift rows is performed and then it is mapped into inverse s-box. Round key is added with

the result and then inverse mix column operation is performed. This operation is shown

in the figure 29.

data_in

Last_mux_sel

data_out

Figure 29: Block diagram of Round operation

The inverse s-box transformation, inverse shift rows and inverse mix column are

explained in details below and also the simulation results are shown for each block.

i> Inverse Shift Rows

35

Inverse Shift Row

Inverse S Box

Add Round Key

Inv. Mix Column

Mux

Page 36: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

The inverse shift rows operation is performed on the input data. The process of inverse

shift rows is the first row is kept as it is , the second row is shifted right once, the third

row is shifted right twice and the fourth row is shifted right thrice. For example if the

input data is “e9317db5cb322c723d2e895faf090794” then the shift rows is performed as

shown in figure 30.

E9 Cb 3d af

31 32 2e 09

7d 2c 89 07

B5 72 5f 94

Figure 30: Inverse Shift row transformation

The shifted result is “e9098972cb31075f3d327d94af2e2cb5”. The simulation result

shown in figure 32 shows that the “shiftrow” result as “e9098972cb31075f3d327d94

af2e2cb5”and hence confirms that the module for inverse shift rows is working.

ii.> Inverse Sub-Bytes Transformation

In the inverse sub-bytes transformation the shifted data is mapped from the inverse s-box

table shown in figure 12. For example if the input data is

“e9098972cb31075f3d327d94af2e2cb5”, inverse sub-bytes transformation is done for

each 8 bits of XY where the first 4 bits is ‘X’ and the last 4 bits is ‘Y’. In the given input

data the first 8 bits is e9, where

X = e and Y = 9

From the table provided in figure 12, corresponding mapped data for e9 is eb. Similarly

for the next 8bits is mapped and so on. The complete transformation is shown in figure

31.

E9 Cb 3d af

09 31 32 2e

89 07 7d 2c

72 5f 94 B5

36

Page 37: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

E9 Cb 3d af

09 31 32 2e

89 07 7d 2c

72 5f 94 B5

Figure 31: Inverse Sub-byte transformation

The mapped result is “eb40f21e592e38848ba113e71bc342d2”. The simulation result

shown in figure xxx shows that the “bytesub” result is “eb40f21e592e38848ba113e71

bc342d2”and hence confirms that the module for inverse sub-bytes transformation is

working.

Figure 32: Simulation of rounds

iii> Key addition

Eb 59 8b 1b

40 2e A1 C3

F2 38 13 42

1e 84 E7 D2

37

Page 38: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

In this step the round key generated by the key schedule, explained next, is XOR-ed with

the mix column result.

iv> Inverse Mix Column

The result from shift row operation is multiplied with a given matrix in inverse mix

column module. The operation is explained in figure 33.

87 F2 4d 97

6e 4c 90 Ec

46 E7 4a C3

A6 8c D8 95

Figure 33: Inverse Mix Column operation

The result of mix column operation is “876e46a6f24ce78c4d904ad897ecc395”. From

figure xxx it is seen that the “mixcolumn” result is “876e46a6f24ce78c4d904ad897

ecc395” and hence confirms that the inverse mix column module is working.

5.4.3 Key Schedule

In decryption the key schedule is generation is the inverse of the encryption process. The

block diagram of the key scheduler is shown in figure 34.

47 40 A3 4c

37 D4 70 9f

94 E4 3a 42

Ed A5 A6 bc

0e 0b 0d 09

09 0e 0b 0d

0d 09 0e 0b

0b 0d 09 0e

38

= X

Page 39: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

key_reg_mux_sel

load_key_reg

key_word(0)

key_word(1) key_word(2) key_word(3)

T

Figure 34: Key Scheduler

39

MUX

KEY REGISTER

XOR

XOR

XOR

XOR

XOR

SHIFT

S BOX 4

round_constant

user_keynext_key

next_key_word(3)

Page 40: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

40

Page 41: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

5.4.4 Simulation Result

The decryption block is simulated with an input data and key length of 128 bits. The data

input data is entered as “3925841d02dc09fbdc118597196a0b32” and the input key is the

key generated at the round 10 of encryption process, which is “D014f9a8c9ee2589e13f0

cc8b6630ca6”. The decrypted cipher output is “3243f6a8885a308d313198a2e0370734”.

The result matches with the spec.

Figure 35. Decrypted result

41

Page 42: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

6 Synthesis of the AES Chip

The previous sections of the documentation dealt with the front end design of the chip.

After the RTL is done in VHDL, the next immediate step is to synthesize the chip. This

part of the document will walk you through the process of synthesis of chip.

6.1 Division of the Chip design work:The work of chip design was divided in the following manner:

Fig 6.1

ARCHITECTURE

Mr. Ali Bassam

“Partition of the chip in functional blocks and design of the interface”

FRONT END DESIGN

Ms. Nasreen AhmedMr. Shafaat Qureshi

“Enter the HDL code and verify the functionality”

BACK END DESIGN

Mr. Sanjit Kurup

“Synthesize, Floor plan, place and route, clock tree generation, physical verification”

FABRICATION

MOSIS

42

Page 43: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Architecture design:The step involves partitioning of the chip in the functional units. As seen in the hardware

section, the chip was divided in different functional blocks such as the interface,

Encryptor, Decryptor. Within each of these sections, there were several fundamental

divisions such as S-box, Shift row, mix column, add round key. This task is one of the

most challenging as it involves the clear understanding of the specification.

Front End Design: The task of the front end design team was to mainly design the state machine of the

individual blocks as per the specification. Based on the state machines or the flow charts,

RTL codes are written, and the language used is VHDL. These codes have to be tested

for their functionality using the simulator. The tool used in this case was NCSim from

Cadence and the results are as explained in the previous section.

Back End Design:The back end design involves the synthesis of the logic circuit, verifying the gate level

simulation, place and route and many other intricacies of the chip layout. This section of

the document explains each of these steps in detail, especially in the context of the AES

chip.

Fabrication:The AES chip is not yet fabricated for the reasons explained in the subsequent sections.

Assuming that the GDS2 file or CIF file was generated for the chip, the obvious choice of

fabrication would be the MOS Implementation Service (MOSIS).

Before going into the depths of the chip design, here is the flow chart for the different

steps and processes involved in the chip design and synthesis. As shown in the figure,

the portion of HDL entry and its functional verification is over. Next comes the synthesis

and the simulation of the netlist generated out of it.

43

Page 44: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.2

6.2 Synthesis of the chip:

The process of converting the RTL to the equivalent gate level circuit is called synthesis.

This is an automated process usually undertaken by the software being used. The

software used in this project is from Cadence Design System Inc. called Physically

Knowledgeable Synthesis (PKS). Fig 6.3 gives the snap shot of the Physically

knowledgeable synthesis tool.

44

Page 45: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.3

Amongst the different steps involved in the process of synthesis, the primary one is the

loading of the design libraries. The design library used is the gscLib2.0. This is a generic

library, and the chips synthesized using this library cannot be fabricated by any

commercial fabrication unit. The reason is obvious as the tools have been donated by

Cadence Inc. for education purpose only, and not for commercial manufacturability or IP

development.

The Technology Library Cell library contains the definition of the basic cells such as

Inverters and Buffer Cells of various strengths

Cells with basic logic functions

AND NAND AND_OR

OR NOR OR_AND

XOR XNOR MUX

Various input configurations of each

• Two, three or four input cells

• Example: AND2, AND3 and AND4 cells

45

Page 46: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

After the library is sourced, a generic netlist is generated. It involves the generation of the

gate level circuitry from the VHDL logic. The gates that are used are not specific to the

synthesis library. Instead, it’s generic, which is the reason it is called generic netlist.

To elaborate on the process, here is an example. Consider the following VHDL code:

In the above example, the VHDL code generates a multiplexer. This is the process of

generating the generic net list. However, it is not necessary that the Library used for the

actual synthesis will have all the basic gates defined in the generic netlist.

For example, consider the following example. If the generic netlist creates a 15 input and

gates, it is not necessary that the Technology library will have a 13 input AND gate.

Instead the library will have an AND gate with maximum input of 4. Therefore, there

needs to be a process which converts the 13 inputs AND gate into its equivalent netlist

containing 4 input AND gates. This is done in the second step of synthesis, called

Technology mapping, in which the generic netlist is redefined using the technology

library. The Fig 6.4 graphically explains the concept.

46

MUX

ain bin

cin

bout

process (ain,bin,cin)

begin

if cin = '1' then

bout <= ain;

else

bout <= bin;

end if;

end process;

Page 47: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.4

Defining the Constraint for the synthesis:

Before actually performing technology mapping, the constraint of synthesis for the chips

needs to be defined. This is defined in two stages. First is to define the .tcl file, where the

tcl stands for Tool Control Language. This is a script file will define the constraints to the

different input pins, clock pins and the output pins. The .tcl file content that was used to

synthesize the AES chip is as given below

set_global fanout_load_limit 20

set_clock clk -period 30.30 -waveform { 0 15.15 }

set_clock_root -clock clk clk

set_clock_uncertainty 0.150

set_external_delay -clock clk 2 [ get_names [ find -outputs ] ]

set_input_delay -clock clk 2 [ get_names [ find -inputs ] ]

set_clock_insertion_delay 0.1 {clk}

47

Technology MapGeneric Map

Page 48: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

As you will observe in the content of the .tcl file, the clock frequency is defined at 33

Mhz with a 50% duty cycle. Also, the clock uncertainty is 0.15 ns. The

set_external_delay defines all the other inputs and sets the input arrival delay at 2 ns.

Similarly, the outputs are also defined to have a delay of maximum 2ns.

Fig 6.5

The figure above shows the other constraints that have been defined during the process of

synthesis. As you will observe, there are no optimization defined for the synthesis. The

reason for not constraining the design is explained in the next paragraph.

After setting the parameters and defining the constraint files, the technology mapping is

done. The time required by the tool for synthesis is approximately 1 hour 15 minutes! For

commercial chips, synthesis has known to take days to complete. This is the primary

reason why the design was not defined to be optimized in order to save time during the

initial stages of synthesis.

After synthesis, the output looks like the screens given below. Fig 6.6

48

Page 49: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.6

49

Page 50: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.7

Fig 6.8

50

Page 51: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig. 6.9

Fig. 6.10

51

Page 52: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.11

The fig. 6.7 and 6.8 gives different views of the synthesized chip.

Synthesis Reports:There are several reports that are generated during the process of synthesis. It is

imperative to look into these reports to verify the operability of the synthesized circuit.

Fig 6.8 and Fig. 6.9 gives the summary of these reports. As observed, the worst slack is

14.506 for the 33 Mhz clock, which is significantly a good figure. The core utilization

and the cell area is also given. Fig 6.9 gives the details on the process technology.

One of the important reports is the timing report. Fig 6.10 gives the detailed timing

analysis. Although, it’s just a portion of the complete report, it gives the gist and can be

used to find the slack, setup time and hold time

52

Page 53: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Gate Level Simulation:

One of the reports generated during the process of synthesis is the verilog netlist. The file

has extension of .v and is used to verify the functionality of the synthesized logic. A

portion of the .v file generated for the AES chip is as given Fig 6.11

Fig 6.12

This verlog file is used to do the netlevel simulation. The test bench that was used for

testing the .VHDL file for functional simulation, the same file is used to simulate the test

vector for this verilog netlist. After running the simulation, and fixing a few bugs, the

verilog netlist generated the waveform, which was exactly the same as the VHDL file

shown in the previous section.

module control(reset, clk, encrypt, data_sel, load_data, key_sel, round_const, last_mux_sel, load_key);input reset;input clk;input encrypt;output [1:0] data_sel;output load_data;output key_sel;output [7:0] round_const;output last_mux_sel;output load_key;wire [3:0] state;

BUFX3 i_16753(.A(\nbus_135[1] ), .Y(data_sel[1]));BUFX3 i_16717(.A(n_4876), .Y(last_mux_sel));BUFX3 i_16705(.A(n_4861), .Y(load_data));BUFX3 i_16679(.A(n_4831), .Y(key_sel));BUFX3 i_16667(.A(n_4816), .Y(load_key));NAND2X1 i_49(.A(n_60), .B(n_69), .Y(n_50));OAI21X1 i_2(.A0(n_57), .A1(n_58), .B0(n_68), .Y(n_49));NAND2X1 i_3(.A(n_44), .B(n_67), .Y(n_47));NAND2X1 i_41(.A(n_60), .B(n_66), .Y(n_44));AOI21X1 i_4(.A0(state[3]), .A1(n_56), .B0(n_41), .Y(n_43));MX2X1 i_19(.A(state[1]), .B(n_61), .S0(state[2]), .Y(n_42));AOI21X1 i_37(.A0(n_59), .A1(n_57), .B0(n_42), .Y(n_41));

53

Page 54: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Place and Route of the Chip:The System On Chip (SoC) Encounter tool from Cadence Design System Inc. has been

used to Place and Route the chip. The snap shot in Fig 6.12 and Fig 6.13

Fig 6.13

54

Page 55: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.14

The IO Pad with buffers has not been designed, which is the reason the GDS2 or CIF file

has not been generated. This will be done shortly and the document will be upgraded.

Fig 6.14 gives the summary of the chip in short.

55

Page 56: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Fig 6.15

56

Page 57: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

Conclusions

1. The entire code running on one microengine in a single thread

environment took 1185 microengine cycles. Further optimization is possible by

applying parallelism and using more microengines.

2. The crypto units of the IXP 2850 operate at 700 MHz providing high-

speed bulk encryption and decryption. All the symmetric key encryption is

offloaded to the crypto units, which also are used to calculate message digests,

message authentication checks and checksums. The crypto units can perform bulk

of packet transformations at high speeds freeing the microengines to perform

other packet processing tasks.

3. The Intel SSDK 3.0 provides an excellent development environment with lot

of built in functions. It has a good programming environment and debugging

interface.

4. The VHDL implementation of the AES algorithm was successful. Both the

encryption and decryption blocks are designed and simulated and the results

are as expected.

5. The AES chip was successfully synthesized using different set of tools from

Cadence Design System Inc. The netlist generated out of synthesis was

successfully tested for its functionality. Also, the place and route of the chip

was completed.

57

Page 58: 5 · Web viewkey_word(0) key_word(1) key_word(2) key_word(3) T Figure 34: Key Scheduler Simulation Result The decryption block is simulated with an input data and key length …

7. References:

Daemen J, Rijmen V; “AES Proposal: Rijndael” 1999

“Security: Adding protection to network via the network processor”; Intel

technology journal; 2002

“Network processors” Intel technology journal; 2002

“Intel IXP2850 Network processor programmer reference model”; Feb

2003.

“Intel IXP2400/IXP2800 Network processor development”; Feb 2003.

“Intel IXP2400/IXP2800 Network processor programmer reference

model”; Feb 2003

“Intel integrated performance primitives for Intel architecture Reference model Volume 4 Cryptography;2003

B. Gladman, “The AES Algorithm (Rijndael) in C and C++, performance

of the optimized implementation,” from

http://fp.gladman.plus.com/cryptography_technology/rijndael/index.htm

System on Chip CECS 460 B materials from http://crete.cadence.com

58