final data

CHAPTER 1

INTRODUCTION

Encryption has become a significant aspect of all types of communication

networks. It provides safe transmission through insecure path and also prevents data

from being attacked by the offenders who try to intimidate the privacy and

confidentiality of our critical data. So a reliable and secure transmission is one of

today's challenges. After evaluation of five years, National Institute of Standards and

Technology (NIST) declared Rijndael cryptographic algorithm as the AES in October

2000. We choose to work on an FPGA, mainly because of its reconfigurability. A

reconfigurable device is of major advantage when used for a cryptographic algorithm.

It allows changes to be made in the design with minimum additional cost or time

spent.

The AES is a block cipher which operates on a 4x4 array of bytes, termed the

'state'. For encryption, each round of AES consists of four stages except the last round

in which MixColumns transformation is excluded.

1.1Methodology

In this work we have developed an iterative architecture as shown in the Fig.

1.1.In our implementation of the AES, the required intermediate states are fed back

through a MUX to the encryption core. The 2x 1 MUX selects the input data for the

initial round and then the feed-back data for all remaining rounds. The 4xl MUX

selects the AddRound Key output for the last round, the Shift Rows output for the

second last round and the Mix Columns output for all the other rounds. Each stage was

implemented and tested as an individual module. Then instantiations of these modules

were used in the main code of our design. The resources used for the first round are

then reused for all subsequent rounds, thus saving on device resources significantly.

Sub Bytes

There are two ways of implementing the Sub Bytes round. First is to construct

the S-box by performing two transformations. These include taking a multiplicative

inverse in the finite field GF (28), and applying a standard affine transformation (over

GF (28)). The second method is to directly store precalculated S-box in the ROM. The

possible memory storage techniques may be configuring the FPGA slices as

1

distributed RAM or using embedded BRAMs of the device. When S-box is

implemented using LUI as RAM, utilizes 75% of the resources. Our Sub Bytes round

has been implemented using the embedded BRAMs in dual port configuration and

Read Only mode. By using BRAMs we have avoided all the calculations and extra

delay needed for determining the S-box values. The state consists of 16 bytes, which

have to be replaced from the S-box and thus, 16 S-boxes are required for every round.

One S-box has 256 entries and so takes only 2Kb for storage. In our design two S-

boxes are stored per BRAM. Thus, 8 BRAMs are needed for each round.

Figure 1.1.AES Encryption Architecture

Shift Rows

Shift Rows transformation is a simple operation of shifting each row by a fixed

number of steps. We achieved this by simply assigning the bytes from input (of Shift

Rows round) state to output state.

Mix Columns

The Mix Columns transformation is done by multiplication modulo X4+ 1, in

the Galois field (28). The input state of this round is multiplied by a constant matrix to

obtain the output state. For encryption, multiplication of the bytes is done with the

2

constants 2 or 3 only. Multiplication with 2 is done by left shift of each byte.

Multiplication with 3 is done as a sum of the byte and its product with 2.

Add RoundKey

In the AddRoundKey transformation, the already calculated round keys are

stored in registers. For every round, the key is accessed and bitwise XOR operation is

done with the state.

The Field Programmable Gate Array

The FPGA being a reprogrammable device combines the flexibility of software

with the speed of hardware. Latest FPGAs offer special features such as math

functions (comparators, counters, trigonometric functions etc), embedded memories

and storage elements etc. These features make the design of cryptographic cores easy.

It provides a reasonably cheap solution for designing and implementing various

algorithms. That is why our encryption core is based on an FPGA device.

1.2 Organization of work

Chapter 2 gives a concise introduction of VHDL and its design synthesis.

Chapter 3 deals with the notations, acronyms, algorithm parameters, symbols,

functions and mathematical preliminaries involved in AES.

Chapter 4 gives a concise introduction of AES and its four transformations and

also implementation issues involved in AES.

Chapter 5 gives brief introduction of Xilinx and illustrates FPGA programming

process.

Chapter 6 provides complete code for implementation of AES encryption.

Chapter 7 gives the results of simulation & synthesis.

Chapter 8 deals with conclusion and future scope.

3

CHAPTER 2

VHDL AND DESIGN SYNTHESIS

2.1 Use of VHDL tools in VLSI design

IC designers are always looking for a ways to increase their productivity

without degrading the quality of their designs. Therefore, it is no wonder that they

have embraced logic synthesis tools. In the last few years, these tools have grown to be

capable for producing designs as good as a human designer. Now logic synthesis is

helping to bring about a switch to design using a hardware description language

(HDL) to describe the structure and behavior of circuits, as evidenced by the recent

availability of logic synthesis tools using the very high speed integrated circuit

hardware description language (VHDL).

Now logic synthesis tools can automatically produce a gate level net list,

allowing designers to formulate their design in a high level description such as

VHDL.Logic synthesis provided two fundamental capabilities. Automatic translation

of high-level descriptions into logic designs, and optimization to decrease the circuit’s

area and increase its speed. Many designs created manually, in terms of chip area

occupied and ic signal speed, but are much faster to do.

The ability to translate a high level description into a net list automatically can

improve design efficiency markedly. It quickly gives designers an accurate estimate of

their logic potential speed and chip real estate needs. In addition, designers can quickly

implement a verify of architectural choices and compare area and speed

characteristics. In a design methodology based on synthesis, the designer begins by

describing a design’s behavior in high level code, capturing its intended functionality

rather than its implementation. Once the functionality has been thoroughly verified

through simulation, the designer reformulates the design in terms of large structural

blocks such as registers, arithmetic units, storage registers, and combinational logic

typically constitutes only about 20% of a chip’s area, creating it can easily absorb 80%

of the gate level design time. The resulting description is called register transfer

4

level(RTL) since the equation describes how data is transferred from one register to

another.

In logic synthesis process, the tool’s first step is to minimize the logical

equations complexity and hence size by finding common terms that can be used

repeatedly. In a translation step called technology mapping, the minimized equations

are mapped into a set of gates the non-synthesized portions of the logic are also

mapped into a technology specific integrated circuit (ASIC) vendor library in which to

implement the chip, so that the logic synthesis tool may efficiently apply the gates

available in that library.

The primary consideration in the entire synthesis process is the quality of the

resulting circuit. Quality in logic synthesis is measured by how close the circuit comes

to meeting the designer speed, chip area and power goals. These goals can apply to

the entire IC or the portions of the logic. Logic synthesis has achieved its greatest

success of synchronous designs that have significant amounts of combinational logic.

Asynchronous designs require that designers formulate timing constraints explicitly.

Unlike the behavior of synchronous designs is not affected by events such as the

arrival of signals.

The ability to steer the synthesis process towards various solutions allows

designers to implement rapidly many versions of a circuit and choose the solution best

suited for their specific situation. Designers can therefore explore their options in a

way that has not been practical before. The ability of synthesis tools to synthesize

sequential logic and optimize it in any chosen technology makes designs quickly as

improved technologies become available and to try out a circuit in several

technologies and then choose the best one. Furthermore, modifying logic manually is

tedious and takes a great deal of time, so those human designers do as little of it as

possible. Synthesis tools, on others hand make many process overall the possible logic

combinations of a circuit. Tight integration of timing analysis within the optimization

algorithms enables some synthesis system to quickly find circuit critical paths – the

path determines the overall clock rate and re-optimize when necessary. Thus in a

fraction of time it would take a designer to do one manual version, the tools iterate

through many solutions to determine the best one. When a designer starts a synthesis

process by translating an RTL description into a netlist, the synthesis tools must first

5

be able to understand the RTL description. A number of languages known as the

hardware description languages (HDLs) have been developed for this purpose. HDL

Statements can be describing circuits in terms of the structure of the structures or

behavior or both. One reason HDLs are so powerful, in fact is that they support both a

variety of design description. A HDL simulator handles all those descriptions,

applying the same simulation and test vectors from the design behavioral level all the

way down to the gate level. This integrated approach reduces the problems that can

result from different descriptions of the same design. As logic synthesis matures, it

will allow designers to concentrate less on the details of the circuit and more on its

actual function and behavior. Logic synthesis tools are becoming capable of more

behavioral- level tasks, such as synthesizing sequential logic and deciding if and where

the storage elements are needed in a design. Existing logic synthesis tools are moving

up the designer ladder, while behavioral research is extending down to the RTL level.

Eventually they will merge, given designers a complete set of tools to automate

designs from concept to layout.

2.2 Scope of VHDL

VHDL satisfies all the requirements for the hierarchical description of

electronic circuits from system level down to switch level. It can support all levels of

timing specification and constraints and is capable of detecting and signaling timing

violations. The language models the reality of concurrency present in digital system

and supports the recursively of finite state machines. The concept of packages and

configurations allow the creation of design libraries for the reuse of previously

designed parts.

2.3 WHY VHDL?

A design engineer in electronic industry used hardware description language to keep

pace with the productivity of the competitors. With VHDL we can quickly describe

the capability described as follows

Power and flexibility

VHDL has powerful language constructs with which to write code descriptions of

complex control logic. It also has multiple levels of design description for controlling

6

design implementation. It supports design libraries and creation of reusable language

for design and simulation.

Devices- independent design

VHDL permits to create a design with out to first choose a device implementation.

With one design description, we can target many device architectures. Without being

familiar with it, we can optimize our design for resource utilization performance. It

permits multiple style of design description.

Portability

VHDL portability permits to simulate the same design description that we have

synthesized. Simulating a large description before synthesizing can save considerable

time. As VHDL is a standard, design description can be taken from one simulator to

another, one synthesis tool to another, and one platform to another means design

description can be used in multiple projects.

Benchmarking capabilities

Device independent design and portability allows benchmarking a design using

different device architectures and different synthesis tools. We can take a completed

design description and synthesize it, create logic for it, evaluate the results and finally

choose the device-a CPLD (complex programmable logic device) or an FPGA that

best fits our design requirements.

ASIC Migration The efficiency that VHDL generated, allows our product to hit the market

quickly if it has been synthesized on a CPLD or FPGA. When production volume

reaches appropriate levels, VHDL facilitates the development of application specific

integrated circuit (ASIC). Sometimes, the exact code used with the PLD can be used

with the ASIC and because VHDL is a well –defined language, we can be assured that

ASIC vendor will deliver a device with expected functionality.

Quick Time-to-Market and low cost

VHDL and programmable logic pair will together facilitate a speedy design

process. VHDL permits to be described quickly. Programmable logic eliminates NRE

expenses and facilitates quick design iterations. Synthesis makes it all possible. VHDL

7

and programmable logic as powerful vehicle to bring the products in market record

time.

The design process can be explained in six steps

1. Define the design requirements.

2. Describe the design in VHDL (formulate and code the design).

3. Simulate the source code.

4. Synthesis, optimize and fit the design on to a suitable device.

5. Simulate the post-layout design model.

6. Progress the device.

2.4 Define the design requirements

Before launching into writing code for our design, we must have a clear idea of

design objective and requirements. That is, the function of the design required setup

and clock-to-output times, maximum frequency of operation and critical paths.

2.5 Describe the design in VHDL

Formulate the design Having an idea of design requirements, we have to write an

efficient code that is realized, through synthesis, to the logic implementation we

intended.

Code the design After deciding upon a design methodology, we should code the

design referring to the block, dataflow, and state diagrams such that the code is

syntactically and semantically correct.

2.6 Stimulate the source code

With source code simulation, faults can be detected early in the design cycle,

allowing us to make corrections with the least possible impact on the schedule. This is

more efficient for larger designs, for which synthesis and lace and route can take a

couple of hours.

8

2.7 Synthesis, Optimize, and Fit the design

Synthesis

It is a process by which net lists or equations are created from design

descriptions, which may be abstracted. VHDL synthesis software tools convert VHDL

descriptions to technology specific net lists or set of equations.

Optimization

The optimization process depends on three things the form of the Boolean

expression, the type of resources available and automatic or used applied synthesis

directives (sometimes called constraints). Optimization for CPLDs involves reducing

the logic to minimal sum-of- products, which is then further optimized for a minimal

literal count. This reduces the product-term utilization and number of logic block

inputs required for any given expression.

Fitting

Fitting is process of taking the logic produced by the synthesis and optimization

process, and placing it into a logic device, transforming the logic (if necessary) to

obtain the best fit. It is a term typically used to describe the process of allocating

resources for CPLD-type architectures.

2.8 Simulate the Post-layout design model A post layout simulation will enable us to verify, not only the functionality of

our design, but also timing, such as setup, clock-to-output, and register-to-register

times. If we are unable to meet our design objectives, then we need to either re-

synthesize, and/or fit our design to a new logic device.

2.9 Program the device

After completing the design description, synthesizing, optimizing, fitting, and

successfully simulating our design, we are ready to program our device and continue

9

work on the rest of our system design. The synthesis, optimization, and fitting

software will produce a file for use in programming the device.

CHAPTER 3

Notations and Mathematical Preliminaries for AES

3.1 Glossary of Terms and Acronyms

The following definitions are used throughout this standard

AES Advanced Encryption Standard

Affine A transformation consisting of multiplication by a matrix

followed by Transformation the addition of a vector.

Array An enumerated collection of identical entities (e.g., an array of

bytes).

Bit A binary digit having a value of 0 or 1.

Block Sequence of binary bits that comprise the input, output, State,

and Round Key. The length of a sequence is the number of bits

it contains Blocks are also interpreted as arrays of bytes.

Byte A group of eight bits that is treated either as a single entity or as

an array of 8 individual bits.

Cipher Series of transformations that converts plaintext to cipher text

using Cipher Key Secret, cryptographic key that is used by the

key Expansion routine to generate a set of Round Keys; can be

pictured as a rectangular array of bytes, having four rows and

Nk columns.

Ciphertext Data output from the Cipher or input to the Inverse Cipher.

Key Expansion Routine used to generate a series of Round Keys from the

Cipher Key.

Plaintext Data input to the Cipher or output from the Inverse Cipher.

Rijndael Cryptographic algorithm specified in this Advanced Encryption

Standard (AES).

Round Key Round keys are values derived from the Cipher Key using the

Key Expansion routine; they are applied to the State in the

Cipher.

10

State Intermediate Cipher result that can be pictured as a rectangular

array of bytes, having four rows and Nb columns.

S-box Non-linear substitution table used in several byte substitution

transformations and in the Key Expansion routine to perform a

onefor-one substitution of a byte value.

Word A group of 32 bits that is treated either as a single entity or as an

array of 4 bytes.

3.2 Algorithm Parameters, Symbols and Functions

The following algorithm parameters, symbols and functions are used throughout this

standard

AddRoundKey() Transformation in the Cipher and Inverse Cipher in which a

Round Key is added to the State using an XOR operation. The

length of Round Key equals the size of the State (i.e., for Nb =

4, the Round Key length equals 128 bits/16 bytes).

K Cipher Key.

MixColumns() Transformation in the Cipher that takes all of the columns of the

State and mixes their data (independently of one another) to

produce new columns.

Nb Number of columns (32-bit words) comprising the State. For

this standard, Nb = 4.

Nk Number of 32-bit words comprising the Cipher Key. For this

standard, Nk = 4, 6, or 8.

Nr Number of rounds, which is a function of Nk and Nb (which is

fixed). For this standard, Nr = 10, 12, or 14.

Rcon[] The round constant word array.

RotWord() Function used in the Key Expansion routine that takes a four-

byte word and performs a cyclic permutation.

ShiftRows() Transformation in the Cipher that processes the State by

cyclically shifting the last three rows of the State by different

offsets.

11

SubBytes() Transformation in the Cipher that processes the State using a

nonlinear byte substitution table (S-box) that operates on each

of the State bytes independently.

SubWord() Function used in the Key Expansion routine that takes a four-

byte input word and applies an S-box to each of the four bytes

to produce an output word.

XOR Exclusive-OR operation.

Exclusive-OR operation.

Multiplication of two polynomials (each with degree < 4)

modulo x4 + 1.

Finite field multiplication.

3.3 Notation and Conventions

3.3.1 Inputs and Outputs

The input and output for the AES algorithm each consist of sequences of 128

bits (digits with values of 0 or 1). These sequences will sometimes be referred to as

blocks and the number of bits they contain will be referred to as their length. The

Cipher Key for the AES algorithm is a sequence of 128, 192 or 256 bits. Other input,

output and Cipher Key lengths are not permitted by this standard.

The bits within such sequences will be numbered starting at zero and ending at

one less than the sequence length (block length or key length). The number i attached

to a bit is known as its index and will be in one of the ranges 0 i < 128, 0 i < 192

or 0 i < 256 depending on the block length and key length (specified above).

3.3.2 Bytes

The basic unit for processing in the AES algorithm is a byte, a sequence of

eight bits treated as a single entity. The input, output and Cipher Key are processed as

arrays of bytes that are formed by dividing these sequences into groups of eight

contiguous bits to form arrays of bytes. For an input, output or Cipher Key denoted by

a, the bytes in the resulting array will be referenced using one of the two forms, an or

a[n], where n will be in one of the following ranges

Key length = 128 bits, 0 n < 16; Block length = 128 bits, 0 n < 16;

Key length = 192 bits, 0 n < 24;

12

Key length = 256 bits, 0 n < 32.

All byte values in the AES algorithm will be presented as the concatenation of

its individual bit values (0 or 1) between braces in the order {b7, b6, b5, b4, b3, b2, b1,

b0}. These bytes are interpreted as finite field elements using a polynomial

representation

(3.1)

Hence the element {01100011} can be represented as {63}, where the

character denoting the four-bit group containing the higher numbered bits is again to

the left. Some finite field operations involve one additional bit (b8) to the left of an 8-

bit byte. Where this extra bit is present, it will appear as ‘{01}’ immediately preceding

the 8-bit byte; for example, a 9-bit sequence will be presented as {01}{1b}.

3.3.3 Arrays of Bytes

Arrays of bytes will be represented in the following form

a0, a1, a2 ...a15

The bytes and the bit ordering within bytes are derived from the 128-bit input

sequence

input0 input1 input2 … input126 input127

as follows

a0 = {input0, input1, …, input7};

a1 = {input8, input9, …, input15};

a15 = {input120, input121, …, input127}.

The pattern can be extended to longer sequences (i.e., for 192- and 256-bit keys), so

that, in

general,

an = {input8n, input8n+1, …, input8n+7} (3.2)

Figure 3.1 Indices for Bytes and Bits.

13

3.3.4 The State

Internally, the AES algorithm’s operations are performed on a two-dimensional

array of bytes called the State. The State consists of four rows of bytes, each

containing Nb bytes, where Nb is the block length divided by 32. In the State array

denoted by the symbol s, each individual byte has two indices, with its row number r

in the range 0 r < 4. and its column number c in the range 0 c < Nb. This allows

an individual byte of the State to be referred to as either sr,c or s[r,c]. For this standard,

Nb=4, i.e., 0 c < 4.

The input – the array of bytes in0, in1, … in15 – is copied into the State array .

The Cipher or Inverse Cipher operations are then conducted on this State array, after

which its final value is copied to the output – the array of bytes out0, out1, … out15.

Figure 3.2 State array input and output.

Hence, at the beginning of the Cipher or Inverse Cipher, the input array, in, is copied

to the State array according to the scheme

s[r, c] = in[r + 4c] for 0 r < 4 and 0 c < Nb, (3.3)

and at the end of the Cipher and Inverse Cipher, the State is copied to the output array

out as follows

out[r + 4c] = s[r, c] for 0 r < 4 and 0 c < Nb.. (3.4)

3.3.5 The State as an Array of Columns

The four bytes in each column of the State array form 32-bit words, where the

row number r provides an index for the four bytes within each word. The state can

hence be interpreted as a one-dimensional array of 32 bit words (columns), w0...w3,

where the column number c provides an index into this array. Hence, for the example

in Fig. 3.2, the State can be considered as an array of four words, as follows

w0 = s0,0 s1,0 s2,0 s3,0 w2 = s0,2 s1,2 s2,2 s3,2

w1 = s0,1 s1,1 s2,1 s3,1 w3 = s0,3 s1,3 s2,3 s3,3 . (3.5)

14

3.4 Mathematical Preliminaries

All bytes in the AES algorithm are interpreted as finite field elements using the

notations. Finite field elements can be added and multiplied, but these operations are

different from those used for numbers. The following subsections introduce the basic

mathematical concepts

3.4.1 Addition

The addition of two elements in a finite field is achieved by “adding” the

coefficients for the corresponding powers in the polynomials for the two elements.

The addition is performed with the XOR operation (denoted by ) - i.e., modulo 2 - so

that 11 = 0, 10 = 1, and 00 = 0. Consequently, subtraction of polynomials is

identical to addition of polynomials.

Alternatively, addition of finite field elements can be described as the modulo

2 addition of corresponding bits in the byte. For two bytes {a7a6a5a4a3a2a1a0} and

{b7b6b5b4b3b2b1b0}, the sum is {c7c6c5c4c3c2c1c0}, where each ci = ai bi

(i.e., c7 = a7 b7, c6 = a6 b6, ...c0 = a0 b0).

For example, the following expressions are equivalent to one another

(x6 +x4 +x2 +x +1) + (x7 +x +1) = x7 +x6 +x4 +x2 (polynomial notation);

{01010111} {10000011} = {11010100} (binary notation);

{57} {83} = {d4} (hexadecimal notation).

3.4.2 Multiplication

In the polynomial representation, multiplication in GF(28) (denoted by )

corresponds with the multiplication of polynomials modulo an irreducible polynomial

of degree 8. A polynomial is irreducible if its only divisors are one and itself. For the

AES algorithm, this irreducible polynomial is

m(x) x8 x 4x3x 1, (3.6)

or {01}{1b} in hexadecimal notation.

For example, {57} {83} = {c1}, because

(x 6 x 4x 2 x +1) (x 7 x 1) = x13x11 x9 x8 x7 x7 x5 x3 x 2 x x6x 4 x2

+ x+1

= x13 x11 x9 x8 x6 x5 x 4 x3 1

15

and

x13 x11 x9 x8 x6 x5 x 4 x3 1 modulo ( x8 x 4 x3 x 1)

=x7 x 6 1.

The modular reduction by m(x) ensures that the result will be a binary

polynomial of degree less than 8, and thus can be represented by a byte. Unlike

addition, there is no simple operation at the byte level that corresponds to this

multiplication.

The multiplication defined above is associative, and the element {01} is the

multiplicative identity. For any non-zero binary polynomial b(x) of degree less than 8,

the multiplicative inverse of b(x), denoted b-1(x), can be found as follows the extended

Euclidean algorithm is used to compute polynomials a(x) and c(x) such that

b(x)a(x) m(x)c(x) 1 (3.7)

Hence, a(x) b(x)mod m(x) 1, which means

b-1 (x) a(x)mod m(x) (3.8)

Moreover, for any a(x), b(x) and c(x) in the field, it holds that

a(x) (b(x) c(x)) a(x) b(x) a(x) c(x) .

It follows that the set of 256 possible byte values, with XOR used as addition and the

multiplication defined as above, has the structure of the finite field GF(28).

Multiplication by x

Multiplying the binary polynomial defined in equation (3.1) with the

polynomial x results in

(3.9)

The result x b(x) is obtained by reducing the above result modulo m(x), as

defined in equation (3.6). If b7 = 0, the result is already in reduced form. If b7 = 1, the

reduction is accomplished by subtracting (i.e., XORing) the polynomial m(x). It

follows that multiplication by x (i.e., {00000010} or {02}) can be implemented at the

byte level as a left shift and a subsequent conditional bitwise XOR with {1b}. This

operation on bytes is denoted by xtime(). Multiplication by higher powers of x can be

implemented by repeated application of xtime(). By adding intermediate results,

multiplication by any constant can be implemented.

For example, {57} {13} = {fe} because

{57}{02} = xtime({57}) = {ae}

{57} {04} = xtime({ae}) = {47}

16

{57} {08} = xtime({47}) = {8e}

{57} {10} = xtime({8e}) = {07},

thus,

{57} {13} = {57} ({01} {02} {10})

= {57} {ae} {07}

= {fe}.

3.4.3 Polynomials with Coefficients in GF(28)

Four-term polynomials can be defined - with coefficients that are finite field

elements - as

a(x) = a3x3+a2x2+a1x+a0 (3.10)

which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the

polynomials in this section behave somewhat differently than the polynomials used in

the definition of finite field elements, even though both types of polynomials use the

same indeterminate, x. The coefficients in this section are themselves finite field

elements, i.e., bytes, instead of bits; also, the multiplication of four-term polynomials

uses a different reduction polynomial, defined below. The distinction should always

be clear from the context. To illustrate the addition and multiplication operations, let

b(x) = b3x3+b2x2+b1x+b0 (3.11)

define a second four-term polynomial. Addition is performed by adding the finite field

coefficients of like powers of x. This addition corresponds to an XOR operation

between the corresponding bytes in each of the words – in other words, the XOR of

the complete word values. X6

Thus, using the equations of (3.10) and (3.11),

(3.12)

Multiplication is achieved in two steps. In the first step, the polynomial product

c(x) = a(x) b(x) is algebraically expanded, and like powers are collected to give

(3.13)

Where

17

The result, c(x), does not represent a four-byte word. Therefore, the second

step of the multiplication is to reduce c(x) modulo a polynomial of degree 4; the result

can be reduced to a polynomial of degree less than 4. For the AES algorithm, this is

accomplished with the polynomial x4 + 1, so that

xi mod(x4 1) ximod 4 (3.14)

The modular product of a(x) and b(x), denoted by a(x) b(x), is given by the four-term

polynomial d(x), defined as follows

(3.15)

with

(3.16)

When a(x) is a fixed polynomial, the operation defined in equation (3.16) can be

written in matrix form as

(3.17)

18

Because x4 +1 is not an irreducible polynomial over GF(28), multiplication by a

fixed four-term polynomial is not necessarily invertible. However, the AES algorithm

specifies a fixed four-term polynomial that does have an inverse

a(x) = {03}x3 + {01}x2 + {01}x + {02} (3.18)

a-1(x) = {0b}x3 + {0d}x2 + {09}x + {0e}. (3.19)

Another polynomial used in the AES algorithm (see the RotWord() function ) has

a0 = a1 = a2 = {00} and a3 = {01}, which is the polynomial x3. Inspection of equation

(3.17) above will show that its effect is to form the output word by rotating bytes in

the input word. This means that [b0, b1, b2, b3] is transformed into [b1, b2, b3, b0].

19

CHAPTER-4

Advanced Encryption Standard(AES)

In cryptography, the Advanced Encryption Standard (AES) is an encryption

standard adopted by the U.S. government. The standard comprises three block ciphers,

AES-128, AES-192 and AES-256, adopted from a larger collection originally

published as Rijndael. Each AES cipher has a 128-bit block size, with key sizes of

128, 192 and 256 bits, respectively. The AES ciphers have been analyzed extensively

and are now used worldwide, as was the case with its predecessor, the Data Encryption

Standard (DES).

AES is based on a design principle known as a Substitution permutation

network. It is fast in both software and hardware.Unlike its predecessor, DES, AES

does not use a Feistel network. AES has a fixed block size of 128 bits and a key size of

128, 192, or 256 bits, whereas Rijndael can be specified with block and key sizes in

any multiple of 32 bits, with a minimum of 128 bits and a maximum of 256 bits. AES

operates on a 4×4 array of bytes, termed the state (versions of Rijndael with a larger

block size have additional columns in the state). Most AES calculations are done in a

special finite field. The AES cipher is specified as a number of repetitions of

transformation rounds that convert the input plaintext into the final output of

ciphertext. Each round consists of several processing steps, including one that depends

on the encryption key. A set of reverse rounds are applied to transform ciphertext back

into the original plaintext using the same encryption key.

AES was announced by National Institute of Standards and Technology

(NIST) as U.S. FIPS PUB 197 (FIPS 197) on November 26, 2001 after a 5-year

standardization process in which fifteen competing designs were presented and

evaluated before Rijndael was selected as the most suitable. It became effective as a

Federal government standard on May 26, 2002 after approval by the Secretary of

Commerce. It is available in many different encryption packages. AES is the first

publicly accessible and open cipher approved by the NSA for top secret information.

The Rijndael cipher was developed by two Belgian cryptographers, Joan

Daemen and Vincent Rijmen, and submitted by them to the AES selection process.

Rijndael (pronounced [rɛindaːl]) is a portmanteau of the names of the two inventors.

20

For the AES algorithm, the length of the input block, the output block and the State is

128 bits. This is represented by Nb = 4, which reflects the number of 32-bit words

(number of columns) in the State.

Figure 4.1 AES encryption structure

For the AES algorithm, the length of the Cipher Key, K, is 128, 192, or 256

bits. The key length is represented by Nk = 4, 6, or 8, which reflects the number of 32-

bit words (number of columns) in the Cipher Key. For the AES algorithm, the number

of rounds to be performed during the execution of the algorithm is dependent on the

21

key size. The number of rounds is represented by Nr, where Nr = 10 when Nk = 4, Nr

= 12 when Nk = 6, and Nr = 14 when Nk = 8.

Figure 4.2. Key-Block-Round Combinations.

For both its Cipher and Inverse Cipher, the AES algorithm uses a round function that

is composed of four different byte-oriented transformations

1) byte substitution using a substitution table (S-box),

2) shifting rows of the State array by different offsets,

3) mixing the data within each column of the State array, and

4) adding a Round Key to the State.

4.1 Cipher

At the start of the Cipher, the input is copied to the State array. After an initial

Round Key addition, the State array is transformed by implementing a round function

10, 12, or 14 times (depending on the key length), with the final round differing

slightly from the first Nr -1 rounds. The final State is then copied to the output. The

round function is parameterized using a key schedule that consists of a one-

dimensional array of four-byte words derived using the Key Expansion routine.

4.1.1 SubBytes()Transformation

In the SubBytes step, each byte in the state is replaced with its entry in a fixed

8-bit lookup table, S; bij = S(aij). In the SubBytes step, each byte in the array is updated

using an 8-bit substitution box, the Rijndael S-box. This operation provides the non-

22

linearity in the cipher. The S-box used is derived from the multiplicative inverse over

GF(28), known to have good non-linearity properties. To avoid attacks based on simple

algebraic properties, the S-box is constructed by combining the inverse function with

an invertible affine transformation. The S-box is also chosen to avoid any fixed points

(and so is a derangement), and also any opposite fixed points.

The SubBytes() transformation is a non-linear byte substitution that operates

independently on each byte of the State using a substitution table (S-box). This S-box

(Fig. 4.3), which is invertible, is constructed by composing two transformations

1. Take the multiplicative inverse in the finite field GF(28), the element {00} is

mapped to itself.

2. Apply the following affine transformation (over GF(2) )

b’i =bi b( i+4)mod 8 b( i+5)mod 8 b( i+6)mod 8 b( i+7)mod 8ci (4.1)

for 0 i 8 , where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the

value {63} or {01100011}. Here and elsewhere, a prime on a variable (e.g., b)

indicates that the variable is to be updated with the value on the right. In matrix form,

the affine transformation element of the S-box can be expressed as

(4.2)

23

.

Figure 4.3 SubBytes() applies the S-box to each byte of the State.

The S-box used in the SubBytes() transformation is presented in hexadecimal

form. For example, if S1,1= {53}, then the substitution value would be determined by

the intersection of the row with index ‘5’ and the column with index ‘3’ in Fig. 4.3.

This would result in S′1.1having a value of {ed}.

Figure 4.4. S-box substitution values for the byte xy (in hexadecimal format).

4.1.2 ShiftRows() Transformation

In the ShiftRows step, bytes in each row of the state are shifted cyclically to

the left. The number of places each byte is shifted differs for each row. The ShiftRows

24

step operates on the rows of the state; it cyclically shifts the bytes in each row by a

certain offset. For AES, the first row is left unchanged. Each byte of the second row is

shifted one to the left. Similarly, the third and fourth rows are shifted by offsets of two

and three respectively. For the block of size 128 bits and 192 bits the shifting pattern is

the same. In this way, each column of the output state of the ShiftRows step is

composed of bytes from each column of the input state. (Rijndael variants with a

larger block size have slightly different offsets). In the case of the 256-bit block, the

first row is unchanged and the shifting for second, third and fourth row is 1 byte, 3

bytes and 4 bytes respectively - this change only applies for the Rijndael cipher when

used with a 256-bit block, as AES does not use 256-bit blocks.

In the ShiftRows() transformation, the bytes in the last three rows of the State

are cyclically shifted over different numbers of bytes (offsets). The first row, r = 0, is

not shifted. Specifically, the ShiftRows() transformation proceeds as follows

s’r,c = s r,(c+shift(r,Nb)) mod Nb for 0 < r < 4 and 0 < c < Nb, (4.3)

where the shift value shift(r,Nb) depends on the row number, r, as follows (recall that

Nb = 4)

shift(1,4) =1; shift(2,4) =2 ; shift(3,4) =3 . (4.4)

This has the effect of moving bytes to “lower” positions in the row (i.e., lower

values of c in a given row), while the “lowest” bytes wrap around into the “top” of the

row (i.e., higher values of c in a given row).

25

S S ’

Figure 4.5 ShiftRows() cyclically shifts the last three rows in the State.

4.1.3 MixColumns() Transformation

In the MixColumns step, each column of the state is multiplied with a fixed

polynomial c(x). In the MixColumns step, the four bytes of each column of the state

are combined using an invertible linear transformation. The MixColumns function

takes four bytes as input and outputs four bytes, where each input byte affects all four

output bytes. Together with ShiftRows, MixColumns provides diffusion in the cipher.

Each column is treated as a polynomial over GF(28) and is then multiplied modulo x4 +

1 with a fixed polynomial c(x) = 3x3 + x2 + x + 2. (The coefficients are displayed in

their hexadecimal equivalent of the binary representation of bit polynomials from

GF(2)[x].)

The MixColumns step can also be viewed as a multiplication by a particular

MDS matrix in Finite field. This process is described further in the article Rijndael

mix columns. The MixColumns() transformation operates on the State column-by-

column, treating each column as a four-term polynomial. The columns are considered

as polynomials over GF(28) and multiplied modulo x4 + 1 with a fixed polynomial

a(x), given by

a(x) = {03}x3 + {01}x2 + {01}x + {02} . (4.5)

As described in Sec. 4.3, this can be written as a matrix multiplication.

26

Let s′(x) a(x)s(x)

As a result of this multiplication, the four bytes in a column are replaced by the following

Figure 4.6 MixColumns() operates on the State column-by-column.

4.1.4 AddRoundKey() Transformation

In the AddRoundKey step, each byte of the state is combined with a byte of the

round subkey using the XOR operation (⊕). In the AddRoundKey step, the subkey is

combined with the state. For each round, a subkey is derived from the main key using

27

Rijndael's key schedule; each subkey is the same size as the state. The subkey is added

by combining each byte of the state with the corresponding byte of the subkey using

bitwise XOR.

In the AddRoundKey() transformation, a Round Key is added to the State by a

simple bitwise XOR operation. Each Round Key consists of Nb words from the key

schedule. Those Nb words are each added into the columns of the State, such that

(4.7)

where [wi] are the key schedule words, and round is a value in the range 0round

Nr. In the Cipher, the initial Round Key addition occurs when round = 0, prior to the

first application of the round function (see Fig. 4.3). The application of the

AddRoundKey() transformation to the Nr rounds of the Cipher occurs when 1round

Nr.

Figure 4.7 AddRoundKey() XORs each column of the State with a wordfrom the key schedule.

4.2 Key Expansion

The AES algorithm takes the Cipher Key, K, and performs a Key Expansion

routine to generate a key schedule. The Key Expansion generates a total of Nb (Nr +

1) words the algorithm requires an initial set of Nb words, and each of the Nr rounds

requires Nb words of key data. The resulting key schedule consists of a linear array of

4-byte words, denoted [wi ], with i in the range 0 i < Nb(Nr + 1). SubWord() is a

function that takes a four-byte input word and applies the S-box to each of the four

28

bytes to produce an output word. The function RotWord() takes a word [a0,a1,a2,a3] as

input, performs a cyclic permutation, and returns the word [a1,a2,a3,a0]. The round

constant word array, Rcon[i], contains the values given by [xi-1,{00},{00},{00}], with

x i-1 being powers of x (x is denoted as {02}) in the field GF(28), (note that i starts at 1,

not 0). From Fig. 11, it can be seen that the first Nk words of the expanded key are

filled with the Cipher Key. Every following word, w[i], is equal to the XOR of the

previous word, w[i-1], and the word Nk positions earlier, w[i-Nk]. For words in

positions that are a multiple of Nk, a transformation is applied to w[i-1]prior to the

XOR, followed by an XOR with a round constant, Rcon[i]. This transformation

consists of a cyclic shift of the bytes in a word (RotWord()), followed by the

application of a table lookup to all four bytes of the word (SubWord()). It is important

to note that the Key Expansion routine for 256-bit Cipher Keys (Nk = 8) is slightly

different than for 128- and 192-bit Cipher Keys. If Nk = 8 and i-4 is a multiple of Nk,

then SubWord() is applied to w[i-1]prior to the XOR.

4.3 Implementation Issues

Key Length Requirements

An implementation of the AES algorithm shall support at least one of the three

key lengths 128, 192, or 256 bits (i.e., Nk = 4, 6, or 8, respectively). Implementations

may optionally support two or three key lengths, which may promote the

interoperability of algorithm implementations.

Keying Restrictions

No weak or semi-weak keys have been identified for the AES algorithm, and

there is no restriction on key selection.

Parameterization of Key Length, Block Size, and Round Number

This standard explicitly defines the allowed values for the key length (Nk),

block size (Nb), and number of rounds (Nr). However, future reaffirmations of this

standard could include changes or additions to the allowed values for those

parameters. Therefore, implementers may choose to design their AES implementations

with future flexibility in mind.

29

Implementation Suggestions Regarding Various Platforms

Implementation variations are possible that may, in many cases, offer

performance or other advantages. Given the same input key and data (plaintext or

ciphertext), any implementation that produces the same output (ciphertext or plaintext)

as the algorithm specified in this standard is an acceptable implementation of the AES.

30

CHAPTER-5

Xilinx FPGA Programming

5.1.Introduction

Xilinx leads one of the fastest growing segments of the semiconductor industry

– programmable logic devices.

Xilinx FPGAs

The Xilinx FPGA Spartan3e series has redefined programmable logic by

expanding the traditional capabilities of field programmable gate arrays (FPGAs) with

new levels of integration and features that address high performance system design

issues. In a single, off-the-shelf programmable Xilinx device, systems architects can

take advantage of microprocessors, the highest density of on-chip memory, multi-

gigabit serial transceivers, digital clock managers, on-chip termination and more. The

result is that Xilinx FPGAs helps designers to simplify board layout, reduce bill of

materials, and get products to market faster than ever before.

Xilinx FPGA Spartan3e FPGAs are available with up to four immersed IBM

PowerPC 405 processors and up to 16 high-speed transceivers that operate at 3.125

gigabits per second. Xilinx Rocket I/O transceivers offer a complete serial interface

solution, supporting 10 Gigabit Ethernet with XAUI, PCI Express and Serial ATA.

Each IBM PowerPC in Xilinx FPGA Spartan3e FPGAs run at 300-plus MHz,

delivering 450 Dhrystone MIPS, and is supported by IBM Core Connect bus

technology. With Xilinx FPGA Spartan3e FPGAs, systems designers can for the first

time partition and repartition their systems between hardware and software at any time

during the development cycle, even after the product has shipped, and debug hardware

and software simultaneously at speed. Xilinx FPGA Spartan3e devices range in

density from 3,168 to 50,832 logic cells.

5.2. Overview of ISE and Synthesis Tools

Overview of ISE

ISE controls all aspects of the design flow. Through the Project Navigator

interface, you can access all of the various design entry and design implementation

31

tools. You can also access the files and documents associated with your project.

Project Navigator maintains a flat directory structure; therefore, you can maintain

revision control through the use of snapshots.

Project Navigator Interface

The Project Navigator Interface is divided into four main sub windows. On the

top left is the Sources in Project window which hierarchically displays the elements

included in the project. Beneath the Sources in Project window is the Processes for

Current Source window which displays available processes. The third window at the

bottom of the Project Navigator is the Console window which displays status

messages, errors, and warnings, and which is updated during all project actions. The

fourth window to the right is a multi-document Interface (MDI) window for viewing

ASCII text files and HDL Bencher™ Waveforms. Each window may be resized,

undocked from Project Navigator or moved to a new location within the main Project

Navigator window. Selecting View Restore, Default Layout can always restore the

default layout. These windows are discussed in more detail in the following sections.

Figure 5-1 Project Navigator

32

Sources in Project Window

This window consists of three tabs, which provide information for the user.

Each tab is discussed in further detail below.

Module View

The Module View tab displays the project name, any user documents, the

specified part type and design flow/synthesis tool, and design source files. Each file in

the Module View has an associated icon. The icon indicates the file type (HDL file,

schematic, core, or text file, for example). For a complete list of possible source types

and their associated icons, seethe Project Navigator online help. Select Help_ISE Help

Contents, select the Index tab and click Source / file types. If a file contains lower

levels of hierarchy, the icon has a + to the left of the name. HDL files have this + to

show the entities (VHDL) or modules (Verilog) within the file. You can expand the

hierarchy by clicking the +. You can open a file for editing by double-clicking on the

filename.

Snapshot View

The Snapshot View tab displays all snapshots associated with the project

currently open in Project Navigator. A snapshot is a copy of the project including all

files in the working directory, and synthesis and simulation subdirectories. A snapshot

is stored with the project for which is taken, and can be viewed in the Snapshot View.

You can view the reports, user documents, and source files for all snapshots. All

information displayed in the Snapshot View is read-only. Using snapshots provides an

excellent version control system, enabling sub teams to do simultaneous development

on the same design.

Note Remote sources are not copied with the snapshot. A reference is maintained in

the snapshot.

Library View

The Library View tab displays all libraries associated with the project open in

Project Navigator.

33

Processes for Current Source Window

This window contains the Process View tab.

Process View

The Process View tab is context sensitive and changes based upon the source

type selected in the Sources for Project window. From the Process View tab, you can

run the functions necessary to define, run and view your design. The Process Window

provides access to the following functions

Design Entry Utilities

Provides access to symbol generation, instantiation templates, HDL Converter,

View Command Line Log File, Launch MTI, and simulation library compilation.

User Constraints

Provides access to editing location and timing constraints.

Synthesis

Provides access to Check Syntax, synthesis, View RTL Schematic, and

synthesis reports. This varies depending on the synthesis tools you use.

Implement Design

Provides access to implementation tools, design flow reports, and point tools.

Generate Programming File

Provides access to the configuration tools and bit stream generation. The

Processes for Current Source window incorporates auto make technology. This

enables the user to select any process in the flow and the software automatically runs

the processes necessary to get to the desired step. For example, when you run the

Implementation process, Project Navigator also runs the synthesis process because

implementation is dependent on up-to-date synthesis results.

Note: To view a running log of command line arguments in the Console window,

expand Design Entry Utilities and select View Command Line Log File.

34

Console Window

The Console window displays errors, warnings, and informational messages.

Errors are signified by a red box next to the message, while warnings have a yellow

box. Warning and Error messages may also be viewed separately from other console

text messages by selecting either the Warnings or Errors tab at the bottom of the

console window.

Error Navigation to Source

You can navigate from a synthesis error or warning message in the Console

window to the location of the error in a source HDL file. To do so, select the error or

warning message, right-click the mouse, and from the menu select Go to Source. The

HDL source file opens and the cursor moves to the line with the error.

Error Navigation to Solution Record

You can navigate from an error or warning message in the Console window to

the relevant solution records on the support.xilinx.com website. These type of errors or

warnings can be identified by the web icon to the left of the error. To navigate to the

solution record, select the error or warning message, right-click the mouse, and from

the menu select go to Solution Record. The default web browser opens and displays all

solution records applicable to this message.

Synthesizing the Design

So far you have used XST for verifying syntax. Next, you will synthesize the

design. The synthesis tool uses the design’s HDL code and generates a supported

netlist type (EDIF or NGC for the Xilinx® implementation tools). The synthesis tools

perform three general steps (although all synthesis tools further breakdown these

general steps) to create the netlist

Analyze / Check Syntax

Checks the syntax of the source code.

Compile

Translates and optimizes the HDL code into a set of components that the

synthesis tool can recognize.

35

Map

Translates the components from the compile stage into the target technology’s

The RTL Viewer

XST can generate a schematic representation of the HDL code that you have

entered. A schematic view of the code is helpful for analyzing your design to see a

graphical connection between the various components that XST has inferred. To view

a schematic representation of your RTL code

1. In Project Navigator, click + next to Synthesize to expand the process

hierarchy.

2. Double-click View RTL Schematic.

Fig.5.2 RTL Viewer

36

Entering Synthesis Options through ISE

Synthesis options enable you to modify the behavior of the synthesis tool to

optimize according to the needs of the design. One option is to control synthesis by

optimizing based on area or speed. Other options include controlling the maximum fan

out of a signal from a flip-flop or setting the desired frequency of the design.

For this tutorial, set the global synthesis options

1. Select stopwatch.vhd (or stopwatch.v).

2. Right-click the Synthesis process.

3. From the menu, select Properties.

4. Click the Synthesis Options tab, and set the Default Frequency to 50MHz.

5. Click the Netlist Options tab, and ensure that the Do Not Write NCF box is

unchecked.

6. Click the Constraint File Options tab, and select the stopwatch.ctr file created

in LeonardoSpectrum, in the “Modifying Constraints” section above.

7. Click OK to accept these values.

8. Select stopwatch.vhd (or stopwatch.v) and double-click the Synthesize process

in theProcesses for Source window.

The RTL/Technology Viewer

LeonardoSpectrum can generate a schematic representation of the HDL code

that you have entered. A schematic view of the code is helpful for analyzing your

design to see a graphical connection between the various components that

LeonardoSpectrum has inferred.

To launch the design in LeonardoSpectrum’s RTL viewer, double-click the

View RTL Schematic process. The following figure displays the design in an RTL

view. LeonardoSpectrum Synthesis Processes

37

Overview of Behavioral Simulation Flow

Behavioral simulation is done before the design is synthesized to verify that the

logic you have created is correct. This allows a designer to find and fix any bugs in the

design before spending time with Synthesis or Implementation. Xilinx® ISE provides

an integrated flow with the ModelTech ModelSim simulator that allows simulations to

be run from the Xilinx Project Navigator graphical user interface (GUI). The examples

in this tutorial show how to use this integrated flow. For additional information about

simulation and for a list of the other supported simulators, refer to Chapter 6 of

Synthesis and Verification Guide. This Guide is available with the collection of

software manual and is accessible from ISE by selecting Help Online Documentation,

or from the web at http //support.xilinx.com/support/sw_manuals/xilinx6/.

5.3 ChipScope ICON/VIO/ILA

The Xilinx ChipScope tools package has several modules that you can add to

your Verilog design to capture input and output directly from the FPGA hardware.

These are

• ICON (Integrated CONtroller) A controller module that provides communication

between the ChipScope host PC and ChipScope modules in the design (such as VIO

and ILA).

• VIO (Virtual Input/Output) A module that can monitor and drive signals in your

design in real-time. You can think of them as virtual push-buttons (for input) and

LEDs (for output). These can be used for debugging purposes, or they can

incorporated into your design as a permanent I/O interface.

• ILA (Integrated Logic Analyzer) A module that lets you view and trigger on

signals in your hardware design. Think of it as a digital oscilloscope (like ModelSim’s

waveform viewer) that you can place in your design to aid in debugging.

These ChipScope modules are extremely useful because they allow you to

view and manipulate signals directly from hardware during run-time. Since they are

38

http://support.xilinx.com/support/sw_manuals/xilinx6/

real Verilog modules and netlists, they get incorporated, synthesized, and implemented

into your design just like any other Verilog code you would write. Whether you know

it or not, you’ve been using ChipScope modules in your designs for the past few

weeks. Take a look at the top-level modules for all the previous labs we’ve finished—

they all contain declarations and instantiations for ICON, VIO, and/or ILA modules.

After working through this tutorial, you’ll know how to add these modules to your

design by yourself.

Figure 5.3. ChipScope Organization Details

Take a look at the ChipScope organization diagram above. To use ChipScope

modules in your design, you must always generate and instantiate an ICON controller

module. The ICON controller module communicates with the host PC and sends

commands to other ChipScope modules via a control port. Your ICON controller

module must be generated with the same number of control ports as there are other

ChipScope modules in your design. For example, if you want to add an ILA module

and a VIO module to your design, generate an ICON module with two control ports.

Once you’ve added the ICON module to your design, you can add as many ChipScope

modules as you have control ports. VIO and ILA modules take the ICON control port

39

as an input and then interact with the modules in your design through

sync_in/sync_out and trigger ports respectively.

Incorporating ChipScope Modules into Your Design

Now that you’ve determined that you need ChipScope modules in your design,

whether for debugging or as a permanent I/O interface, it’s simple to add them to your

design. You follow a four-step process

1. Generate the ChipScope modules, using the ChipScope Core Generator.

2. Incorporate and instantiate the ChipScope modules into the top-level

module in your design.

3. Connect the ChipScope modules to your design.

4. Synthesize, implement, and run the design on the FPGA.

5.4 FPGA Programming

Implementing a logic design with an FPGA usually consists of the following

steps (depicted in the figure which follows)

1. You enter a description of your logic circuit using a hardware description

language (HDL) such as VHDL or Verilog. You can also draw your design

using a schematic editor.

2. You use a logic synthesizer program to transform the HDL or schematic into a

netlist. The netlist is just a description of the various logic gates in your design

and how they are interconnected.

3. You use the implementation tools to map the logic gates and interconnections

into the FPGA. The FPGA consists of many configurable logic blocks, which

can be further decomposed into look-up tables that perform logic operations.

The CLBs and LUTs are interwoven with various routing resources. The

mapping tool collects your netlist gates into groups that fit into the LUTs and

then the place & route tool assigns the groups to specific CLBs while opening

or closing the switches in the routing matrices to connect them together.

4. Once the implementation phase is complete, a program extracts the state of the

switches in the routing matrices and generates a bitstream where the ones and

zeroes correspond to open or closed switches.

5. The bitstream is downloaded into a physical FPGA chip. The electronic

switches in the FPGA open or close in response to the binary bits in the

40

bitstream. Upon completion of the downloading, the FPGA will perform the

operations specified by your HDL code or schematic.

That's really all there is to it. XILINX ISE provides the HDL and schematic

editors, logic synthesizer, fitter, and bitstream generator software. The XSTOOLs

from XESS provide utilities for downloading the bitstream into the FPGA on the XSA

Board.

Fig 5.4.Dumping Procedure

41

CHAPTER-6

SOURCE CODE OF AES

AES CODE

-- AES Encryption System-- The top level of the Rijndael algorithm-- Each round finished in one clock cycle

LIBRARY ieee;USE ieee.std_logic_1164.ALL;USE ieee.std_logic_arith.ALL;--USE work.ALL;

ENTITY aes ISPORT( plaintext IN STD_LOGIC_VECTOR(127 DOWNTO 0); user_key IN STD_LOGIC_VECTOR(127 DOWNTO 0); ciphertext OUT STD_LOGIC_VECTOR(127 DOWNTO 0); encrypt IN STD_LOGIC; clk IN STD_LOGIC; reset IN STD_LOGIC );END aes;

ARCHITECTURE beh OF aes IS--component instantiationCOMPONENT round PORT( e_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); key IN STD_LOGIC_VECTOR(127 DOWNTO 0); last_mux_sel IN STD_LOGIC; d_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END COMPONENT;COMPONENT key_schedule PORT( clk IN STD_LOGIC; reset IN STD_LOGIC; key_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); key_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0); key_reg_mux_sel IN STD_LOGIC; round_constant IN STD_LOGIC_VECTOR(7 DOWNTO 0); load_key_reg IN STD_LOGIC );END COMPONENT;COMPONENT controlPORT( reset IN STD_LOGIC;

42

clk IN STD_LOGIC; encrypt IN STD_LOGIC; data_reg_mux_sel OUT STD_LOGIC_VECTOR(1 DOWNTO 0); load_data_reg OUT STD_LOGIC; key_reg_mux_sel OUT STD_LOGIC; round_const OUT STD_LOGIC_VECTOR(7 DOWNTO 0); last_mux_sel OUT STD_LOGIC; load_key_reg OUT STD_LOGIC );END COMPONENT;--internal signal instantiationSIGNAL data_reg_in, data_reg_out, round0_out, round1_10_out, key STD_LOGIC_VECTOR(127 DOWNTO 0);SIGNAL key_reg_mux_sel std_logic;SIGNAL round_constant std_logic_vector(7 downto 0);SIGNAL data_reg_mux_sel std_logic_vector(1 downto 0);SIGNAL load_data_reg, load_key_reg, last_mux_sel std_logic;--------------------------------------------BEGIN--mux to the register inputWITH data_reg_mux_sel SELECTdata_reg_in <= round0_out WHEN "00",

round1_10_out WHEN "01",plaintext WHEN OTHERS;

--1st Roundround0_out <= data_reg_out XOR key;--2nd to 10th Rounds, where same hareware gets reusedlayers roundPORT MAP(

e_in => data_reg_out, key => key,last_mux_sel=> last_mux_sel,d_out => round1_10_out );

--register to store values after each roundsdata_register PROCESS(clk, reset, load_data_reg, data_reg_in)BEGIN IF(reset='1') THEN

data_reg_out <= "0000000000000000000000000000000000000000000"&"0000000000000000000000000000000000000000000"&"000000000000000000000000000000000000000000";

ELSIF(clk'event AND clk='1') THEN IF(load_data_reg='1') THEN

data_reg_out <= data_reg_in; END IF;

END IF;END PROCESS data_register;--key generator for each roundskey_generator key_schedule

43

PORT MAP(clk => clk,reset => reset,key_in => user_key,key_out => key,key_reg_mux_sel => key_reg_mux_sel,round_constant => round_constant,load_key_reg => load_key_reg);

--system controlcontrl controlPORT MAP(

reset => reset, clk => clk,encrypt => encrypt,data_reg_mux_sel=> data_reg_mux_sel,load_data_reg => load_data_reg,key_reg_mux_sel => key_reg_mux_sel,round_const => round_constant,last_mux_sel => last_mux_sel,load_key_reg => load_key_reg);

--encryption outputciphertext <= data_reg_out;

END beh;

ROUND KEY CODE

-- One round of the Rijndael algorithm

LIBRARY ieee;USE ieee.std_logic_1164.ALL;--USE work.ALL;

ENTITY round ISPORT( e_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); key IN STD_LOGIC_VECTOR(127 DOWNTO 0); last_mux_sel IN STD_LOGIC; d_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END round;

ARCHITECTURE beh OF round IS--component instantiationCOMPONENT s_box PORT( s_in IN STD_LOGIC_VECTOR(7 DOWNTO 0); s_out OUT STD_LOGIC_VECTOR(7 DOWNTO 0)

44

);END COMPONENT;COMPONENT shift_row PORT( shiftrow_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); shiftrow_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END COMPONENT;COMPONENT mix_column PORT( mixcolumn_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); mixcolumn_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END COMPONENT;--internal signal instantiationSIGNAL bytesub_out, shiftrow_out, mixcolumn_out, mux_out STD_LOGIC_VECTOR(127 DOWNTO 0);

--description of a Rijndael round-- PLAINTEXT ==> |S_BOX --> SHIFT_ROW --> MIX_COLUMN --> ADD_ROUND_KEY| ==> CIPHERTEXTBEGIN--16 replica of 8-bit S-box is generated sboxes FOR i IN 15 DOWNTO 0 GENERATE sbox_map s_box PORT MAP(

s_in => e_in(8*i+7 downto 8*i), s_out => bytesub_out(8*i+7 downto 8*i) );

END GENERATE sboxes; ShiftRow shift_rowPORT MAP(

shiftrow_in => bytesub_out,shiftrow_out => shiftrow_out);

MixColumn mix_columnPORT MAP(

mixcolumn_in => shiftrow_out,mixcolumn_out => mixcolumn_out);

--mux to skip mix column operationWITH last_mux_sel SELECTmux_out <= mixcolumn_out WHEN '0',

shiftrow_out WHEN OTHERS;--round key addition d_out <= mux_out XOR key;

END beh;

45

S-BOX CODE

-- 8-bit input S-Box-- Implemented as a lookup table

LIBRARY ieee;USE ieee.std_logic_1164.ALL;

ENTITY s_box ISPORT( s_in IN STD_LOGIC_VECTOR(7 DOWNTO 0); s_out OUT STD_LOGIC_VECTOR(7 DOWNTO 0) );END s_box;

ARCHITECTURE beh OF s_box IS BEGIN--s box tableWITH s_in(7 DOWNTO 0) SELECT s_out(7 DOWNTO 0) <=

--first row "01100011" WHEN "00000000", --(X"63") "01111100" WHEN "00000001", --(X"7C") "01110111" WHEN "00000010", --(X"77") "01111011" WHEN "00000011", --(X"7B") "11110010" WHEN "00000100", --(X"F2") "01101011" WHEN "00000101", --(X"6B") "01101111" WHEN "00000110", --(X"6F") "11000101" WHEN "00000111", --(X"C5") "00110000" WHEN "00001000", --(X"30") "00000001" WHEN "00001001", --(X"01") "01100111" WHEN "00001010", --(X"67") "00101011" WHEN "00001011", --(X"2B") "11111110" WHEN "00001100", --(X"FE") "11010111" WHEN "00001101", --(X"D7") "10101011" WHEN "00001110", --(X"AB") "01110110" WHEN "00001111", --(X"76") --second row "11001010" WHEN "00010000", --(X"CA") "10000010" WHEN "00010001", --(X"82") "11001001" WHEN "00010010", --(X"C9") "01111101" WHEN "00010011", --(X"7D") "11111010" WHEN "00010100", --(X"FA") "01011001" WHEN "00010101", --(X"59") "01000111" WHEN "00010110", --(X"47") "11110000" WHEN "00010111", --(X"F0") "10101101" WHEN "00011000", --(X"AD") "11010100" WHEN "00011001", --(X"D4")

46

"10100010" WHEN "00011010", --(X"A2") "10101111" WHEN "00011011", --(X"AF") "10011100" WHEN "00011100", --(X"9C") "10100100" WHEN "00011101", --(X"A4") "01110010" WHEN "00011110", --(X"72") "11000000" WHEN "00011111", --(X"C0") --third row "10110111" WHEN "00100000", --(X"B7") "11111101" WHEN "00100001", --(X"FD") "10010011" WHEN "00100010", --(X"93") "00100110" WHEN "00100011", --(X"26") "00110110" WHEN "00100100", --(X"36") "00111111" WHEN "00100101", --(X"3F") "11110111" WHEN "00100110", --(X"F7") "11001100" WHEN "00100111", --(X"CC") "00110100" WHEN "00101000", --(X"34") "10100101" WHEN "00101001", --(X"A5") "11100101" WHEN "00101010", --(X"E5") "11110001" WHEN "00101011", --(X"F1") "01110001" WHEN "00101100", --(X"71") "11011000" WHEN "00101101", --(X"D8") "00110001" WHEN "00101110", --(X"31") "00010101" WHEN "00101111", --(X"15") --forth row "00000100" WHEN "00110000", --(X"04") "11000111" WHEN "00110001", --(X"C7") "00100011" WHEN "00110010", --(X"23") "11000011" WHEN "00110011", --(X"C3") "00011000" WHEN "00110100", --(X"18") "10010110" WHEN "00110101", --(X"96") "00000101" WHEN "00110110", --(X"05") "10011010" WHEN "00110111", --(X"9A") "00000111" WHEN "00111000", --(X"07") "00010010" WHEN "00111001", --(X"12") "10000000" WHEN "00111010", --(X"80") "11100010" WHEN "00111011", --(X"E2") "11101011" WHEN "00111100", --(X"EB") "00100111" WHEN "00111101", --(X"27") "10110010" WHEN "00111110", --(X"B2") "01110101" WHEN "00111111", --(X"75") --fifth row "00001001" WHEN "01000000", --(X"09") "10000011" WHEN "01000001", --(X"83") "00101100" WHEN "01000010", --(X"2C") "00011010" WHEN "01000011", --(X"1A") "00011011" WHEN "01000100", --(X"1B") "01101110" WHEN "01000101", --(X"6E") "01011010" WHEN "01000110", --(X"5A") "10100000" WHEN "01000111", --(X"A0") "01010010" WHEN "01001000", --(X"52")

47

"00111011" WHEN "01001001", --(X"3B") "11010110" WHEN "01001010", --(X"D6") "10110011" WHEN "01001011", --(X"B3") "00101001" WHEN "01001100", --(X"29") "11100011" WHEN "01001101", --(X"E3") "00101111" WHEN "01001110", --(X"2F") "10000100" WHEN "01001111", --(X"84") --sixth row "01010011" WHEN "01010000", --(X"53") "11010001" WHEN "01010001", --(X"D1") "00000000" WHEN "01010010", --(X"00") "11101101" WHEN "01010011", --(X"ED") "00100000" WHEN "01010100", --(X"20") "11111100" WHEN "01010101", --(X"FC") "10110001" WHEN "01010110", --(X"B1") "01011011" WHEN "01010111", --(X"5B") "01101010" WHEN "01011000", --(X"6A") "11001011" WHEN "01011001", --(X"CB") "10111110" WHEN "01011010", --(X"BE") "00111001" WHEN "01011011", --(X"39") "01001010" WHEN "01011100", --(X"4A") "01001100" WHEN "01011101", --(X"4C") "01011000" WHEN "01011110", --(X"58") "11001111" WHEN "01011111", --(X"CF") --seventh row "11010000" WHEN "01100000", --(X"D0") "11101111" WHEN "01100001", --(X"EF") "10101010" WHEN "01100010", --(X"AA") "11111011" WHEN "01100011", --(X"FB") "01000011" WHEN "01100100", --(X"43") "01001101" WHEN "01100101", --(X"4D") "00110011" WHEN "01100110", --(X"33") "10000101" WHEN "01100111", --(X"85") "01000101" WHEN "01101000", --(X"45") "11111001" WHEN "01101001", --(X"F9") "00000010" WHEN "01101010", --(X"02") "01111111" WHEN "01101011", --(X"7F") "01010000" WHEN "01101100", --(X"50") "00111100" WHEN "01101101", --(X"3C") "10011111" WHEN "01101110", --(X"9F") "10101000" WHEN "01101111", --(X"A8") --eighth row "01010001" WHEN "01110000", --(X"51") "10100011" WHEN "01110001", --(X"A3") "01000000" WHEN "01110010", --(X"40") "10001111" WHEN "01110011", --(X"8F") "10010010" WHEN "01110100", --(X"92") "10011101" WHEN "01110101", --(X"9D") "00111000" WHEN "01110110", --(X"38") "11110101" WHEN "01110111", --(X"F5")

48

"10111100" WHEN "01111000", --(X"BC") "10110110" WHEN "01111001", --(X"B6") "11011010" WHEN "01111010", --(X"DA") "00100001" WHEN "01111011", --(X"21") "00010000" WHEN "01111100", --(X"10") "11111111" WHEN "01111101", --(X"FF") "11110011" WHEN "01111110", --(X"F3") "11010010" WHEN "01111111", --(X"D2") --ninth row "11001101" WHEN "10000000", --(X"CD") "00001100" WHEN "10000001", --(X"0C") "00010011" WHEN "10000010", --(X"13") "11101100" WHEN "10000011", --(X"EC") "01011111" WHEN "10000100", --(X"5F") "10010111" WHEN "10000101", --(X"97") "01000100" WHEN "10000110", --(X"44") "00010111" WHEN "10000111", --(X"17") "11000100" WHEN "10001000", --(X"C4") "10100111" WHEN "10001001", --(X"A7") "01111110" WHEN "10001010", --(X"7E") "00111101" WHEN "10001011", --(X"3D") "01100100" WHEN "10001100", --(X"64") "01011101" WHEN "10001101", --(X"5D") "00011001" WHEN "10001110", --(X"19") "01110011" WHEN "10001111", --(X"73") --tenth row "01100000" WHEN "10010000", --(X"60") "10000001" WHEN "10010001", --(X"81") "01001111" WHEN "10010010", --(X"4F") "11011100" WHEN "10010011", --(X"DC") "00100010" WHEN "10010100", --(X"22") "00101010" WHEN "10010101", --(X"2A") "10010000" WHEN "10010110", --(X"90") "10001000" WHEN "10010111", --(X"88") "01000110" WHEN "10011000", --(X"46") "11101110" WHEN "10011001", --(X"EE") "10111000" WHEN "10011010", --(X"B8") "00010100" WHEN "10011011", --(X"14") "11011110" WHEN "10011100", --(X"DE") "01011110" WHEN "10011101", --(X"5E") "00001011" WHEN "10011110", --(X"0B") "11011011" WHEN "10011111", --(X"DB") --eleventh row "11100000" WHEN "10100000", --(X"E0") "00110010" WHEN "10100001", --(X"32") "00111010" WHEN "10100010", --(X"3A") "00001010" WHEN "10100011", --(X"0A") "01001001" WHEN "10100100", --(X"49") "00000110" WHEN "10100101", --(X"06") "00100100" WHEN "10100110", --(X"24")

49

"01011100" WHEN "10100111", --(X"5C") "11000010" WHEN "10101000", --(X"C2") "11010011" WHEN "10101001", --(X"D3") "10101100" WHEN "10101010", --(X"AC") "01100010" WHEN "10101011", --(X"62") "10010001" WHEN "10101100", --(X"91") "10010101" WHEN "10101101", --(X"95") "11100100" WHEN "10101110", --(X"E4") "01111001" WHEN "10101111", --(X"79") --twelveth row "11100111" WHEN "10110000", --(X"E7") "11001000" WHEN "10110001", --(X"C8") "00110111" WHEN "10110010", --(X"37") "01101101" WHEN "10110011", --(X"6D") "10001101" WHEN "10110100", --(X"8D") "11010101" WHEN "10110101", --(X"D5") "01001110" WHEN "10110110", --(X"4E") "10101001" WHEN "10110111", --(X"A9") "01101100" WHEN "10111000", --(X"6C") "01010110" WHEN "10111001", --(X"56") "11110100" WHEN "10111010", --(X"F4") "11101010" WHEN "10111011", --(X"EA") "01100101" WHEN "10111100", --(X"65") "01111010" WHEN "10111101", --(X"7A") "10101110" WHEN "10111110", --(X"AE") "00001000" WHEN "10111111", --(X"08") --thirteenth row "10111010" WHEN "11000000", --(X"BA") "01111000" WHEN "11000001", --(X"78") "00100101" WHEN "11000010", --(X"25") "00101110" WHEN "11000011", --(X"2E") "00011100" WHEN "11000100", --(X"1C") "10100110" WHEN "11000101", --(X"A6") "10110100" WHEN "11000110", --(X"B4") "11000110" WHEN "11000111", --(X"C6") "11101000" WHEN "11001000", --(X"E8") "11011101" WHEN "11001001", --(X"DD") "01110100" WHEN "11001010", --(X"74") "00011111" WHEN "11001011", --(X"1F") "01001011" WHEN "11001100", --(X"4B") "10111101" WHEN "11001101", --(X"BD") "10001011" WHEN "11001110", --(X"8B") "10001010" WHEN "11001111", --(X"8A") --forteenth row "01110000" WHEN "11010000", --(X"70") "00111110" WHEN "11010001", --(X"3E") "10110101" WHEN "11010010", --(X"B5") "01100110" WHEN "11010011", --(X"66") "01001000" WHEN "11010100", --(X"48") "00000011" WHEN "11010101", --(X"03")

50

"11110110" WHEN "11010110", --(X"F6") "00001110" WHEN "11010111", --(X"0E") "01100001" WHEN "11011000", --(X"61") "00110101" WHEN "11011001", --(X"35") "01010111" WHEN "11011010", --(X"57") "10111001" WHEN "11011011", --(X"B9") "10000110" WHEN "11011100", --(X"86") "11000001" WHEN "11011101", --(X"C1") "00011101" WHEN "11011110", --(X"1D") "10011110" WHEN "11011111", --(X"9E") --fifteenth row "11100001" WHEN "11100000", --(X"E1") "11111000" WHEN "11100001", --(X"F8") "10011000" WHEN "11100010", --(X"98") "00010001" WHEN "11100011", --(X"11") "01101001" WHEN "11100100", --(X"69") "11011001" WHEN "11100101", --(X"D9") "10001110" WHEN "11100110", --(X"8E") "10010100" WHEN "11100111", --(X"94") "10011011" WHEN "11101000", --(X"9B") "00011110" WHEN "11101001", --(X"1E") "10000111" WHEN "11101010", --(X"87") "11101001" WHEN "11101011", --(X"E9") "11001110" WHEN "11101100", --(X"CE") "01010101" WHEN "11101101", --(X"55") "00101000" WHEN "11101110", --(X"28") "11011111" WHEN "11101111", --(X"DF") --sixteenth row "10001100" WHEN "11110000", --(X"8C") "10100001" WHEN "11110001", --(X"A1") "10001001" WHEN "11110010", --(X"89") "00001101" WHEN "11110011", --(X"0D") "10111111" WHEN "11110100", --(X"BF") "11100110" WHEN "11110101", --(X"E6") "01000010" WHEN "11110110", --(X"42") "01101000" WHEN "11110111", --(X"68") "01000001" WHEN "11111000", --(X"41") "10011001" WHEN "11111001", --(X"99") "00101101" WHEN "11111010", --(X"2D") "00001111" WHEN "11111011", --(X"0F") "10110000" WHEN "11111100", --(X"B0") "01010100" WHEN "11111101", --(X"54") "10111011" WHEN "11111110", --(X"BB") "00010110" WHEN "11111111", --(X"16") "XXXXXXXX" WHEN OTHERS;

END beh;

51

SHIFT_ ROWS CODE

-- ShiftRow Transformation-- 1st row gets shifted left by zero-- 2nd row gets shifted left by one-- 3rd row gets shifted left by two-- 4th row gets shifted left by three


ENTITY shift_row ISPORT( shiftrow_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); shiftrow_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END shift_row;

ARCHITECTURE beh OF shift_row IS-- type describing the byte array consisting of 16 byte matrix arrayTYPE matrix_index is array (15 downto 0) of std_logic_vector(7 downto 0);SIGNAL b, c matrix_index;

BEGIN--initial mapping of input into a byte matrix array named bmatrix_mapping PROCESS(shiftrow_in)BEGIN FOR i IN 15 DOWNTO 0 LOOP

b(15-i) <= shiftrow_in(8*i+7 DOWNTO 8*i); END LOOP;END PROCESS matrix_mapping;--shift row transformation-- b(i) --> c(i)---- | 0 4 8 12 | | 0 4 8 12 | (no shift) -- | 1 5 9 13 | ==> | 5 9 13 1 | ( 1 left shift) -- | 2 6 10 14 | | 10 14 2 6 | ( 2 left shift) -- | 3 7 11 15 | | 15 3 7 11 | ( 3 left shift)

--shifted first columnc(0) <= b(0);c(1) <= b(5);c(2) <= b(10);c(3) <= b(15);--shifted second columnc(4) <= b(4);c(5) <= b(9);c(6) <= b(14);

52

c(7) <= b(3);--shfited third columnc(8) <= b(8);c(9) <= b(13);c(10) <= b(2);c(11) <= b(7);--shifted forth columnc(12) <= b(12);c(13) <= b(1);c(14) <= b(6);c(15) <= b(11);

--mapping temporary c vector into shiftedrow outputmatrix_mapping_back PROCESS(c)BEGIN FOR i IN 15 DOWNTO 0 LOOP

shiftrow_out(8*i+7 DOWNTO 8*i) <= c(15-i); END LOOP;END PROCESS matrix_mapping_back;

END beh;

MIX _COLUMN CODE

-- MixColumn Transformation

-- multiplied matrix-- | 02 03 01 01 | | c(0) c(4) c(8) c(12) |-- | 01 02 03 01 | X | c(1) c(5) c(9) c(13) |-- | 01 01 02 03 | | c(2) c(6) c(10) c(14) |-- | 03 01 01 02 | | c(3) c(7) c(11) c(15) |


ENTITY mix_column ISPORT( mixcolumn_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); mixcolumn_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0) );END mix_column;

ARCHITECTURE beh OF mix_column IS--signal declearation TYPE matrix_index is array (15 downto 0) of std_logic_vector(7 downto 0); TYPE shift_index is array (15 downto 0) of std_logic_vector(8 downto 0); SIGNAL shifted_2, shifted_3, xored shift_index;

53

SIGNAL c, c_out, mult_2, mult_3 matrix_index;

BEGIN-- mapping input to a 4X4 matrix-- mixcolumn_in(127 downto 0) --> | c(0) c(4) c(8) c(12) |-- | c(1) c(5) c(9) c(13) |-- | c(2) c(6) c(10) c(14) |-- | c(3) c(7) c(11) c(15) |input_matrix_mapping PROCESS(mixcolumn_in)BEGIN FOR i IN 15 DOWNTO 0 LOOP

c(15-i) <= mixcolumn_in(8*i+7 DOWNTO 8*i); END LOOP;END PROCESS input_matrix_mapping;--all elements in the matrix is multiplied by 2 multiplication_by_2 PROCESS(c, shifted_2)BEGIN FOR i IN 15 DOWNTO 0 LOOP

shifted_2(i) <= c(i) & '0'; -- shift which is multiplying by 2IF (shifted_2(i)(8)='1') THEN-- for result exceeding finite field of 7 mult_2(i) <= shifted_2(i)(7 downto 0) XOR "00011011";ELSE mult_2(i) <= shifted_2(i)(7 downto 0);END IF;

END LOOP;END PROCESS multiplication_by_2;--all elements in the matrix is multiplied by 3--which is equivalent to 3*Z = 2*Z xor Zmultiplication_by_3 PROCESS(c, shifted_3, xored)BEGIN FOR i IN 15 DOWNTO 0 LOOP

shifted_3(i) <= c(i) & '0'; -- 2*Zxored(i) <= shifted_3(i) XOR '0' & c(i); -- (2*Z) xor ZIF (xored(i)(8)='1') THEN -- if finite field exceed 7 mult_3(i) <= xored(i)(7 downto 0) XOR "00011011";ELSE mult_3(i) <= xored(i)(7 downto 0);END IF;

END LOOP;END PROCESS multiplication_by_3;--mix column transformation--row onec_out(0) <= mult_2(0) XOR mult_3(1) XOR c(2) XOR c(3);c_out(4) <= mult_2(4) XOR mult_3(5) XOR c(6) XOR c(7);c_out(8) <= mult_2(8) XOR mult_3(9) XOR c(10) XOR c(11);c_out(12) <= mult_2(12) XOR mult_3(13) XOR c(14) XOR c(15);

54

--row twoc_out(1) <= c(0) XOR mult_2(1) XOR mult_3(2) XOR c(3); c_out(5) <= c(4) XOR mult_2(5) XOR mult_3(6) XOR c(7); c_out(9) <= c(8) XOR mult_2(9) XOR mult_3(10) XOR c(11); c_out(13) <= c(12) XOR mult_2(13) XOR mult_3(14) XOR c(15); --row threec_out(2) <= c(0) XOR c(1) XOR mult_2(2) XOR mult_3(3);c_out(6) <= c(4) XOR c(5) XOR mult_2(6) XOR mult_3(7);c_out(10) <= c(8) XOR c(9) XOR mult_2(10) XOR mult_3(11);c_out(14) <= c(12) XOR c(13) XOR mult_2(14) XOR mult_3(15);--row fourc_out(3) <= mult_3(0) XOR c(1) XOR c(2) XOR mult_2(3);c_out(7) <= mult_3(4) XOR c(5) XOR c(6) XOR mult_2(7);c_out(11) <= mult_3(8) XOR c(9) XOR c(10) XOR mult_2(11);c_out(15) <= mult_3(12) XOR c(13) XOR c(14) XOR mult_2(15);--mapping back to a vectormap_to_vector PROCESS(c_out)BEGIN FOR i IN 15 DOWNTO 0 LOOP

mixcolumn_out(8*i+7 DOWNTO 8*i) <= c_out(15-i); END LOOP;END PROCESS map_to_vector;

END beh;

KEY_SCHEDULE CODE

-- Key Scheduling -- Datapath for generating the key for each round.-- One 128-bit register is used to store each round key.-- Every clock cycle, one set of 4X4 key matrix is calculated.

-- original 4X4 key matrix-- key (127 downto 0) --> | k(0) k(4) k(8) k(12) |-- | k(1) k(5) k(9) k(13) |-- | k(2) k(6) k(10) k(14) |-- | k(3) k(7) k(11) k(15) |-- | -- V-- 32-bit words key_word(0)


ENTITY key_schedule ISPORT( clk IN STD_LOGIC; reset IN STD_LOGIC;

55

key_in IN STD_LOGIC_VECTOR(127 DOWNTO 0); key_out OUT STD_LOGIC_VECTOR(127 DOWNTO 0); key_reg_mux_sel IN STD_LOGIC; round_constant IN STD_LOGIC_VECTOR(7 DOWNTO 0); load_key_reg IN STD_LOGIC );END key_schedule;

ARCHITECTURE beh OF key_schedule IS--component declarationCOMPONENT s_box_4 PORT( s_word_in IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_word_out OUT STD_LOGIC_VECTOR(31 DOWNTO 0) );END COMPONENT;

--signal declarationTYPE word_array is array (3 downto 0) of std_logic_vector(31 downto 0); SIGNAL key_word, next_key_word word_array;

SIGNAL T , temp_shift, temp_sbox std_logic_vector(31 downto 0);SIGNAL key_reg_in, next_key, key_reg_out std_logic_vector(127 downto 0);SIGNAL upperbyte_trans std_logic_vector(7 downto 0);

BEGIN--key register, which stores a round key key0 PROCESS(reset, clk, key_reg_in, load_key_reg)BEGIN IF(reset='1') THEN

key_reg_out <= "00000000000000000000000000000000000000000000"& "00000000000000000000000000000000000000000000"& "0000000000000000000000000000000000000000";

ELSIF(clk'event AND clk='1') THENIF(load_key_reg='1') THEN key_reg_out <= key_reg_in;END IF;

END IF;END PROCESS key0;--mux at the input of key registerkey_reg_in <= key_in WHEN key_reg_mux_sel='0' ELSE --original key, 1st round

next_key; --for following rounds--mapping a vector into array of wordskey_word(0) <= key_reg_out(127 downto 96);key_word(1) <= key_reg_out(95 downto 64);key_word(2) <= key_reg_out(63 downto 32);key_word(3) <= key_reg_out(31 downto 0);--calculating next key words or next key columnnext_key_word(0) <= key_word(0) XOR T;

56

next_key_word(1) <= key_word(1) XOR key_word(0) XOR T;next_key_word(2) <= key_word(2) XOR key_word(1) XOR key_word(0) XOR T;next_key_word(3) <= key_word(3) XOR key_word(2) XOR key_word(1) XOR key_word(0) XOR T;--converting word array back to a vectornext_key <= next_key_word(0) & next_key_word(1) & next_key_word(2) & next_key_word(3);

--below describes the calculation of T --left shifttemp_shift <= key_word(3)(23 downto 16) &

key_word(3)(15 downto 8) & key_word(3)(7 downto 0) & key_word(3)(31 downto 24);

--key subbyte transformationsbox_lookup s_box_4 PORT MAP(

s_word_in => temp_shift, s_word_out => temp_sbox);

--XOR the upperbyte and round constantupperbyte_trans <= temp_sbox(31 downto 24) XOR round_constant;--finally vector T calculatedT <= upperbyte_trans & temp_sbox(23 downto 0);--connecting signal to a entity portkey_out <= key_reg_out;

END beh;

S_BOX_4 CODE

-- Four S Boxes grouped for key scheduling-- In key word, multiples of index 4 gets transformed by-- this module


ENTITY s_box_4 ISPORT( s_word_in IN STD_LOGIC_VECTOR(31 DOWNTO 0); s_word_out OUT STD_LOGIC_VECTOR(31 DOWNTO 0) );END s_box_4;

ARCHITECTURE beh OF s_box_4 IS--component instantiationCOMPONENT s_box

57

PORT( s_in IN STD_LOGIC_VECTOR(7 DOWNTO 0); s_out OUT STD_LOGIC_VECTOR(7 DOWNTO 0) );END COMPONENT;

BEGIN--generating 4 s-boxes sboxes FOR i IN 3 DOWNTO 0 GENERATE sbox_map s_box PORT MAP(

s_in => s_word_in(8*i+7 downto 8*i), s_out => s_word_out(8*i+7 downto 8*i) );

END GENERATE sboxes;

END beh;

S-BOX CODE

-- 8-bit input S-Box-- Implemented as a lookup table

LIBRARY ieee;USE ieee.std_logic_1164.ALL;

ENTITY s_box ISPORT( s_in IN STD_LOGIC_VECTOR(7 DOWNTO 0); s_out OUT STD_LOGIC_VECTOR(7 DOWNTO 0) );END s_box;

ARCHITECTURE beh OF s_box IS BEGIN--s box tableWITH s_in(7 DOWNTO 0) SELECT s_out(7 DOWNTO 0) <=

--first row "01100011" WHEN "00000000", --(X"63") "01111100" WHEN "00000001", --(X"7C") "01110111" WHEN "00000010", --(X"77") "01111011" WHEN "00000011", --(X"7B") "11110010" WHEN "00000100", --(X"F2") "01101011" WHEN "00000101", --(X"6B") "01101111" WHEN "00000110", --(X"6F") "11000101" WHEN "00000111", --(X"C5") "00110000" WHEN "00001000", --(X"30")

58

"00000001" WHEN "00001001", --(X"01") "01100111" WHEN "00001010", --(X"67") "00101011" WHEN "00001011", --(X"2B") "11111110" WHEN "00001100", --(X"FE") "11010111" WHEN "00001101", --(X"D7") "10101011" WHEN "00001110", --(X"AB") "01110110" WHEN "00001111", --(X"76") --second row "11001010" WHEN "00010000", --(X"CA") "10000010" WHEN "00010001", --(X"82") "11001001" WHEN "00010010", --(X"C9") "01111101" WHEN "00010011", --(X"7D") "11111010" WHEN "00010100", --(X"FA") "01011001" WHEN "00010101", --(X"59") "01000111" WHEN "00010110", --(X"47") "11110000" WHEN "00010111", --(X"F0") "10101101" WHEN "00011000", --(X"AD") "11010100" WHEN "00011001", --(X"D4") "10100010" WHEN "00011010", --(X"A2") "10101111" WHEN "00011011", --(X"AF") "10011100" WHEN "00011100", --(X"9C") "10100100" WHEN "00011101", --(X"A4") "01110010" WHEN "00011110", --(X"72") "11000000" WHEN "00011111", --(X"C0") --third row "10110111" WHEN "00100000", --(X"B7") "11111101" WHEN "00100001", --(X"FD") "10010011" WHEN "00100010", --(X"93") "00100110" WHEN "00100011", --(X"26") "00110110" WHEN "00100100", --(X"36") "00111111" WHEN "00100101", --(X"3F") "11110111" WHEN "00100110", --(X"F7") "11001100" WHEN "00100111", --(X"CC") "00110100" WHEN "00101000", --(X"34") "10100101" WHEN "00101001", --(X"A5") "11100101" WHEN "00101010", --(X"E5") "11110001" WHEN "00101011", --(X"F1") "01110001" WHEN "00101100", --(X"71") "11011000" WHEN "00101101", --(X"D8") "00110001" WHEN "00101110", --(X"31") "00010101" WHEN "00101111", --(X"15") --forth row "00000100" WHEN "00110000", --(X"04") "11000111" WHEN "00110001", --(X"C7") "00100011" WHEN "00110010", --(X"23") "11000011" WHEN "00110011", --(X"C3") "00011000" WHEN "00110100", --(X"18") "10010110" WHEN "00110101", --(X"96") "00000101" WHEN "00110110", --(X"05") "10011010" WHEN "00110111", --(X"9A")

59

"00000111" WHEN "00111000", --(X"07") "00010010" WHEN "00111001", --(X"12") "10000000" WHEN "00111010", --(X"80") "11100010" WHEN "00111011", --(X"E2") "11101011" WHEN "00111100", --(X"EB") "00100111" WHEN "00111101", --(X"27") "10110010" WHEN "00111110", --(X"B2") "01110101" WHEN "00111111", --(X"75") --fifth row "00001001" WHEN "01000000", --(X"09") "10000011" WHEN "01000001", --(X"83") "00101100" WHEN "01000010", --(X"2C") "00011010" WHEN "01000011", --(X"1A") "00011011" WHEN "01000100", --(X"1B") "01101110" WHEN "01000101", --(X"6E") "01011010" WHEN "01000110", --(X"5A") "10100000" WHEN "01000111", --(X"A0") "01010010" WHEN "01001000", --(X"52") "00111011" WHEN "01001001", --(X"3B") "11010110" WHEN "01001010", --(X"D6") "10110011" WHEN "01001011", --(X"B3") "00101001" WHEN "01001100", --(X"29") "11100011" WHEN "01001101", --(X"E3") "00101111" WHEN "01001110", --(X"2F") "10000100" WHEN "01001111", --(X"84") --sixth row "01010011" WHEN "01010000", --(X"53") "11010001" WHEN "01010001", --(X"D1") "00000000" WHEN "01010010", --(X"00") "11101101" WHEN "01010011", --(X"ED") "00100000" WHEN "01010100", --(X"20") "11111100" WHEN "01010101", --(X"FC") "10110001" WHEN "01010110", --(X"B1") "01011011" WHEN "01010111", --(X"5B") "01101010" WHEN "01011000", --(X"6A") "11001011" WHEN "01011001", --(X"CB") "10111110" WHEN "01011010", --(X"BE") "00111001" WHEN "01011011", --(X"39") "01001010" WHEN "01011100", --(X"4A") "01001100" WHEN "01011101", --(X"4C") "01011000" WHEN "01011110", --(X"58") "11001111" WHEN "01011111", --(X"CF") --seventh row "11010000" WHEN "01100000", --(X"D0") "11101111" WHEN "01100001", --(X"EF") "10101010" WHEN "01100010", --(X"AA") "11111011" WHEN "01100011", --(X"FB") "01000011" WHEN "01100100", --(X"43") "01001101" WHEN "01100101", --(X"4D") "00110011" WHEN "01100110", --(X"33")

60

"10000101" WHEN "01100111", --(X"85") "01000101" WHEN "01101000", --(X"45") "11111001" WHEN "01101001", --(X"F9") "00000010" WHEN "01101010", --(X"02") "01111111" WHEN "01101011", --(X"7F") "01010000" WHEN "01101100", --(X"50") "00111100" WHEN "01101101", --(X"3C") "10011111" WHEN "01101110", --(X"9F") "10101000" WHEN "01101111", --(X"A8") --eighth row "01010001" WHEN "01110000", --(X"51") "10100011" WHEN "01110001", --(X"A3") "01000000" WHEN "01110010", --(X"40") "10001111" WHEN "01110011", --(X"8F") "10010010" WHEN "01110100", --(X"92") "10011101" WHEN "01110101", --(X"9D") "00111000" WHEN "01110110", --(X"38") "11110101" WHEN "01110111", --(X"F5") "10111100" WHEN "01111000", --(X"BC") "10110110" WHEN "01111001", --(X"B6") "11011010" WHEN "01111010", --(X"DA") "00100001" WHEN "01111011", --(X"21") "00010000" WHEN "01111100", --(X"10") "11111111" WHEN "01111101", --(X"FF") "11110011" WHEN "01111110", --(X"F3") "11010010" WHEN "01111111", --(X"D2") --ninth row "11001101" WHEN "10000000", --(X"CD") "00001100" WHEN "10000001", --(X"0C") "00010011" WHEN "10000010", --(X"13") "11101100" WHEN "10000011", --(X"EC") "01011111" WHEN "10000100", --(X"5F") "10010111" WHEN "10000101", --(X"97") "01000100" WHEN "10000110", --(X"44") "00010111" WHEN "10000111", --(X"17") "11000100" WHEN "10001000", --(X"C4") "10100111" WHEN "10001001", --(X"A7") "01111110" WHEN "10001010", --(X"7E") "00111101" WHEN "10001011", --(X"3D") "01100100" WHEN "10001100", --(X"64") "01011101" WHEN "10001101", --(X"5D") "00011001" WHEN "10001110", --(X"19") "01110011" WHEN "10001111", --(X"73") --tenth row "01100000" WHEN "10010000", --(X"60") "10000001" WHEN "10010001", --(X"81") "01001111" WHEN "10010010", --(X"4F") "11011100" WHEN "10010011", --(X"DC") "00100010" WHEN "10010100", --(X"22") "00101010" WHEN "10010101", --(X"2A")

61

"10010000" WHEN "10010110", --(X"90") "10001000" WHEN "10010111", --(X"88") "01000110" WHEN "10011000", --(X"46") "11101110" WHEN "10011001", --(X"EE") "10111000" WHEN "10011010", --(X"B8") "00010100" WHEN "10011011", --(X"14") "11011110" WHEN "10011100", --(X"DE") "01011110" WHEN "10011101", --(X"5E") "00001011" WHEN "10011110", --(X"0B") "11011011" WHEN "10011111", --(X"DB") --eleventh row "11100000" WHEN "10100000", --(X"E0") "00110010" WHEN "10100001", --(X"32") "00111010" WHEN "10100010", --(X"3A") "00001010" WHEN "10100011", --(X"0A") "01001001" WHEN "10100100", --(X"49") "00000110" WHEN "10100101", --(X"06") "00100100" WHEN "10100110", --(X"24") "01011100" WHEN "10100111", --(X"5C") "11000010" WHEN "10101000", --(X"C2") "11010011" WHEN "10101001", --(X"D3") "10101100" WHEN "10101010", --(X"AC") "01100010" WHEN "10101011", --(X"62") "10010001" WHEN "10101100", --(X"91") "10010101" WHEN "10101101", --(X"95") "11100100" WHEN "10101110", --(X"E4") "01111001" WHEN "10101111", --(X"79") --twelveth row "11100111" WHEN "10110000", --(X"E7") "11001000" WHEN "10110001", --(X"C8") "00110111" WHEN "10110010", --(X"37") "01101101" WHEN "10110011", --(X"6D") "10001101" WHEN "10110100", --(X"8D") "11010101" WHEN "10110101", --(X"D5") "01001110" WHEN "10110110", --(X"4E") "10101001" WHEN "10110111", --(X"A9") "01101100" WHEN "10111000", --(X"6C") "01010110" WHEN "10111001", --(X"56") "11110100" WHEN "10111010", --(X"F4") "11101010" WHEN "10111011", --(X"EA") "01100101" WHEN "10111100", --(X"65") "01111010" WHEN "10111101", --(X"7A") "10101110" WHEN "10111110", --(X"AE") "00001000" WHEN "10111111", --(X"08") --thirteenth row "10111010" WHEN "11000000", --(X"BA") "01111000" WHEN "11000001", --(X"78") "00100101" WHEN "11000010", --(X"25") "00101110" WHEN "11000011", --(X"2E") "00011100" WHEN "11000100", --(X"1C")

62

"10100110" WHEN "11000101", --(X"A6") "10110100" WHEN "11000110", --(X"B4") "11000110" WHEN "11000111", --(X"C6") "11101000" WHEN "11001000", --(X"E8") "11011101" WHEN "11001001", --(X"DD") "01110100" WHEN "11001010", --(X"74") "00011111" WHEN "11001011", --(X"1F") "01001011" WHEN "11001100", --(X"4B") "10111101" WHEN "11001101", --(X"BD") "10001011" WHEN "11001110", --(X"8B") "10001010" WHEN "11001111", --(X"8A") --forteenth row "01110000" WHEN "11010000", --(X"70") "00111110" WHEN "11010001", --(X"3E") "10110101" WHEN "11010010", --(X"B5") "01100110" WHEN "11010011", --(X"66") "01001000" WHEN "11010100", --(X"48") "00000011" WHEN "11010101", --(X"03") "11110110" WHEN "11010110", --(X"F6") "00001110" WHEN "11010111", --(X"0E") "01100001" WHEN "11011000", --(X"61") "00110101" WHEN "11011001", --(X"35") "01010111" WHEN "11011010", --(X"57") "10111001" WHEN "11011011", --(X"B9") "10000110" WHEN "11011100", --(X"86") "11000001" WHEN "11011101", --(X"C1") "00011101" WHEN "11011110", --(X"1D") "10011110" WHEN "11011111", --(X"9E") --fifteenth row "11100001" WHEN "11100000", --(X"E1") "11111000" WHEN "11100001", --(X"F8") "10011000" WHEN "11100010", --(X"98") "00010001" WHEN "11100011", --(X"11") "01101001" WHEN "11100100", --(X"69") "11011001" WHEN "11100101", --(X"D9") "10001110" WHEN "11100110", --(X"8E") "10010100" WHEN "11100111", --(X"94") "10011011" WHEN "11101000", --(X"9B") "00011110" WHEN "11101001", --(X"1E") "10000111" WHEN "11101010", --(X"87") "11101001" WHEN "11101011", --(X"E9") "11001110" WHEN "11101100", --(X"CE") "01010101" WHEN "11101101", --(X"55") "00101000" WHEN "11101110", --(X"28") "11011111" WHEN "11101111", --(X"DF") --sixteenth row "10001100" WHEN "11110000", --(X"8C") "10100001" WHEN "11110001", --(X"A1") "10001001" WHEN "11110010", --(X"89") "00001101" WHEN "11110011", --(X"0D")

63

"10111111" WHEN "11110100", --(X"BF") "11100110" WHEN "11110101", --(X"E6") "01000010" WHEN "11110110", --(X"42") "01101000" WHEN "11110111", --(X"68") "01000001" WHEN "11111000", --(X"41") "10011001" WHEN "11111001", --(X"99") "00101101" WHEN "11111010", --(X"2D") "00001111" WHEN "11111011", --(X"0F") "10110000" WHEN "11111100", --(X"B0") "01010100" WHEN "11111101", --(X"54") "10111011" WHEN "11111110", --(X"BB") "00010110" WHEN "11111111", --(X"16") "XXXXXXXX" WHEN OTHERS;

END beh;

CONTROL CODE

-- Control for Rijndael system-- Control generates load and mux select signals to the datapath.


ENTITY control ISPORT( reset IN STD_LOGIC; clk IN STD_LOGIC; encrypt IN STD_LOGIC; data_reg_mux_sel OUT STD_LOGIC_VECTOR(1 DOWNTO 0); load_data_reg OUT STD_LOGIC; key_reg_mux_sel OUT STD_LOGIC; round_const OUT STD_LOGIC_VECTOR(7 DOWNTO 0); last_mux_sel OUT STD_LOGIC; load_key_reg OUT STD_LOGIC );END control;

ARCHITECTURE beh OF control IS--state declarationTYPE control_type IS (init, load_inputs, round1, round2, round3, round4, round5,

round6, round7, round8, round9, round10, round0); SIGNAL control_ps, control_ns control_type;

BEGIN--finite state machine for controlcontrol_FSM

64

PROCESS (clk, encrypt, reset, control_ns, control_ps) BEGIN IF(reset='1') THEN

control_ps <= init; ELSIF (clk'event AND clk='1') THEN

control_ps <= control_ns; END IF; key_reg_mux_sel <= '1'; --default outputs data_reg_mux_sel <= "01"; round_const <= "00000000"; load_key_reg <= '1'; load_data_reg <= '1'; last_mux_sel <= '0'; --combinatorial part CASE control_ps IS

WHEN init => key_reg_mux_sel <= '0'; load_key_reg <= '0'; load_data_reg <= '0'; IF (encrypt='1') THEN

control_ns <= load_inputs; ELSE

control_ns <= init; END IF;

WHEN load_inputs =>data_reg_mux_sel <= "11";key_reg_mux_sel <= '0';control_ns <= round0;

--key0 loaded, XOR key0 and plaintextWHEN round0 =>

round_const <= "00000001";data_reg_mux_sel <= "00";control_ns <= round1;

--key1, start of normal roundsWHEN round1 =>

round_const <= "00000010";control_ns <= round2;

--key2 WHEN round2 =>





round_const <= "00010000";

65

control_ns <= round5;--key5 WHEN round5 =>










--key10, last round excludes the mix column step WHEN round10 =>

last_mux_sel <= '1';load_key_reg <= '0';control_ns <= init;

END CASE;END PROCESS control_FSM;

END beh;

TEST BENCH FOR AES

-- Test bench for AES

LIBRARY ieee;USE ieee.std_logic_1164.ALL;USE work.ALL;

ENTITY test_bench ISEND test_bench;

ARCHITECTURE beh OF test_bench IS--system under testCOMPONENT aes PORT( plaintext IN STD_LOGIC_VECTOR(127 DOWNTO 0); user_key IN STD_LOGIC_VECTOR(127 DOWNTO 0); ciphertext OUT STD_LOGIC_VECTOR(127 DOWNTO 0); encrypt IN STD_LOGIC;

66

clk IN STD_LOGIC; reset IN STD_LOGIC );END COMPONENT;

--signal instantiationSIGNAL plaintext, user_key, ciphertext, valid_cipher STD_LOGIC_VECTOR(127 DOWNTO 0);SIGNAL encrypt, clk, reset std_logic;CONSTANT period time = 100 ns;SIGNAL test_result STD_LOGIC_VECTOR(31 downto 0);BEGIN--system under testtest_system aesPORT MAP( plaintext => plaintext, user_key => user_key, ciphertext => ciphertext, encrypt => encrypt, clk => clk, reset=> reset );--clock signal clock PROCESS BEGIN clk <= '1'; wait for period/2; clk <= '0'; wait for period/2;END PROCESS clock;--test vector inputs PROCESSBEGIN reset <= '1'; wait for period; reset <= '0'; wait for period; encrypt <= '1'; plaintext <= to_stdlogicvector(X"3243f6a8885a308d313198a2e0370734"); user_key <= to_stdlogicvector(X"2b7e151628aed2a6abf7158809cf4f3c"); wait for period; encrypt <= '0'; wait for period*20; assert ciphertext=to_stdlogicvector(X"3925841d02dc09fbdc118597196a0b32") report "TEST FAILED -- WRONG CIPHERTEXT!" severity error; wait;END PROCESS inputs;--PASS or FAIL messagetest_output PROCESS(ciphertext)BEGIN

67

IF(ciphertext=to_stdlogicvector(X"3925841d02dc09fbdc118597196a0b32")) THEN test_result <= "01010000010000010101001101010011";

ELSEtest_result <= "01000110010000010100100101001100";

END IF;END PROCESS test_output; --correct encrypted messagevalid_cipher <= to_stdlogicvector(X"3925841d02dc09fbdc118597196a0b32"); END beh;

TOP MODULE CODE ------------------------------------------------------------------------------------ Company -- Engineer -- -- Create Date 15 04 31 03/19/2010 -- Design Name -- Module Name top - Behavioral -- Project Name -- Target Devices -- Tool versions -- Description ---- Dependencies ---- Revision -- Revision 0.01 - File Created-- Additional Comments ------------------------------------------------------------------------------------library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;

---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.--library UNISIM;--use UNISIM.VComponents.all;

entity top isPORT(clk IN STD_LOGIC; -- plaintext IN STD_LOGIC_VECTOR(127 DOWNTO 0); --user_key IN STD_LOGIC_VECTOR(127 DOWNTO 0); ciphertext OUT STD_LOGIC_VECTOR(127 DOWNTO 0); encrypt IN STD_LOGIC; reset IN STD_LOGIC );

68

end top;

architecture Behavioral of top isconstant plaintext STD_LOGIC_VECTOR(127 DOWNTO 0) = "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111";constant user_key STD_LOGIC_VECTOR(127 DOWNTO 0) = "00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111";signal CONTROL0 STD_LOGIC_VECTOR ( 35 downto 0 );signal CONTROL STD_LOGIC_VECTOR ( 35 downto 0 ); signal SYNC_OUT STD_LOGIC_VECTOR ( 129 downto 0 ); signal SYNC_IN STD_LOGIC_VECTOR ( 129 downto 0 ); signal encrypt1 STD_LOGIC;signal reset1 STD_LOGIC;signal ciphertext1 STD_LOGIC_VECTOR(127 DOWNTO 0);component aes ISPORT(clk IN STD_LOGIC; --plaintext IN STD_LOGIC_VECTOR(127 DOWNTO 0); --user_key IN STD_LOGIC_VECTOR(127 DOWNTO 0); ciphertext OUT STD_LOGIC_VECTOR(127 DOWNTO 0); encrypt IN STD_LOGIC; reset IN STD_LOGIC );END component;component icon is port ( CONTROL0 inout STD_LOGIC_VECTOR ( 35 downto 0 ) );end component;component vio is port ( CLK in STD_LOGIC = 'X'; CONTROL inout STD_LOGIC_VECTOR ( 35 downto 0 ); SYNC_OUT out STD_LOGIC_VECTOR ( 129 downto 0 ); SYNC_IN in STD_LOGIC_VECTOR ( 129 downto 0 ) );end component;beginaesinst1 aes port map(CLK=>CLK,ciphertext=>ciphertext1,encrypt=>encrypt1,reset=>reset1);iconinst icon port map(CONTROL0=>CONTROL0 );vioinst vio port map(CLK=>CLK ,CONTROL =>CONTROL0, SYNC_OUT=>SYNC_OUT ,SYNC_IN=>SYNC_IN );reset1<=SYNC_OUT(0);encrypt1<=SYNC_OUT(1);SYNC_IN<=("00"& ciphertext1);end Behavioral;

69

CHAPTER-7

SIMULATION RESULTS

FIG.7.1.TEST BENCH WAVEFORM

FIG.7.2.OUTPUT OF SIMULATION

70

FIG.7.3.SIMULATION RESULT SHOWING OUTPUT OF EACH ROUND

71

FIG.7.4.RESULT SHOWN ON CHIPSCOPE ANALYZER

72

The following diagram shows the values in the State array as the Cipher

progresses for a block length and a Cipher Key length of 16 bytes each (i.e., Nb = 4

and Nk = 4).

Input = 32 43 F6 A8 88 5A 30 8D 31 31 98 A2 E0 37 07 34

Cipher Key = 2B 7E 15 16 28 AE D2 A6 AB F7 15 88 09 CF 4F 3C

The Round Key values are taken from the Key Expansion example in Appendix A.

73

Fig 7.5 Cipher Example

74

CHAPTER-8

CONCLUSIONS

The efficiency of AES implementation may be in terms of area or throughput.

Our focus was on area efficient architecture. We successfully achieved a compact

architecture with only a small percentage of device resources being used and still

having a faster frequency of 206.28 MHz and throughput of 2.64 Gbps. The use of

BRAMs for S-box storage allows the device slices to remain available for

implementation of other logic of any application requiring our encryption core.

8.1 FUTURE WORK

Our future work includes incorporating the AES decryption process in this core

also. Then we will work on improving the throughput and size of this core. This will

be done by focusing on the target device, clocking techniques, area constraints, and

combining consecutives transformation rounds in one module and efficient utilization

of available embedded BRAMs.

75

REFERENCES

[1] A compact AES encryption core on xilinx FPGA IEE-2009 by Kundi, D.-e.-S.

Zaka, S. Qurat-Ul-Ain Aziz, A. PN Eng. Coll., Nat. Univ. of Sci. & Technol.,

Karachi published on 17-18 Feb. 2009

[2] William Stallings, Cryptography and Network security, 3rd Edition, Pearson

Education.

[3] A. Lee, NIST Special Publication 800-21, Guideline for Implementing

Cryptography in the Federal Government, National Institute of Standards and

Technology,November 1999.

[4] www.wikipedia.org

[5] Xilinx Using Block RAM in Spartan-3 Generation FPGAs available at

http //www.xilinx.com/bvdocs/appnotcs/xapp463. pdf, 2004.

[6] Xilinx, Spartan-3 Field Programmable Gate Array data sheets available

at http /www.xilinx.com/spartan3.

76

http://www.wikipedia.org/

final data

Documents