eecs 150 homework 11 solutions fall...

EECS 150 Homework 11 Solutions Fall 2008

of 6

The first three questions concern 8, 16, and 32 bit microcontroller chips that all have family members available for under $1. The files referenced are available at: http://www.eecs.berkeley.edu/~pister/150fa08

1) Look at the users guide for the Texas Instruments TI MSP430 a. Is this a Princeton or Harvard architecture? How many bits? (Fig 1-1)

Program and Data are stored together in the same address space Princeton architecture. 16 bit datapath.

b. What is the maximum number of bytes available for program and data memory? (Fig 1-2) The total number of bytes available for [program + data] is 0FFDF – 0200 = 0FDDF, or 64991 bytes. The amount used for code and the amount used for data depends on the configuration.

c. How many inputs and outputs are there from the register file? (Fig. 3-1) The Register file has 1 input: the memory data bus. It has 4 outputs: 2 to the ALU, 1 to the memory data bus, and 1 to the memory address bus.

d. Give an example of an instruction that can execute in a single cycle. What is the most cycles that it takes for an instruction to execute, and how many bytes long is that instruction? (Table 3-16) (Extra for experts: what does that instruction do?) An instruction that will execute in a single cycle: MOV R5, R8 The maximum number of cycles it takes to execute an instruction is 6, of which there are several examples. Each of these instructions is 3 bytes long. For example: MOV 4(R7), TONI

2) Look at the paper on the ARM Cortex-M3 a. Is this a Princeton or a Harvard architecture? How many bits?

There is separate memory for the program code and the data, so this is a Harvard architecture. The datapath is 32 bits.

b. Is the first paragraph in section 2.1 truth, or marketing hype? While there is some truth to this, it is mostly marketing hype. The benefit of a Harvard architecture over a Princeton architecture is that in every cycle, you can execute the current instruction and fetch the next instruction. This is made possible by the fact that the memory address & data busses are NOT used to access the instruction, so they are available to execute the instruction if needed. In contrast, when you fetch an instruction in a Princeton architecture, the memory address and databus are used to fetch the instruction from memory. If the instruction only involves registers and does not require accessing the memory, you can execute this instruction in a single cycle. If, however, the instruction requires accessing the memory, it will take 2 cycles to excute the instruction: 1 to fetch the instruction from memory, and one to execute the instruction. As it turns out, the ratio of instruction calls that access memory to those that don’t is roughly 1:4. Therefore, you typically only have to access the memory every ~5th instruction, so the average number of cycles to execute an instruction in a Princeton architecture is ~1.2 cycles. Thus a Harvard architecture could offer ~16% reduction in cycles, which is not nearly as drastic as this white paper implies.

c. How does the M3 memory map differ from the MSP430? (Figure 4) A few examples of differences between the memory maps: 1) The ARM M3 has a much larger address space – 4GB compared to 64KB for the MSP430. 2) The memory maps are inverted – the ARM M3 stores the code at the lowest address,


of 6

while the MSP430 stores code at the highest addresses. 3) The MSP430 and the ARM M3 have memory-mapped peripherals, but the ARM M3 includes addresses for several external devices, such as external RAM and external peripheral devices.

d. What does the Bit Band Aliased region let you do? Is it wasteful of address space (i.e. it uses 32MB of address space – should I care?) The Bit Band Aliased region allows you to access/modify individual bits in the Bit Band in a single instruction. This is a significant improvement over the standard way to access a single bit: read the data, mask the desired bit, and re-write the data. The Bit Band region contains 32-bit words. Each individual bit in the Bit Band word corresponds to a specific address in the Bit Band Alias region. Writing to the address in the Bit Band Alias region essentially writes the individual bit in the Bit Band. This is all done in hardware that sets or clears a single bit in the Bit Band, and there is actually no physical memory in the Bit Band Alias region. Although this uses 32M addresses, this is not wasteful since 32 bits of address allows the Cortex to address far more memory than any embedded application will actually have.

3) Look at the users guide for the Atmel ATmega128 a. Is this a Princeton or a Harvard architecture? How many bits?

Harvard; 8 bit. b. Are the program counter and stack pointer registers part of the main register file?

(Fig 2) No – the PC and SP are separate from the general purpose registers in the ATmega. In contrast, the PC and SP are in the main register file in the MSP and Cortex.

4) For the simplified 32 bit processor that we’ve been working with in class (on the following pages)

a. Use copies of the following page to trace the execution of (See instruction traces 4ai – 4aiii on the next page…)

i. back-to-back ADD3 instructions The ADD3 command does not use the MA or MD bus, so it can fetch the next instruction while it is executing the ADD.

ii. PUSH R2 To improve performance and make this instruction execute in only two cycles, one option would be to make the SP a counter, like the PC. If the SP could self-increment and self-decrement, the PUSH and the MOVE_SP could both be executed in the same cycle.

iii. A branch taken, and a branch not taken It would also be possible for the Branch instruction to fetch the next instruction in the same cycle that it evaluates the conditions for the branch. This would remove one cycle in the case of a branch not taken, but would not change the performance for a branch taken.


of 6


of 6

b. Which instructions require a field from the IR to pass through the ALU? Specify

which bits, and if they need to be shifted, sign extended, etc. LD2: #immed (4 bits), shifted left 2 bits LD1: #immed (8 bits) ST2: #immed (4 bits), shifted left 2 bits ADD2: #immed (4 bits) SUB2: #immed (4 bits) BRx: offset (10 bits), shifted left 1 bit and sign extended Note that the instructions LSL, LSR, ASR, and ROT will also require a field from the IR (the 4 bit “shift” value) to control the shifter on the input to the ALU.

c. What instructions would you need to use to load -243 into R2? To load an immediate value, we use the LD1 instruction. However, we can only load a positive value, so first we load 243 into R2, and then we pass it through


of 6

the ALU to negate it. Using only the instructions explicitly listed, we subtract from zero to negate:

LD1 R1, 243 LD1 R2, 0 SUB3 R2, R2, R1 We aren’t explicitly given an instruction to negate a register value, but we know that this is included in the basic functionality of an ALU. We can save one instruction by using a Negate command: LD1 R2, 243 NEG R2, R2 (R2 := -R2)

d. What changes do you need to make to the architecture to allow the process status word to be PUSHed and POPed? The process status word (PSW) is not defined in the given architecture. We assume that the PSW is separate from the general-purpose register file, similar to how the Instruction Register is shown. In order to be able to PUSH and POP the PSW to and from the stack, it needs to be connected to the memory data bus as shown below:

PSW


of 6

e. Assuming that the IR and the PSW are part of the control FSM (and hold most of its state) how many additional bits of state do you need in the control FSM? You would need enough bits to tell you which cycle of the instruction you were currently in. For example, the BRx instruction shown in part (a) has a “fetch” cycle and a “BR” cycle. These are different states in the FSM, but the IR and PSW may not be able to distinguish between the two states. The number of additional bits of state is set by the instruction with the maximum number of cycles. Also seen in part (a), PUSH R2 took 3 cycles: “fetch”, “Push”, and “Move_SP”. Thus we need two additional bits of state in order to distinguish which cycle we are in.

f. Draw a subset of the state transition (bubble) diagram which implements only ADD3, LD2, ST2, and BRx

Note that all states except ADD3 go back to the Fetch state after executing the instruction. ADD3 fetches the next instruction in the same cycle as the execution of the ADD, so it does not need to go back to Fetch before transitioning to the next instruction state. Note that most instructions look like the ADD3 instruction, in that they will also fetch the next instruction in the same cycle that they execute. For each of the branch states, the branch address is calculated regardless of whether the branch condition is met. As seen in the instruction trace in part (a), the REG_EN line is only set if the condition is met, but the new address is calculated either way. The REG_EN is controlled by the NCVZ outputs from the ALU (Negative, Carry-out, oVerflow, Zero).

eecs 150 homework 11 solutions fall...

Documents