microprocessor

http://www.eastaughs.fsnet.co.uk/cpu/index.htm The microprocessor is sometimes referred to as the 'brain' of the personal computer, and is responsible for the processing of the instructions which make up computer software. It houses the central processing unit, commonly referred to as the CPU, and as such is a crucially important part of the home PC. However, how many people really understand how the chip itself works? This tutorial aims to provide an introduction to the various parts of the microprocessor, and to teach the basics of the architecture and workings of the CPU across three specific sections: CPU Structure This section, using a simplified model of a central processing unit as an example, takes you through the role of each of the major constituent parts of the CPU. It also looks more closely at each part, and examines how they are constructed and how they perform their role within the microprocessor. Instruction Execution Once you are familiar with the various elements of the processor, this section looks at how they work together to process and execute a program. It looks at how the various instructions that form the program are recognised, together with the processes and actions that are carried out during the instruction execution cycle itself. Further Features Now that the basics have been covered, this section explores the further advancements in the field of microprocessor architecture that have occured in recent years. Explanations of such techniques as pipelining and hyperthreading are provided, together with a look at cache memory and trends in CPU architecture. Each section also concludes with a multiple choice quiz with which you can test your knowledge, while some also contain interactive animations in order to improve your learning experience. These animations are in Macromedia Flash format, and will require Flash Player to be installed on your computer. If it is not, please visit the Macromedia website in order to download and install the browser plug-in. The first section of this tutorial related to the structure of the central processing unit. Please click the button marked with the next arrow below to proceed.

As there are a great many variations in architecture between the different kinds of CPU, we shall begin my looking at a simplified model of the structure. The model to be used can be seen on the right of this page, and is a good basis on which to build your knowledge of the workings of a microprocessor. The simplified model consists of five parts, which are: Arithmetic & Logic Unit (ALU) The part of the central processing unit that deals with operations such as addition, subtraction, and multiplication of integers and Boolean operations. It receives control signals from the control unit telling it to carry out these operations. For more, click the title above.

Control Unit (CU) This controls the movement of instructions in and out of the processor, and also controls the operation of the ALU. It consists of a decoder, control logic circuits, and a clock to ensure everything happens at the correct time. It is also responsible for performing the instruction execution cycle. More on the control unit can be discovered by clicking the title above. Register Array This is a small amount of internal memory that is used for the quick storage and retreival of data and instructions. All processors include some common registers used for specific functions, namely the program counter, instruction register, accumulator, memory address register and stack pointer. For more, click the title above. System Bus This is comprised of the control bus, data bus and address bus. It is used for connections between the processor, memory and peripherals, and transferal of data between the various parts. Click the title above for more. Memory The memory is not an actual part of the CPU itself, and is instead housed elsewhere on the motherboard. However, it is here that the program being executed is stored, and as such is a crucial part of the overall structure involved in program execution. For further information on the memory, please see the seperate tutorial if available. For more information on these parts of the CPU, click the corresponding title of the description above. You could also click on the part in question on the diagram to the right. Alternatively, click the right arrow button below to move on to the next page, which looks at the arithmetic and logic unit.

The simplified model of the central processing unit. Click on an area for more details.

The ALU, or the arithmetic and logic unit, is the section of the processor that is involved with executing operations of an arithmetic or logical nature. It works in conjunction with the register array for many of these, in particular, the accumulator and flag registers. The accumulator holds the results of operations, while the flag register contains a number of individual bits that are used to store information about the last operation carried out by the ALU. More on these registers can be found in the register array section. You can look at the ALU as comprising many subcomponents for each specific task that it is required to perform. Some of these tasks and their appropriate subcomponents are:

The simplified model of the central processing unit, with the arithmetic & logic unit highlighted in red. Click on a different section for more information.

Addition and subtraction These two tasks are performed by constructs of logic gates, such as half adders and full adders. While they may be termed 'adders', with the aid of they can also perform subtraction via use of inverters and 'two's complement' arithmetic. The topic of logic gates is too expansive and detailed to be covered in full here. Many resources exist on the internet and elsewhere relating to this topic, however, so it is recommended that you read further into the areas outlined above to aid with your learning. Multiplication and division In most modern processors, the multiplication and division of integer values is handled by specific floating-point hardware within the CPU. Earlier processors used either additional chips known as maths co-processors, or used a completely different method to perform the task. Logical tests Further logic gates are used within the ALU to perform a number of different logical tests, including seeing if an operation produces a result of zero. Most of these logical tests are used to then change the values stored in the flag register, so that they may be checked later by seperate operations or instructions. Others produce a result which is then stored, and used later in further processing. Comparison Comparison operations compare values in order to determine such things as whether one number is greater than, less than or equal to another. These operations can be performed by subtraction of one of the numbers from the other, and as such can be handled by the aforementioned logic gates. However, it is not strictly necessary for the result of the calculation to be stored in this instance.. the amount by which the values differ is not required. Instead, the appropriate status flags in the flag register are set and checked to detemine the result of the operation.

Bit shifting Shifting operations move bits left or right within a word, with different operations filling the gaps created in different ways. This is accomplished via the use of a shift register, which uses pulses from the clock within the control unit to trigger a chain reaction of movement across the bits that make up the word. Again, this is a quite complicated logical procedure, and further reading may aid your understanding. Click the next button below to move on and look at the control unit, or alternatively click on a section of the diagram above to view a different section. The control unit is arguably the most complicated part of this model CPU, and is responsible for controlling much of the operation of the rest of the processor. It does this by issuing control signals to the other areas of the processor, instructing them on what should be performed next.

The simplified model of the central processing unit, with the control unit highlighted in red. Click on a different section for more information.

Similarly to the arithmetic and logic unit, the control unit can be broken down further for easier understanding. As such, the three main elements of the control unit are as follows: Decoder This is used to decode the instructions that make up a program when they are being processed, and to determine in what actions must be taken in order to process them. These decisions are normally taken by looking at the opcode of the instruction, together with the addressing mode used. This is covered in greater detail in the instruction execution section of this tutorial. Timer or clock The timer or clock ensures that all processes and instructions are carried out and completed at the right time. Pulses are sent to the other areas of the CPU at regular intervals (related to the processor clock speed), and actions only occur when a pulse is detected. This ensures that the actions themselves also occur at these same regular intervals, meaning that the operations of the CPU are synchronised. Control logic circuits The control logic circuits are used to create the control signals themselves, which are then sent around the processor. These signals inform the arithmetic and logic unit and the register array what they actions and steps they should be performing, what data they should be using to perform said actions, and what should be done with the results. Further detail is not required at this stage on the control unit, though it is clear that there is much detail at lower levels that has yet to be touched on. However, to move

on to the next element of the processor (the register array), please click the next button below. A register is a memory location within the CPU itself, designed to be quickly accessed for purposes of fast data retrieval. Processors normally contain a register array, which houses many such registers. These contain instructions, data and other values that may need to be quickly accessed during the execution of a program. Many different types of registers are common between most microprocessor designs. These are:with the register array highlighted in red. Click on a Program Counter (PC) different section for more information. This register is used to hold the memory address of the next instruction that has to executed in a program. This is to ensure the CPU knows at all times where it has reached, that is able to resume following an execution at the correct point, and that the program is executed correctly. The simplified model of the central processing unit,

Instruction Register (IR) This is used to hold the current instruction in the processor while it is being decoded and executed, in order for the speed of the whole execution process to be reduced. This is because the time needed to access the instruction register is much less than continual checking of the memory location itself. Accumulator (A, or ACC) The accumulator is used to hold the result of operations performed by the arithmetic and logic unit, as covered in the section on the ALU. Memory Address Register (MAR) Used for storage of memory addresses, usually the addresses involved in the instructions held in the instruction register. The control unit then checks this register when needing to know which memory address to check or obtain data from. Memory Buffer Register (MBR) When an instruction or data is obtained from the memory or elsewhere, it is first placed in the memory buffer register. The next action to take is then determined and carried out, and the data is moved on to the desired location. Flag register / status flags The flag register is specially designed to contain all the appropriate 1-bit status flags, which are changed as a result of operations involving the arithmetic and logic unit. Further information can be found in the section on the ALU.

Other general purpose registers These registers have no specific purpose, but are generally used for the quick storage of pieces of data that are required later in the program execution. In the model used here these are assigned the names A and B, with suffixes of L and U indicating the lower and upper sections of the register respectively. The final main area of the model microprocessor being used in this tutorial is the system bus. Click the next arrow button below in order to read more. The system bus is a cable which carries data communication between the major components of the computer, including the microprocessor. Not all of the communication that uses the bus involves the CPU, although naturally the examples used in this tutorial will centre on such instances.

The simplified model of the central processing unit, with the system bus highlighted in red. Click on a different section for more information.

The system bus consists of three different groups of wiring, called the data bus, control bus and address bus. These all have seperate responsibilities and characteristics, which can be outlined as follows: Control Bus The control bus carries the signals relating to the control and co-ordination of the various activities across the computer, which can be sent from the control unit within the CPU. Different architectures result in differing number of lines of wire within the control bus, as each line is used to perform a specific task. For instance, different, specific lines are used for each of read, write and reset requests. Data Bus This is used for the exchange of data between the processor, memory and peripherals, and is bi-directional so that it allows data flow in both directions along the wires. Again, the number of wires used in the data bus (sometimes known as the 'width') can differ. Each wire is used for the transfer of signals corresponding to a single bit of binary data. As such, a greater width allows greater amounts of data to be transferred at the same time. Address Bus The address bus contains the connections between the microprocessor and memory that carry the signals relating to the addresses which the CPU is processing at that time, such as the locations that the CPU is reading from or writing to. The width of the address bus corresponds to the maximum addressing capacity of the bus, or the largest address within memory that the bus can work with. The addresses are transferred in binary format, with each line of the address bus carrying a single binary digit. Therefore the maximum address capacity is equal to two to the power of the number of lines present (2^lines).

This concludes the look at the simplified model processor that will be used for the remainder of this tutorial. The next section will look at the instruction execution process, and how these different parts work together to execute programs. However, before that, there's a chance to test what you've learnt in this section regarding processor architecture. Click the next arrow below to take a short quiz relating to this section of the tutorial. Following on from looking at the structure and architecture of the central processing unit itself, we shall now look at how the CPU is used to execute programs and make the computer as a whole run smoothly and efficiently. To do this, we must take a step back from concentrating solely on the processor, and look at the complete computer unit.

A flow diagram illustrating the flow of data within the PC during program execution and the saving of data. Further explanation can be found below.

When software is installed onto a modern day personal computer (most commonly from a CD-ROM, though other media or downloading from the internet is also common), code comprising the program and any associated files is stored on the hard drive. This code comprises of a series of instructions for performing designated tasks, and data associated with these instructions. The code remains there until the user chooses to execute the program in question, on which point sections of the code are loaded into the computers memory. The CPU then executes the program from memory, processing each instruction in turn. Of course, in order to execute the instructions, it is necessary for the CPU to understand what the instruction is telling it to do. Therefore, recognition for instructions that could be encountered needs to be programmed into the processor. The instructions that can be recognized by a processor are referred to as an 'instruction set', and are described in greater detail on the next page of the tutorial. Once the instruction has been recognized, and the actions that should be carried out are decided upon, the actions are then performed before the CPU proceeds on to the next instruction in memory. This process is called the 'instruction execution cycle', and is also covered later on in this tutorial. Results can then be stored back in the memory, and later saved to the hard drive and possibly backed up onto removal media or in seperate locations. This is the same flow of information as when a program is executed only in reverse, as illustrated in the diagram above.

On the next page of this tutorial is a more in-depth look at instruction sets. Click the next arrow below to proceed. As outlined in the introduction to this section, for a processor to be able to process an instruction, it needs to be able to determine what the instruction is asking to be carried out. For this to occur, the CPU needs to know what actions it may be asked to perform, and have pre-determined methods available to carry out these actions. It is this idea which is the reasoning behind the 'instruction set'. When a processor is executing a program, the program is in a machine language. However, programmers almost never write their programs directly into this form. While it may not have been originally written in this way, it is translated to a machine language at some point before execution so that it is understandable by the CPU. Machine language can be directly interpreted by the hardware itself, and is able to be easily encoded as a string of binary bits and sent easily via electrical signals. The instruction set is a collection of pre-defined machine codes, which the CPU is designed to expect and be able to act upon when detected. Different processors have different instruction sets, to allow for greater features, easier coding, and to cope with changes in the actual architecture of the processor itself. Each machine code of an instruction set consists of two seperate fields: Opcode Operand(s)

The opcode is a short code which indicates what operation is expected to be performed. Each operation has a unique opcode. The operand, or operands, indicate where the data required for the operation can be found and how it can be accessed (the addressing mode, which is discussed in full later). The length of a machine code can vary - common lengths vary from one to twelve bytes in size. The exact format of the machine codes is again CPU dependant. For the purpose of this tutorial, we will presume we are using a 24-bit CPU. This means that the minimum length of the machine codes used here should be 24 binary bits, which in this instance are split as shown in the table below: Opcode 6 bits (18-23) - Allows for 64 unique opcodes (2^6)

Operand(s) 18 bits (0-17) - 16 bits (0-15) for address values - 2 bits (16/17) for specifying addressing mode to be used Opcodes are also given mnemonics (short names) so that they can be easily referred to in code listings and similar documentation. For example, an instruction to store the contents of the accumulator in a given memory address could be given the binary opcode 000001, which may then be referred to using the mnemonic STA (short for STore Accumulator). Such mnemonics will be used for the examples on upcoming pages. Now we know what form the data is in when it is read by the CPU, it is necessary to learn about the cycle by which the instructions of a program are executed. This is the

topic of the next page of the tutorial, which can be accessed by clicking the next arrow below Once a program is in memory it has to be executed. To do this, each instruction must be looked at, decoded and acted upon in turn until the program is completed. This is achieved by the use of what is termed the 'instruction execution cycle', which is the cycle by which each instruction in turn is processed. However, to ensure that the execution proceeds smoothly, it is is also necessary to synchronise the activites of the processor.

Diagram showing the basics of the instruction execution cycle. Each instruction is fetched from memory, decoded, and then executed.

To keep the events synchronised, the clock located within the CPU control unit is used. This produces regular pulses on the system bus at a specific frequency, so that each pulse is an equal time following the last. This clock pulse frequency is linked to the clock speed of the processor - the higher the clock speed, the shorter the time between pulses. Actions only occur when a pulse is detected, so that commands can be kept in time with each other across the whole computer unit. The instruction execution cycle can be clearly divided into three different parts, which will now be looked at in more detail. For more on each part of the cycle click the relevant heading, or use the next arrow as before to proceed though each stage in order. Fetch Cycle The fetch cycle takes the address required from memory, stores it in the instruction register, and moves the program counter on one so that it points to the next instruction. Decode Cycle Here, the control unit checks the instruction that is now stored within the instruction register. It determines which opcode and addressing mode have been used, and as such what actions need to be carried out in order to execute the instruction in question. Execute Cycle The actual actions which occur during the execute cycle of an instruction depend on both the instruction itself, and the addressing mode specified to be used to access the data that may be required. However, four main groups of actions do exist, which are discussed in full later on. Clicking the next arrow below will take you to further information relating to the fetch cycle. The first part of the instruction execution cycle is the fetch cycle. To best illustrate the actions that occur within the fetch cycle, there is an interactive animation below. Once the instruction has been fetched and stored in the instruction register, it must

then be decoded. The decoding process is detailed on the next page, which can be accessed by clicking the next arrow below. Once the instruction has been fetched and is stored, the next step is to decode the instruction in order to work out what actions should be performed to execute it. This involves examining the opcode to see which of the machine codes in the CPU's instruction set it corresponds to, and also checking which addressing mode needs to be used to obtain any required data. Therefore, using the CPU model from this tutorial, bits 16 to 23 should be examined. Once the opcode is known, the execution cycle can occur. Different actions need to be carried out dependant on the opcode, with no two opcodes requiring the same actions to occur. However, there are generally four groups of different actions that can occur: Transfer of data between the CPU and memory. Transfer of data between the CPU and an input or output devices. Processing of data, possibly involving the use of the arithmetic and logic unit. A control operation, in order to change the sequence of subsequent operations. These can possibly be conditional, based on the values stored at that point within the flag register.

For greater simplicity, and as describing all the possible instructions is unnecessary, the following tutorial pages will only look at a few possible instructions. These are: Mnemonic Description MOV ADD STO END Moves a data value from one location to another Adds to data values using the ALU, and returns the result to the accumulator Stores the contents of the accumulator in the specified location Marks the end of the program in memory

The four instructions used in the examples for the remainder of this section of the tutorial

The following three pages of the tutorial will look at the first two of these instructions, and how they are executed in each of the three main addressing modes. These addressing modes are: Immediate addressing With immediate addressing, no lookup of data is actually required. The data is located within the operands of the instruction itself, not in a seperate memory location. This is the quickest of the addressing modes to execute, but the least flexible. As such it is the least used of the three in practice. Direct addressing For direct addressing, the operands of the instruction contain the memory address where the data required for execution is stored. For the instruction to be processed the required data must be first fetched from that location.

Indirect addressing When using indirect addressing, the operands give a location in memory similarly to direct addressing. However, rather than the data being at this location, there is instead another memory address given where the data actually is located. This is the most flexible of the modes, but also the slowest as two data lookups are required. The next page looks at immediate addressing. Click the next arrow below to proceed The first of the three addressing modes to be looked at is immediate addressing. When writing out the code in mnemonic form, operands that require this mode are marked with a # symbol. With immediate addressing, the data required for execution of the instruction is located directly within the operands of the instruction itself. No lookup of data from memory is required. To best illustrate the methods used by immediate addressing there is an interactive animation below. The next of the three addressing modes that will be looked at is direct addressing. To proceed, click the next arrow below. The second of the three addressing modes to be looked at is direct addressing. When writing out the code in mnemonic form, no symbol is required to mark operands which use this form. Direct addressing means that the operands of the instruction hold the address of the location in memory where the data required can be found. The data is then fetched from this location in order to allow the instruction to be executed. To best illustrate the methods used by direct addressing there is an interactive animation below. The final of the three modes of addressing to be looked at is indirect addressing. To proceed to the next page where this mode is covered, please click the next arrow button below. The final of the three addressing modes to be looked at is indirect addressing. When writing out the code in mnemonic form, operands that require this mode are marked with a @ symbol. Indirect addressing means that the memory address given in the operands of the instruction is not the location of the actual data required. Instead, that address holds a further address, at which the data is stored. This can prove useful if decisions need to be made within the execution, as the memory address used in processing can be changed during execution. To best illustrate the methods used by indirect addressing there is an interactive animation below Now that we have covered all the stages of the instruction execution process, and also the three main addressing modes that are used, we are able to examine the full execution of simple programs. The next page of the tutorial shows the full execution of one such simple program, and is available by clicking on the next arrow button below.

In the previous two sections the basics of the workings and architecture of the central processing unit has been explained. There has been a general look at a simple processor architecture, an explanation of the method by which instructions are executed, and how the various different addressing modes affect how the CPU processes instructions. However, modern CPUs are very rarely as simple as the ones that have been discussed thus far. While the information covered up to this point is still applicable and relevant to the majority of microprocessors, many refinements to the workings and architecture have also been implemented. In this final section of the tutorial there will be a brief look at three main areas where these refinements have occured: Pipelining This is a method by which the processor can be involved in the execution of more than a single instruction at one time. Understandably, this enables the execution of the program to be completed with greater speed, but is not without complications and problems. These have to be overcome by careful design. CISC and RISC architectures Over the course of the development of the modern day processor, two competing architectures have emerged. CISC and RISC have several major differences in features and ideas, but both were designed with the intention of improving CPU performance. Current processors tend not to be strictly adherent to either architecture, instead being a mix of the two ideals. Modern architectures Outside of pipelining, RISC and CISC, many other improvements to the general architecture of the microprocessor have been developed. These are in many differing areas such as cache memory and specialised instruction set extensions. New advancements are added with each new generation of processors. The first of these areas to be covered is the topic of pipelining. Click the next arrow below to read more about the topic. Up until this point in the tutorial we have assumed that the processor is only able to process one instruction at a time. All examples have shown an instruction having to be executed in full before the next one can be started on. However, this is not how modern CPUs work. Pipelining is the name given to the process by which the processor can be working on more than one instruction at once. The simplest way to approach pipelining is to consider the three stage fetch, decode and execute instruction execution cycle outlined earlier. There are times during each of these subcycles of the main cycle where the main memory is not being accessed, and the CPU could be considered 'idle'. The idea, therefore, is to begin the fetch stage for a second instruction while the first stage is being decoded. Then, when instruction one is being executed and instruction two is being decoded, a third instruction can be fetched. Below is an interactive animation that demonstrates the benefits which this simple form of pipelining can produce.

Across the nine time cycles shown above, the non pipelined method manages to completely execute three instructions. With pipelining, seven instructions are executed in full - and another two are started. However, pipelining is not without problems, and does not necessarily work as well as this. For more on the problems associated with pipelining and how they can be overcome, click the next arrow below. While pipelining can severely cut the time taken to execute a program, there are problems that cause it to not work as well as it perhaps should. The three stages of the instruction execution process do not necessarily take an equal amount of time, with the time taken for 'execute' being generally longer than 'fetch'. This makes it much harder to synchronise the various stages of the different instructions. Also, some instructions may be dependent on the results of other earlier instructions. This can arise when data produced earlier needs to be used, or when a conditional branch based on a previous outcome is used. One of the simplest ways in which the effects of these problems can be reduced is by breaking the instruction execution cycle into stages that are more likely to be of an equal duration. For example, the diagram below shows how the cycle can be broken down into six stages rather than three:

Diagram showing the differences between the common 3 stage model of the instruction execution cycle, and the 6 stage model used in more advanced pipelining.

However, while this may solve some of the problems outlined above, it is not without creating further problems of its own. Firstly, it is not always the case than an instruction will use all six of these stages. Simple load instructions, for example, will not require the use of the final 'write operand' stage, which would possibly upset the synchronisation. There is also the matter of potential conflicts within the memory system, as three of the above stages (fetch instruction, fetch operands, write operand) require access to the memory. Many memory management systems would not allow three seperate instructions to be accessing the memory at once, and hence the pipelining would not be as beneficial as it would first seem. On top of this, the problem of conditional branching and result dependant instructions also occurs. This means that the processor needs to be designed well in order to cope with these potential interruptions to the flow of data. As you can tell, there are many issues which need to be taken into consideration relating to the technique of pipelining. While it is a powerful technique for the purpose of increasing CPU performance, it does require careful design and consideration in order to achieve the best possible results.

Next, we shall move on to look at the competing architectures of CISC and RISC. Click the next arrow to proceed. Years of development have been undertaken into improving the architecture of the central processing unit, with the main aim of improving performance. Two competing architectures were developed for this purpose, and different processors conformed to each one. Both had their strengths and weaknesses, and as such also had supporters and detractors. CISC: Complex Instruction Set Computers Earlier developments were based around the idea that making the CPU more complex and supporting a larger number of potential instructions would lead to increased performance. This idea is at the root of CISC processors, such as the Intel x86 range, which have very large instruction sets reaching up to and above three hundred seperate instructions. They also have increased complexity in other areas, with many more specialised addressing modes and registers also being implemented, and variable length of the instruction codes themselves. Performance was improved here by allowing the simplification of program compilers, as the range of more advanced instructions available led to less refinements having to be made at the compilation process. However, the complexity of the processor hardware and architecture that resulted can cause such chips to be difficult to understand and program for, and also means they can be expensive to produce. RISC: Reduced Instruction Set Computers In opposition to CISC, the mid-1980s saw the beginnings of the RISC philosophy. The idea here was that the best way to improve performance would be to simplify the processor workings as much as possible. RISC processors, such as the IBM PowerPC processor, have a greatly simplified and reduced instruction set, numbering in the region of one hundred instructions or less. Addressing modes are simplified back to four or less, and the length of the codes is fixed in order to allow standardisation across the instruction set. Changing the architecture to this extent means that less transistors are used to produce the processors. This means that RISC chips are much cheaper to produce than their CISC counterparts. Also the reduced instruction set means that the processor can execute the instructions more quickly, potentially allowing for greater speeds. However, only allowing such simple instructions means a greater burden is placed upon the software itself. Less instructions in the instruction set means a greater emphasis on the efficient writing of software with the instructions that are available. Supporters of the CISC architecture will point out that their processors are of good enough performance and cost to make such efforts not worth the trouble. CISC Large (100 to 300) Complex (8 to 20) Specialised Variable Variable Instruction Set Addressing Modes Instruction Format Code Lengths Execution Cycles RISC Small (100 or less) Simple (4 or less) Simple Fixed Standard for most

Higher Cost / CPU Complexity Lower Compilation Processor design Simplifies Complicates Processor design Software

Summary of the main differences between the two competing architectures

Looking at the most modern processors, it becomes evident that the whole rivalry between CISC and RISC is now not of great importance. This is because the two architectures are converging closer to each other, with CPUs from each side incorporating ideas from the other. CISC processors now use many of the same techniques as RISC ones, while the reduced instruction sets of RISC processors contain similar numbers of instructions to those found in certain CISC chips. However, it is still important that you understand the ideas behind these two differing architectures, and why each design path was chosen. The final page of this section of the tutorial looks at other improvements to modern CPU architectures. Click the next arrow below to proceed. On top of the topics already covered in this section, there are many other ways in which companies who manufacture microprocessors have attempted to improve the performance of their CPUs. These are generally at too high a level to be discussed within this tutorial, but this page contains a brief introduction to three of the most common of these other techniques.

The Pentium 4 is the first of Intel's processors to make use of the new hyperthreading technology

Cache memory This is a small amount of high-speed memory used specifically as a fast and effective method of storage for commonly used instructions. Most programs end up accessing the same data and instructions over and over again at some point in their execution. Placing these in higher speed storage, such as a cache, provides a great improvement in the time taken for processing over continual accessing from the main memory at a slower speed. Home computer processors traditionally have implemented the cache directly into their architecture, in what is known as a 'Level 1' cache. The most modern CPUs also make use of external caches, which are referred to as 'Level 2' cache and much larger in size than 'Level 1' caches. More recent processors have larger caches - for instance, the Intel 486 had a cache of only eight kilobytes, while the Pentium II used multiple stores totalling up to two megabytes of storage space. Specialised instruction set extensions The most commonly known extensions to the traditional CPU instruction set are Intel's MMX and AMD's 3DNow! technology. These both come into use when the processor is asked to perform operations involving graphics, audio and video, and consist of a number of specific instructions which are specialized to perform the short repetitive tasks that make up the large majority of multimedia processing. These

extensions use SIMD (Single Instruction, Multiple Data) instructions in order to greatly reduce the time taken, as such instructions perform their operations to multiple pieces of data at the same time. MMX makes use of fifty-seven SIMD instructions, while the Pentium 4 raises this number to one hundred and forty-four. This includes further extensions to improve operations relating to internet-related activity, such as the streaming of music and video files. The improved 3DNow! technology found in the AMD Athlon processor also contains SIMD instructions for this purpose. Such extensions ultimately enhance the performance of the processor in activites relating to gaming, multimedia applications, and use of the internet and other forms of communication. Hyperthreading Hyperthreading is a new technology, introduced by Intel with their most recent Pentium 4 processors. It works by using what is known as 'simultaneous multithreading' to make the single processor appear to the computer operating system as multiple logical processors. This enables the CPU via use of shared hardware resources to execute multiple seperate parts of a program (or 'threads') at the same time. This technology does not provide the same performance increase as actual seperate processors would do, but provides a considerable boost for less cost and power consumption than said multiple processors would require. Current processors such as the aforementioned Pentium 4 currently split the CPU into two logical processors. Intel are currently working on further advancements which will enable splitting higher numbers of threads to be simultaneously executed. This concludes the section of the further features of the more modern microprocessor. Following this page is a multiple-choice quiz with which you can test your knowledge from this section. Click the next arrow below to continue. You have reached the conclusion of the microprocessor tutorial. Hopefully you should now have a greater understanding of the architecture of the microprocessor and how it works.