cao-2 model test paper 1

Computer Architecture & Organization - IIModel Set - I

1. (a) Why does pipelining improve performance?(Year-2008)

Solution:

1. Pipeline is an implementation technique in which multiple instructions are overlapped in execution. Today’s processors are fast because of pipelining.

2. A pipeline is like an assembly line: each step completes one piece of the whole job. Assemble line does not reduce the time it takes to complete an individual job; it increases the amount of job being built simultaneously and the rate.

3. Pipe stage (pipe segment) small piece of the pipeline instruction.4. Therefore, pipelining improves instruction throughput rather than individual instruction

execution time. The throughput of the instruction pipeline is determined by how often an instruction exists in pipeline.

5. The goal of designers is to balance the length of each stage; otherwise there will be idle time during a stage. If the stage are perfectly balanced then the time between instructions on the pipelined machine is = Time between instructions (no pipelined)/Number of pipe stages.

(b) Differentiate between RISC and CISC @ machines. (Year – 2008)

Solution:

RISC (Reduced Instruction Set Computer). A computer with fewer instructs with simple constructs, so they can be executed much faster within the CPU without having to use memory as often. This type of computer is classified as a reduced instruction set computer or RISC.

RISC characteristics (i) Relatively few instructions. (ii) Relatively few addressing modes. (iii) Memory access limited to load and store instructions. (iv)All operations done within the registers of the CPU. (v) Fixed length, easily decoded instruction format. (vi)Single cycle instruction execution. (vii)Hardware rather than micro programmed control.

CISC. (Complexes Instruction set Computer) A computer with large number of instructions is classified as a complex instruction set computer. Characteristics of CISC.

(i)A large number of instructions typically from 100 to 250 instructions. (ii)Some instructions that perform specialized tasks and are used infrequently. (iii)A large variety of addressing mode typically from 5 to 20 different modes. (iv)Variable length instruction format. (v)Instructions that manipulate operands in memory.

COA -II Model Question Paper - I Page No - 1

(c) Why the performance of a parallel computer is improved by using a two level cache memory? (Year - 2008)

Solution:Modern high end PCs and workstations all have at least two levels of caches: A very fast, and hence not very big, first level (L1) cache together with a larger but slower L2 cache. Some recent microprocessors have 3 levels.

When a miss occurs in L1, L2 is examined, and only if a miss occurs there is main memory referenced.

So the average miss penalty for an L1 miss is

(L2 hit rate)*(L2 time) + (L2 miss rate)*(L2 time + memory time)

We are assuming L2 time is the same for an L2 hit or L2 miss. We are also assuming that the access doesn't begin to go to memory until the L2 miss has occurred.

(d) Write at least four differences between a multiprocessor and multicomputer system. (Year - 2008)

Solution:

Multiprocessor: -

1. Multiprocessor is more than one CPU or one CPU with more than one core in one.Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them.

2. A multiprocessor system is simply a computer that has more than one CPU on its motherboard. If the operating system is built to take advantage of this, it can run different processes (or different threads belonging to the same process) on different CPUs.

3. There are many variations on this basic theme, and the definition of multiprocessing can vary with context, mostly as a function of how CPUs are defined (multiple cores on one die, multiple chips in one package, multiple packages in one system unit, etc.).

4. Multiprocessing sometimes refers to the execution of multiple concurrent software processes in a system as opposed to a single process at any one instant. However, the terms multitasking or multiprogramming are more appropriate to describe this concept, which is implemented mostly in software, whereas multiprocessing is more appropriate to describe the use of multiple hardware CPUs.

Multicomputer: -

1. Computer multicomputer is more than one computer or a network of computers. A computer made up of several computers. Something similar to parallel computing.

2. A multicomputer may be considered to be either a loosely coupled NUMA computer or a tightly coupled cluster. Multicomputers are commonly used when strong computer power is required in an environment with restricted physical space or electrical power.

3. Distributed computing deals with hardware and software systems containing more than one processing element or storage element, concurrent processes, or multiple programs, running under a loosely or tightly controlled regime.

4. In distributed computing a program is split up into parts that run simultaneously on multiple computers communicating over a network. Distributed computing is a form of parallel computing, but parallel computing is most commonly used to describe program parts running simultaneously on multiple processors in the same computer.

(e) Discuss anti-dependence / Name-dependence Vs True dependence. (Year - 2006)

Solution:

Anti-dependency occurs when an instruction requires a value that is later updated. In the following example, instruction 3 anti-depends on instruction 2 - the ordering of these instructions cannot be


changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would affect the final value of A.

1. B = 32. A = B + 13. B = 7

Anti-dependency is an example of a name dependency. That is, renaming of variables could remove the dependency, as in the next example:

1. B = 3N. B2 = B2. A = B2 + 13. B = 7

A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in parallel.

True dependence.

However, the modification has introduced a new dependency: instruction 2 is now truly dependent on instruction N, which is truly dependent upon instruction 1. As true dependencies, these new dependencies are impossible to safely remove.

(f) What do you mean by cache coherence? (Year - 2006)

Solution:

In a shared memory multiprocessor with a separate cache memory for each processor , it is possible to have many copies of any one instruction operand : one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be changed also. Cache coherence is the discipline that ensures that changes in the values of shared operands are propagated throughout the system in a timely fashion.

There are three distinct levels of cache coherence:

1. Every write operation appears to occur instantaneously. 2. All processes see exactly the same sequence of changes of values for each separate operand. 3. Different processes may see an operand assume different sequences of values. (This is

considered noncoherent behavior.)

In both level 2 behavior and level 3 behavior, a program can observe stale data . Recently, computer designers have come to realize that the programming discipline required to deal with level 2 behavior is sufficient to deal also with level 3 behavior. Therefore, at some point only level 1 and level 3 behavior will be seen in machines.

(g) Explain what structural hazard with suitable example.

Solution:

It occurs when combinations of instructions cannot be accommodated because of resource conflicts. Often arise when some functional unit is not fully pipelined. Load uses register file’s Write port during its 5 th stage.

1 2 3 4 5 Load IF RF/ID EX MEM WB

R-type uses register file’s write port during the 4th stage.


http://whatis.techtarget.com/definition/0,289893,sid9_gci213048,00.html

http://whatis.techtarget.com/definition/0,289893,sid9_gci212713,00.html

http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci212833,00.html

http://searchstorage.techtarget.com/sDefinition/0,,sid5_gci211730,00.html

1 2 3 4 R-type IF RF/ID EX WBExample:

Consider a load followed immediately by a store processor only has a single write port.

(h) What do mean by Locality of Reference?

Solution:

There are, therefore, two main components to the locality of reference temporal there is a tendency for a program to reference in the near future, memory items that

it has referenced in the recent past. For example, loops, temporary variables, arrays, stacks, … spatial there is a tendency for a program to make references to a portion of memory in the

neighbourhood of the last memory reference

(i) Write down Methods for improving performance Cache Performance. (Year - 2008)

Solutions:

Methods for improving performance

increase cache size increase block size increase associatively add a 2nd level cache

(j) Identify the kind of hazard occurs while executing the following instruction in the pipeline.Draw the path to avoid the hazard.

Here a load instruction followed immediately by a store processor only has a single port.

Solutions: 1. Delay instruction until functional unit is ready.

Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that follow (previous instruction continue).

Increase CPI from the ideal value of 1


2. Build more sophisticated functional units so that all combinations of instructions can be accommodated. Example: Allow two simultaneous writes to the register file.

Write Back Stall Solution: Delay R-Type register write by one cycle.

1 2 3 4 5 R-type IF RF/ID EX MEM WB

2. (a) What do you mean by interleaved memory organization? (Year-2008)

Solution:

1. Pipeline and vector processors often require simultaneous access to memory from two or more sources. For example: An instruction pipeline, an arithmetic pipeline usually requires two or more operands to enter the pipeline at the same time.

2. Instead of using two memory buses for simultaneous access, the memory can be partitioned into a number of modules connected to a common memory address and data buses. Each memory array has its own address register AR and data register DR.

3. The modular system permits one module to initiate a memory access while other modules are in the process of reading or writing a word and each module can honor a memory request independent of the state of the other modules.

Advantage:

1. Different sets of addresses are assigned to different memory modules. For example, in a two-module memory system, the even addresses may be in one module and the odd addresses in the other.

2. A modular memory is useful in systems with pipeline and vector processing. A vector processor that uses an n-way interleaved memory can fetch n operands from n different modules.

3. By which the effective memory cycle time can be reduced by a factor close to the number of modules.

(b) Explain in details cache coherence mechanisms.

Solutions:Cache coherence mechanisms


1. Directory-based coherence:In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory to its cache. When an entry is changed the directory either updates or invalidates the other caches with that entry.

2. Snooping is the process where the individual caches monitor address lines for accesses to memory locations that they have cached. When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates its own copy of the snooped memory location.

3. Snarfing is where a cache controller watches both address and data in an attempt to update its own copy of a memory location when a second master modifies a location in main memory.

3. (a) List down various Pipeline Hazards. Year - 2008)

Solution:

structural hazards: attempt to use the same resource two different ways at the same time E.g., two instructions try to read the same memory at the same time

data hazards: attempt to use item before it is ready instruction depends on result of prior instruction still in the pipeline

add r1, r2, r3sub r4, r2, r1

control hazards: attempt to make a decision before condition is evaulated branch instructions

beq r1, loopadd r1, r2, r3

Can always resolve hazards by waiting• pipeline control must detect the hazard• take action (or delay action) to resolve hazards

(b) Identify the data hazards while executing the following instruction in DLX pipeline. Draw the forwarding path to avoid the hazard. (Year-2008)

ADD R1, R2, R3 SUB R4, R1 , R5 AND R6,R1 ,R7OR R8,R1,R9 XOR R10,R1 ,R11

Solution:

Data hazards occur when pipeline changes the order of read/write access to operands sot that the order differs from the order seen by sequentially executing instructions caused by several types of dependencies.

Data Hazard Solution:

(i) Stalls: Delay next instruction until ready


http://en.wikipedia.org/wiki/Bus_snooping

(ii) Or register file writes on first half and reads on second half. Or, “Forward” the data to the appropriate unit

4. (a) Describe the Flynn’s classification of computer architecture. (Year - 2006)

Solution:Flynn's taxonomy

Single Instruction Multiple Instruction

Single Data SISD MISD

Multiple Data SIMD MIMD

Classifications

The four classifications defined by Flynn are based upon the number of concurrent instruction (or control) and data streams available in the architecture:

(i) Single Instruction, Single Data stream (SISD)

A sequential computer which exploits no parallelism in either the instruction or data streams. Examples of SISD architecture are the traditional uniprocessor machines like a PC or old mainframes.

(i) SISD (ii) SIMD

(ii) Single Instruction, Multiple Data streams (SIMD)

A computer which exploits multiple data streams against a single instruction stream to perform operations which may be naturally parallelized. For example, an array processor or GPU.


http://en.wikipedia.org/wiki/GPU

http://en.wikipedia.org/wiki/Array_processor

http://en.wikipedia.org/wiki/SIMD

http://en.wikipedia.org/wiki/Mainframe_computer

http://en.wikipedia.org/wiki/Personal_Computer

http://en.wikipedia.org/wiki/Uniprocessor

http://en.wikipedia.org/wiki/SISD

http://en.wikipedia.org/wiki/MIMD

http://en.wikipedia.org/wiki/SIMD

http://en.wikipedia.org/wiki/MISD

http://en.wikipedia.org/wiki/SISD

http://en.wikipedia.org/wiki/File:SISD.svg

http://en.wikipedia.org/wiki/File:SIMD.svg

(iii) MISD (iv) MIMD

(iii) Multiple Instruction, Single Data stream (MISD)

Multiple instructions operate on a single data stream. Uncommon architecture which is generally used for fault tolerance. Heterogeneous systems operate on the same data stream and must agree on the result. Examples include the Space Shuttle flight control computer.

(iv) Multiple Instruction, Multiple Data streams (MIMD)

Multiple autonomous processors simultaneously executing different instructions on different data. Distributed systems are generally recognized to be MIMD architectures; either exploiting a single shared memory space or a distributed memory space.

(b) Discuss various Levels of Parallelism. (Year-2006)

Solution:

Exploiting Parallelism

Taking advantage of parallelism is another very important method for improving the performance of a computer system. We consider three examples that demonstrate the advantages of parallelism at three different levels

System Level Processor Level Detailed digital design level

System level Parallelism

The aim of this example is to improve the throughput performance of a server system with respect to a particular benchmark, e.g., SPEC Web. The parallelism takes the form of

multiple processors multiple disc drives

The general idea is to spread the overall workload amongst the available processors and disc drives

Scalability is viewed as a valuable asset for server applications Ideally, the overall improvement in performance over a single processor would be a factor of N,

where N is the number of processors.

Processor level Parallelism

Advantage can be taken of the fact that not all instructions in a program rely on the result of their predecessors

Thus sequences of instructions can be executed with varying degrees of overlap, which is a form of parallelism

This is the basis of instruction pipelining which we study later in the module Instruction pipelining has the effect of improving performance by decreasing the CPI of a

processor


http://en.wikipedia.org/wiki/Distributed_system

http://en.wikipedia.org/wiki/MIMD

http://en.wikipedia.org/wiki/Space_Shuttle

http://en.wikipedia.org/wiki/MISD

http://en.wikipedia.org/wiki/File:MISD.svg

http://en.wikipedia.org/wiki/File:MIMD.svg

Detailed Digital Design Level Parallelism

Examples: set associative caches use multiple banks of memory that may be searched in parallel to

find a desired item modern ALUs use carry-lookahead, which uses parallelism to speed up the process of

computing sums from linear to logarithmic in the number of bits per operand

5. (a) Explain "pipelining & Pipeline Taxonomies ". (Year-2006)

Solution:

There are two main ways to increase the performance of a processor through high-level system architecture

Increasing the memory access speed Increasing the number of supported concurrent operations

Pipelining Parallelism

Pipelining is the process by which instructions are parallelized over several overlapping stages of execution, in order to maximize data path efficiency

Pipelining is analogous to many everyday scenarios Car manufacturing process Batch laundry jobs Basically, any assembly-line operation applies

Two important concepts: New inputs are accepted at one end before previously accepted inputs appear as

outputs at the other end; The number of operations performed per second is increased, even though the elapsed

time needed to perform any one operation remains the same

Pipeline Taxonomies

There are two types of pipelines used in computer systems Arithmetic pipelines

Used to pipeline data intensive functionalities Instruction pipelines

Used to pipeline the basic instruction fetch and execute sequence Other classifications include

Linear vs. nonlinear pipelines Presence (or lack) of feed forward and feedback paths between stages

Static vs. dynamic pipelines Dynamic pipelines are multifunctional, taking on a different form depending on

the function being executed Scalar vs. vector pipelines

Vector pipelines specifically target computations using vector data

(b) Consider an improvement to a processor that makes the original processor run 10 times faster, but is only usable for 40% of the time. What is the overall speedup gained by incorporating this improvement using Amdahl’s Law?

Solution:


The performance gain that can be made by improving some portion of the operation of a computer can be calculated using Amdahl’s Law

Amdahl’s Law states that the improvement gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used

Little point in improving rare tasks Amdahl’s Law defines the overall speedup that can be gained for a task by using the new feature

designed to speed up the execution of the task

Overall speedup =

Amdahl’s Law

F = 0.4 S = 10

overall speedup = = = = 1.56

6. (a) Explicit parallelism vs. implicit parallelism. (Year - 2006)

Solution:

Explicit parallelism:

1. Explicit parallel programming is the absolute programmer control over the parallel execution2. In some instances, explicit parallelism may be avoided with the use of an optimizing compiler that

automatically extracts the parallelism inherent to computations (see implicit parallelism).3. In computer programming, explicit parallelism is the representation of concurrent computations by

means of primitives in the form of special-purpose directives or function calls. 4. Most parallel primitives are related to process synchronization, communication or task

partitioning. As they seldom contribute to actually carry out the intended computation of the program, their computational cost is often considered as parallelization overhead.

Advantage

A skilled parallel programmer takes advantage of explicit parallelism to produce very efficient code.

Disadvantage

However, programming with explicit parallelism is often difficult, especially for non computing specialists, because of the extra work involved in planning the task division and synchronization of concurrent processes.

COA -II Model Question Paper - I Page No -

enhanced

enhancedenhanced

new

oldoverall

Speedup

Fraction Fraction 1

1

ExTime

ExTime Speedup

Best you could ever hope to do:

enhancedmaximum Fraction - 1

1 Speedup

enhanced

enhancedenhancedoldnew Speedup

FractionFraction 1ExTime ExTime

10

http://en.wikipedia.org/w/index.php?title=Parallelization_overhead&action=edit&redlink=1

http://en.wikipedia.org/wiki/Computer_programming

http://en.wikipedia.org/wiki/Implicit_parallelism

http://en.wikipedia.org/wiki/Parallel_programming

Programming with explicit parallelism: Message Passing Interface Parallel Virtual Machine Ease programming language Ada programming language Java programming language Java Spaces

Implicit parallelism

1. In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs.

2. A pure implicitly parallel language does not need special directives, operators or functions to enable parallel execution.

Programming languages with implicit parallelism include Lab VIEW, MATLAB M-code.

Example:

If a particular problem involves performing the same operation on a group of numbers (such as taking the sine or logarithm of each in turn), a language that provides implicit parallelism might allow the programmer to write the instruction thus:

numbers = [0 1 2 3 4 5 6 7];result = sin(numbers);

The compiler or interpreter can calculate the sine of each element independently, spreading the effort across multiple processors if available.

Advantages

Implicit parallelism generally facilitates the design of parallel programs and therefore results in a substantial improvement of programmer productivity.

Disadvantages1. Languages with implicit parallelism reduce the control that the programmer has over the parallel

execution of the program, 2. Experiments with implicit parallelism showed that implicit parallelism made debugging difficult

(b) Write a short note on Instruction-level parallelism.

Solutions:

1. Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program:

1. e = a + b2. f = c + d3. g = e * f

Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.

2. A goal of compiler and processor designers is to identify and take advantage of as much ILP as possible.

3. ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed.


http://en.wikipedia.org/wiki/Central_processing_unit

http://en.wikipedia.org/wiki/Compiler

http://en.wikipedia.org/wiki/Computer_program

http://en.wikipedia.org/wiki/Logarithm

http://en.wikipedia.org/wiki/Sine

http://en.wikipedia.org/wiki/MATLAB

http://en.wikipedia.org/wiki/LabVIEW

http://en.wikipedia.org/wiki/Parallel_computing

http://en.wikipedia.org/wiki/Interpreter

http://en.wikipedia.org/wiki/Compiler

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/JavaSpaces

http://en.wikipedia.org/wiki/Java_(programming_language)

http://en.wikipedia.org/wiki/Ada_programming_language

http://en.wikipedia.org/wiki/Ease_programming_language

http://en.wikipedia.org/wiki/Parallel_Virtual_Machine

http://en.wikipedia.org/wiki/Message_Passing_Interface

4. How much ILP exists in programs is very application specific. In certain fields, such as graphics and scientific computing the amount can be very large.

5. Micro-architectural techniques that are used to exploit ILP include:

Instruction pipelining where the execution of multiple instructions can be partially overlapped. Superscalar execution in which multiple execution units are used to execute multiple instructions in parallel. Out-of-order execution where instructions execute in any order that does not violate data dependencies. Register renaming which refers to a technique used to avoid unnecessary serialization of program

operations imposed by the reuse of registers by those operations, used to enable out-of-order execution.

7. (a) How do tightly coupled system differs from loosely coupled ones ? (Year-2008)

Solution:

Shared Memory Systems

Tightly Coupled Systems Uniform and Non-Uniform Memory Access

Tightly Coupled Systems

Multiple CPUs share memory. Each CPU has full access to all shared memory through a common bus. Communication between nodes occurs via shared memory. Performance is limited by the bandwidth of the memory bus.

Performance:

Performance is potentially limited in a tightly coupled system by a number of factors. These include various system components such as the memory bandwidth, CPU to CPU communication bandwidth, the memory available on the system, the I/O bandwidth, and the bandwidth of the common bus.

Uniform and Non-Uniform Memory Access

Shared memory systems can be loosely coupled with memory. Uniform memory access from the CPU on the left, and non-uniform memory access (NUMA) between the left and right disks.

Advantages

Memory access is cheaper than inter-node communication. This means that internal synchronization is faster than using the distributed lock manager.

Shared memory systems are easier to administer than a cluster.

Shared Disk Systems

Shared disk systems are typically loosely coupled.

Loosely Coupled Systems


http://download.oracle.com/docs/cd/A57673_01/DOC/server/doc/SPS73/chap3.htm#unuma

http://download.oracle.com/docs/cd/A57673_01/DOC/server/doc/SPS73/chap3.htm#tightly

http://en.wikipedia.org/wiki/Register_renaming

http://en.wikipedia.org/wiki/Out-of-order_execution

http://en.wikipedia.org/wiki/Execution_unit

http://en.wikipedia.org/wiki/Superscalar

http://en.wikipedia.org/wiki/Instruction_pipelining

Each node consists of one or more CPUs and associated memory. Memory is not shared between nodes. Communication occurs over a common high-speed bus. Each node has access to the same disks and other resources.

Advantages

Shared disk systems permit high availability. All data is accessible even if one node dies. These systems have the concept of one database, which is an advantage over shared nothing

systems. Shared disk systems provide for incremental growth.

Disadvantages

Inter-node synchronization is required, involving DLM overhead and greater dependency on high-speed interconnect.

If the workload is not partitioned well, there may be high synchronization overhead. There is operating system overhead of running shared disk software.

(b) What do you understand by quantitative principle of computer design?(year-2008)

Solution:

Quantitative Principles of Design1. Take Advantage of Parallelism2. Principle of Locality3. Focus on the Common Case4. Amdahl’s Law5. The Processor Performance Equation

1) Taking Advantage of Parallelism

• Increasing throughput of server computer via multiple processors or multiple disks• Detailed HW design:

– Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand

– Multiple memory banks searched in parallel in set-associative caches• Pipelining: overlap instruction execution to reduce the total time to complete an instruction sequence.

– Not every instruction depends on immediate predecessor Þ executing instructions completely/partially in parallel possible

– Classic 5-stage pipeline:

1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)


2) The Principle of Locality

• The Principle of Locality:– Program access a relatively small portion of the address space at any instant of time.

• Two Different Types of Locality:– Temporal Locality (Locality in Time): If an item is referenced, it will tend to be

referenced again soon (e.g., loops, reuse)– Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are

close by tend to be referenced soon.

3) Focus on the Common Case

• Common sense guides computer design– Since its engineering, common sense is valuable

• In making a design trade-off, favor the frequent case over the infrequent case.– Ex: Instruction fetche and decode unit used more frequently than multiplier, so optimize it First.

• Frequent case is often simpler and can be done faster than the infrequent case– Ex: overflow is rare when adding 2 numbers, so improve performance by optimizing

more common case of no overflow. • What is frequent case and how much performance improved by making case faster => Amdahl’s

Law

4) Amdahl’s Law

For Example:

Fraction = 0.9, Speedup = 10

5) Processor performance equation

CPU time = = X X

The execution time of a program can be refined into three components number of instructions

COA -II Model Question Paper - I Page No -

enhanced

enhancedenhanced

new

oldoverall

Speedup

Fraction Fraction 1

1

ExTime

ExTime Speedup

Best you could ever hope to do:

enhancedmaximum Fraction - 1

1 Speedup

enhanced

enhancedenhancedoldnew Speedup

FractionFraction 1ExTime ExTime

14

number of clock cycles per instruction duration of clock cycle

It is relatively straightforward to count the number of instructions executed, and the number of processor clock cycles for a program We can then calculate the average number of clock cycles per instruction (CPI)

CPI =

8. (a) Write short note on Massively Parallel Systems (MPP).

Massively parallel (MPP) systems have the following characteristics:

From only a few nodes, up to thousands of nodes are supported. The cost per processor may be extremely low because each node is an inexpensive processor. Each node has associated non-shared memory. Each node has its own devices, but in case of failure other nodes can access the devices of the

failed node (on most systems). Nodes are organized in a grid, mesh, or hypercube arrangement. Oracle instances can potentially reside on any or all nodes.

System: A Hypercube Example

Note: A hypercube is an arrangement of processors such that each processor is connected to log2n other processors, where n is the number of processors in the hypercube. Log2n is said to be the "dimension" of the hypercube. For example, in the 8-processor hypercube shown in Figure, dimension = 3; each processor is connected to three other processors.

A massively parallel system may have as many as several thousand nodes. Each node may have its own Oracle instance, with all the standard facilities of an instance


http://download.oracle.com/docs/cd/A57673_01/DOC/server/doc/SPS73/chap3.htm#mpp

An MPP has access to a huge amount of real memory for all database operations (such as sorts or the buffer cache), since each node has its own associated memory. To avoid disk I/O, this advantage will be significant in long running queries and sorts. This is not possible for 32 bit machines which have a 2 GB addressing limit; the total amount of memory on an MPP system may well be over 2 GB.

As with loosely coupled systems, cache consistency on MPPs must still be maintained across all nodes in the system. Thus, the overhead for cache management is still present.

Advantages

Shared nothing systems provide for incremental growth. System growth is practically unlimited. MPPs are good for read-only databases and decision support applications. Failure is local: if one node fails, the others stay up.

Disadvantages

More coordination is required. A process can only work on the node that owns the desired disk. If one node dies, processes cannot access its data. Physically separate databases which are logically one database can be extremely complex and

time-consuming to administer. Adding nodes means reconfiguring and laying out data on disks. If there is a heavy workload of updates or inserts, as in an online transaction processing system,

it may be worthwhile to consider data-dependent routing to alleviate contention. (b) Identify the data hazards while executing the following instruction in DLX pipeline.

Draw the forwarding path to avoid the hazard.LW R1,O(R2)SUB R4,R1,R6AND R6,R1,R7OR R8,R1,R9

Solution:

Hardware stall,a pipeline interlock checks and stops the instruction issue

Data hazard with forwarding


cao-2 model test paper 1

Documents

computer multicomputer

l2 time

parallel computer

multiple instructions

numa computer

type of computer

pipeline instruction

instruction pipeline