125770406 content question bank ap9222 pdf

Reg. No. :

M.E. DEGREE EXAMINATION, JUNE 2010

Second Semester

Applied Electronics

AP9222 — COMPUTER ARCHITECTURE AND PARALLEL PROCESSING

(Common to M.E-Computer and Communication, M.E-VLSI Design and

M.E-Embedded System Technologies)

(Regulation 2009)

Time : Three hours Maximum : 100 Marks

Answer ALL Questions

PART A — (10 × 2 = 20 Marks)

1. Define Bernstein conditions related to parallelism and dependence relations.

2. A workstation uses a 15-MHz processor with a claimed 10-MIPS rating to

execute a given program mix. What is the effective CPI of this computer

assuming a one-cycle delay for each memory access?

3. List the parameters used for evaluating parallel computations.

4. Topologically equivalent networks are those whose graph representations are

isomorphic with the same interconnection capabilities. Prove that the Omega

network is topologically equivalent to the Baseline network.

5. A two — level memory system has eight virtual pages on a disk to be mapped

into four page frames in the main memory. A certain program generated the

following page trace:

1,0,2,2,1,7,6,7,0,1,2,0,3,0,4,5,1,5,2,4,5,6,7,6,7,2,4,2,7,3,3,2,3

Show the successive virtual pages residing in the four page frames with

respect to the above trace using LRU replacement policy. Compute the hit ratio

in the main memory. Assume the page frames are initially empty.

6. State the two sufficient conditions to achieve sequential consistency in shared

memory access.

7. Why are MIMD, MPMD or SPMD control preferred over SIMD data

parallelism?

Question Paper Code: J7605 3

15 315

315

J7605 2

8. Compare the advantages and disadvantages of chained directories for cache

coherence control in large-scale multiprocessor systems.

9. Bring out the differences in the message passing OS models.

10. Distinguish between spin locks and suspended locks for sole access Lou critical

section.

PART B — (5 × 16 = 80 Marks)

11. (a) (i) Analyze the data dependencies among the following statements in

the given program:

s1: Load RI, 1024

s2: Load R2, M(10)

s3: Add Rl, R2

s4: Store M(1024), R1

s5: Store M((R2)), 1024

where (Ri) means the content of register Ri and M(10) contains 64

initially.

(1) Draw a dependence graph to show all the dependencies.

(2) Are there any resource dependencies if only one copy of each

functional unit is available the CPU? (8)

(ii) Explain about the theoretical models of parallel computers used by

algorithm designers and chip developers. (8)

Or

(b) Characterize the architectural operations of SIMD and MIMD computers.

Distinguish between multiprocessors and multicomputer based on their

structures, resource sharing and interprocessor communications. Also

explain the differences among UMA, NUMA, COMA and NORMA

computers. (16)

12. (a) Explain the applicability and restrictions involved in using Amdhal’s law,

Guustafon’s law, Sun and Ni’s law to estimate the speedup performance

of an n-processor system compared with that of a single-processor system

ignoring all communication overheads. (16)

Or

(b) (i) Compare control flow, data flow and reduction computers in terms

of the program flow mechanism used. Comment on the advantages

and disadvantages of the above computer models. (8)

(ii) Explain the steps involved in calculating the grain size and

communication latency for multiplying two 2 × 2 matrices. (8)

315

315

315

J7605 3

13. (a) (i) Explain the difference between superscalar and VLIW architectures

in terms of hardware and software requirements. (8)

(ii) Consider a two level memory hierarchy M1 and M2. Denote the hit

ratio of MI as h. Let c1 and c2 be the costs per kilobyte, s1 and s2

the memory capacities, and t1 and t2 the access times respectively.

(8)

(1) Under what conditions will the average cost of the entire

memory approach c2.

(2) What is the effective memory access time of this hierarchy?

(3) Let r=t2/tl be the speed ratio of the two memories.

Let E=t1/ta be the access efficiency of the memory system.

Express E in terms of r and h.

(4) What is the required hit ratio h to make E>0.95 if r=l00?

Or

(b) (i) Describe the daisy chaining and the distributed arbiter for bus

arbitration on a multiprocessor system. State the advantages and

shortcomings of each from both the implementational and

operational points of view. (8)

(ii) Consider the following three interleaved memory designs for a main

memory system with 16 memory modules. Each module is assumed

to have a capacity of 1Mbyte. The machine is byte-addressable.

Design I: 16- way interleaving with one memory bank

Design 2: 8-way interleaving with two memory banks.

Design 3: 4 way interleaving with four memory banks.

(1) Specify the address formats for each of the above memory

organizations.

(2) Determine the maximum memory bandwidth obtained if only

one memory module fails in each of the above memory

organizations.

(3) Comment on the relative merits of the three interleaved

memory organizations. (8)

14. (a) (i) Why are fine-grain processors chosen for future multiprocessors

over medium-grain processors used in the past? From scalability

point of view why is fine-grain parallelism more appealing than

medium-grain or coarse-grain parallelism for building MPP

systems? (8)

(ii) Compare the connection machines CM-2 and CM-5 in their

architectures, operation modes, functional capabilities and potential

performance. Comment on the improvement made in CM-5 over

CM-2 from the viewpoints of a computer architect and a machine

programmer. (8)

Or

315

315

315

J7605 4

(b) (i) Prove that the greedy algorithm for multicast routing on a

wormhole routed hypercube network always yields the minimum

network traffic and minimum distance from the source to any of the

destinations. (6)

(ii) Consider the following reservation table for a four stage pipeline

with a clock cycle r = 20 ns.

1 2 3 4 5 6

S1 X X

S2 X X

S3 X

S4 X X

One non-compute delay stage into the pipeline can be inserted to make a

latency of 1 permissible in the shortest greedy cycle. The purpose is to

yield a new reservation table leading to an optimal latency equal to the

upper bound. (10)

(1) Show the modified reservation table with five rows and seven

columns.

(2) Draw the state transition diagram for the optimal cycle.

(3) List all the simple and greedy cycles from the state diagram.

(4) Prove that the new MAL equals the lower bound.

(5) What is the optimal throughput of this pipeline?

15. (a) (i) What is perfect decomposition? Discuss the differences in program

replication techniques on multi-computers as opposed to program

partitioning on multiprocessors. (8)

(ii) Explain the multiprocessor UNIX design goals in the areas of

compatibility, portability, address space, load balancing, parallel

I/O and network services. (8)

Or

(b) Explain loop transformation theory and discuss how it can be applied for

loop vectorization or Parallelization.

———————

315

315

315

125770406 content question bank ap9222 pdf

Documents