5.final term paper of architecture

8/14/2019 5.Final Term Paper of Architecture

1/11

Pipelined Instruction Processing

In

Computer Organization and Architecture

Vishal Patyal

Reg. No = 3050060122

97806-66742 (M)

(0190) 5230-650 (Home)

[email protected]

Term Paper for CSE-211

Computer Arithmetic

Fifth Term 2008

ABSTRACT

As we know this is new introduction of term paper which is nothing but minor

project of computer system organization and architecture in which i deeply

studied about the performance metrics for parallel system which is very

helpful to us in the future in the field of IT industry. In this I studied about one

frequently needs to compare the performance of two or more parallel computers, but how should

this be done? This report discusses some of these issues so that the problems they can cause may

be avoided in the future.We review the many performance metrics that have been proposed forparallel systems (i.e., program- architecture combinations). These include the many variants of

speedup, efficiency, and isoefficiency. We give reasons why none of these metrics should be

used independent of the run time of the parallel system.

Keywords: -Principle, Problems and Solution, Advantages & Disadvantages, Examples,

Implementation, Development, Conclusion, References


2/11

Acknowledgements:-

First of all I would like to express my sincere gratitude to the almighty for encouraging me to

complete this term paper. The following are some important people in my life who gave me

strength and valuable suggestions to complete the task.

First, my parents, friends, whose love and affection give me strength and encouragement to face

challenges of life.

Second, my sir Bijoy Chatterjee, whose inspiration, motivation spiritual guidance always keeps

me in the right path and keeps my faith in God almighty without whose blessings nothing is

possible.

Finally, thanks for the Lovely Professional University which gave me great opportunity to make

the term paper.

DATED: VISHAL PATYAL

3050060122


3/11

CONTENTS:

An introduction to performance metrics

1. Performance Metrics for Parallel Systems

1. Execution time

2. Total parallel overload

1. Performance metrics for parallel system: Execution time

2. Performance metrics for parallel systems: Total parallel overhead

Performance metrics for parallel systems: Speed up

Performance metrics for parallel systems: Super linear Speedups

Performance metrics for parallel systems: Efficiency

File systems for parallel I/O

Evaluating performance metrics for parallel systems:


4/11

An introduction to performance metrics

1. The term performance

The performance can be considered as a sub-characterstic of dependability, although the

classical literature does not include it in their definition. If user perceive the performance of asystem as bad, for example because of high response times or denied requests due to

overload.

Some of the most commonly used performance metrics are response time, throughput and

utilization.

Response time is defined as the time interval between a user request of a service and the

response of the system. This metric is sometimes called responsiveness. As responces often

cannot be delivered instantaneously, there are two possible implication of this definition.

Either the response time is defined as the time interval between the user issuing a request and

the system starting to respond or finishing .

Performance metrics are measures of an organization's activities and performance. Performance

metrics should support a range of stakeholder needs from, customers, shareholders to employees

while traditionally many metrics are financed based, inwardly focusing on the performance ofthe organization, metrics may also focus on the performance against customers requirements and

value.

Developing performance metrics usually follows a process of:

1. Establishing critical processes/customer requirements,2. Developing measures,

3. Establishing targets which the results can be scored against.

A criticism of performance metrics is that when the value of information is computed using

mathematical methods, it shows that even performance metrics professionals choose measuresthat have little value. This is referred to as the "measurement inversion". For example, metrics

seem to emphasize what organizations find immediately measurable even if those are low

value and tend to ignore high value measurements simply because they seem harder tomeasure (whether they are or not).


5/11

Performance Metrics for Parallel Systems

It is important to study the performance of parallel programs with a view to determining the best

algorithm, evaluating hardware platforms, and examining the benefits from parallelism. A

number of metrics have been used based on the desired outcome of performance analysis.

1. Execution time

The serial runtime of a program is the time elapsed between the beginning and the end of itsexecution on a sequential computer. The parallel runtime is the time that elapses from the

moment a parallel computation starts to the moment the last processing element finishes

execution. We denote the serial runtime by TS and the parallel runtime by TP.

2.Total parallel overload

The overheads incurred by a parallel program are encapsulated into a single expression referred

to as the overhead function. We define overhead function or total overhead of a parallel system

as the total time collectively spent by all the processing elements over and above that required bythe fastest known sequential algorithm for solving the same problem on a single processing

element. We denote the overhead function of a parallel system by the symbol To.

The total time spent in solving a problem summed over all processing elements is pT P. TS units

of this time are spent performing useful work, and the remainder is overhead. Therefore, theoverhead function (To) is given by


6/11

1. Performance metrics for parallel system: Execution time

Serial runtime of a program is the time elapsed between thebeginning and the end of its execution on a sequential computer.

The parallel runtime is the time that elapses from the moment thefirst processor starts to the moment the last processor finishes

execution.

We denote the serial runtime by Ts

and the parallel runtime by Tp

2. Performance metrics for parallel systems: Total parallel overhead

Let T allbe the total time collectively spent by all the processingelements.

TSis the serial time.

Observe that T allTSis then the total time spend by all processorsCombined in non-useful work. This is called the total overhead.

The total time collectively spent by all the processing elementsT all= p TP(p is the number of processors).

The overhead function (To) is therefore given by To= p TP- TS

4. Performance metrics for parallel systems: Speed up

Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the

time required to solve the same problem on a parallel computer withp identicalprocessing elements.

.


7/11

Performance metrics: speed up Example

Speedup (S) is the ratio of the time taken to solve a problem on a singleprocessor to the time required to solve the same problem on a parallel computerwithp identical processing elements.

Consider the problem of adding n numbers by using n processing elements.

Ifn is a power of two, we can perform this operation in log n steps bypropagating partial sums up a logical binary tree of processors.

Example: Computing the global sum of 16 partial sums using 16 processing elements . ji

denotes

Sum of numbers with consecutive labels from itoj.

Performance metrics: Example (continued)

If an addition takes constant time, say, tc and communication

of a single word takes time ts + tw, we have the parallel time

TP= (log n)

We know that TS= (n)

SpeedupSis given byS= (n /log n)

Performance metrics: Speedup

For a given problem, there might be many serial algorithms available. These algorithms

may have different asymptotic runtimes and may be parallelizable to different degrees.

For the purpose of computing speedup, we always consider the best sequential programas the baseline.

Consider the problem of parallel bubble sort. The serial time for bubble sort is 150 seconds.

The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.

The speedup would appear to be 150/40 = 3.75.

But is this really a fair assessment of the system?

What if serial quick sort only took 30 seconds? In this case, the speedup is 30/40 = 0.75.This is a more realistic assessment of the system.


8/11

5. Performance metrics for parallel systems: Super linear Speedups

Resource-based super linearity: The higher aggregate cache/memory bandwidth can result

in better cache-hit ratios, and therefore superlinearity.

Example: A processor with 64KB of cache yields an 80% hit ratio. If two processors areused, since the problem size/processor is smaller, the hit ratio goes up to 90%. Of the remaining

10% access, 8% come from local memory and 2% from remote memory.

If DRAM access time is 100 ns, cache access time is 2 ns, and remote memory access time

is 400ns, this corresponds to a speedup of 2.43!

6. Performance metrics for parallel systems: Efficiency

Efficiency is a measure of the fraction of time for which a processing element is usefully

employed

File systems for parallel I/O

The section opens with a survey of the "The Vesta parallel file system," (Peter F. Corbett and

Dror G. Feitelson, 1996). IBM's AIX Parallel I/O File System, offered on the SP2, is based on

Vesta. While many previous parallel file systems hide how a file is distributed (striped) acrossnodes, Vesta provides user interface-level access to a file's parallel structure. Vesta also aims to

facilitate file access via different types of decomposition of a file's data (i.e., data might beaccessed by row, and also by column). A Vesta file has a 2-dimensional structure, called a cell.A cell is defined as "a container where data can be deposited, or alternatively, as virtual I/O

nodes that are then mapped to available physical I/O nodes." Thus, one file dimension is the cell

dimension, and the other is the data within the cell. As a file is written, a parameter specifies howmany cells it can occupy. Given that the number of cells determines the amount of parallelism a

file uses, that features is useful especially in the presence of small files, since distributing a small

file over many nodes results in potentially poor access to that file. The article describes many

other Vesta features, and offers an evaluation of the file system's performance using a parallelsorting application as an example.

"The Zebra striped network file system," (John H. Hartman and John K. Ousterhout, 1995),describes an extension to log structured file systems. In such a system, "each client forms its newdata for all files into a sequential log that it stripes across the storage servers." That batching of

writes from each node results in much improved performance, especially in the presence of many

small writes. Zebra also takes advantage of RAID-style redundancy techniques: It can recoverfrom a single server failure without loss of information (but not from multiple concurrent server

failures). File block creations, updates, and deletes are communicated between nodes in the form

of deltas. The paper gives an overview of log-structured file systems, and then outlines Zebra's


9/11

components (including a "stripe cleaner" to reclaim unused space). It concludes with a

performance evaluation focusing on small-file writes.

Many parallel file systems hide the striping and other data distribution strategies from clientapplications in order to simplify the interface application programmers must be aware of.

However, that facade also makes it difficult to experiment with, and explore, distributed filesystem characteristics in general. "PPFS: A High-Performance Portable Parallel File System"(James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew A. Chien, and David S.

Blumenthal), which is an updated version of a 1995 original paper, defines a parallel file system

that lends itself to experimentation. It is a rich user-level library implemented on top of anoperating system's native file system, effectively interposing itself between that OS and an

application program. The article outlines PPFS's design philosophy, and provides two benchmark

application results, one sfrom a read-intensive gene sequencing application, and the other based

on a write-intensive electron scattering code from plasma physics.

"The Global File system" (Steven R. Soltis, Thomas M. Ruwart, Grant M. Erickson, Kenneth W.

Preslan, and Matthew T. O'Keefe, 1996) is a cluster-based distributed file system. Instead offocusing on serving a large number of clients, GFS aims to deliver high performance to arelatively small number of systems. Its target applications are those requiring high bandwidth

and large amounts of storage place, such as multimedia information systems. GFS takes

advantage of device locks to ensure data consistency, and outlines a file access pattern that issomewhat similar to how processes access shared memory in an SMP. In GFS, clients obtain a

lock when they read or write data on shared network storage, and those locks correspond to

device-level locks on the storage system; in the prototype implementation, those locks areimplemented with the SCSI Dlock command. Because that design allows GFS to atomically

modify data, GFS clients can remain unaware of each other's presence. To provide a unified

logical storage place, GFS collects storage devices into a network storage pool; subpools divide

network storage pools based on device type. GFS enables clients to export file systems tomachines not directly connected to a storage pool: a GFS client can act as an NFS server, and

can even serve HTTP clients, enabling access to the storage pool via the Web.

"Serverless network file systems" (1996, Thomas E. Anderson, Michael D. Dahlin, Jeanna M.Neefe Matthews, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang) describes xFS,

the file system for the network of workstations (NOW) system. NOW defines a cooperation

mechanism for independent workstations. For storage, that implies that there is no central server,and that file system services are provided collectively by the workstations: Any workstation on

the network can store, cache, and access any block of data. Since any machine on the network

can fail as well, xFS also provides for the redundancy of data. xFS incorporates previous

research and extends it to enable a serverless mode of operation. The paper discusses RAID, log-structured storage, Zebra, different multiprocessor cache consistency mechanisms, and then

focuses on how that research is adapted to xFS's unique needs. For instance, xFS distributes file

system management services among the workstations: It attempts to assign a file used by a clientto a manager located on that client machine. The paper describes a simulation study showing that

co-locating a file's management with that file's client improves locality, and thereby significantly

reduces network-bound client requests. Data distribution in xFS is achieved by a softwareimplementation of RAID, utilizing a Zebra-like log-based striping.


10/11

Evaluating performance metrics for parallel systems:

In conducting any evaluation, we need to identify a set of performance metrics that we would

like to measure and the techniques and tools that will be used to gather these metrics.

Metrics which capture the available computer power are often not a true indicator of theperformance actually delivered by a parallel system. Metrics for parallel systems performance

evaluation should quantify this gap between available and delivered compute power sinceunderstanding application and architectural bottlenecks is crucial for application restricting and

architectural enhancements. Many performance metrics such as speed up, scaled speed up and

isoefficiency, have been proposed to quantify the match between the application and architecturein a parallel system. While these metrics are useful for tracking overall performance trends, they

provide little additional information about where performance is lost. Some of these metrics

attempt to identify the cause of the problem when the parallel system does not scale expected.However, it is essential to find the individual application and architectural artifacts that lead to

these bottlenecks and quantify their relative contribution towards limiting the overall scalability

of the system. Traditional metrics do not help further in this regard.

Parallel system overheads may be broadly classified into purely algorithmic components. A

component arising from the interaction of the application with the system software and a

component arising from the interaction of the application with the hardware.

2.2 Evaluation Techniques

Experimentation, analytical modeling and simulation are three well-known techniques forevaluating parallel systems. Experimentation involves implementing the application on the actual

hardware and measuring its performance. Analytical models abstract hardware and application

details in a parallel system and capture complex system features by simple mathematical

formulae. These formulae are usually parameterized by a limited number of degrees of freedom

so that the analysis is kept tractable. Simulation is a valuable technique which exploits computer

resources to model and imitate the behavior of a real system in a controlled manner. Each

technique has its own limitations. The amount of statistics that can be gleaned by

experimentation (to quantify the overhead functions) is limited by the

monitoring/instrumentation support provided by the underlying system. Additional

instrumentation can sometimes disturb the evaluation. Analytical models are often criticized forthe unrealism and simplifying assumptions made in expressing the complex interaction between

the application and the architecture. Simulation realistic computer system demand considerable

resources in terms of space and time.


11/11

References:

1. http://www google.com

2. http://www.wikipedia.com

3. http://www.guruji.com

4.http://www.google.com/introduction+to+performance+metrics&btnG=Search

5.http://scholar.google.com/s

6. http://www.stormingmedia.us/html
http://www/http://www.google.com/introduction+to+performance+metrics&btnG=Searchhttp://www/http://www.google.com/introduction+to+performance+metrics&btnG=Search

5.final term paper of architecture

Documents