5.final term paper of architecture
TRANSCRIPT
-
8/14/2019 5.Final Term Paper of Architecture
1/11
Pipelined Instruction Processing
In
Computer Organization and Architecture
Vishal Patyal
Reg. No = 3050060122
97806-66742 (M)
(0190) 5230-650 (Home)
Term Paper for CSE-211
Computer Arithmetic
Fifth Term 2008
ABSTRACT
As we know this is new introduction of term paper which is nothing but minor
project of computer system organization and architecture in which i deeply
studied about the performance metrics for parallel system which is very
helpful to us in the future in the field of IT industry. In this I studied about one
frequently needs to compare the performance of two or more parallel computers, but how should
this be done? This report discusses some of these issues so that the problems they can cause may
be avoided in the future.We review the many performance metrics that have been proposed forparallel systems (i.e., program- architecture combinations). These include the many variants of
speedup, efficiency, and isoefficiency. We give reasons why none of these metrics should be
used independent of the run time of the parallel system.
Keywords: -Principle, Problems and Solution, Advantages & Disadvantages, Examples,
Implementation, Development, Conclusion, References
-
8/14/2019 5.Final Term Paper of Architecture
2/11
Acknowledgements:-
First of all I would like to express my sincere gratitude to the almighty for encouraging me to
complete this term paper. The following are some important people in my life who gave me
strength and valuable suggestions to complete the task.
First, my parents, friends, whose love and affection give me strength and encouragement to face
challenges of life.
Second, my sir Bijoy Chatterjee, whose inspiration, motivation spiritual guidance always keeps
me in the right path and keeps my faith in God almighty without whose blessings nothing is
possible.
Finally, thanks for the Lovely Professional University which gave me great opportunity to make
the term paper.
DATED: VISHAL PATYAL
3050060122
-
8/14/2019 5.Final Term Paper of Architecture
3/11
CONTENTS:
An introduction to performance metrics
1. Performance Metrics for Parallel Systems
1. Execution time
2. Total parallel overload
1. Performance metrics for parallel system: Execution time
2. Performance metrics for parallel systems: Total parallel overhead
Performance metrics for parallel systems: Speed up
Performance metrics for parallel systems: Super linear Speedups
Performance metrics for parallel systems: Efficiency
File systems for parallel I/O
Evaluating performance metrics for parallel systems:
-
8/14/2019 5.Final Term Paper of Architecture
4/11
An introduction to performance metrics
1. The term performance
The performance can be considered as a sub-characterstic of dependability, although the
classical literature does not include it in their definition. If user perceive the performance of asystem as bad, for example because of high response times or denied requests due to
overload.
Some of the most commonly used performance metrics are response time, throughput and
utilization.
Response time is defined as the time interval between a user request of a service and the
response of the system. This metric is sometimes called responsiveness. As responces often
cannot be delivered instantaneously, there are two possible implication of this definition.
Either the response time is defined as the time interval between the user issuing a request and
the system starting to respond or finishing .
Performance metrics are measures of an organization's activities and performance. Performance
metrics should support a range of stakeholder needs from, customers, shareholders to employees
while traditionally many metrics are financed based, inwardly focusing on the performance ofthe organization, metrics may also focus on the performance against customers requirements and
value.
Developing performance metrics usually follows a process of:
1. Establishing critical processes/customer requirements,2. Developing measures,
3. Establishing targets which the results can be scored against.
A criticism of performance metrics is that when the value of information is computed using
mathematical methods, it shows that even performance metrics professionals choose measuresthat have little value. This is referred to as the "measurement inversion". For example, metrics
seem to emphasize what organizations find immediately measurable even if those are low
value and tend to ignore high value measurements simply because they seem harder tomeasure (whether they are or not).
-
8/14/2019 5.Final Term Paper of Architecture
5/11
Performance Metrics for Parallel Systems
It is important to study the performance of parallel programs with a view to determining the best
algorithm, evaluating hardware platforms, and examining the benefits from parallelism. A
number of metrics have been used based on the desired outcome of performance analysis.
1. Execution time
The serial runtime of a program is the time elapsed between the beginning and the end of itsexecution on a sequential computer. The parallel runtime is the time that elapses from the
moment a parallel computation starts to the moment the last processing element finishes
execution. We denote the serial runtime by TS and the parallel runtime by TP.
2.Total parallel overload
The overheads incurred by a parallel program are encapsulated into a single expression referred
to as the overhead function. We define overhead function or total overhead of a parallel system
as the total time collectively spent by all the processing elements over and above that required bythe fastest known sequential algorithm for solving the same problem on a single processing
element. We denote the overhead function of a parallel system by the symbol To.
The total time spent in solving a problem summed over all processing elements is pT P. TS units
of this time are spent performing useful work, and the remainder is overhead. Therefore, theoverhead function (To) is given by
-
8/14/2019 5.Final Term Paper of Architecture
6/11
1. Performance metrics for parallel system: Execution time
Serial runtime of a program is the time elapsed between thebeginning and the end of its execution on a sequential computer.
The parallel runtime is the time that elapses from the moment thefirst processor starts to the moment the last processor finishes
execution.
We denote the serial runtime by Ts
and the parallel runtime by Tp
2. Performance metrics for parallel systems: Total parallel overhead
Let T allbe the total time collectively spent by all the processingelements.
TSis the serial time.
Observe that T allTSis then the total time spend by all processorsCombined in non-useful work. This is called the total overhead.
The total time collectively spent by all the processing elementsT all= p TP(p is the number of processors).
The overhead function (To) is therefore given by To= p TP- TS
4. Performance metrics for parallel systems: Speed up
Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the
time required to solve the same problem on a parallel computer withp identicalprocessing elements.
.
-
8/14/2019 5.Final Term Paper of Architecture
7/11
Performance metrics: speed up Example
Speedup (S) is the ratio of the time taken to solve a problem on a singleprocessor to the time required to solve the same problem on a parallel computerwithp identical processing elements.
Consider the problem of adding n numbers by using n processing elements.
Ifn is a power of two, we can perform this operation in log n steps bypropagating partial sums up a logical binary tree of processors.
Example: Computing the global sum of 16 partial sums using 16 processing elements . ji
denotes
Sum of numbers with consecutive labels from itoj.
Performance metrics: Example (continued)
If an addition takes constant time, say, tc and communication
of a single word takes time ts + tw, we have the parallel time
TP= (log n)
We know that TS= (n)
SpeedupSis given byS= (n /log n)
Performance metrics: Speedup
For a given problem, there might be many serial algorithms available. These algorithms
may have different asymptotic runtimes and may be parallelizable to different degrees.
For the purpose of computing speedup, we always consider the best sequential programas the baseline.
Consider the problem of parallel bubble sort. The serial time for bubble sort is 150 seconds.
The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.
The speedup would appear to be 150/40 = 3.75.
But is this really a fair assessment of the system?
What if serial quick sort only took 30 seconds? In this case, the speedup is 30/40 = 0.75.This is a more realistic assessment of the system.
-
8/14/2019 5.Final Term Paper of Architecture
8/11
5. Performance metrics for parallel systems: Super linear Speedups
Resource-based super linearity: The higher aggregate cache/memory bandwidth can result
in better cache-hit ratios, and therefore superlinearity.
Example: A processor with 64KB of cache yields an 80% hit ratio. If two processors areused, since the problem size/processor is smaller, the hit ratio goes up to 90%. Of the remaining
10% access, 8% come from local memory and 2% from remote memory.
If DRAM access time is 100 ns, cache access time is 2 ns, and remote memory access time
is 400ns, this corresponds to a speedup of 2.43!
6. Performance metrics for parallel systems: Efficiency
Efficiency is a measure of the fraction of time for which a processing element is usefully
employed
File systems for parallel I/O
The section opens with a survey of the "The Vesta parallel file system," (Peter F. Corbett and
Dror G. Feitelson, 1996). IBM's AIX Parallel I/O File System, offered on the SP2, is based on
Vesta. While many previous parallel file systems hide how a file is distributed (striped) acrossnodes, Vesta provides user interface-level access to a file's parallel structure. Vesta also aims to
facilitate file access via different types of decomposition of a file's data (i.e., data might beaccessed by row, and also by column). A Vesta file has a 2-dimensional structure, called a cell.A cell is defined as "a container where data can be deposited, or alternatively, as virtual I/O
nodes that are then mapped to available physical I/O nodes." Thus, one file dimension is the cell
dimension, and the other is the data within the cell. As a file is written, a parameter specifies howmany cells it can occupy. Given that the number of cells determines the amount of parallelism a
file uses, that features is useful especially in the presence of small files, since distributing a small
file over many nodes results in potentially poor access to that file. The article describes many
other Vesta features, and offers an evaluation of the file system's performance using a parallelsorting application as an example.
"The Zebra striped network file system," (John H. Hartman and John K. Ousterhout, 1995),describes an extension to log structured file systems. In such a system, "each client forms its newdata for all files into a sequential log that it stripes across the storage servers." That batching of
writes from each node results in much improved performance, especially in the presence of many
small writes. Zebra also takes advantage of RAID-style redundancy techniques: It can recoverfrom a single server failure without loss of information (but not from multiple concurrent server
failures). File block creations, updates, and deletes are communicated between nodes in the form
of deltas. The paper gives an overview of log-structured file systems, and then outlines Zebra's
-
8/14/2019 5.Final Term Paper of Architecture
9/11
components (including a "stripe cleaner" to reclaim unused space). It concludes with a
performance evaluation focusing on small-file writes.
Many parallel file systems hide the striping and other data distribution strategies from clientapplications in order to simplify the interface application programmers must be aware of.
However, that facade also makes it difficult to experiment with, and explore, distributed filesystem characteristics in general. "PPFS: A High-Performance Portable Parallel File System"(James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew A. Chien, and David S.
Blumenthal), which is an updated version of a 1995 original paper, defines a parallel file system
that lends itself to experimentation. It is a rich user-level library implemented on top of anoperating system's native file system, effectively interposing itself between that OS and an
application program. The article outlines PPFS's design philosophy, and provides two benchmark
application results, one sfrom a read-intensive gene sequencing application, and the other based
on a write-intensive electron scattering code from plasma physics.
"The Global File system" (Steven R. Soltis, Thomas M. Ruwart, Grant M. Erickson, Kenneth W.
Preslan, and Matthew T. O'Keefe, 1996) is a cluster-based distributed file system. Instead offocusing on serving a large number of clients, GFS aims to deliver high performance to arelatively small number of systems. Its target applications are those requiring high bandwidth
and large amounts of storage place, such as multimedia information systems. GFS takes
advantage of device locks to ensure data consistency, and outlines a file access pattern that issomewhat similar to how processes access shared memory in an SMP. In GFS, clients obtain a
lock when they read or write data on shared network storage, and those locks correspond to
device-level locks on the storage system; in the prototype implementation, those locks areimplemented with the SCSI Dlock command. Because that design allows GFS to atomically
modify data, GFS clients can remain unaware of each other's presence. To provide a unified
logical storage place, GFS collects storage devices into a network storage pool; subpools divide
network storage pools based on device type. GFS enables clients to export file systems tomachines not directly connected to a storage pool: a GFS client can act as an NFS server, and
can even serve HTTP clients, enabling access to the storage pool via the Web.
"Serverless network file systems" (1996, Thomas E. Anderson, Michael D. Dahlin, Jeanna M.Neefe Matthews, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang) describes xFS,
the file system for the network of workstations (NOW) system. NOW defines a cooperation
mechanism for independent workstations. For storage, that implies that there is no central server,and that file system services are provided collectively by the workstations: Any workstation on
the network can store, cache, and access any block of data. Since any machine on the network
can fail as well, xFS also provides for the redundancy of data. xFS incorporates previous
research and extends it to enable a serverless mode of operation. The paper discusses RAID, log-structured storage, Zebra, different multiprocessor cache consistency mechanisms, and then
focuses on how that research is adapted to xFS's unique needs. For instance, xFS distributes file
system management services among the workstations: It attempts to assign a file used by a clientto a manager located on that client machine. The paper describes a simulation study showing that
co-locating a file's management with that file's client improves locality, and thereby significantly
reduces network-bound client requests. Data distribution in xFS is achieved by a softwareimplementation of RAID, utilizing a Zebra-like log-based striping.
-
8/14/2019 5.Final Term Paper of Architecture
10/11
Evaluating performance metrics for parallel systems:
In conducting any evaluation, we need to identify a set of performance metrics that we would
like to measure and the techniques and tools that will be used to gather these metrics.
Metrics which capture the available computer power are often not a true indicator of theperformance actually delivered by a parallel system. Metrics for parallel systems performance
evaluation should quantify this gap between available and delivered compute power sinceunderstanding application and architectural bottlenecks is crucial for application restricting and
architectural enhancements. Many performance metrics such as speed up, scaled speed up and
isoefficiency, have been proposed to quantify the match between the application and architecturein a parallel system. While these metrics are useful for tracking overall performance trends, they
provide little additional information about where performance is lost. Some of these metrics
attempt to identify the cause of the problem when the parallel system does not scale expected.However, it is essential to find the individual application and architectural artifacts that lead to
these bottlenecks and quantify their relative contribution towards limiting the overall scalability
of the system. Traditional metrics do not help further in this regard.
Parallel system overheads may be broadly classified into purely algorithmic components. A
component arising from the interaction of the application with the system software and a
component arising from the interaction of the application with the hardware.
2.2 Evaluation Techniques
Experimentation, analytical modeling and simulation are three well-known techniques forevaluating parallel systems. Experimentation involves implementing the application on the actual
hardware and measuring its performance. Analytical models abstract hardware and application
details in a parallel system and capture complex system features by simple mathematical
formulae. These formulae are usually parameterized by a limited number of degrees of freedom
so that the analysis is kept tractable. Simulation is a valuable technique which exploits computer
resources to model and imitate the behavior of a real system in a controlled manner. Each
technique has its own limitations. The amount of statistics that can be gleaned by
experimentation (to quantify the overhead functions) is limited by the
monitoring/instrumentation support provided by the underlying system. Additional
instrumentation can sometimes disturb the evaluation. Analytical models are often criticized forthe unrealism and simplifying assumptions made in expressing the complex interaction between
the application and the architecture. Simulation realistic computer system demand considerable
resources in terms of space and time.
-
8/14/2019 5.Final Term Paper of Architecture
11/11
References:
1. http://www google.com
2. http://www.wikipedia.com
3. http://www.guruji.com
4.http://www.google.com/introduction+to+performance+metrics&btnG=Search
5.http://scholar.google.com/s
6. http://www.stormingmedia.us/html
http://www/http://www.google.com/introduction+to+performance+metrics&btnG=Searchhttp://www/http://www.google.com/introduction+to+performance+metrics&btnG=Search