5.final term paper of architecture

Upload: vishal-patyal

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 5.Final Term Paper of Architecture

    1/11

    Pipelined Instruction Processing

    In

    Computer Organization and Architecture

    Vishal Patyal

    Reg. No = 3050060122

    97806-66742 (M)

    (0190) 5230-650 (Home)

    [email protected]

    Term Paper for CSE-211

    Computer Arithmetic

    Fifth Term 2008

    ABSTRACT

    As we know this is new introduction of term paper which is nothing but minor

    project of computer system organization and architecture in which i deeply

    studied about the performance metrics for parallel system which is very

    helpful to us in the future in the field of IT industry. In this I studied about one

    frequently needs to compare the performance of two or more parallel computers, but how should

    this be done? This report discusses some of these issues so that the problems they can cause may

    be avoided in the future.We review the many performance metrics that have been proposed forparallel systems (i.e., program- architecture combinations). These include the many variants of

    speedup, efficiency, and isoefficiency. We give reasons why none of these metrics should be

    used independent of the run time of the parallel system.

    Keywords: -Principle, Problems and Solution, Advantages & Disadvantages, Examples,

    Implementation, Development, Conclusion, References

  • 8/14/2019 5.Final Term Paper of Architecture

    2/11

    Acknowledgements:-

    First of all I would like to express my sincere gratitude to the almighty for encouraging me to

    complete this term paper. The following are some important people in my life who gave me

    strength and valuable suggestions to complete the task.

    First, my parents, friends, whose love and affection give me strength and encouragement to face

    challenges of life.

    Second, my sir Bijoy Chatterjee, whose inspiration, motivation spiritual guidance always keeps

    me in the right path and keeps my faith in God almighty without whose blessings nothing is

    possible.

    Finally, thanks for the Lovely Professional University which gave me great opportunity to make

    the term paper.

    DATED: VISHAL PATYAL

    3050060122

  • 8/14/2019 5.Final Term Paper of Architecture

    3/11

    CONTENTS:

    An introduction to performance metrics

    1. Performance Metrics for Parallel Systems

    1. Execution time

    2. Total parallel overload

    1. Performance metrics for parallel system: Execution time

    2. Performance metrics for parallel systems: Total parallel overhead

    Performance metrics for parallel systems: Speed up

    Performance metrics for parallel systems: Super linear Speedups

    Performance metrics for parallel systems: Efficiency

    File systems for parallel I/O

    Evaluating performance metrics for parallel systems:

  • 8/14/2019 5.Final Term Paper of Architecture

    4/11

    An introduction to performance metrics

    1. The term performance

    The performance can be considered as a sub-characterstic of dependability, although the

    classical literature does not include it in their definition. If user perceive the performance of asystem as bad, for example because of high response times or denied requests due to

    overload.

    Some of the most commonly used performance metrics are response time, throughput and

    utilization.

    Response time is defined as the time interval between a user request of a service and the

    response of the system. This metric is sometimes called responsiveness. As responces often

    cannot be delivered instantaneously, there are two possible implication of this definition.

    Either the response time is defined as the time interval between the user issuing a request and

    the system starting to respond or finishing .

    Performance metrics are measures of an organization's activities and performance. Performance

    metrics should support a range of stakeholder needs from, customers, shareholders to employees

    while traditionally many metrics are financed based, inwardly focusing on the performance ofthe organization, metrics may also focus on the performance against customers requirements and

    value.

    Developing performance metrics usually follows a process of:

    1. Establishing critical processes/customer requirements,2. Developing measures,

    3. Establishing targets which the results can be scored against.

    A criticism of performance metrics is that when the value of information is computed using

    mathematical methods, it shows that even performance metrics professionals choose measuresthat have little value. This is referred to as the "measurement inversion". For example, metrics

    seem to emphasize what organizations find immediately measurable even if those are low

    value and tend to ignore high value measurements simply because they seem harder tomeasure (whether they are or not).

  • 8/14/2019 5.Final Term Paper of Architecture

    5/11

    Performance Metrics for Parallel Systems

    It is important to study the performance of parallel programs with a view to determining the best

    algorithm, evaluating hardware platforms, and examining the benefits from parallelism. A

    number of metrics have been used based on the desired outcome of performance analysis.

    1. Execution time

    The serial runtime of a program is the time elapsed between the beginning and the end of itsexecution on a sequential computer. The parallel runtime is the time that elapses from the

    moment a parallel computation starts to the moment the last processing element finishes

    execution. We denote the serial runtime by TS and the parallel runtime by TP.

    2.Total parallel overload

    The overheads incurred by a parallel program are encapsulated into a single expression referred

    to as the overhead function. We define overhead function or total overhead of a parallel system

    as the total time collectively spent by all the processing elements over and above that required bythe fastest known sequential algorithm for solving the same problem on a single processing

    element. We denote the overhead function of a parallel system by the symbol To.

    The total time spent in solving a problem summed over all processing elements is pT P. TS units

    of this time are spent performing useful work, and the remainder is overhead. Therefore, theoverhead function (To) is given by

  • 8/14/2019 5.Final Term Paper of Architecture

    6/11

    1. Performance metrics for parallel system: Execution time

    Serial runtime of a program is the time elapsed between thebeginning and the end of its execution on a sequential computer.

    The parallel runtime is the time that elapses from the moment thefirst processor starts to the moment the last processor finishes

    execution.

    We denote the serial runtime by Ts

    and the parallel runtime by Tp

    2. Performance metrics for parallel systems: Total parallel overhead

    Let T allbe the total time collectively spent by all the processingelements.

    TSis the serial time.

    Observe that T allTSis then the total time spend by all processorsCombined in non-useful work. This is called the total overhead.

    The total time collectively spent by all the processing elementsT all= p TP(p is the number of processors).

    The overhead function (To) is therefore given by To= p TP- TS

    4. Performance metrics for parallel systems: Speed up

    Speedup (S) is the ratio of the time taken to solve a problem on a single processor to the

    time required to solve the same problem on a parallel computer withp identicalprocessing elements.

    .

  • 8/14/2019 5.Final Term Paper of Architecture

    7/11

    Performance metrics: speed up Example

    Speedup (S) is the ratio of the time taken to solve a problem on a singleprocessor to the time required to solve the same problem on a parallel computerwithp identical processing elements.

    Consider the problem of adding n numbers by using n processing elements.

    Ifn is a power of two, we can perform this operation in log n steps bypropagating partial sums up a logical binary tree of processors.

    Example: Computing the global sum of 16 partial sums using 16 processing elements . ji

    denotes

    Sum of numbers with consecutive labels from itoj.

    Performance metrics: Example (continued)

    If an addition takes constant time, say, tc and communication

    of a single word takes time ts + tw, we have the parallel time

    TP= (log n)

    We know that TS= (n)

    SpeedupSis given byS= (n /log n)

    Performance metrics: Speedup

    For a given problem, there might be many serial algorithms available. These algorithms

    may have different asymptotic runtimes and may be parallelizable to different degrees.

    For the purpose of computing speedup, we always consider the best sequential programas the baseline.

    Consider the problem of parallel bubble sort. The serial time for bubble sort is 150 seconds.

    The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.

    The speedup would appear to be 150/40 = 3.75.

    But is this really a fair assessment of the system?

    What if serial quick sort only took 30 seconds? In this case, the speedup is 30/40 = 0.75.This is a more realistic assessment of the system.

  • 8/14/2019 5.Final Term Paper of Architecture

    8/11

    5. Performance metrics for parallel systems: Super linear Speedups

    Resource-based super linearity: The higher aggregate cache/memory bandwidth can result

    in better cache-hit ratios, and therefore superlinearity.

    Example: A processor with 64KB of cache yields an 80% hit ratio. If two processors areused, since the problem size/processor is smaller, the hit ratio goes up to 90%. Of the remaining

    10% access, 8% come from local memory and 2% from remote memory.

    If DRAM access time is 100 ns, cache access time is 2 ns, and remote memory access time

    is 400ns, this corresponds to a speedup of 2.43!

    6. Performance metrics for parallel systems: Efficiency

    Efficiency is a measure of the fraction of time for which a processing element is usefully

    employed

    File systems for parallel I/O

    The section opens with a survey of the "The Vesta parallel file system," (Peter F. Corbett and

    Dror G. Feitelson, 1996). IBM's AIX Parallel I/O File System, offered on the SP2, is based on

    Vesta. While many previous parallel file systems hide how a file is distributed (striped) acrossnodes, Vesta provides user interface-level access to a file's parallel structure. Vesta also aims to

    facilitate file access via different types of decomposition of a file's data (i.e., data might beaccessed by row, and also by column). A Vesta file has a 2-dimensional structure, called a cell.A cell is defined as "a container where data can be deposited, or alternatively, as virtual I/O

    nodes that are then mapped to available physical I/O nodes." Thus, one file dimension is the cell

    dimension, and the other is the data within the cell. As a file is written, a parameter specifies howmany cells it can occupy. Given that the number of cells determines the amount of parallelism a

    file uses, that features is useful especially in the presence of small files, since distributing a small

    file over many nodes results in potentially poor access to that file. The article describes many

    other Vesta features, and offers an evaluation of the file system's performance using a parallelsorting application as an example.

    "The Zebra striped network file system," (John H. Hartman and John K. Ousterhout, 1995),describes an extension to log structured file systems. In such a system, "each client forms its newdata for all files into a sequential log that it stripes across the storage servers." That batching of

    writes from each node results in much improved performance, especially in the presence of many

    small writes. Zebra also takes advantage of RAID-style redundancy techniques: It can recoverfrom a single server failure without loss of information (but not from multiple concurrent server

    failures). File block creations, updates, and deletes are communicated between nodes in the form

    of deltas. The paper gives an overview of log-structured file systems, and then outlines Zebra's

  • 8/14/2019 5.Final Term Paper of Architecture

    9/11

    components (including a "stripe cleaner" to reclaim unused space). It concludes with a

    performance evaluation focusing on small-file writes.

    Many parallel file systems hide the striping and other data distribution strategies from clientapplications in order to simplify the interface application programmers must be aware of.

    However, that facade also makes it difficult to experiment with, and explore, distributed filesystem characteristics in general. "PPFS: A High-Performance Portable Parallel File System"(James V. Huber, Jr., Christopher L. Elford, Daniel A. Reed, Andrew A. Chien, and David S.

    Blumenthal), which is an updated version of a 1995 original paper, defines a parallel file system

    that lends itself to experimentation. It is a rich user-level library implemented on top of anoperating system's native file system, effectively interposing itself between that OS and an

    application program. The article outlines PPFS's design philosophy, and provides two benchmark

    application results, one sfrom a read-intensive gene sequencing application, and the other based

    on a write-intensive electron scattering code from plasma physics.

    "The Global File system" (Steven R. Soltis, Thomas M. Ruwart, Grant M. Erickson, Kenneth W.

    Preslan, and Matthew T. O'Keefe, 1996) is a cluster-based distributed file system. Instead offocusing on serving a large number of clients, GFS aims to deliver high performance to arelatively small number of systems. Its target applications are those requiring high bandwidth

    and large amounts of storage place, such as multimedia information systems. GFS takes

    advantage of device locks to ensure data consistency, and outlines a file access pattern that issomewhat similar to how processes access shared memory in an SMP. In GFS, clients obtain a

    lock when they read or write data on shared network storage, and those locks correspond to

    device-level locks on the storage system; in the prototype implementation, those locks areimplemented with the SCSI Dlock command. Because that design allows GFS to atomically

    modify data, GFS clients can remain unaware of each other's presence. To provide a unified

    logical storage place, GFS collects storage devices into a network storage pool; subpools divide

    network storage pools based on device type. GFS enables clients to export file systems tomachines not directly connected to a storage pool: a GFS client can act as an NFS server, and

    can even serve HTTP clients, enabling access to the storage pool via the Web.

    "Serverless network file systems" (1996, Thomas E. Anderson, Michael D. Dahlin, Jeanna M.Neefe Matthews, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang) describes xFS,

    the file system for the network of workstations (NOW) system. NOW defines a cooperation

    mechanism for independent workstations. For storage, that implies that there is no central server,and that file system services are provided collectively by the workstations: Any workstation on

    the network can store, cache, and access any block of data. Since any machine on the network

    can fail as well, xFS also provides for the redundancy of data. xFS incorporates previous

    research and extends it to enable a serverless mode of operation. The paper discusses RAID, log-structured storage, Zebra, different multiprocessor cache consistency mechanisms, and then

    focuses on how that research is adapted to xFS's unique needs. For instance, xFS distributes file

    system management services among the workstations: It attempts to assign a file used by a clientto a manager located on that client machine. The paper describes a simulation study showing that

    co-locating a file's management with that file's client improves locality, and thereby significantly

    reduces network-bound client requests. Data distribution in xFS is achieved by a softwareimplementation of RAID, utilizing a Zebra-like log-based striping.

  • 8/14/2019 5.Final Term Paper of Architecture

    10/11

    Evaluating performance metrics for parallel systems:

    In conducting any evaluation, we need to identify a set of performance metrics that we would

    like to measure and the techniques and tools that will be used to gather these metrics.

    Metrics which capture the available computer power are often not a true indicator of theperformance actually delivered by a parallel system. Metrics for parallel systems performance

    evaluation should quantify this gap between available and delivered compute power sinceunderstanding application and architectural bottlenecks is crucial for application restricting and

    architectural enhancements. Many performance metrics such as speed up, scaled speed up and

    isoefficiency, have been proposed to quantify the match between the application and architecturein a parallel system. While these metrics are useful for tracking overall performance trends, they

    provide little additional information about where performance is lost. Some of these metrics

    attempt to identify the cause of the problem when the parallel system does not scale expected.However, it is essential to find the individual application and architectural artifacts that lead to

    these bottlenecks and quantify their relative contribution towards limiting the overall scalability

    of the system. Traditional metrics do not help further in this regard.

    Parallel system overheads may be broadly classified into purely algorithmic components. A

    component arising from the interaction of the application with the system software and a

    component arising from the interaction of the application with the hardware.

    2.2 Evaluation Techniques

    Experimentation, analytical modeling and simulation are three well-known techniques forevaluating parallel systems. Experimentation involves implementing the application on the actual

    hardware and measuring its performance. Analytical models abstract hardware and application

    details in a parallel system and capture complex system features by simple mathematical

    formulae. These formulae are usually parameterized by a limited number of degrees of freedom

    so that the analysis is kept tractable. Simulation is a valuable technique which exploits computer

    resources to model and imitate the behavior of a real system in a controlled manner. Each

    technique has its own limitations. The amount of statistics that can be gleaned by

    experimentation (to quantify the overhead functions) is limited by the

    monitoring/instrumentation support provided by the underlying system. Additional

    instrumentation can sometimes disturb the evaluation. Analytical models are often criticized forthe unrealism and simplifying assumptions made in expressing the complex interaction between

    the application and the architecture. Simulation realistic computer system demand considerable

    resources in terms of space and time.

  • 8/14/2019 5.Final Term Paper of Architecture

    11/11

    References:

    1. http://www google.com

    2. http://www.wikipedia.com

    3. http://www.guruji.com

    4.http://www.google.com/introduction+to+performance+metrics&btnG=Search

    5.http://scholar.google.com/s

    6. http://www.stormingmedia.us/html

    http://www/http://www.google.com/introduction+to+performance+metrics&btnG=Searchhttp://www/http://www.google.com/introduction+to+performance+metrics&btnG=Search