understanding performance benchmarks - dell performance benchmarks ... bapco, tpc, and the storage...
TRANSCRIPT
POWER SOLUTIONS June 200488
Understanding
Performance BenchmarksBenchmarks provide objective information that can be used to compare computer plat-forms, components, operating systems, and specific system configurations. This articlediscusses characteristics of credible benchmarks, guidelines for evaluating benchmarkresults, and some of the main benchmarks used at Dell for assessing the performanceof server, workstation, and client systems.
At their best, performance benchmarks provide impar-
tial information that can be used to evaluate and
compare the performance of computer systems. Dell and
the computer industry promote objective and credible
benchmarking in various ways, including participation in
standards bodies such as the Standard Performance Eval-
uation Corporation (SPEC), Business Applications Perfor-
mance Corporation (BAPCo), Transaction Processing
Performance Council (TPC), and Storage Performance
Council. When properly run and documented, the bench-
marks produced by these and other groups help provide
objective information that can be used to compare com-
puter platforms, components, operating systems, and spe-
cific system configurations.
Dell is committed to furthering industry practices that
yield objective industry-standard benchmark results. Orga-
nizations can use these benchmarks to evaluate and com-
pare Dell™ systems to competitors’ systems. Dell also uses
the benchmarks when developing new products and
assessing new technologies.
The Dell benchmark philosophy is based on three
tenets:
• Benchmark in a way that closely resembles how
organizations use applications on Dell systems
• Ensure that anyone can reproduce results with a
system shipped directly from Dell, using publicly
available drivers
• Promote benchmark and run-rule changes that
reflect this approach to benchmarking
This article discusses characteristics of credible bench-
marks and presents high-level guidelines for evaluating
benchmark results. It concludes with a list of the key
benchmarks used at Dell to evaluate server, workstation,
and client system performance.
Characteristics of credible performance benchmarksA computer performance benchmark is a standard by
which a computer system can be measured and judged.
Many of the well-known benchmarks are developed and
regulated by standards organizations such as SPEC and
BAPCo. Just as common are unregulated benchmarks that
measure system performance when running specific appli-
cations such as Adobe® Photoshop®, Microsoft® Exchange,
Parametric® Pro/E®, or Id Software® Quake III® software.
These benchmarks can help administrators evaluate system
performance on a single, critical application such as Pro/E
or Microsoft Exchange. Such benchmarks can be run—and
their results reported—with varying degrees of flexibility.
SYSTEM ARCHITECTURE
BY SHARON HANSON, DIEGO ESTEVES, AND CLINT ESPINOZA
088-90 Hanson.qxd 6/16/04 3:01 PM Page 88
In contrast, regulated benchmarks tend to have well-defined and
documented methodologies, and their results are documented and
reproducible. A good example is the SPEC® CPU2000 benchmark,
which is produced by SPEC, a nonprofit corporation. According to
SPEC, the organization’s mission is to establish, maintain, and
endorse a standardized set of relevant benchmarks. SPEC develops
suites of benchmarks and also reviews and publishes submitted
results from member organizations and other benchmark licensees.1
The SPEC organization has industry-wide representation and its
benchmark suites are well accepted and credible.
The SPEC CPU2000 benchmark provides performance meas-
urements that can be used to compare compute-intensive work-
loads (both integer and floating point) on different computer
systems. These compute-intensive benchmarks measure the perfor-
mance of a system’s processor,
memory architecture, and com-
piler. CPU2000 consists of a set
of objective tests that must be
compiled and run according to
SPEC run rules. SPEC provides
the benchmarks as source code
so they can be compiled to run
on a variety of platforms, includ-
ing industry-standard Intel®
architecture–based systems and
SPARC® processor–based Sun™
systems.
In addition, SPEC provides
guidelines for legitimately opti-
mizing the performance of tested
systems on the benchmark. These
guidelines are designed to ensure that the hardware and software
configurations of tested systems are suitable to run real-world appli-
cations. The organization also requires a full disclosure report,
which provides benchmark results and configuration details suffi-
cient to independently reproduce the results. SPEC encourages
submission of reports for publication on the SPEC Web site
(http://www.spec.org). These reports undergo a peer-review process
before publication. Because of these rigorous requirements, CPU2000
benchmark results that are published on the SPEC Web site are
widely used to compare the CPU, memory, and compiler perfor-
mance of client and server systems.
BAPCo, TPC, and the Storage Performance Council are also
nonprofit corporations that provide industry-standard benchmarks
widely used to compare the performance of client, server, and stor-
age systems. TPC was founded to define transaction processing and
database benchmarks. The BAPCo charter is to develop and distribute
a set of objective performance benchmarks based on popular com-
puter applications and industry-standard operating systems. The
goal of the Storage Performance Council is to define, promote, and
enforce vendor-neutral benchmarks that characterize the perfor-
mance of storage subsystems.
Guidelines for evaluating benchmark resultsWhen using benchmark results to evaluate and compare systems,
administrators should understand the benchmark, be aware of
system optimizations, and ensure comparable system compar-
isons, as follows.
Understand the benchmarkIt is essential to understand which aspects of system performance
a benchmark is testing as well as what the system’s workload will
be. Those who are evaluating benchmarks should consider whether
the benchmark workload is reasonably representative of the real-
world applications that will be run on the system. For instance, if
a client system will be used to run mainstream business produc-
tivity applications, the BAPCo SYSmark® or Ziff Davis® Business
Winstone® benchmarks are good candidates.2 On the other hand,
if the test subject is a workstation system that will be used prima-
rily to run Pro/E, the Pro/E application benchmark is suitable. If
possible, those who are evaluating benchmarks should focus on reg-
ulated benchmarks from standards bodies such as SPEC and BAPCo
or on benchmarks that are standard industry applications.
Application benchmarks can be run with a variety of inputs, each
of which attempts to represent different usage scenarios. For exam-
ple, Adobe Photoshop performance varies greatly depending on the
size of the image and the operations performed on it. Moreover, some
Photoshop operations may be better suited or optimized for a par-
ticular system architecture. Even within a particular operation (such
as the Gaussian Blur filter), the end user may be able to modify how
the filter is applied. Different code algorithms may be used, result-
ing in significantly different performance results. These variables make
it relatively easy to create a suite of Photoshop benchmark opera-
tions that greatly favor a particular system architecture. For this
reason, Dell recommends that organizations look beyond summary
benchmark results to help ensure that the operations performed are
representative of their specific usage models.
Be aware of system optimizationsSome optimization of the tested system is expected and allowed on
all benchmarks. SPEC outlines broad optimization guidelines in its
run rules for each benchmark. The expectation of these guidelines
SYSTEM ARCHITECTURE
www.dell.com/powersolutions POWER SOLUTIONS 89
1 For more information about SPEC, visit http://www.spec.org.2 For more information about the BAPCo SYSmark benchmark, visit www.bapco.com; for more information about the Ziff Davis Winstone benchmark, visit http://www.veritest.com/benchmarks/bwinstone/default.asp.
Those who are evaluating
benchmarks should
consider whether the
benchmark workload is
reasonably representative
of the real-world
applications that will be
run on the system.
088-90 Hanson.qxd 6/16/04 3:01 PM Page 89
is to avoid optimizations that are so extreme as to render the system
unsuitable for real-world applications. For example, when running
SPEC benchmarks, Dell often uses publicly available compilers that
support new CPU features. These features can improve system perfor-
mance and better demonstrate the capability of Dell systems. This
practice conforms to the spirit of the SPEC guidelines. The compil-
ers are publicly available to software developers to use in building
their own applications; therefore, the benchmark results are repre-
sentative of possible real-world applications.
In contrast, it is not uncommon for a benchmark to be run on
a system that has been specially tuned to do well on the bench-
mark. Such tuning can be so extreme that the benchmark results
are neither credible nor useful. Even regulated benchmarks can be
misused in this way, so it is important that the benchmark results
include complete configuration information for the tested system.
Ensure comparable system comparisonsWhen comparing the benchmark results of systems from multiple
vendors, test engineers should ensure that the tested systems and
their benchmark settings are comparable. This requires organizations
conducting benchmark tests to supply adequate documentation for
system and benchmark configurations.
Benchmarking at Dell Dell uses benchmarks throughout the technology assessment and
systems development process to help ensure that Dell server and
client systems provide the appropriate balance of performance,
features, cost, quality, and reliability. Dell supports industry efforts
to standardize performance benchmarks and is an active partici-
pant in all the standards bodies discussed in this article. Figures 1
and 2 list key benchmarks that Dell uses to evaluate the performance
of server and client systems.
When used appropriately, benchmarks can provide valuable
information that can help administrators compare and evaluate
computer systems. In addition to benchmarks, many factors should
weigh heavily in the evaluation process, including features, support,
and price, as well as the ability to service, upgrade, and manage
the system under consideration.
Sharon Hanson (sharon_ [email protected]) is a technical writer in the office of the Dell
CTO. She has written and produced Dell white papers and technical articles on industry tech-
nology trends for the past eight years. Sharon has a B.B.A. from The University of Texas at Austin.
Diego Esteves ([email protected]) is a systems engineer and consultant currently
working on Dell Precision™ workstation SPEC performance and independent software vendor
(ISV) application certifications. Diego has a B.S.B.A. from Xavier University in Cincinnati,
Ohio. He currently represents Dell on the SPEC CPU subcommittee, the body responsible for
the industry-standard SPEC CPU2000 benchmarks.
Clint Espinoza ([email protected]) is a storage performance engineer specializing
in RAID adapter performance. Clint has a B.A. from Trinity University in San Antonio, Texas.
SYSTEM ARCHITECTURE
POWER SOLUTIONS June 200490
Workload type Benchmark
Database • Online transaction processing (OLTP): TPC-C • Decision support: TPC-H and TPC-R • Java™: SPECjbb�
Messaging • Microsoft Exchange: MAPI (Message Application Programming Interface) Messaging Benchmark 2 (MMB2) and MMB3 • Lotus Notes�: Notesbench� • Simple Mail Transport Protocol (SMTP)/Post Office Protocol 3 (POP3): SPECmail� 2001
Web services • Hypertext Transfer Protocol (HTTP): SPECweb�99 • HTTP over SSL (HTTPS): SPECweb99_SSL
File and print • Ziff Davis NetBench� services • SPECsfs�
Storage • SPC Benchmark 1™ (SPC-1) • Iometer
CPU and • SPEC CPU2000high-performance • Linpackcomputing • NASTRAN� • STREAM • Hierarchical INTegration (HINT)
Microbenchmarks* • LMbench • Netperf
*A microbenchmark measures one specific feature of a system isolated from other features.
Figure 1. Typical server benchmarks
Workload type Benchmark
Business productivity • SYSmark 2004 • Content Creation Winstone 2004 • Business Winstone 2004
Mainstream 3-D performance • Futuremark� 3DMark� CPU, memory subsystem, • SPEC CPU2000and compiler • Linpack Gaming • Quake III • Epic Games� Unreal Tournament 2003 • Ubisoft™ Splinter Cell� Portable computer • Business Winstone 2002 BatteryMark�battery life • BAPCo MobileMark� 2002
3-D graphics • SPECviewperf� 7.1
Mechanical computer-aided • SPECapc for Pro/ENGINEER™
design (MCAD) • NASTRAN
2-D graphics • Photoshop • Autodesk� AutoCAD�
Figure 2. Typical client system benchmarks
FOR MORE INFORMATION
SPEC: http://www.spec.org
BAPCo: http://www.bapco.com
TPC: http://www.tpc.org
Storage Performance Council: http://www.storageperformance.org
088-90 Hanson.qxd 6/16/04 3:01 PM Page 90