Performance Evaluations An overview and some lessons learned “Knowing is not enough; we must apply. Willing is not enough; we must do.” Johann Wolfgang

Download Performance Evaluations An overview and some lessons learned “Knowing is not enough; we must apply. Willing is not enough; we must do.” Johann Wolfgang

Post on 18-Jan-2016

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Performance EvaluationsAn overview and some lessons learnedKnowing is not enough; we must apply. Willing is not enough; we must do. Johann Wolfgang von Goethe

    Performance Evaluations - 2007 (U. Roehm)

  • Performance EvaluationResearch methodology: quantitative evaluationof an existing software or hardware artefactSUT - system under testusing a set of experimentseach consisting of a series of performance measurementsto collect realistic performance data

    Performance Evaluations - 2007 (U. Roehm)

  • Performance EvaluationsPrepareWhat do you want to measure?PlanHow can you measure and what is needed?ImplementPitfalls for the correct implementation of benchmarks.EvaluateHow to conduct performance experiments.Analyse and VisualiseWhat does the outcome mean?

    Performance Evaluations - 2007 (U. Roehm)

  • Step 1: PreparationsWhat do you want to show?Understanding of behaviour of existing system?Or proof an approachs superiority?Which performance metrics are you interested in?Mean response timesThroughputScalabilityWhat are the variables?Parameters of own system / algorithmnumber of concurrent users, number of nodes,

    Performance Evaluations - 2007 (U. Roehm)

  • Performance MetricsResponse Time (rt)Time duration the SUT takes to answer a requestNote: For complex tasks, response time runtime Mean Response Time (mrt)Mean of the response times of a set of requestsOnly successful requests count!Throughput (thp)Number of successful requests per time unit

    Performance Evaluations - 2007 (U. Roehm)

  • Runtime versus Response TimeModel: client-server communication with server queueing incoming requests (e.g. web servers or database servers)client sends a request arunt(a) - runtime to complete request a runt(a) = treceiveN(last_result) - tsendrt(a) - response time for action a (until first result comes back to client!) rt(a) = treceive1(first_result) - tsendwt(a) - waiting time of action a in server queueet(a) - execution time of action a (after server took request from queue)frt(a) - first result time of action a (Note: frt(a)
  • More Performance MetricsScalability / Speedup:thpn / thp1

    Fairness:

    Resource Consumptionmemory usageCPU loadenergy consumption Note: The later are typically server-side statistics!

    Performance Evaluations - 2007 (U. Roehm)

  • Step 2: PlaningWhich experiments to show the intended results?What do you need to run those experiment?HardwareSoftwareData (!)Prepare an evaluation scheduleEvaluations always take longer than you expect!Expect to change some goals / your approach based on the outcome of initial experimentsSome initial runs might be helpful to explore the space

    Performance Evaluations - 2007 (U. Roehm)

  • Typical Client-Server Evaluation SetupClientClientServerServerTest NetworkServer - Server NetworkClient Emulator(s)Test NetworkSystem Under Test (SUT)In general, the SUT can be arbitrary complex;e.g. clustered servers or multi-tier architecturesThe client emulator(s) should run on a separate machine than the server(s).

    Performance Evaluations - 2007 (U. Roehm)

  • Workload SpecificationsMultiprogramming Level (MPL) How many concurrent users / clients?Heterogeneous workload?Is every client doing the same or are there variations?Typically: Well-defined set of transactions / request kinds with a defined distributionDo you emulate programs or users?If just interested in peak performance, then as many requests as possibleSometimes more complex user model neededE.g. TPC-C users with think times and sleep times

    Performance Evaluations - 2007 (U. Roehm)

  • Experimental DataProblem: How do we get reasonable test data?Approach 1: TracingIf you have an existing system available, trace typical usages and use such traces to drive your experimentsApproach 2: Standard BenchmarksUse the data generator of a standard benchmarkIn some areas, there are explicit data corpi to evaluateApproach 3: Make something up yourselfAlways the least preferable way!!!Justify why think that your data setup is representative e.g. by using the pattern of a standard benchmark

    Performance Evaluations - 2007 (U. Roehm)

  • Standard BenchmarksThere are many standard benchmarks availablevery helpful to make results more comparablemost come with synthetic data generatorsmake your results more publishable (reviewers will have more trust in your experimental setup)Disadvantages:Standard benchmarks can be very complexSome specifications are not free, but cost moneyExamples :TPC-C, TPC-H, TPC-R, TPC-WECPerf, SPECjAppServer, IBMs Trade2, etc.

    Performance Evaluations - 2007 (U. Roehm)

  • Example: TPC BenchmarksTPC - Transaction Processing Council (tpc.org)Non-profit corporation of commercial software vendorsDefined a set of database and e-business performance benchmarksTPC-CMeasures the performance of OLTP systems (bank scenario)V1.0 became official in 1992; current version v5.8TPC-H and TPC-R (former TPC-D)Performance of OLAP systems (warehouse scenario); In 1999, TPC-D replaced by TPC-H (ad-hoc queries) + TPC-R (reporting)TPC-W and TPC-Apptransactional web benchmark simulating interactive e-business websiteTPC-W obsolete since April 2005; replaced(?) by TPC-AppTPC-ENew OLTP benchmark that simulates the workload of a brokerage firm

    Performance Evaluations - 2007 (U. Roehm)

  • Step 3: ImplementationGoal: Evaluation program(s) (client emulators) that measure what you have planned.Typical elements to take care ofAccurate timingRandom number generationFast loggingNo hidden serialisation, e.g. via global singletonsNo screen output during measurement intervalAvoid measuring the test harness rather than SUT

    Performance Evaluations - 2007 (U. Roehm)

  • Time MeasurementsEvery programming language offers some timing functionsBut be aware that there is a timer resolutionE.g. Javas System.CurrentTimeInMillis() suggests by its name that it measures time in milliseconds The question is, how many milliseconds between updatesThere is no point in trying to measure something taking microseconds with timers with milliseconds resolution!

    Performance Evaluations - 2007 (U. Roehm)

  • Example: Java TimingStandard (all JDKs): System.currentTimeInMillis()Be aware: Has different resolutions depending on OS and platforms!

    Since JDK 1.4.2 (portable, undocumented!!): sun.misc.PerfExample: // may throw SecurityException sun.misc.Perf perf = sun.misc.Perf.getPerf(); long ticksPerSecond = perf.highResFrequency(); long currTick = perf.highResCounter(); long milliSeconds = (currTick * 1000) / ticksPerSecond;In JDK 1.5: java.lang.System.nanoTime()Always uses the best precision available on a system, but no guaranteed resolutionSome third party solutions using, e.g., Windows HighPerformanceTimers through Java JNI (hence limited portability, best for Windows)

    Linux (2.2, x86) 1 msMac OS X 1 msWindows 200010 msWindows 9860 msSolaris (2.7/i386, 2.8/sun4u) 1 ms

    Performance Evaluations - 2007 (U. Roehm)

  • Example: Wrong Timer Usage

    Performance Evaluations - 2007 (U. Roehm)

  • Xmple:Same Experiment - High-res Timer

    Performance Evaluations - 2007 (U. Roehm)

    ClientRT

    Client response times for GetShares()Beans:500

    cache-hitTimesTenMySQL (local)MySQL (remote)Oracle (remote)

    500 beans17862375278712228

    1 bean avg2.3243.5724.755.57424.456

    (this value is copied from 100 beans average of data2.xls)

    factor11.53700516352.04388984512.398450946610.5232358003

    2.39845094664.7379

  • Random Number GeneratorsCommon Mistakes:A multi-threaded client, but all threads use the same global objectThis effectively serialises your threads!Large set of random numbers is generated within codeWe do not want to measure how fast Java can generate random numbersUse an array with pre-generated random numbers (space vs. time)Seeds are the sameYou make your program deterministic

    Performance Evaluations - 2007 (U. Roehm)

  • LoggingGoal: Fast logging of results during experiments without interfering with the measurementsApproach:Log to a file, not the screenscreen output / scrolling is VERY slowVery common mistakeUse standard log libraries with low overheade.g. Javas log4j (http://sourceforge.net/projects/log4j/) or Windows performance counter API If your client reads data from a hard drive, write your log data to a different diskLog asynchronously, be fast, be thread-safe (careful!)

    Performance Evaluations - 2007 (U. Roehm)

  • Windows Performance MonitorWindows includes a performance monitor applicationonline GUIcan capture to filesupports remote monitoring!Based on Windows API Performance Counters\\ComputerName\Object(Instance)\Countersupported by basically every server application; huge # of statisticscan be used in own programsCf. http://technet2.microsoft.com/WindowsServer/en/library/3fb01419-b1ab-4f52-a9f8-09d5ebeb9ef21033.mspx?mfr=trueFrom Java: http://www.javaworld.com/javaworld/jw-11-2004/jw-1108-windowspm.html

    Performance Evaluations - 2007 (U. Roehm)

  • Step 4: EvaluationObjective: To collect accurate performance data in a set of experiments.Three major issues:Controlled evaluation environmentDocumentationArchiving of raw data

    Performance Evaluations - 2007 (U. Roehm)

  • Evaluation EnvironmentWe want a stable evaluation environment that allows us to measure the system under test under repeatable settings without interferenceClean computer initialisation (client(s) and server(s))No concurrent programs, minimum set of sys servicesDisable anti-virus software!Decide: open or closed system?Decide: cold or warm caches?Make sure you do not measure any side-effects!Many prefer t