amd uprof user guide · amd uprof user guide release v1.2 ... 3.1.6 importing profile databases...

66
AMD uProf User Guide Release v1.2 AMD Developer Tools Team May 01, 2018

Upload: hoangkiet

Post on 02-Jul-2018

304 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User GuideRelease v1.2

AMD Developer Tools Team

May 01, 2018

Page 2: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CONTENTS

1 Introduction 11.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 CPU Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Power Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 On Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 On Linux (using .tar.gz binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.3 On RHEL (using .rpm installer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2.4 On Ubuntu (using .deb installer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Installing the Power Profiler Linux Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Wider Linux Power Profiler Support (DKMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Profiler Concepts 42.1 CPU Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 CPU Profile Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 IBS Derived Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1.4 L3 and DF Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Power Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.2 Supported Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.3 Power Profile Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Graphical User Interface 153.1 CPU Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.1 Starting a Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.2 Profile Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.3 Counter Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.4 Source-level Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.5 Call Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.6 Importing Profile Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.1.7 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.8 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Power Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2.1 Live Power Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.2.2 Application Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.3 Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Command Line Interface 414.1 CPU Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

i

Page 3: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

4.1.1 CLI Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.1.3 Java Application Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.2 Power Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.1 CLI Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2.3 Sample Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 SDKs 515.1 AMDPowerProfileAPI Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.1.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.1.3 Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.1.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.1.5 How to use API Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 AMDProfileControl APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605.2.1 Required File Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.2 The control APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.3 Calling the APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.2.4 Compiling the target application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2.5 Profiling with target application and control APIs . . . . . . . . . . . . . . . . . . . . . 62

6 Limitations 636.1 CPU Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Power Profiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

ii

Page 4: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

ONE

INTRODUCTION

AMD uProf is a comprehensive tool suite that allows developers to harness the benefits of AMD CPUs and APUs.This tool suite is to help the developers to optimize their application running on AMD processors. It offersfollowing functionalities:

• CPU Profiler

• Power Profiler

AMD uProf is available as Graphical user interface and Command-Line user interface tools for Windows® oper-ating system and Linux® operating system.

This document is intended for software developers. The chapter on CPU Profiling assumes an understanding ofthe concepts of threads and processes, as well as familiarity with CPU architecture.

1.1 System Requirements

1.1.1 Operating Systems

• Microsoft Windows 7 64-bit

• Microsoft Windows 10 64-bit

• Microsoft Windows Server 2016 64-bit

• Ubuntu 16.04 64-bit & later

• RHEL 7.0 64-bit & later

1.1.2 CPU Profiling

• Supported platforms - AMD CPUs or APUs, and Intel CPUs

1.1.3 Power Profiling

• Supported platforms - Kaveri, Mullins, Temash, Carrizo, Ryzen, Threadripper, EPYC

For detailed system requirements see the Release Notes.

1.2 Installation

Install AMD uProf using one of the following methods.

1

Page 5: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

1.2.1 On Windows

Run AMDuProf-x.y.z.exe installer.

1.2.2 On Linux (using .tar.gz binary)

Just extract the tar.gz binary.

$ tar -xvzf AMDuProf_Linux_x64_x.y.z.tar.gz

To install the Power Profiler Linux driver, refer the next section.

1.2.3 On RHEL (using .rpm installer)

Install using rpm or yum command.

$ sudo rpm --install amduprof-x.y-z.x86_64.rpm

OR$ sudo yum install amduprof-x.y-z.x86_64.rpm

1.2.4 On Ubuntu (using .deb installer)

Install using dpkg command.

$ sudo dpkg --install amduprof_x.y-z_amd64.deb

1.3 Installing the Power Profiler Linux Driver

On Linux systems, gcc and make software packages are prerequisites for installing Power Profiler’s Linux driver.If you don’t have these packages, they can be installed using sudo yum install gcc make (on RPMsystems) and sudo apt install build-essential (on Debian systems).

AMD uProf Debian and RPM packages perform the driver installation automatically. However, if you’ve down-loaded the AMD uProf tar archive, you have to install the Power Profiler’s Linux driver manually. This includes asimple step of running AMDPowerProfilerDriver.sh script with root credentials.

Example:

$ tar -xf AMDuProf_Linux_x64_x.y.z.tar.gz

$ cd AMDuProf_Linux_x64_x.y.z/bin

$ sudo ./AMDPowerProfilerDriver.sh install

Installer will create a source tree for power profiler driver under /usr/src/AMDPowerProfiler-<version number>. All the source files required for module compilation islocated in this directory are under MIT license.

To uninstall the driver run the following command:

$ cd <AMDµProf-install-dir>/bin

$ sudo ./AMDPowerProfilerDriver.sh uninstall

1.4 Wider Linux Power Profiler Support (DKMS)

On Linux machine Power Profiler driver can also be installed with Dynamic Kernel Module Support (DKMS)framework support. DKMS framework automatically upgrades the power profiler driver module whenever thereis a change in the existing kernel. This saves user from manually upgrading the power profiler driver module.

2 Chapter 1. Introduction

Page 6: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

The DKMS package needs to be installed on target machines before running the installation steps mentioned inthe above section. AMDPowerProfilerDriver.sh installer script will automatically takes care of DKMS relatedconfiguration if DKMS package is installed in the target machine.

Example (for Ubuntu system):

$ sudo apt-get install dkms

$ tar -xf AMDuProf_Linux_x64_x.y.z.tar.gz

$ cd AMDuProf_Linux_x64_x.y.z/bin

$ sudo ./AMDPowerProfilerDriver.sh install

If the user upgrades the kernel version frequently it is recommended to use DKMS for installation.

1.5 Support

Visit the following sites for bug reports, support and feature requests.

AMD uProf Release pageAMD Developer Community

1.5. Support 3

Page 7: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

TWO

PROFILER CONCEPTS

2.1 CPU Profiler

2.1.1 Introduction

AMD uProf’s CPU Profiler is used for performance analysis and tuning of applications running on CPU. The CPUProfiler lets you identify main performance bottlenecks of the profiled application or the entire system.

Features

• Supported Profile Types

– Time-Based Profile (TBP) - To identify the “hot-spots” in the profiled applications. (Hot-spotsare code areas that use significantly more time compared to other areas in the code.)

– Event-Based Profile (EBP) - To identify CPU and memory related performance issues in theprofiled applications.

– Instruction-Based Sampling (IBS) - To record and count the instructions that trigger HW events,as well as calculate various metrics, such as data cache latency.

• Supported Profile Modes

– Per-Process - Profiles the launched process and it’s children.

– System-Wide - Profiles the entire system.

– Attach to an already running process.

– User Mode and Kernel Mode Profiling.

• Supported Languages

– C, C++ and Fortran applications

– Java applications

– CLR/.NET applications (only on Windows)

• Call Stack Sampling (CSS) for all profile types.

• Sample Aggregation

– Process/Thread/Module/Function/Source and Instruction level aggregation.

– Instruction Mix (IMIX)

• Miscellaneous Features

– Profiling C++ inline functions.

– HW events counter multiplexing.

• Debugging Info Formats Supported

4

Page 8: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

– Unmanaged Executables compiled by MS Visual Studio or GCC (under Linux or other Unix-likesystems (like Cygwin and MinGW)).

– Debug Information Formats - PDB, COFF, DWARF, STABS.

– Managed Executables - JVMTI for Java applications and COR for .NET applications.

• Supporting Virtualized Environments

– Time-Based Profile (TBP) and Event-Based Profile (EBP) are supported in guest OS running onVMware Workstation 11.0 or later.

– Time-Based Profile (TBP) is supported on Microsoft Hyper-V.

– Time-Based Profile (TBP) is supported on Xen Project hypervisor.

– Time-Based Profile (TBP) is supported on Linux KVM hypervisor.

• APIs to control CPU Profiling (i.e. to pause and resume profiling) from the target application to limit theprofiling scope.

2.1.2 CPU Profile Key Concepts

This section explains various key concepts related to CPU Profiling. The AMD uProf offers CPU-Profiling throughits graphical user interface tool AMDuProf and command-line interface tool AMDuProfCLI.

Profiling Concepts

Statistical Sampling The CPU Profiler follows a statistical sampling-based approach to gather the profile dataperiodically. It uses a variety of SW and HW resources available in AMD x86 based processor families.CPU Profiler uses the SW timer, HW Performance Monitor Counters (PMC), and HW IBS feature. Themost time-consuming parts of a program have a larger number of samples; this is because they have ahigher probability of being executed while samples are being taken by the CPU Profiler.

Sampling Interval The time between the collection of every two samples is the Sampling Interval. For example,in TBP, if the time interval is 1 millisecond, then roughly 1,000 TBP samples are being collected everysecond for each processor core.

The meaning of sampling interval depends on the profile type. For TBP sampling interval is in milliseconds,for PMC events the sampling interval is the number of occurences of that event and for IBS it is the numberof processed instructions after which it will be tagged.

Hardware Sources

Performance Monitor Counters (PMC) AMD’s x86-based processors have Performance Monitor Counters(PMC) that let them monitor various micro-architectural events in a CPU core. The PMC counters areused in two modes:

• In counting mode, these counters are used to count the specific events that occur in a CPU core.

• In sampling mode, these counters are programmed to count a specific number of events. Once thecount is reached the appropriate number of times (called sampling interval), an interrupt is triggered.During the interrupt handling, the CPU Profiler collects profile data.

The number of hardware performance event counters available in each processor is implementation-dependent (see the BIOS and Kernel Developer’s Guide [BKDG] of the specific processor for the exactnumber of hardware performance counters). The operating system and/or BIOS can reserve one or morecounters for internal use. Thus, the actual number of available hardware counters may be less than thenumber of hardware counters. The CPU Profiler uses all available counters for profiling.

Instruction-Based Sampling (IBS) IBS is a code profiling mechanism that enables the processor to select arandom instruction fetch or micro-Op after a programmed time interval has expired and record specificperformance information about the operation.

2.1. CPU Profiler 5

Page 9: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

An interrupt is generated when the operation is complete as specified by IBS Control MSR. An interrupthandler can then read the performance information that was logged for the operation.

The IBS mechanism is split into two parts

• Instruction Fetch performance

• Instruction Execution Performance

Instruction fetch sampling provides information about instruction TLB and instruction cache behavior forfetched instructions.

Instruction execution sampling provides information about micro-Op execution behavior.

The data collected for instruction fetch performance is independent from the data collectedfor instruction execution performance. Support for the IBS feature is indicated by theCore::X86::Cpuid::FeatureExtIdEcx[IBS].

Instruction execution performance is profiled by tagging one micro-Op associated with an instruction. In-structions that decode to more than one micro-Op return different performance data depending upon whichmicro-Op associated with the instruction is tagged. These micro-Ops are associated with the RIP of the nextinstruction.

In this mode, the CPU Profiler uses the IBS HW supported by the AMD x86-based processor to observe theeffect of instructions on the processor and on the memory subsystem. In IBS, HW events are linked with theinstruction that caused them. Also, HW events are being used by the CPU Profiler to derive various metrics,such as data cache latency.

IBS is supported starting from the AMD processor family 10h.

L3 Cache Performance Monitor Counters (L3) A Core Complex (CCX) is a group of CPU cores which shareL3 cache resources. All the cores in a CCX share a single L3 cache. In family 17, 8MB of L3 cache sharedacross all cores within the CCX. Family 17 processors support L3PMCs to monitor the performance of L3resources. Refer family 17 PPR for more details.

Data Fabric Performance Monitor Counters (DF) Family 17 processors support DFPMCs to monitor the per-formance of Data Fabric resources. Refer family 17 PPR for more details.

Profile Types

Time-Based Profile (TBP) In this profile mode, the profile data is periodically collected based on the specifiedtimer interval. It is used to identify the hot-spots of the profiled applications.

Event-Based Profile (EBP) In this mode, the CPU Profiler uses the PMCs to monitor the various micro-architectural events supported by the AMD x86-based processor. It helps to identify the CPU and memoryrelated performance issues in profiled applications. The CPU Profiler provides a number of predefined EBPprofile configurations. To analyze a particular aspect of the profiled application (or system), a specific setof relevant events are grouped and monitored together. The CPU Profiler provides a list of predefined eventconfigurations, such as Assess Performance and Investigate Branching, etc. You can select any of thesepre-define configurations to profile and analyze the runtime characteristics of your application. You alsocan create their custom configurations of events to profile.

In this profile mode, a delay called skid occurs between the time at which the sampling interrupt occursand the time at which the sampled instruction address is collected. This skid distributes the samples in theneighborhood near the actual instruction that triggered a sampling interrupt. This produces an inaccuratedistribution of samples and events are often attributed to the wrong instructions.

Instruction-Based Sampling (IBS) In this mode, the CPU Profiler uses the IBS HW supported by the AMDx86-based processor to observe the effect of instructions on the processor and on the memory subsystem.In IBS, HW events are linked with the instruction that caused them. Also, HW events are being used by theCPU Profiler to derive various metrics, such as data cache latency.

Event-Counter Multiplexing If the number of monitored PMC events is less than, or equal to, the number ofavailable performance counters, then each event can be assigned to a counter, and each event can be moni-tored 100% of the time. In a single-profile measurement, if the number of monitored events is larger than the

6 Chapter 2. Profiler Concepts

Page 10: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

number of available counters, the CPU Profiler time-shares the available HW PMC counters. (This is calledevent counter multiplexing.) It helps monitor more events and decreases the actual number of samples foreach event, thus reducing data accuracy. The CPU Profiler auto-scales the sample counts to compensate forthis event counter multiplexing. For example, if an event is monitored 50% of the time, the CPU Profilerscales the number of event samples by factor of 2.

2.1.3 IBS Derived Events

uProf translates the IBS information produced by the hardware into derived event sample counts that resembleEBP sample counts. All IBS-derived events have “IBS” in the event name and abbreviation. Although IBS-derived events and sample counts look similar to EBP events and sample counts, the source and sampling basisfor the IBS event information are different.

Arithmetic should never be performed between IBS derived event sample counts and EBP event sample counts.It is not meaningful to directly compare the number of samples taken for events that represent the same hardwarecondition. For example, fewer IBS DC miss samples is not necessarily better than a larger quantity of EBP DCmiss samples.

IBS Fetch derived events list.

IBS Fetch Event DescriptionAll IBS fetchsamples

The number of all IBS fetch samples. This derived event countsthe number of all IBS fetch samples that were collected includ-ing IBS-killed fetch samples

IBS fetch killed The number of IBS sampled fetches that were killed fetches. Afetch operation is killed if the fetch did not reach ITLB or IC ac-cess. The number of killed fetch samples is not generally usefulfor analysis and are filtered out in other derived IBS fetch events(except Event Select 0xF000 which counts all IBS fetch samplesincluding IBS killed fetch samples.)

IBS fetch at-tempted

The number of IBS sampled fetches that were not killed fetch at-tempts. This derived event measures the number of useful fetchattempts and does not include the number of IBS killed fetchsamples. This event should be used to compute ratios such asthe ratio of IBS fetch IC misses to attempted fetches. The num-ber of attempted fetches should equal the sum of the number ofcompleted fetches and the number of aborted fetches.

IBS fetch com-pleted

The number of IBS sampled fetches that completed. A fetch iscompleted if the attempted fetch delivers instruction data to theinstruction decoder. Although the instruction data was delivered,it may still not be used (e.g., the instruction data may have beenon the “wrong path” of an incorrectly predicted branch.)

IBS fetch aborted The number of IBS sampled fetches that aborted. An attemptedfetch is aborted if it did not complete and deliver instruction datato the decoder. An attempted fetch may abort at any point in theprocess of fetching instruction data. An abort may be due to abranch redirection as the result of a mispredicted branch. Thenumber of IBS aborted fetch samples is a lower bound on theamount of unsuccessful, speculative fetch activity. It is a lowerbound since the instruction data delivered by completed fetchesmay not be used.

IBS ITLB hit The number of IBS attempted fetch samples where the fetchoperation initially hit in the L1 ITLB (Instruction TranslationLookaside Buffer).

IBS L1 ITLBmisses (and L2ITLB hits)

The number of IBS attempted fetch samples where the fetch op-eration initially missed in the L1 ITLB and hit in the L2 ITLB.

2.1. CPU Profiler 7

Page 11: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

IBS L1 L2 ITLBmiss

The number of IBS attempted fetch samples where the fetch op-eration initially missed in both the L1 ITLB and the L2 ITLB.

IBS instructioncache misses

The number of IBS attempted fetch samples where the fetch op-eration initially missed in the IC (instruction cache).

IBS instructioncache hit

The number of IBS attempted fetch samples where the fetch op-eration initially hit in the IC.

IBS 4K pagetranslation

The number of IBS attempted fetch samples where the fetch op-eration produced a valid physical address (i.e., address transla-tion completed successfully) and used a 4-KByte page entry inthe L1 ITLB.

IBS 2M pagetranslation

The number of IBS attempted fetch samples where the fetch op-eration produced a valid physical address (i.e., address transla-tion completed successfully) and used a 2-MByte page entry inthe L1 ITLB.

IBS fetch latency The total latency of all IBS attempted fetch samples. Divide thetotal IBS fetch latency by the number of IBS attempted fetchsamples to obtain the average latency of the attempted fetchesthat were sampled.

IBS fetch L2cache miss

The instruction fetch missed in the L2 Cache.

IBS ITLB refilllatency

The number of cycles when the fetch engine is stalled for anITLB reload for the sampled fetch. If there is no reload, thelatency will be 0.

IBS Op derived events list.

IBS Op Event DescriptionAll IBS op samples The number of all IBS op samples that were collected. These

op samples may be branch ops, resync ops, ops that performload/store operations, or undifferentiated ops (e.g., those ops thatperform arithmetic operations, logical operations, etc.). IBS col-lects data for retired ops. No data is collected for ops that areaborted due to pipeline flushes, etc. Thus, all sampled ops arearchitecturally significant and contribute to the successful for-ward progress of executing programs.

IBS tag-to-retire cycles The total number of tag-to-retire cycles across all IBS op sam-ples. The tag-to-retire time of an op is the number of cycles fromwhen the op was tagged (selected for sampling) to when the opretired.

IBS completion-to-retirecycles

The total number of completion-to-retire cycles across all IBS opsamples. The completion-to-retire time of an op is the numberof cycles from when the op completed to when the op retired.

IBS branch op The number of IBS retired branch op samples. A branch opera-tion is a change in program control flow and includes uncondi-tional and conditional branches, subroutine calls and subroutinereturns. Branch ops are used to implement AMD64 branch se-mantics.

IBS mispredicted branchop

The number of IBS samples for retired branch operations thatwere mispredicted. This event should be used to compute theratio of mispredicted branch operations to all branch operations.

IBS taken branch op The number of IBS samples for retired branch operations thatwere taken branches.

IBS mispredicted takenbranch op

The number of IBS samples for retired branch operations thatwere mispredicted taken branches.

Continued on next page

8 Chapter 2. Profiler Concepts

Page 12: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Table 2.2 – continued from previous pageIBS Op Event DescriptionIBS return op The number of IBS retired branch op samples where the opera-

tion was a subroutine return. These samples are a subset of allIBS retired branch op samples.

IBS mispredicted returnop

The number of IBS retired branch op samples where the opera-tion was a mispredicted subroutine return. This event should beused to compute the ratio of mispredicted returns to all subrou-tine returns.

IBS resync op The number of IBS resync op samples. A resync op is onlyfound in certain microcoded AMD64 instructions and causes acomplete pipeline flush.

IBS all load store ops The number of IBS op samples for ops that perform either a loadand/or store operation. An AMD64 instruction may be translatedinto one (“single fastpath”), two (“double fastpath”), or several(“vector path”) ops. Each op may perform a load operation, astore operation or both a load and store operation (each to thesame address). Some op samples attributed to an AMD64 in-struction may perform a load/store operation while other op sam-ples attributed to the same instruction may not. Further, somebranch instructions perform load/store operations. Thus, a mixof op sample types may be attributed to a single AMD64 instruc-tion depending upon the ops that are issued from the AMD64instruction and the op types.

IBS load ops The number of IBS op samples for ops that perform a load oper-ation.

IBS store ops The number of IBS op samples for ops that perform a store op-eration.

IBS L1 DTLB hit The number of IBS op samples where either a load or store op-eration initially hit in the L1 DTLB (data translation lookasidebuffer).

IBS L1 DTLB misses L2hits

The number of IBS op samples where either a load or store op-eration initially missed in the L1 DTLB and hit in the L2 DTLB.

IBS L1 and L2 DTLBmisses

The number of IBS op samples where either a load or store op-eration initially missed in both the L1 DTLB and the L2 DTLB.

IBS data cache misses The number of IBS op samples where either a load or store op-eration initially missed in the data cache (DC).

IBS data cache hits The number of IBS op samples where either a load or store op-eration initially hit in the data cache (DC).

IBS misaligned data ac-cess

The number of IBS op samples where either a load or store oper-ation caused a misaligned access (i.e., the load or store operationcrossed a 128-bit boundary).

IBS bank conflict onload op

The number of IBS op samples where either a load or store op-eration caused a bank conflict with a load operation.

IBS bank conflict onstore op

The number of IBS op samples where either a load or store op-eration caused a bank conflict with a store operation.

IBS store-to-load for-warded

The number of IBS op samples where data for a load operationwas forwarded from a store operation.

IBS store-to-load can-celled

The number of IBS op samples where data forwarding to a loadoperation from a store was cancelled.

IBS UC memory access The number of IBS op samples where a load or store operationaccessed uncacheable (UC) memory.

IBS WC memory access The number of IBS op samples where a load or store operationaccessed write combining (WC) memory.

IBS locked operation The number of IBS op samples where a load or store operationwas a locked operation.

Continued on next page

2.1. CPU Profiler 9

Page 13: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Table 2.2 – continued from previous pageIBS Op Event DescriptionIBS MAB hit The number of IBS op samples where a load or store operation

hit an already allocated entry in the Miss Address Buffer (MAB).IBS L1 DTLB 4K page The number of IBS op samples where a load or store operation

produced a valid linear (virtual) address and a 4-KByte page en-try in the L1 DTLB was used for address translation.

IBS L1 DTLB 2M page The number of IBS op samples where a load or store operationproduced a valid linear (virtual) address and a 2-MByte pageentry in the L1 DTLB was used for address translation.

IBS L1 DTLB 1G page The number of IBS op samples where a load or store operationproduced a valid linear (virtual) address and a 1-GByte page en-try in the L1 DTLB was used for address translation.

IBS L2 DTLB 4K page The number of IBS op samples where a load or store operationproduced a valid linear (virtual) address, hit the L2 DTLB, andused a 4 KByte page entry for address translation.

IBS L2 DTLB 2M page The number of IBS op samples where a load or store operationproduced a valid linear (virtual) address, hit the L2 DTLB, andused a 2-MByte page entry for address translation.

IBS L2 DTLB 1G page The number of IBS op samples where a load or store operationproduced a valid linear (virtual) address, hit the L2 DTLB, andused a 1-GByte page entry for address translation.

IBS data cache miss loadlatency

The total DC miss load latency (in processor cycles) across allIBS op samples that performed a load operation and missed inthe data cache. The miss latency is the number of clock cyclesfrom when the data cache miss was detected to when data wasdelivered to the core. Divide the total DC miss load latency bythe number of data cache misses to obtain the average DC missload latency.

IBS load resync Load Resync.IBS Northbridge local The number of IBS op samples where a load operation was ser-

viced from the local processor. Northbridge IBS data is onlyvalid for load operations that miss in both the L1 data cache andthe L2 data cache. If a load operation crosses a cache line bound-ary, then the IBS data reflects the access to the lower cache line.

IBS Northbridge remote The number of IBS op samples where a load operation was ser-viced from a remote processor.

IBS Northbridge localL3

The number of IBS op samples where a load operation was ser-viced by the local L3 cache.

IBS Northbridge localcore L1 or L2 cache

The number of IBS op samples where a load operation was ser-viced by a cache (L1 data cache or L2 cache) belonging to a localcore which is a sibling of the core making the memory request.

IBS Northbridge localcore L1, L2, L3 cache

The number of IBS op samples where a load operation was ser-viced by a remote L1 data cache, L2 cache or L3 cache aftertraversing one or more coherent HyperTransport links.

IBS Northbridge localDRAM

The number of IBS op samples where a load operation was ser-viced by local system memory (local DRAM via the memorycontroller).

IBS Northbridge remoteDRAM

The number of IBS op samples where a load operation was ser-viced by remote system memory (after traversing one or morecoherent HyperTransport links and through a remote memorycontroller).

IBS Northbridge localAPIC MMIO ConfigPCI

The number of IBS op samples where a load operation was ser-viced from local MMIO, configuration or PCI space, or from thelocal APIC.

Continued on next page

10 Chapter 2. Profiler Concepts

Page 14: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Table 2.2 – continued from previous pageIBS Op Event DescriptionIBS Northbridge remoteAPIC MMIO ConfigPCI

The number of IBS op samples where a load operation was ser-viced from remote MMIO, configuration or PCI space.

IBS Northbridge cachemodified state

The number of IBS op samples where a load operation was ser-viced from local or remote cache, and the cache hit state was theModified (M) state.

IBS Northbridge cacheowned state

The number of IBS op samples where a load operation was ser-viced from local or remote cache, and the cache hit state was theOwned (O) state.

IBS Northbridge localcache latency

The total data cache miss latency (in processor cycles) for loadoperations that were serviced by the local processor.

IBS Northbridge remotecache latency

The total data cache miss latency (in processor cycles) for loadoperations that were serviced by a remote processor.

2.1.4 L3 and DF Events

L3, DF PMC event IDs are very similar to core PMC event IDs. To keep L3, DF event IDs distinguishable fromthe core PMC event IDs, the L3, DF events IDs are prefixed with unique values.

All L3 event IDs are prefixed with 0xb.All DF event IDs are prefixed with 0xc.

The L3 PMC for “L3 Cache Accesses” is 0x0001. To keep this ID unique, uProf uses 0xb001 for “L3 CacheAccesses”. This event ID transformation is only useful is you are passing the event ID in the CLI option (--eevent=0xb001,...).

Similarly, if a DF PMC is 0x000y. Then this event ID gets transformed to 0xc00y.

All this transformation is only done in the uProf front end to make sure Core PMC, L3 PMC, DF PMC events aredistinguishable from each other.

To collect L3 and DF events profile data, it should be combined with timer based profiling.

2.1.5 References

AMD Developer DocumentsAMD uProf Home page

2.2 Power Profiler

2.2.1 Introduction

AMD uProf’s Power Profiler is a tool to analyze the energy efficiency of systems based on AMD CPUs, APUsand dGPUs. It is used to collect and analyze the power, frequency and thermal characteristics data.

Features

• Reports the following data:

– Estimated average energy consumption of Socket and CPU Cores.

– Estimated average power consumption of supported APUs and dGPUs.

2.2. Power Profiler 11

Page 15: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

– Core Effective Frequency of the CPU cores.

– Average Frequency of the supported iGPUs and dGPUs.

– Thermal characteristics of the CPU compute-units, iGPUs and dGPUs.

– CPU Core P-States.

– Application Analysis

• Supported Languages for Application Analysis:

– C, C++ and Fortran applications

• Supported Processors:

– AMD CPUs: Ryzen, Threadripper, EPYC

– AMD APUs: Carrizo, Kaveri, Mullins, Temash, Stoney, Bristol

– AMD dGPUs: Graphics IP 7 GPUs, Radeon and FirePro models

2.2.2 Supported Counters

The supported Power, Frequency and Thermal counters depends on the Processor family and model. Following isthe list of the supported counters available in various platforms.

Category Name DescriptionEnergy RAPL Package energy RAPL MSR based Package energy. Note: Available only on

Ryzen, Threadripper and EPYC.Energy RAPL Core energy RAPL MSR based Core energy. Note: Available only on Ryzen,

Threadripper and EPYC.CorrelatedPower

Socket Power Correlated Average Socket Power for the sampling period, re-ported in Watts. This is an estimated consumption value basedon platform activity levels. Counters are available on socketlevel. Note: Available only on Ryzen, Threadripper and EPYC.

CorrelatedPower

VDDCR Soc Power Average Correlated VDDCR_SOC Power for the sampling pe-riod, reported in Watts. This is an estimated consumption valuebased on platform activity levels. Counters are available onsocket level. Note: Available only on Ryzen, Threadripper andEPYC.

CorrelatedPower

Correlated VDDCRCPU Power

Correlated Average VDDCR_CPU power from per-part calcu-lations for the sampling period, reported in Watts. This is anestimated consumption value based on platform activity levels.Counters are available on socket level. Note: Available only onRyzen, Threadripper and EPYC.

CorrelatedPower

Correlated VDDIO MemPower

Correlated Average VDDIO_MEM Power for the sampling pe-riod, reported in Watts. This is an estimated consumption valuebased on platform activity levels. Counters are available onsocket level. Note: Available only on Ryzen, Threadripper andEPYC.

CorrelatedPower

Correlated VDD18Power

Correlated Average VDD18 power Power for the sampling pe-riod, reported in Watts. This is an estimated consumption valuebased on platform activity levels. Counters are available onsocket level. Note: Available only on Ryzen, Threadripper andEPYC.

Power Total APU Power Average APU Power for the sampling period, reported in Watts.This is an estimated consumption value which is calculatedbased on APU activity levels.

12 Chapter 2. Profiler Concepts

Page 16: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Power CPU Compute UnitPower

Average CPU Compute Unit Power for the sampling period, re-ported in Watts. This is an estimated consumption value whichis calculated based on APU activity levels.

Power iGPU Power Average Integrated-GPU Power for the sampling period, re-ported in Watts. This is an estimated consumption value whichis calculated based on APU activity levels.

Power PCIe-Controller Average PCIe-Controller Power for the sampling period, re-ported in Watts. This is an Power estimated consumption valuewhich is calculated based on APU activity levels. This valuedoes not include the power consumed by PCIe devices connectedto the PCIe bus.

Power Memory-Controller Average DDR Memory-Controller Power for the sampling pe-riod, reported in Watts. This Power is an estimated consump-tion value which is calculated based on APU activity levels.This value does not include the power consumed by the mem-ory DIMMs.

Power Display-Controller Average Display-Controller Power for the sampling period, re-ported in Watts. This value Power refers to the APUs internaldisplay controller which may be used in notebook and embeddedconfigurations. This is an estimated consumption value which iscalculated based on APU activity levels. This value does notinclude the power consumed by the display.

Power Cumulative APU The accumulated energy consumed by the APU throughout theprofile session. Reported Power in Joules. Available only in thecommand line tool.

Power Cumulative Compute The accumulated energy consumed by the CPU Compute Unitthroughout the profile Unit Power session. Reported in Joules.Note: Available only in the Command line interface tool.

Power Cumulative iGPU Power The accumulated energy consumed by the APU’s Internal GPUthroughout the profile session. Reported in Joules. Note: Avail-able only in the Command line interface tool.

Power dGPU power Average Discrete-GPU Power for the sampling period, reportedin Watts. This is an estimated consumption value which is cal-culated based on dGPU activity levels. The dGPU family nameis prefixed with this counter name.

Frequency CPU Core/Thread Aver-age Frequency

Average CPU Core Frequency for the sampling period, reportedin MHz. This is the Core Effective Frequency (CEF). The corecan go into various P-States within the sampling period, eachwith its own frequency. The CEF is the average of the core fre-quencies over the sampling period.

Frequency iGPU Average Fre-quency

Average Integrated-GPU Frequency for the sampling period, re-ported in MHz.

Frequency dGPU Average Fre-quency

Average Discrete-GPU Frequency for the sampling period, re-ported in MHz. The dGPU family name is prefixed with thiscounter name.

Frequency CPU Core/Thread Fre-quency Histogram

Histogram of CPU Core Effective Frequency (average frequencyfor the sampling period). Note: Available only in the Commandline interface tool.

Frequency iGPU Frequency His-togram

Histogram of Internal-GPU Effective Frequency (average fre-quency for the sampling period). Note: Available only in theCommand line interface tool.

Temperature CPU Compute-UnitTemperature Measured

CPU Compute Unit Average Temperature, reported in Celsius.The reported Measured value is normalized and scaled, rela-tive to the specific processor’s maximum operating temperature.This value can be used to indicate the rise and decline of temper-ature.

2.2. Power Profiler 13

Page 17: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Temperature iGPU Measured Measured Integrated-GPU Average Temperature, reported inCelsius. The reported value Temperature is normalized andscaled, relative to the specific processor’s maximum operatingtemperature. This value can be used to indicate rise and declineof temperature.

Temperature dGPU Measured Tem-perature

Measured Discrete-GPU Average Temperature, reported in Cel-sius. The reported value is normalized and scaled, relative tothe specific processor’s maximum operating temperature. Thisvalue can be used to indicate rise and decline of temperature.The dGPU family name is prefixed with this counter name.

P-State CPU P-State CPU Core P-State at the time when sampling was performed.

2.2.3 Power Profile Key Concepts

This section explains various key concepts related to Power Profiling. The AMD uProf offers live power profileand application power analysis through the Graphical user interface tool AMDuProf and Command line userinterface tool AMDuProfCLI.

Profiling Concepts

Live Power Profile: The Power Profiler follows a statistical sampling-based approach to gather the profile dataperiodically. The profile data is collected from various hardware counters, PMC (Performance MonitorCounters), SMU (System Management Unit) and PCIe (Peripheral Component Interconnect Express) ad-dress space registers. These profile data is plotted in a running timechart graph. This helps users to monitorthe current state of the system hardware components. For example, on a Ryzen system, Socket Power, PPTLimit, Energy of each core can be selected together for observation. At every sampling period, profiler willcollect the profile data and plot them in a live timechart graph.

Time Chart: Similar to Live Power Profile, CLI provides a way to collect and report the statistically sampledprofile data into a text or CSV file format. This options has much lower overhead than the Live PowerProfile.

Application Analysis: This analysis can be used to identify the most power intensive applica-tion/thread/module/function/source. The profiler attributes the energy consumption to the variousapplications running in the system, using IPC. This analysis can be used to optimize the application toreduce its energy consumption.Note: This feature is available only on Windows OS.

Hardware Resources

Running Average Power Limit (RAPL) MSRs RAPL MSRs provide socket and physical corelevel energy consumption data. These special resisters are available on AMD family 17 pro-cessors - Ryzen, Threadripper and EPYC.

System Management Unit (SMU) SMU within the AMD Processor provides interface to retrievethe various power, frequency and thermal characteristics like average CPU Core power, Packagepower, iGPU power, dGPU power, temperature, frequency etc.

Model Specific Registers A number of Model Specific Registers are exposed by AMD processor.Details can be found in BKDG/PPR document available in the AMD Developer Web page. ThePower Profiler make use of number of MSRs to calculate Core Effective Frequency, P-State andother profile data.

2.2.4 References

AMD Developer DocumentsAMD uProf Home page

14 Chapter 2. Profiler Concepts

Page 18: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

THREE

GRAPHICAL USER INTERFACE

3.1 CPU Profiler

AMDuProf GUI provides a visual way to profile and analyze performance data.

On opening the GUI, you will be greeted with the welcome screen. This screen contains various quick links andhyperlinks to jump start workflows. For the first time it will look like this:

The various sections are:

1. “Create a new profile” link lets you generate a new profile with requisite options. “Open existing profileconfig” lets you browse through various saved profile configurations and choose anyone.

2. Once you have profiled the system/application or imported profile databases, you will see the last 5 openeddatabases and a link to see the complete list as well.

3. The “Quick Links” section contains two entries which lets you to start profiles with minimal configuration.The first one will start a system-wide time-based profiling until stopped by user and then display the col-lected data. The second option takes to a section where numerous counters can be selected and will presenta live view of the data through graphs.

4. The “Developer Links” section provides useful links.

15

Page 19: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.1 Starting a Profile

In order to start a profile, we click either the “PROFILE” button at the top or “Create a new profile” hyperlink.This will navigate to the following page:

You can see there are collapsible sections. In each of these sections, options are grouped. Not all of them arecompulsory however. Different types of profile target can be selected from the “Target” pane. They are:

1. “Launch Application”: Select this target when you want to launch an application and profile it (or launchand do a system-wide profile). The only compulsory option is a valid path to the executable. (By default,the path to the executable becomes the working directory unless you specify a path). Once provided, it willlook like this:

16 Chapter 3. Graphical User Interface

Page 20: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

2. “System Wide”: Select this if you do not wish to launch any application but perform either a system-wideprofile or profile specific set of cores. The screen will look like this:

3. “Attach to Process”: Select this if you want to profile an application/process which is already running. Thiswill bring up a process table which can be refreshed. Selecting any one of the process from the table ismandatory in order to start profile.

3.1. CPU Profiler 17

Page 21: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

All other paned are optional. These panes include:

1. “Extra Options”: Use this to set options like path to the source (if the sources are relocated), path to symbolservers (if PDB files are available on symbol servers) and Core affinity (to bind the launched application tospecific set of cores).

2. “Profile Scheduling”: Use this pane to set scheduling options like the profile duration and start delay. Incase, if Instrumentation APIs are used for fine-grained profiling, the “Profile Instrumentation API” optionshould be checked.

18 Chapter 3. Graphical User Interface

Page 22: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3. “Call Stack Options”: Use this to collect call-stack data. By default it is disabled, in order to enable it, clickon the option “Enable CSS”. Once enabled, you can set the collection mode, call stack depth and enableFPO (which leads to better call-stack collection). Note: “Enable FPO” is not available on Linux.

Once you have set the requisite options, click on the “Next” button. Note that specifying any invalid option willdisable the “Next” button. On clicking the “Next” button, you will be presented with a grid of possible Profiletype. This selection decides what kind of profile data is collected. For example, if you are profiling a memorybandwidth limited application, you can go for “Investigate Data Access”. The screen will look like this:

3.1. CPU Profiler 19

Page 23: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

The grid contains certain default profiles which contain the necessary relevant events. However, this may notalways fit your needs. In that case you can click on the “Custom Profile” box. (You may need to scroll down ifthere are lots of different boxes, it is usually the last entry). Once clicked, you will be immediately taken to thecounter/event selection dialog.

Selecting any event from the tree will show it’s corresponding description. You can select one or multiple suchevents (Ctrl + Click) and then click on the “Add Selected Event” button to add it to the table on the right. Removingevents is achieved by selecting one or more from the table and clicking the “Removed Selected Event” button.

20 Chapter 3. Graphical User Interface

Page 24: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

On adding event(s), clicking any row on the table will show the corresponding mask for the event with it’s descrip-tion. Furthermore, for each event, you can optionally set the mode (User and Kernel) and the sampling intervalfor the event. Changing the mask is accomplished by checking/unchecking the corresponding mask entries andthe table will reflect the value i.e.

In case you want to search for a specific event, just type the name in the “Filters” text box and a correspondingdrop-down will show the matching counter names i.e.

3.1. CPU Profiler 21

Page 25: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Note: You can always double-click an event in the tree to add it and double-click on the table to remove it. Also,same event cannot be added multiple times.

Once you are done setting the options and when done correctly, the “Start Profile” green-colored button at thebottom will be enabled and you can click on it to start the profile. After profile initialization, you will see therunning timer displaying the number of seconds passed starting from zero. It will look like this:

Here you can either cancel the profile which will take you back to profile options or stop the profile. Once theprofiling is stopped, the collected data will be processed.

22 Chapter 3. Graphical User Interface

Page 26: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Note: The data translation can take a long time based on options provided during profiling. Enabling call stackcollection significantly increases the translation time and so does enabling remote symbol servers.

3.1.2 Profile Overview

Once the translation completes, you will be presented with an overview of the hot spots for the profile i.e.

This hot-spot will be shown for functions, modules, process and threads. (Process and Threads will only be shownif there are at least two of them). The hot-spots are shown per counter and it can be selected from drop-down in

3.1. CPU Profiler 23

Page 27: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

top right corner. Changing it to any other counter will update the hot-spot data accordingly i.e.

The hot-spot donut chart for functions is interactive in nature i.e. you can click on any section and the correspond-ing function’s source view will open in a separate tab.

Apart from this, you can also see certain ancillary information regarding profile opened and the system informationin which the profiling was carried out i.e.

24 Chapter 3. Graphical User Interface

Page 28: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.3 Counter Samples

Click on the “ANALYZE” button on the top to go to counter-wise sample data for process/modules/threads. Thisview will look like this:

The view contains data in two different formats:

1. The upper tree represents counter samples grouped by “Process”. The tree can be expanded to see the childentries for each parent (i.e. for a process). The load modules and threads are child entries for the selected

3.1. CPU Profiler 25

Page 29: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

process entry.

2. The lower table contains function-level counter samples. This data depends on what is selected in the tree.For more specific data, you can select a child entry from the tree and the corresponding function data willbe updated i.e.

Not all entries will be loaded for a profile. To load more than the default entries, click the “Load more func-tion/profile data” buttons on the top right corners to fetch more data. The columns can be sorted as well byclicking on the column headers.

The data can be filtered by expanding the “Options and Filters” collapsible pane. The “View” controls the countersthat are displayed. By default, all the counters are displayed, but it can be narrowed down to certain predefinedsets.

26 Chapter 3. Graphical User Interface

Page 30: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Furthermore, you can change the group-by criteria to either group by modules or threads.

You can also change the sample data type (from absolute count to percentage) and change whether to list systemmodules or not.

3.1. CPU Profiler 27

Page 31: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.4 Source-level Attribution

Double-clicking any entry for a function will make the GUI load the source view for that function. If the GUI canfind the path to the source for the function, then it will try to open the file failing which you will be prompted tolocate it.

Note: The name of the file will be present in the title bar of the browse dialog. If locating and opening the file issuccessful, it will be displayed in a separate tree in “SOURCE” tab with the file name displayed. It looks like:

28 Chapter 3. Graphical User Interface

Page 32: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Each row for the source view can be expanded to see the assembly instructions generated for the correspondingsource line. The tree will also show the offset for each assembly instruction along with counter samples.

For multi-threaded/processed applications, if a function has been executed from multiple threads/processes, theneach of them will be listed in the processes and threads drop-downs in filters. Changing them will update thecounter sample data for that particular selection. By default, data for all process and all thread is shown. Note thatselecting a different process from the current selection will reset the “Threads” drop-down to “All Threads” againas each process can spawn multiple threads.

3.1. CPU Profiler 29

Page 33: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

As with the case for counter samples section, there are filters present here as well. You can modify the process,threads, value show type (absolute vs. percentage). This helps you to get more granular data. The function drop-down on the top right helps you to see the source-level attribution for all the functions present in the file whichhave some samples. Selecting any one of them will take you there. However, the data for the previous one will beremoved.

If the source file cannot be located or opened, only disassembly will be displayed.

30 Chapter 3. Graphical User Interface

Page 34: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.5 Call Graph

If the profile was created with Call-stack sampling enabled, the call graph will be displayed in the “ANALYZE”tab and can be navigated by clicking “Call Graph”. It will look like:

The data can be browsed based on processes and counters (which can be selected from the drop-down at the top).The top central table displays call-stack samples for each functions. Clicking on any function updates the bottomtwo tables. These tables displays the callers and callees respectively of the selected function.

3.1. CPU Profiler 31

Page 35: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.6 Importing Profile Databases

Profile databases generated through CLI or from elsewhere can be imported in the GUI. For this, from “HOME”page you can navigate to “Open Profile” and you will see the following:

While importing, you can specify the DB file and along with that the sources directory for the application whichwas profiled and symbol servers if applicable. The database format is .db, the raw data format is .caperf on Linuxand .pdata or .prd on Windows. These files are recognized by the GUI and can be opened. For raw data, importingwill trigger a translation to .db format before opening and thus may take some time. However, if the GUI finds analready existing translation, it will instead ask your permission to either open the existing translation, re-translateand open or simply cancel it.

32 Chapter 3. Graphical User Interface

Page 36: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Once the translation completes, the database will be opened and you will see the hot-spots.

3.1.7 Settings

There are certain application-wide settings to customize the experience. The “SETTINGS” tab is located in top-right corner and is divided into three sections, each having a short description of what it contains. The settingsonce changed can be applied by clicking the “Apply” button. There are settings which are common with profiledata filters and hence any change in them when applied through “Apply” button will only get applied to suchviews which do not have local filters set. In case you want to override them, you can click on the “Override . . . ”button. (Note: You will lose all local filters applied). You can always reset the settings by clicking “Reset” buttonor “Cancel” to cancel any changes that you don’t want to apply.

3.1. CPU Profiler 33

Page 37: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

3.1.8 Miscellaneous

Once you have opened/imported profile databases, a history is created and the last 50 opened profile databases’records are stored (i.e. where they are located). Such a list will come up in the welcome section as well.

In case there are more than 5 entries, you can open the “Recent Profiles” in “HOME” page and you will see thelist of all profile databases opened.

34 Chapter 3. Graphical User Interface

Page 38: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Any entry can be clicked (by the name) to open it. Clicking on “See Details” will open a sliding right pane whichwill show some details about the database.

Similarly in case of CPU profiles generated (when you set the options and start profiling), if it generates at leastone valid database, the profile will be stored with the options set and can be loaded again in future. Such a list isavailable in “PROFILE” page in “Saved Configurations” section. Note that by default the profile name is generatedby the application and if you want to reuse it, you should ideally name it so that it is easy to locate. This can bedone by typing a name in the bottom right corner when setting the profile options.

3.1. CPU Profiler 35

Page 39: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

When you click the “See Details” button to pop-up details, you can further see the databases generated by theprofile by clicking on the hyperlink in the details pane. It will be shown as a table i.e.

3.2 Power Profiler

Power Profiler GUI provides following functionalities:

• “Live Power Profile” configuration to collect and report the power, frequency and thermal data in live graphs.

36 Chapter 3. Graphical User Interface

Page 40: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

• “Power Profile” configuration for application Analysis to identify the energy intensive applica-tions/modules/functions/source lines.

3.2.1 Live Power Profile

AMDuProf GUI lets the user to select the counters based on the available counter categories. It also providessettings section to provide sampling period, launched application to be profiled and other related configurations.Based on the user input, GUI plots the collected profile data and display the data in live graphs.

New Profile Session

There are two ways to start a new power profile session from the home page below.

• For quick profiling click “See what’s guzzling power in your system” link. This will take to the“Select Live Counters” page to select the interesting counters.

• An advanced user can click “Create a new profile” link under the heading “Start Here” andproceed to “Set Profile Options” page for further configurations.

3.2. Power Profiler 37

Page 41: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Set Profile Options

Select Live Counters

Counters are arranged based on categories such as energy, power, frequency, temperature etc. Withina category, counters are arranged in a tree structure based on the device hierarchy. User can select orunselect any number of counters by checking or unchecking the corresponding checkbox adjacent tothe counter name. A context menu will populate option to select or unselect all counters of the currentcategory and to collapse or expand the device tree for that counter category.

There should be at least one counter selected to move to the next window for profiling.

38 Chapter 3. Graphical User Interface

Page 42: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Time Chart

Once we select the interested counters from the “Select Live Counters” page and clicking “StartProfile” button will start the data collection and plot the data in graphs. There is also a correspondingdata table adjacent to each graph to display the current value of the counters.

Line graphs are grouped together and plotted based on the category. However, if too many countersare selected from the same category, counters are grouped together based on category and the devicesthey belong to.

Based on the graph type selected, graphs will either be scrolling or scaling. Refer to the “SETTINGS”section to know more about graph types.

When plotting is in progress, user can either stop the collection of the data and start analyzing thedata or close the profile session. When plotting is in stopped state, to analyze the plotted data usercan hover the mouse over the graph to view the corresponding data values of the counters in the datatable. User can also click on any graph line to highlight that line.

3.2.2 Application Analysis

The “Power Profiler” configuration in “Select Configuration Type” page is used to perform application analysisof energy consumption of a launched application or all the running applications in the system. It can be used toidentify the energy intensive functions with respect to the energy consumption of the application or for the allapplications running in system during the profile run. User can drill down to source code level, provided requiredsymbol information and source files are available for the target application. Further, user can also attach a specificprocess running from a list of running processes.

Power profile application analysis and CPU profile application analysis follow the common GUI workflow. Pleaserefer to the “CPU Profiler” section of Graphical User Interface in this document.

3.2.3 Settings

Save Live Data

Live data can be saved in a SQLite db file for future use. This option will enable writing the profiledata into a database file. This database file can be opened in any SQLite browser to analyze the profile

3.2. Power Profiler 39

Page 43: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

data in future.

Save Live Data at

User can choose a path to store the output database file.

Graph Types (Scaling & Scrolling)

When profile is in progress, collected data for profiling are continuously plotted on the graph. Onceplotting area of graph is covered, if “Scrolling Graph” is selected, older plotting data are replaced bynewly collected counter data. So at any point of time, user can see only a fixed number of data plottedon the graph. However, if “Scaling Graph” is selected, graph is compressed to accommodate the newdata. User can view the entire data from the beginning of the profile in this graph type.

Sampling Interval

Sampling interval implies the interval at which profiler will collect data from driver and plot them inthe graph. Default value is 100ms. However, in case of EPYC processors, default sampling intervalis 500ms. It is recommended to choose a larger sampling period to avoid overhead of data collectionfrom the driver. Please note profiler may not allow the system to transition to sleep state if thesampling interval is too small.

40 Chapter 3. Graphical User Interface

Page 44: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

FOUR

COMMAND LINE INTERFACE

4.1 CPU Profiler

4.1.1 CLI Options

AMD uProf’s CPU Profiler provides a command line interface utility AMDuProfCLI to collect and analyze theprofile data. It can also be used from a batch file or a test script.

Usage:

> AMDuProfCLI.exe [--version] [--help] <command> <options>[<application>] [<arguments>]

--version : Displays the version of AMD uProf tool.-h | --help : Displays the quick help details.

<application> : Path to the launch application to be profiled.<arguments> : Arguments (if any) to be passed to the launch application.

Following commands are supported:

collect Run the given input application and collect CPU profile samples.report Process the given CPU profile data file and generate a CPU profile report in CSV

format.info Displays generic information about system, CPU topology etc.help Opens this HTML help page in a web browser.

Following options are supported with collect command.

Option Description-h | --help Displays this help information.--list <type> Lists the supported items for the following types:

cpu-configs : Predefined profile configurations that canbe used with --config option.cpu-events : Raw PMC events that can be used with--event option.

--config <config-type> Predefined data-collection configuration to be used to collect samples.Use --list cpu-configs to get the list of supported configs.

41

Page 45: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

-e | --event <EVENT> Specify Timer, PMU, L3, DF or IBS event in the form of comma sepa-rated key=value pair. Supported keys are:

event=<event-id> or <timer | ibs-fetch |ibs-op>

umask=<unit-mask>

user=<0 | 1>

os=<0 | 1>

interval=<sampling-interval> (in milliseconds)ibsop-count-control=<0 | 1>

slicemask=<L3-slice-mask>

threadmask=<L3-thread-mask>

component=<DF-component>The PMU-event-select and unit-mask values can be in decimal or hex(prefixed with 0x). Default values are umask=0, user=1, os=1, inter-val=0, ibsop-count-control=0. For a PMC event, if the interval is notset or 0, then the event will be monitored in count mode. For timer, ibs-fetch and ibs-op events valid sampling interval is required. For timer,the interval is in milliseconds. If ibsop-count-control is 0, then countclock cycles otherwise count dispatched micro ops. Multiple occur-rences of -e is allowed. Use --list cpu-events to get the list ofsupported event-ids.

-p | --pid <PID,PID,..> Profile existing processes (processes to attach to). Process IDs are sep-arated by comma.

-a | --system-wide System Wide Profile (SWP). If this flag is not set then the commandline tool will profile only the launched application or the PIDs attached.

-c | --cpu <core-id,..> Comma separated list of CPUs to profile. Ranges of CPUs also be speci-fied with -, e.g. 0-3. Use info command to get list of core-ids (info--cpu-topology).

--call-graph <I:D:S:F> Enable callstack Sampling. Specify the Unwind Interval (I) in millisec-onds and Unwind Depth (D) value. Specify the Scope (S) by choosingone of the following:

user : Collect only for user space code.kernel : Collect only for kernel space code.all : Collect for code executed in user and kernel spacecode.

Specify to collect missing frames due to Frame Pointer Omission (F) bycompiler:

fpo : Collect missing callstack frames.nofpo : Do not collect missing callstack frames.

Scope (S) and FPO (F) are only supported on Windows.-g Enable callstack sampling with the following default values:

Unwind Interval (I) : 1Unwind Depth (D) : 128Scope (S) : userFPO (F): nofpo

-d | --duration <n> Profile duration n in seconds.--affinity <core-id,..> Core affinity. Comma separated list of core-ids. Ranges of core-ids also

be specified, e.g. 0-3. Default affinity is all the available cores. In Per-Process profile, processor affinity is set for the launched application.

--no-inherit Do not profile the children of the launched application (i.e. processeslaunched by the profiled application).

-b | --terminate Terminate the launched application after profile data collection ends.Only the launched application process will be killed. Its children, ifany, may continue to execute.

42 Chapter 4. Command Line Interface

Page 46: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

--start-delay <n> Start Delay n in seconds. Start profiling after the specified duration.When n is 0, it has no impact.

--start-paused Profiling paused indefinitely. The target application resumes the profil-ing using the profile control APIs. This option is expected to be usedonly when the launched application is instrumented to control the profiledata collection using the resume and pause APIs defined in AMDPro-fileControl library, which is provided along with AMD uProf installer.

-w | --working-dir <dir> Specify the working directory. Default will be the directory of thelaunch application.

-o | --output <file name> Base name of the output file. If this option is skipped,default path will be used. The default path will be%Temp%\AMD-CpuProfile-<timestamp>.* on Windowsand /tmp/AMD-CpuProfile-<timestamp>.* on Linux.

-v | --verbose <n> Specify debug log messaging level. Valid values are:1 : INFO2 : DEBUG3 : EXTENSIVE

Following options are supported with report command.

Option Description-h | --help Displays this help information.--list cpu-views List of the supported report view configurations that can be used with

--view option.-i | --input <file name> Input file name. Either the raw profile data file (.prd on Windows and

.caperf on Linux) or the processed data file (.db) can be specified.-o | --output <output dir> Output directory in which the processed data file

.db, .csv will be created. The default path will be%Temp%\<base-name-of-input-file>.* on Windowsand /tmp/<base-name-of-input-file>.* on Linux.

--summary Report only the overview of the profile.--group-by <section>

Specify the report to be generated. Supported report options are:

process : Report process details.module : Report module details.thread : Report thread.

--view <view-config> Report only the events present in the given view file. Use --listcpu-views to get the list of supported view-configs.

--src Generate detailed function report with source statements.--disasm Generate detailed function report with assembly instructions.--sort-by <event-index> Specify the event index on which the reported profile data will be sorted.

This event is also used to generate CallGraph section.--imix Generate Instruction MIX report.--ignore-system-module Ignore samples from System Modules.--show-percentage Show Percentage.--src-path <path1;..> Source file directories. (Semicolon separated paths.)--symbol-path <path1;..> Debug Symbol paths. (Semicolon separated paths.)--symbol-server <path1;..> Symbol Server directories. (Semicolon separated paths.)--symbol-cache-dir <path> Path to store the symbols downloaded from the Symbol Servers.

4.1. CPU Profiler 43

Page 47: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

--cutoff <n> Cutoff to limit the number of process, threads, modules and functionsto be reported. n is the minimum number of entries to be reported invarious report sections. Default value is 10.

-v | --verbose <n> Specify debug log messaging level. Valid values are:1 : INFO2 : DEBUG3 : EXTENSIVE

Following options are supported with info command.

Option Description--system Displays processor information of this system.--cpu-topology Displays CPU topology information of this system.--disasm <binary-path> Disassembles the given binary file.

4.1.2 Examples

Collect Command

Launch the application classic.exe and collect Time-based profile (TBP) samples:

> AMDuProfCLI.exe collect -o c:\Temp\cpuprof-tbp classic.exe

Launch classic.exe and do ‘Assess Performance’ profile for 10 seconds:

> AMDuProfCLI.exe collect --config assess -o c:\Temp\cpuprof-assess -d10 classic.exe

Launch classic.exe and collect ‘IBS’ samples in SWP mode:

> AMDuProfCLI.exe collect --config ibs -a -o c:\Temp\cpuprof-ibs-swpclassic.exe

Collect ‘TBP’ samples in SWP mode for 10 seconds:

> AMDuProfCLI.exe collect -a -o c:\Temp\cpuprof-TBP-swp -d 10

Launch classic.exe and collect ‘TBP’ with Callstack sampling:

> AMDuProfCLI.exe collect --config tbp -g -o c:\Temp\cpuprof-tbpclassic.exe

Launch classic.exe and collect samples for PMC events 0x76 and 0xc0:

> AMDuProfCLI.exe collect -e event=0x76,interval=250000 -eevent=0xc0,user=1,os=0,interval=250000 -o c:\Temp\cpuprof-tbpclassic.exe

Launch classic.exe and collect samples for IBS OP with interval 50000:

> AMDuProfCLI.exe collect -e event=ibs-ops,interval=50000 -oc:\Temp\cpuprof-tbp classic.exe

Collect L3 samples in SWP mode:

> AMDuProfCLI.exe collect -e event=timer,interval=1 -eevent=0xb001,umask=0x80,slicemask=0xF,threadmask=0xFF -a -d 10 -oc:\Temp\cpuprof-l3

Collect DF samples in SWP mode: (using dummy event, umask, component values below)

> AMDuProfCLI.exe collect -e event=timer,interval=1 -eevent=0xabcd,umask=0x00,component=comp0 -a -d 10 -o c:\Temp\cpuprof-df

44 Chapter 4. Command Line Interface

Page 48: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Report Command

Generate report from the raw datafile:

> AMDuProfCLI.exe report -i c:\Temp\cpuprof-tbp.prd -oc:\Temp\cpuprof-tbp-out

Generate IMIX report from the raw datafile:

> AMDuProfCLI.exe report --imix -i c:\Temp\cpuprof-tbp.prd -oc:\Temp\cpuprof-tbp-out

Generate report with Symbol Server paths:

> AMDuProfCLI.exe report --symbol-path C:\AppSymbols;C:\DriverSymbols--symbol-server http://msdl.microsoft.com/download/symbols --cache-dirC:\symbols -i c:\Temp\cpuprof-tbp.prd -o c:\Temp\cpuprof-tbp-out

Others

Open this HTML help in a web browser:

> AMDuProfCLI.exe help

Print help in the console/terminal:

> AMDuProfCLI.exe --help

Print uProf version:

> AMDuProfCLI.exe --version

Print system details:

> AMDuProfCLI.exe info --system

Print CPU topology details:

> AMDuProfCLI.exe info --cpu-topology

To disassemble classic.exe into classic-disasm.txt file:

> AMDuProfCLI.exe info --disasm classic.exe > classic-disasm.txt

4.1.3 Java Application Profiling

AMDuProfCLI supports Java application profiling running on JVM. To support this, it uses JVM Tool Interface(JVMTI).

AMDuProfCLI provides JVMTI Agent libraries: AMDJvmtiAgent.dll (on Windows), libAMDJvmtiAgent.so (onLinux). The JVMTI expects this JvmtiAgent need to be attached to the target JVM before starting the profiling.Only after attaching the JvmtiAgent, AMDuProfCLI would be able to collect the profile data. In this release,AMDuProfCLI can’t attach JvmtiAgent dynamically to an already running JVM. Hence any JVM process pro-filed by attach-process mechanism, AMDuProfCLI can’t capture any class information, unless the JvmtiAgent isattached to the JVM. As long as the Java application is launched by AMDuProfCLI, you don’t need to worry aboutJvmtiAgent, AMDuProfCLI would take care all these things in the background.

If there is a scenario, where you need to attach to an already running JVM, you need to attach the uProf JvmtiAgentmanually to the JVM while running the Java application using -agentpath option. Later AMDuProfCLI canattach to the JVM PID to collect profile data.

For a 64-bit JVM on Linux:

$ java-agentpath:<path-to-uProf/bin/ProfileAgents/x64/libAMDJvmtiAgent.so><java-app-launch-options>

4.1. CPU Profiler 45

Page 49: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

For a 64-bit JVM on Windows:

> java -agentpath:<C:\\ProgramFiles\\AMD\\AMDuProf\\bin\\ProfileAgents\\x64\\AMDJvmtiAgent.dll><java-app-launch-options>

Keep a note of the process id (PID) of the above JVM instance. Then launch AMDuProfCLI with the usual optionsand --pid <jvm-pid>.

4.2 Power Profiler

4.2.1 CLI Options

AMD uProf’s Power Profiler provides a command line interface utility AMDuProfCLI to collect the power,frequency and thermal characteristics data. This utility is targeted for the users who prefer to use commandinterpreters like cmd.exe on Windows and bash on Linux. This utility can be used from a batch file or script.

Usage:

> AMDuProfCLI.exe [--version] [--help] <command> <options>[<application>] [<arguments>]

--version : Displays the version of AMD uProf tool.-h | --help : Displays the quick help details.

<application> : Path to the launch application to be profiled.<arguments> : Arguments (if any) to be passed to the launch application.

Following commands are supported:

collect Run the given input application and collect power profile samples. Supported onlyon Windows.

timechart Run power profiler and collect counters for supported combination of device andcategory types.

report Process the given power profile data file and generate a report in CSV format.help Opens this HTML help page in a web browser.

Following options are supported with collect command.

Option Description-h | --help Displays this help information.--config power Predefined data-collection configuration to be used to collect power

samples.-p | --pid <PID,PID,..> Profile existing processes (processes to attach to). Process IDs are sep-

arated by comma.-a | --system-wide System Wide Profile (SWP). If this flag is not set then the command

line tool will profile only the launched application or the PIDs attached.-c | --cpu <core-id,..> Comma separated list of CPUs to profile. Ranges of CPUs also be spec-

ified with -, e.g. 0-3.-d | --duration <n> Profile duration n in seconds.--affinity Core affinity. Comma separated list of CPUs. Ranges of CPUs also be

specified, e.g. 0-3. Default affinity is all the available cores. In Per-Process profile, processor affinity is set for the launched application.

46 Chapter 4. Command Line Interface

Page 50: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

-b | --terminate Terminate the launched application after profile data collection ends.Only the launched application process will be killed. Its children, ifany, may continue to execute.

--start-delay <n> Start Delay n in seconds. Start profiling after the specified duration.If n is 0, then wait indefinitely, which used for profile control. Thisoption is expected to be used only when the launched application isinstrumented to control the profile data collection using the enable ordisable APIs defined in AMDProfileControl library, which is providedalong with AMD uProf installer.

--start-paused Profiling paused indefinitely. The target application resumes the pro-filing using the profile control APIs. Refer the AMDProfileControl li-brary, which is provided along with AMD uProf installer.

-w | --working-dir <dir> Specify the working directory. Default will be the directory of thelaunch application.

-o | --output <file name> Base name of the output file. If this option is skipped,default path will be used. The default path will be%Temp%\AMD-PowerProfile-<timestamp>.* on Win-dows and /tmp/AMD-PowerProfile-<timestamp>.* onLinux.

-v | --verbose <n> Specify debug log messaging level. Valid values are:1 : INFO2 : DEBUG3 : EXTENSIVE

4.2. Power Profiler 47

Page 51: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Following options are supported with timechart command.

Option Description--list Display all the supported devices and categories.-e | --event <type,...> Collect counters for specified type or comma separated list of types,

where type can be a device or a category.Supported device list:

socket : Collect profile data from socket.die : Collect profile data from die.core : Collect profile data from core.thread : Collect profile data from thread.

Supported category list:power : Collect all available power counters.frequency : Collect all available frequency counters.temperature : Collect all available temperaturecounters.voltage : Collect all available voltage counters.current : Collect all available current counters.dvfs : Collect all available Dynamic Voltage andFrequency Scaling (DVFS) counters.energy : Collect all available energy counters.correlatedpower : Collect all available correlatedpower counters.cac : Collect all available cac counters.controllers : Collect all available controllers counters.

Multiple occurrences of -e is allowed.--histogram Collect histogram counters. Allowed only with an occurrence of -e

frequency.--cumulative Collect cumulative counters. Allowed only with an occurrence of -e

power.-t | --interval <n> Sampling interval n in milliseconds. The minimum value is 10ms.-d | --duration <n> Profile duration n in seconds.--affinity <core-id,..> Core affinity. Comma separated list of core-ids. Ranges of core-ids

also be specified, e.g. 0-3. Default affinity is all the available cores.Affinity is set for the launched application.

-w | --working-dir <dir> Set the working directory for the launched target application.-f | --format <fmt> Output file format. Supported formats are:

txt : Text (.txt) format.csv : Comma Separated Value (.csv) format.

Default file format is CSV.-o | --output <filepath> Output file path.-h | --help Displays this help information.

Following options are supported with report command.

Option Description-h | --help Displays this help information.-i | --input <file name> Input file name. Either the raw profile data file (.pdata on Windows) or

the processed data file (.db) can be specified.-o | --output <output dir> Output directory in which the processed data file

.db, .csv will be created. The default path will be%Temp%\<base-name-of-input-file>.* on Windowsand /tmp/<base-name-of-input-file>.* on Linux.

48 Chapter 4. Command Line Interface

Page 52: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

--summary Report only the overview of the profile.--group-by <section> Specify the report to be generated. Supported report options are:

process : Report process details.module : Report module details.thread : Report thread.

--src Generate detailed function report with source statements.--disasm Generate detailed function report with assembly instructions.--sort-by <event-index> Specify the event index on which the reported profile data will be sorted.--ignore-system-module Ignore samples from System Modules.--show-percentage Show Percentage.--src-path <path1;..> Source file directories. (Semicolon separated paths.)--symbol-path <path1;..> Debug Symbol paths. (Semicolon separated paths.)--symbol-server <path1;..> Symbol Server directories. (Semicolon separated paths.)--symbol-cache-dir <path> Path to store the symbols downloaded from the Symbol Servers.--cutoff <n> Cutoff to limit the number of process, threads, modules and functions

to be reported. n is the minimum number of entries to be reported invarious report sections. Default value is 10.

-v | --verbose <n> Specify debug log messaging level. Valid values are:1 : INFO2 : DEBUG3 : EXTENSIVE

4.2.2 Examples

Collect Command

Launch classic.exe and collect ‘Power’ samples in SWP mode:

> AMDuProfCLI.exe collect --config power -a -o c:\Temp\pwrprof-swpclassic.exe

Launch classic.exe and collect ‘Power’ samples for application:

> AMDuProfCLI.exe collect --config power -o c:\Temp\pwrprof classic.exe

Timechart Command

Collect all the power counter values for the duration of 10 seconds with sampling interval of 100

milliseconds:

> AMDuProfCLI.exe timechart --event power --interval 100 --duration 10

Collect all frequency counter values for 10 seconds, sampling them every 500 milliseconds and dumping

the results to a csv file:

> AMDuProfCLI.exe timechart --event frequency -o %Temp%\PowerOutput

--interval 500 --duration 10

Collect all frequency counter values at core 0 to 3 for 10 seconds, sampling them every 500 milliseconds

and dumping the results to a text file:

> AMDuProfCLI.exe timechart --event core=0-3,frequency --output

%Temp%\PowerOutput --interval 500 -duration 10 --format txt

4.2. Power Profiler 49

Page 53: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

Report Command

Generate report from the raw datafile:

> AMDuProfCLI.exe report -i c:\Temp\pwrprof.pdata -oc:\Temp\pwrprof-out

Generate report with Symbol Server paths:

> AMDuProfCLI.exe report --symbol-path C:\AppSymbols;C:\DriverSymbols--symbol-server http://msdl.microsoft.com/download/symbols --cache-dirC:\symbols -i c:\Temp\pwrprof.pdata -o c:\Temp\pwrprof-out

4.2.3 Sample Report

Following command collects all the correlatedpower counter values for the duration of 1 second with

sampling interval of 100 milliseconds:

> AMDuProfCLI.exe timechart --event correlatedpower --interval 100--duration 1 -o %Temp%\PowerOutput

It will generate a timechart report (similar to the below screen-shot) in the file: %Temp%\PowerOutput.csv

50 Chapter 4. Command Line Interface

Page 54: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

FIVE

SDKS

AMD uProf provides following development libraries for the developers:

5.1 AMDPowerProfileAPI Library

5.1.1 Introduction

AMDPowerProfileApi library provides APIs to configure, collect and report the supported power profiling coun-ters on various AMD platforms. The AMDPowerProfile API library is useful to analyze the energy efficiencyof systems based on AMD CPUs, APUs and dGPUs (Discrete GPU). These APIs provide interface to read thepower, thermal and frequency characteristics of APU/dGPU and their subcomponents. These APIs are targeted forsoftware developers who want to write their own application to sample the power counters based on their specificuse case.

5.1.2 APIs

AMDTPwrProfileInitialize

This API loads and initializes the AMDT Power Profile drivers. This API should be the first one to be called.

AMDTResult AMDTPwrProfileInitialize(AMDTPwrProfileMode profileMode)

Parameters

• profileMode: Client should select any one of the predefined profile modes that are defined inAMDTPwrProfileMode.

Returns

• AMDT_STATUS_OK: Success

• AMDT_ERROR_INVALIDARG: An invalid profileMode parameter was passed

• AMDT_ERROR_DRIVER_UNAVAILABLE: Driver not available

• AMDT_DRIVER_VERSION_MISMATCH: Mismatch between the expected and installed driver ver-sions

• AMDT_ERROR_PLATFORM_NOT_SUPPORTED: Platform not supported

• AMDT_WARN_SMU_DISABLED: SMU is disabled and hence power and thermal values provided bySMU will not be available

• AMDT_WARN_IGPU_DISABLED: Internal GPU is disabled

• AMDT_ERROR_FAIL: An internal error occurred

• AMDT_ERROR_PREVIOUS_SESSION_NOT_CLOSED: Previous session was not closed.

51

Page 55: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

AMDTPwrGetSupportedCounters

This API provides the list of counters supported by the platform. The pointers returned will be valid till the clientcalls AMDTPwrProfileClose() function.

AMDTResult AMDTPwrGetSupportedCounters(AMDTUInt32 *pNumCounters, AMDTPwr-CounterDesc **ppCounterDescs)

Parameters

• pNumCounters: Number of counters supported by the device

• ppCounterDescs: Description of each counter supported by the device

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_INVALIDARG: NULL pointer was passed as ppCounterDescs or pNumCounters pa-rameters

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_INVALID_DEVICEID: invalid deviceId parameter was passed

• AMDT_ERROR_OUTOFMEMORY: Failed to allocate required memory

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrGetCounterId

This API provides the counter id for a basic counter.

AMDTResult AMDTPwrGetCounterId(AMDTCounter counter, AMDTUInt32 *pCounterId)

Parameters

• counter: supported counter to get the counter id

• pCounterId: counter id of counter.

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_INVALIDARG: NULL pointer was passed as pCounterDesc parameter

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_INVALID_COUNTERID: Invalid counterId parameter was passed

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrGetCounterDesc

This API provides the description for the given counter index.

AMDTResult AMDTPwrGetCounterDesc(AMDTUInt32 counterId, AMDTPwrCounterDesc*pCounterDesc)

Parameters

52 Chapter 5. SDKs

Page 56: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

• counterId: Counter index

• pCounterDesc: Description of the counter which index is counterId

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_INVALIDARG: NULL pointer was passed as pCounterDesc parameter

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_INVALID_COUNTERID: Invalid counterId parameter was passed

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrEnableCounter

This API will enable the counter to be sampled. This API cannot be used once profile is started.

AMDTResult AMDTPwrEnableCounter(AMDTUInt32 counterId)

Parameters

• counterId: Counter index

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_INVALID_COUNTERID: Invalid counterId parameter was passed

• AMDT_ERROR_COUNTER_ALREADY_ENABLED: Specified counter is already enabled

• AMDT_ERROR_PROFILE_ALREADY_STARTED: Counters cannot be enabled on the fly when theprofile is already started

• AMDT_ERROR_PREVIOUS_SESSION_NOT_CLOSED: Previous session was not closed

• AMDT_ERROR_COUNTER_NOT_ACCESSIBLE: Counter is not accessible

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrSetTimerSamplingPeriod

This API will set the driver to periodically sample the counter values and store them in a buffer. This cannot becalled once the profile run is started. This API is not required to call if AMDTPwrProfileInitialize API is calledwith AMDT_PWR_MODE_INSTANT_COUNTER as profileMode.

AMDTResult AMDTPwrSetTimerSamplingPeriod(AMDTUInt32 interval)

Parameters

• interval: sampling period in millisecond

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_INVALIDARG: Invalid interval value was passed

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

5.1. AMDPowerProfileAPI Library 53

Page 57: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

• AMDT_ERROR_PROFILE_ALREADY_STARTED: Timer interval cannot be changed when the pro-file is already started

• AMDT_ERROR_PREVIOUS_SESSION_NOT_CLOSED: Previous session was not closed

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrStartProfiling

This API will start the profiling and the driver will collect the data at regular interval specified byAMDTPwrSetTimerSamplingPeriod(). This has to be called after enabling the required counters by usingAMDTPwrEnableCounter().

AMDTResult AMDTPwrStartProfiling()

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_TIMER_NOT_SET: Sampling timer was not set

• AMDT_ERROR_COUNTERS_NOT_ENABLED: No counter enabled for collecting profile data

• AMDT_ERROR_PROFILE_ALREADY_STARTED: Profile is already started

• AMDT_ERROR_PREVIOUS_SESSION_NOT_CLOSED: Previous session was not closed

• AMDT_ERROR_BIOS_VERSION_NOT_SUPPORTED: BIOS needs to be upgraded

• AMDT_ERROR_FAIL: An internal error occurred

• AMDT_ERROR_ACCESSDENIED: Profiler is busy, currently not accessible

AMDTPwrReadAllEnabledCounters

This API will read all the counters that are enabled. This can return an array of {CounterID, Float-Value}.If there are no new samples, this API will return AMDT_ERROR_PROFILE_DATA_NOT_AVAILABLE andpNumOfSamples will point to value of zero. If there are new samples, this API will return AMDT_STATUS_OKand pNumOfSamples will point to value greater than zero.

AMDTResult AMDTPwrReadAllEnabledCounters(AMDTUInt32 *pNumOfSamples, AMDTP-wrSample **ppData)

Parameters

• pNumOfSamples: Number of sample based on the AMDTPwrSetSampleValueOption() set

• ppData: Processed profile data. No need to allocate or free the memory data is valid till we call thisAPI next time

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_INVALIDARG: NULL pointer was passed as pNumSamples of ppData parameters

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_PROFILE_NOT_STARTED: Profile is not started

• AMDT_ERROR_PROFILE_DATA_NOT_AVAILABLE: Profile data is not yet available

54 Chapter 5. SDKs

Page 58: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

• AMDT_ERROR_OUTOFMEMORY: Memory not available

• AMDT_ERROR_SMU_ACCESS_FAILED: One of the configured SMU data accessible

• AMDT_ERROR_FAIL: An internal error occurred

AMDTPwrStopProfiling

This APIs will stop the profiling run which was started by AMDTPwrStartProfiling() function call.

AMDTResult AMDTPwrStopProfiling()

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

• AMDT_ERROR_PROFILE_NOT_STARTED: Profile is not started

• AMDT_ERROR_FAIL: An internal error occurred

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_FAIL: An internal error occurred

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

AMDTPwrProfileClose

This API will close the power profiler and unregister driver and cleanup all memory allocated duringAMDTPwrProfileInitialize().

AMDTResult AMDTPwrProfileClose()

Returns

• AMDT_STATUS_OK: On Success

• AMDT_ERROR_FAIL: An internal error occurred

• AMDT_ERROR_DRIVER_UNINITIALIZED: AMDTPwrProfileInitialize() function wasneither called nor successful

5.1.3 Data Types

AMDTPwrProfileMode

enum AMDTPwrProfileMode

Following power profile modes are supported.

• AMDT_PWR_PROFILE_MODE_ONLINE : Counter values are collected at every specified samplinginterval.

• AMDT_PWR_MODE_INSTANT_COUNTER : Counter values are collected instantly.

Note: AMDT_PWR_MODE_INSTANT_COUNTER mode is supported only for AMD Family 17h Model10h based processor family

5.1. AMDPowerProfileAPI Library 55

Page 59: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

AMDTCounter

enum AMDTCounter

Following power profile counters are supported for AMDTPwrGetCounterId().

• AMD_PWR_SOCKET_POWER : Socket Power

• AMD_PWR_SOCKET_STAPM_LIMIT : Socket Stapm Limit

• AMD_PWR_SOCKET_PPT_FAST_LIMIT: Fast PPT Limit

• AMD_PWR_SOCKET_PPT_SLOW_LIMIT: Slow PPT Limit

AMDTPwrUnit

enum AMDTPwrUnit

Following are the various unit types for the output values for the counter types.

• AMDT_PWR_UNIT_TYPE_COUNT : Count index

• AMDT_PWR_UNIT_TYPE_PERCENT : Percentage

• AMDT_PWR_UNIT_TYPE_RATIO : Ratio

• AMDT_PWR_UNIT_TYPE_MILLI_SECOND : Time in milli seconds

• AMDT_PWR_UNIT_TYPE_JOULE : Energy consumption

• AMDT_PWR_UNIT_TYPE_WATT : Power consumption

• AMDT_PWR_UNIT_TYPE_VOLT : Voltage

• AMDT_PWR_UNIT_TYPE_MILLI_AMPERE : Current

• AMDT_PWR_UNIT_TYPE_MEGA_HERTZ : Frequency

• AMDT_PWR_UNIT_TYPE_CENTIGRADE : Temperature

AMDTResult

type AMDTResulttypedef unsigned int AMDTResult

AMDTPwrCounterDesc

type AMDTPwrCounterDesc

struct AMDTPwrCounterDesc encapsulate details of a supported power counter and its associated device.

Data Members

• AMDTUInt32 m_counterID : Counter index

• AMDTUInt32 m_deviceId : Device Id

• AMDTDeviceType m_devType : Device type- Package/Die/Compute unit/Core/dGPU

• AMDTUInt32 m_devInstanceId : Device instance id within the device type

• char *m_description : Name of the counter

• char *m_name : Description of the counter

• AMDTPwrCategory m_category : Power/Frequency/Temperature

56 Chapter 5. SDKs

Page 60: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

• AMDTPwrAggregation m_aggregation : Single/Histogram/Cumulative

• AMDTPwrUnit m_units : Seconds/MHz/Joules/Watts/Volt/Ampere

• AMDTUInt32 m_parentCounterId : If the counter has some child counters

AMDTPwrSample

type AMDTPwrSample

struct AMDTPwrSample encapsulate output sample with timestamp and the counter values for all the enabledcounters.

Data Members

• AMDTPwrSystemTime m_systemTime : Start time of Profiling

• AMDTUInt64 m_elapsedTimeMs : Elapsed time in milliseconds - relative to the start time ofthe profile

• AMDTUInt64 m_recordId : Record id

• AMDTUInt32 m_numOfCounter : Number of counter values available

• AMDTPwrCounterValue *m_counterValues : List of counter values

AMDTPwrSystemTime

type AMDTPwrSystemTime

struct AMDTPwrSystemTime represents the system time in second and milliseconds

Data Members

• AMDTUInt64 m_second : Seconds

• AMDTUInt64 m_microSecond : Milliseconds

AMDTPwrCounterValue

type AMDTPwrCounterValue

struct AMDTPwrCounterValue represents a counter id and its value

Data Members

AMDTUInt32 m_counterID; // Counter indexAMDTUInt32 m_valueCnt; // Number of value for this counterunion{

AMDTFloat32 m_data; // Counter valueAMDTFloat32 *m_pData; // Pointer to the multi value array

}

5.1. AMDPowerProfileAPI Library 57

Page 61: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

5.1.4 Examples

1 //=============================================================2 // (c) 2017 Advanced Micro Devices, Inc.3 //4 /// \author AMDuProf Developer Tools5 /// \brief Example program using the AMDTPowerProfile APIs.6 //7 //=============================================================8

9 // - Start the profiling10 // - Periodically read the counter values and report till the user has11 // requested to stop12

13 #include <stdio.h>14 #include <stdlib.h>15 #include <assert.h>16 #include <time.h>17 #include <string.h>18 #ifdef __linux__19 #include <unistd.h>20 #endif21

22 #include <AMDTPowerProfileApi.h>23

24 void CollectAllCounters()25 {26 AMDTResult hResult = AMDT_STATUS_OK;27

28 // Initialize online mode29 hResult = AMDTPwrProfileInitialize(AMDT_PWR_MODE_TIMELINE_ONLINE);30 // --- Handle the error31

32 // Configure the profile run33 // 1. Get the supported counters34 // 2. Enable all the counter35 // 3. Set the timer configuration36

37 // 1. Get the supported counter details38 AMDTUInt32 nbrCounters = 0;39 AMDTPwrCounterDesc* pCounters = nullptr;40

41 hResult = AMDTPwrGetSupportedCounters(&nbrCounters, &pCounters);42 assert(AMDT_STATUS_OK == hResult);43

44 AMDTPwrCounterDesc* pCurrCounter = pCounters;45

46 for (AMDTUInt32 cnt = 0; cnt < nbrCounters; cnt++, pCurrCounter++)47 {48 if (nullptr != pCurrCounter)49 {50 // Enable all the counters51 hResult = AMDTPwrEnableCounter(pCurrCounter->m_counterID);52 assert(AMDT_STATUS_OK == hResult);53 }54 }55

56 // Set the timer configuration57 AMDTUInt32 samplingInterval = 100; // in milliseconds58 AMDTUInt32 profilingDuration = 10; // in seconds59

60 hResult = AMDTPwrSetTimerSamplingPeriod(samplingInterval);61 assert(AMDT_STATUS_OK == hResult);

58 Chapter 5. SDKs

Page 62: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

62

63 // Start the Profile Run64 hResult = AMDTPwrStartProfiling();65 assert(AMDT_STATUS_OK == hResult);66

67 // Collect and report the counter values periodically68 // 1. Take the snapshot of the counter values69 // 2. Read the counter values70 // 3. Report the counter values71

72 volatile bool isProfiling = true;73 bool stopProfiling = false;74 AMDTUInt32 nbrSamples = 0;75

76 while (isProfiling)77 {78 // sleep for refresh duration - at least equivalent to the79 // sampling interval specified80 #if defined ( WIN32 )81 // Windows82 Sleep(samplingInterval);83 #else84 // Linux85 usleep(samplingInterval * 1000);86 #endif87

88 // read all the counter values89 AMDTPwrSample* pSampleData = nullptr;90

91 hResult = AMDTPwrReadAllEnabledCounters(&nbrSamples, &pSampleData);92

93 if (AMDT_STATUS_OK != hResult)94 {95 continue;96 }97

98 if (nullptr != pSampleData)99 {

100 // iterate over all the samples and report the sampled counter values101 for (AMDTUInt32 idx = 0; idx < nbrSamples; idx++)102 {103 // Iterate over the sampled counter values and print104 for (unsigned int i = 0; i < pSampleData[idx].m_numOfCounter; i++)105 {106 if (nullptr != pSampleData[idx].m_counterValues)107 {108 AMDUInt32 id = pSampleData[idx].m_counterValues->m_

→˓counterID;109 // Get the counter descriptor to print the counter name110 AMDTPwrCounterDesc counterDesc;111 AMDTPwrGetCounterDesc(id, &counterDesc);112

113 fprintf(stdout, "%s : %f ", counterDesc.m_name,114 pSampleData[idx].m_

→˓counterValues->m_data);115

116 pSampleData[idx].m_counterValues++;117 }118 } // iterate over the sampled counters119

120 fprintf(stdout, "\n");121 } // iterate over all the samples collected122

5.1. AMDPowerProfileAPI Library 59

Page 63: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

123 // check if we exceeded the profile duration124 if ((profilingDuration > 0)125 && (pSampleData->m_elapsedTimeMs >= (profilingDuration * 1000)))126 {127 stopProfiling = true;128 }129

130 if (stopProfiling)131 {132 // stop the profiling133 hResult = AMDTPwrStopProfiling();134 assert(AMDT_STATUS_OK == hResult);135 isProfiling = false;136 }137 }138 }139

140 // Close the profiler141 hResult = AMDTPwrProfileClose();142 assert(AMDT_STATUS_OK == hResult);143 }144

145 int main()146 {147 AMDTResult hResult = AMDT_STATUS_OK;148 CollectAllCounters();149 return hResult;150 }

5.1.5 How to use API Library

Example code must use AMDTPowerProfileAPI.dll to compile and run the example program. We must need tomake sure drivers are up and running.

To build and execute a example application, following steps should be performed on Linux machine.

1. Example CollectAllCounters is located at

<AMDuProf-install-dir>/Examples/CollectAllCounters

$ cd <AMDuProf-install-dir>/Examples/CollectAllCounters

2. Set LD_LIBRARY_PATH

$ export LD_LIBRARY_PATH=<AMDuProf-install-dir>/bin

3. Compile application code

$ g++ -O -std=c++11 CollectAllCounters.cpp

-I<AMDuProf-install-dir>/include

-l AMDPowerProfileAPI -L<AMDuProf-install-dir>/bin

-Wl,-rpath <AMDuProf-install-dir>/bin -o CollectAllCounters

4. Execute

$ ./CollectAllCounters

5.2 AMDProfileControl APIs

AMDProfilerControl APIs allow user to limit the profiling scope to a specific portion of the code within the targetapplication. Usually, when the profiling done, it captures the samples for the complete application, i.e. start of

60 Chapter 5. SDKs

Page 64: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

execution till end of the application execution. The control APIs can be used to enable the profiler only for aspecific part of application, e.g. a CPU intensive loop, a hot function, etc. The target application need to berecompiled after adding the control APIs within the application.

5.2.1 Required File Paths

All the paths mentioned below are with respect to uProf installation directory (UPROF-INSTALL-DIR).

Header File:

<UPROF-INSTALL-DIR>\include\AMDProfileController.h (on Windows platforms)

<UPROF-INSTALL-DIR>/include/AMDProfileController.h (on Linux platforms)

Static Library Files on Windows platforms:

<UPROF-INSTALL-DIR>\lib\x64\AMDProfileController.lib

<UPROF-INSTALL-DIR>\lib\x86\AMDProfileController.lib

Static Library File on Linux platforms:

<UPROF-INSTALL-DIR>/lib/x64/libAMDProfileController.a

5.2.2 The control APIs

To resume CPU profiling, call the below API.

bool amdProfileResume(AMD_PROFILE_CPU);

To Pause CPU profiling, call the below two API.

bool amdProfilePause(AMD_PROFILE_CPU);

CPU Profiler only profiles the code within each Resume-Pause APIs pair. After adding these APIs, compile yourtarget application and profile only the desired part of code.

5.2.3 Calling the APIs

Include the header file AMDProfileController.h, and call the resume and pause APIs within the code. The codeencapsulated within resume-pause API pair will be profiled by CPU Profiler. These APIs can be called multipletimes to profile different parts of the code. These API calls can be spread across multiple functions, i.e. resumecalled from one function and stop called from another function. These APIs can be spread across threads, i.e.resume called from one thread and stop called from another thread of the same target application.

In the below code snippet, the CPU Profiler is restricted to the execution of multiply_matrices() function.

1 #include <AMDProfileController.h>2

3 int main(int argc, char* argv[])4 {5 // Initialize the matrices6 initialize_matrices();7

8 // Resume the CPU profiler9 amdProfileResume(AMD_PROFILE_CPU);;

10

11 // Multiply the matrices12 multiply_matrices();13

14 // Stop the CPU Profiler15 amdProfilePause(AMD_PROFILE_CPU);16

5.2. AMDProfileControl APIs 61

Page 65: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

AMD uProf User Guide, Release v1.2

17 return 0;18 }

5.2.4 Compiling the target application

To compile the application on Microsoft Visual Studio, update the configuration properties to include the path ofheader file, LIB file of AMDProfileController.

To compile a C++ application on Linux using G++, use the following command:

$ g++ -std=c++11 <source-file.cpp> -I<UPROF-INSTALL-DIR>/include-L<UPROF-INSTALL-DIR>/lib/x64/ -lAMDProfileController -lrt -pthread

Note: Do not use -static option while compiling with G++.

5.2.5 Profiling with target application and control APIs

After the compiling the target application, create a project in uProf using it, set the desired CPU profile sessionoptions. While setting the CPU profile session options, in the ‘Profile Scheduling’ section, select ‘Are you usingProfile Instrumentation API?’.

Once all the settings done, start the CPU profiling. The CPU Profiler will be in pause state and target applicationexecution begins. When the resume API gets called from target application, CPU Profile starts profiling till stopAPI gets called from target application. As soon as stop API is called in target application, CPU Profiler stopsprofiling and waits for next control API call.

To profile from CLI, option --start-paused should be used to start the profiler in pause state.

Sample command on Windows platforms:

> AMDuProfCLI.exe collect --config tbp --start-paused -oC:\Temp\cpuprof-tbp <target-application.exe>

Sample command on Linux platforms:

$ ./AMDuProfCLI collect --config tbp --start-paused -o /tmp/cpuprof-tbp<target-application>

62 Chapter 5. SDKs

Page 66: AMD uProf User Guide · AMD uProf User Guide Release v1.2 ... 3.1.6 Importing Profile Databases ... (see the BIOS and Kernel Developer’s Guide

CHAPTER

SIX

LIMITATIONS

6.1 CPU Profiler

• CPU Profiler expects the profiled application executable binaries must not be compressed or obfuscated byany software protector tools, e.g. VMProtect.

• In case of Zeppelin B1 parts, only one PMC register is used at a time for the PMC events based profiling.

6.2 Power Profiler

• Only one profile session can be run at a time.

• Please make sure latest Radeon driver is installed before running power profiler. Newer version of dGPUmay go to sleep (low power) state frequently if there is no activity in dGPU. In that case, power profiler mayemit a warning AMDT_WARN_SMU_DISABLED. Counters may not be accessible in this state. Beforerunning the power profiler, it is advisable to bring the dGPU to active state.

• ICELAND dGPU(Topaz-XT, Topaz PRO, Topaz XTL, Topaz LE) series is not supported.

• Application Analysis is supported only on Windows OS.

• Minimum supported sampling period in CLI is 10ms. Whereas in GUI, for the platforms other than EPYC,the minimum supported sampling period is 100ms and for EPYC it is 500ms. It is recommended to uselarge sampling period to reduce the sampling and rendering overhead.

• If SMU is not accessible while profiling is in progress, behavior may be undefined.

• Live power profile analysis is memory intensive. Longer duration profiling may not work as intended.

63