perf scripts jiri olsa - events.static.linuxfound.org3 perf scripts | jiri olsa perf stat counting...

PERF SCRIPTS | JIRI OLSA1

perf scripts

jiri olsa


HI

● basics

● perf in python

● post process scripts


● perf stat

COUNTING

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

CPU 0 CPU 1 CPU 2

$ perf stat e 'cycles,instructions' WORKLOAD

Performance counter stats for 'find ..':

104,142,555 cycles 64,785,445 instructions

start

stop


SAMPLING

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

CPU 0 CPU 1 CPU 2

$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB perf.data

start

stop

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sampleperf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

● perf record


PERF INTERFACE IN NUTSHELL

$ perf stat e 'cycles' WORKLOAD


104,142,555 cycles





104,142,555 cycles

kerneluser



kerneluser$ perf stat e 'cycles' WORKLOAD


104,142,555 cycles SYS_PERF_EVENT_OPEN





104,142,555 cycles

EVENT

SYS_PERF_EVENT_OPEN



kerneluser

SYS_PERF_EVENT_OPEN



104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP





104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN





104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB





104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN






104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN


SYS_MMAP





104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN


SYS_MMAP


sample

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf.data


● 2 areas of script support

● use perf in python scripts

● post process perf data via python/perl

PERF SCRIPTS


● use perf in python scripts

● perf module

PYTHON SCRIPTS


PYTHON SCRIPTS



104,142,555 cycles

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN


SYS_MMAP


sample

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf.data


PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP


sample


PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP


sample

#!/usr/bin/python

import perf

def main(): cpus = perf.cpu_map() threads = perf.thread_map()

evsel = perf.evsel(task = 1, wakeup_eve.. sample_id_

evsel.open(cpus = cpus, threads ..)

while True: evlist.poll(timeout = 1) for cpu in cpus: event = evlist.read_on_cpu(cpu) if not event: continue

print event

while True: print "nobody likes python anyway.."


PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP


sample

#!/usr/bin/python

import perf


evsel = perf.evsel(task = 1, wakeup_eve.. sample_id_

evsel.open(cpus = cpus, threads ..)


print event

while True: print "every1 likes python anyway.."

perf.so

python extension# yum install pythonperf


PYTHON PERF MODULEimport perf


if __name__ == '__main__': main()




evsel = perf.evsel(task = 1, comm = 1, mmap = 0, wakeup_events = 1, watermark = 1, sample_id_all = 1, sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)

evsel.open(cpus = cpus, threads = threads)







evlist = perf.evlist(cpus, threads) evlist.add(evsel) evlist.mmap()







evlist = perf.evlist(cpus, threads) evlist.add(evsel) evlist.mmap()


print event



PYTHON PERF MODULE

● needs some love

counting interface

stabilize

● volunteers welcome ;-)

$KERNEL/tools/perf/util/python.c


● interface for processing perf data from:

perf record

perf stat

POST PROCESS SCRIPTING


POST PROCESS SCRIPTING - SAMPLING

# Children Self Command Shared Object Symbol # ........ ........ ....... ................ ...............................# 51.40% 0.00% ls [kernel.vmlinux] [k] system_call 9.71% 0.00% ls [kernel.vmlinux] [k] __alloc_pages_nodemask 9.71% 9.71% ls [kernel.vmlinux] [k] clear_page | clear_page __alloc_pages_nodemask alloc_pages_vma handle_mm_fault __do_page_fault do_page_fault page_fault _int_malloc

8.73% 8.30% ls [kernel.vmlinux] [k] perf_event_context_sched_in | perf_event_context_sched_in | |95.07% __perf_event_task_sched_in | finish_task_switch | __schedule | _cond_resched | sys_write | system_call | __GI___libc_write | 0x2d646c6975622d66 | 4.93% perf_event_exec setup_new_exec

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf rep

ort





perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf rep

ort


● perf script wrapperperf script s <script>

● python/perl support




#!/usr/bin/python

def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass

def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass

def trace_begin(): print "start"

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t



#!/usr/bin/python








perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t


● set of callbacks:

trace_begin/trace_end

process_event (non tracepoint)

$(SUBSYSTEM)__$(EVENT) (tracepoint)

POST PROCESS SCRIPTING - INTERFACE


● process_event (args)

args – dictionary with arguments

● $(SUBSYSTEM)__$(EVENT)(...)

event, ctxt, cpu, s, ns, tid, comm, callchain

+ tracepoint specific arguments

POST PROCESS SCRIPTING - INTERFACE

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): pass


#!/usr/bin/python



● perf record ...

● perf script -g lang

● perf script -s <script.py>

● perf script -l

● perf script record|report <script>

● man perf-script ;-)

POST PROCESS SCRIPTING


● native perf language

● post process perf stat output

POST PROCESS SCRIPTING - COUNTING








1.34357914 cpi cycles per instructions ratio



cpi { events { CY = cycles:u IN = instructions:u }

cpi = CY / IN

print cpi}




cpi = CY / IN

print cpi}

formula name




cpi = CY / IN

print cpi}

formula name events




cpi = CY / IN

print cpi}

formula name events

calculation formula




cpi = CY / IN

print cpi}

formula name events

calculation formulavariable to print




cpi = CY / IN

print cpi}

formula name events

calculation formulavariable to print

$ perf stat f formula.conf e formulacpi a^C Performance counter stats for 'system wide':

739,225,320 cycles:u [100.00%] 620,227,854 instructions:u # 0.84 insns per cycle

1.674587325 seconds time elapsed

1.19186089 cpi



branch { events { IN = instructions:u BI = branchinstructions:u BM = branchmisses:u }

branchrate = BI / IN branchmissrate = BM / IN branchmissratio = BM / BI

print branchrate print branchmissrate print branchmissratio}

$ perf stat f formula.conf e formulabranch du sh / ^Cdu: Interrupt

Performance counter stats for 'du sh /':

39,285,799 instructions:u 8,865,310 branchinstructions:u 273,038 branchmisses:u # 3.08% of all branches


0.22566195 branchrate 0.00695004 branchmissrate 0.03079847 branchmissratio

● branch example

● branch-rate/miss/ratio



● needs more testing/users

● not upstream yethttps://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/ perf/formula

https://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/



● store stat data into perf.data







$ perf stat e 'cycles,instructions' record WORKLOAD



perf.data

STAT EVENT

STAT EVENT




$ perf stat e 'cycles,instructions' record WORKLOAD



perf.data

STAT EVENT

STAT EVENT

$ perf stat report





def trace_begin(): print "in trace_begin"

def trace_end(): print "in trace_end"

def stat__cycles(cpu, sec, nsec, val, ena, run): print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t






perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t



● not upstream, work in progress


THANKS, QUESTIONS?

Jiri Olsa <[email protected]>

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Developer_Guide/index.html#perf

perf scripts jiri olsa - events.static.linuxfound.org3 perf scripts | jiri olsa perf stat counting...

Documents