perf scripts jiri olsa - events.static.linuxfound.org3 perf scripts | jiri olsa perf stat counting...
TRANSCRIPT
PERF SCRIPTS | JIRI OLSA1
perf scripts
jiri olsa
PERF SCRIPTS | JIRI OLSA2
HI
● basics
● perf in python
● post process scripts
PERF SCRIPTS | JIRI OLSA3
● perf stat
COUNTING
WORKLOAD
WORKLOAD
WORKLOAD
WORKLOAD
WORKLOAD
CPU 0 CPU 1 CPU 2
$ perf stat e 'cycles,instructions' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
start
stop
PERF SCRIPTS | JIRI OLSA4
SAMPLING
WORKLOAD
WORKLOAD
WORKLOAD
WORKLOAD
WORKLOAD
CPU 0 CPU 1 CPU 2
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB perf.data
start
stop
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sampleperf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
● perf record
PERF SCRIPTS | JIRI OLSA5
PERF INTERFACE IN NUTSHELL
$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
PERF SCRIPTS | JIRI OLSA6
PERF INTERFACE IN NUTSHELL
$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
kerneluser
PERF SCRIPTS | JIRI OLSA7
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles SYS_PERF_EVENT_OPEN
PERF SCRIPTS | JIRI OLSA8
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENT
SYS_PERF_EVENT_OPEN
PERF SCRIPTS | JIRI OLSA9
PERF INTERFACE IN NUTSHELL
kerneluser
SYS_PERF_EVENT_OPEN
$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
PERF SCRIPTS | JIRI OLSA10
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
PERF SCRIPTS | JIRI OLSA11
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB
PERF SCRIPTS | JIRI OLSA12
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB
PERF SCRIPTS | JIRI OLSA13
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB
SYS_MMAP
PERF SCRIPTS | JIRI OLSA14
PERF INTERFACE IN NUTSHELL
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB
SYS_MMAP
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sample
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf.data
PERF SCRIPTS | JIRI OLSA15
● 2 areas of script support
● use perf in python scripts
● post process perf data via python/perl
PERF SCRIPTS
PERF SCRIPTS | JIRI OLSA16
● use perf in python scripts
● perf module
PYTHON SCRIPTS
PERF SCRIPTS | JIRI OLSA17
PYTHON SCRIPTS
kerneluser$ perf stat e 'cycles' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
$ perf record e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB
SYS_MMAP
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sample
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf.data
PERF SCRIPTS | JIRI OLSA18
PYTHON SCRIPTS
kerneluser
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
SYS_MMAP
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sample
PERF SCRIPTS | JIRI OLSA19
PYTHON SCRIPTS
kerneluser
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
SYS_MMAP
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sample
#!/usr/bin/python
import perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
evsel = perf.evsel(task = 1, wakeup_eve.. sample_id_
evsel.open(cpus = cpus, threads ..)
while True: evlist.poll(timeout = 1) for cpu in cpus: event = evlist.read_on_cpu(cpu) if not event: continue
print event
while True: print "nobody likes python anyway.."
PERF SCRIPTS | JIRI OLSA20
PYTHON SCRIPTS
kerneluser
EVENTTASK 1
CPU 0 CPU 1
CGROUP
SYS_READ
SYS_PERF_EVENT_OPEN
SYS_MMAP
IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT
sample
#!/usr/bin/python
import perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
evsel = perf.evsel(task = 1, wakeup_eve.. sample_id_
evsel.open(cpus = cpus, threads ..)
while True: evlist.poll(timeout = 1) for cpu in cpus: event = evlist.read_on_cpu(cpu) if not event: continue
print event
while True: print "every1 likes python anyway.."
perf.so
python extension# yum install pythonperf
PERF SCRIPTS | JIRI OLSA21
● only sampling interface atm
● quite simple one:
cpu_map
thread_map
evsel (open)
evlist (open, mmap, poll, add, read_on_cpu)
(mmap|task|comm|lost|read|sample|throttle)_event
PYTHON PERF MODULE
PERF SCRIPTS | JIRI OLSA22
PYTHON PERF MODULEimport perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
if __name__ == '__main__': main()
PERF SCRIPTS | JIRI OLSA23
PYTHON PERF MODULEimport perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
evsel = perf.evsel(task = 1, comm = 1, mmap = 0, wakeup_events = 1, watermark = 1, sample_id_all = 1, sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)
evsel.open(cpus = cpus, threads = threads)
if __name__ == '__main__': main()
PERF SCRIPTS | JIRI OLSA24
PYTHON PERF MODULEimport perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
evsel = perf.evsel(task = 1, comm = 1, mmap = 0, wakeup_events = 1, watermark = 1, sample_id_all = 1, sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)
evsel.open(cpus = cpus, threads = threads)
evlist = perf.evlist(cpus, threads) evlist.add(evsel) evlist.mmap()
if __name__ == '__main__': main()
PERF SCRIPTS | JIRI OLSA25
PYTHON PERF MODULEimport perf
def main(): cpus = perf.cpu_map() threads = perf.thread_map()
evsel = perf.evsel(task = 1, comm = 1, mmap = 0, wakeup_events = 1, watermark = 1, sample_id_all = 1, sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)
evsel.open(cpus = cpus, threads = threads)
evlist = perf.evlist(cpus, threads) evlist.add(evsel) evlist.mmap()
while True: evlist.poll(timeout = 1) for cpu in cpus: event = evlist.read_on_cpu(cpu) if not event: continue
print event
if __name__ == '__main__': main()
PERF SCRIPTS | JIRI OLSA26
PYTHON PERF MODULE
● needs some love
counting interface
stabilize
● volunteers welcome ;-)
$KERNEL/tools/perf/util/python.c
PERF SCRIPTS | JIRI OLSA27
● interface for processing perf data from:
perf record
perf stat
POST PROCESS SCRIPTING
PERF SCRIPTS | JIRI OLSA28
POST PROCESS SCRIPTING - SAMPLING
# Children Self Command Shared Object Symbol # ........ ........ ....... ................ ...............................# 51.40% 0.00% ls [kernel.vmlinux] [k] system_call 9.71% 0.00% ls [kernel.vmlinux] [k] __alloc_pages_nodemask 9.71% 9.71% ls [kernel.vmlinux] [k] clear_page | clear_page __alloc_pages_nodemask alloc_pages_vma handle_mm_fault __do_page_fault do_page_fault page_fault _int_malloc
8.73% 8.30% ls [kernel.vmlinux] [k] perf_event_context_sched_in | perf_event_context_sched_in | |95.07% __perf_event_task_sched_in | finish_task_switch | __schedule | _cond_resched | sys_write | system_call | __GI___libc_write | 0x2d646c6975622d66 | 4.93% perf_event_exec setup_new_exec
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf rep
ort
PERF SCRIPTS | JIRI OLSA29
POST PROCESS SCRIPTING - SAMPLING
# Children Self Command Shared Object Symbol # ........ ........ ....... ................ ...............................# 51.40% 0.00% ls [kernel.vmlinux] [k] system_call 9.71% 0.00% ls [kernel.vmlinux] [k] __alloc_pages_nodemask 9.71% 9.71% ls [kernel.vmlinux] [k] clear_page | clear_page __alloc_pages_nodemask alloc_pages_vma handle_mm_fault __do_page_fault do_page_fault page_fault _int_malloc
8.73% 8.30% ls [kernel.vmlinux] [k] perf_event_context_sched_in | perf_event_context_sched_in | |95.07% __perf_event_task_sched_in | finish_task_switch | __schedule | _cond_resched | sys_write | system_call | __GI___libc_write | 0x2d646c6975622d66 | 4.93% perf_event_exec setup_new_exec
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf rep
ort
PERF SCRIPTS | JIRI OLSA30
POST PROCESS SCRIPTING - SAMPLING
# Children Self Command Shared Object Symbol # ........ ........ ....... ................ ...............................# 51.40% 0.00% ls [kernel.vmlinux] [k] system_call 9.71% 0.00% ls [kernel.vmlinux] [k] __alloc_pages_nodemask 9.71% 9.71% ls [kernel.vmlinux] [k] clear_page | clear_page __alloc_pages_nodemask alloc_pages_vma handle_mm_fault __do_page_fault do_page_fault page_fault _int_malloc
8.73% 8.30% ls [kernel.vmlinux] [k] perf_event_context_sched_in | perf_event_context_sched_in | |95.07% __perf_event_task_sched_in | finish_task_switch | __schedule | _cond_resched | sys_write | system_call | __GI___libc_write | 0x2d646c6975622d66 | 4.93% perf_event_exec setup_new_exec
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf rep
ort
PERF SCRIPTS | JIRI OLSA31
● perf script wrapperperf script s <script>
● python/perl support
POST PROCESS SCRIPTING - SAMPLING
PERF SCRIPTS | JIRI OLSA32
POST PROCESS SCRIPTING - SAMPLING
#!/usr/bin/python
def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)
def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def trace_begin(): print "start"
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA33
POST PROCESS SCRIPTING - SAMPLING
#!/usr/bin/python
def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)
def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def trace_begin(): print "start"
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA34
POST PROCESS SCRIPTING - SAMPLING
#!/usr/bin/python
def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)
def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def trace_begin(): print "start"
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA35
POST PROCESS SCRIPTING - SAMPLING
#!/usr/bin/python
def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)
def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node): insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);
def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
def trace_begin(): print "start"
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA36
● set of callbacks:
trace_begin/trace_end
process_event (non tracepoint)
$(SUBSYSTEM)__$(EVENT) (tracepoint)
POST PROCESS SCRIPTING - INTERFACE
PERF SCRIPTS | JIRI OLSA37
● process_event (args)
args – dictionary with arguments
● $(SUBSYSTEM)__$(EVENT)(...)
event, ctxt, cpu, s, ns, tid, comm, callchain
+ tracepoint specific arguments
POST PROCESS SCRIPTING - INTERFACE
def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr, bytes_req, bytes_alloc, gfp_flags): pass
def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain, call_site, ptr): pass
#!/usr/bin/python
def process_event(d): for k,v in d.items(): print "%s = %s" % (k, v)
PERF SCRIPTS | JIRI OLSA38
● perf record ...
● perf script -g lang
● perf script -s <script.py>
● perf script -l
● perf script record|report <script>
● man perf-script ;-)
POST PROCESS SCRIPTING
PERF SCRIPTS | JIRI OLSA39
● native perf language
● post process perf stat output
POST PROCESS SCRIPTING - COUNTING
PERF SCRIPTS | JIRI OLSA40
● native perf language
● post process perf stat output
POST PROCESS SCRIPTING - COUNTING
$ perf stat e 'cycles,instructions' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
PERF SCRIPTS | JIRI OLSA41
● native perf language
● post process perf stat output
POST PROCESS SCRIPTING - COUNTING
$ perf stat e 'cycles,instructions' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
1.34357914 cpi cycles per instructions ratio
PERF SCRIPTS | JIRI OLSA42
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
PERF SCRIPTS | JIRI OLSA43
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
formula name
PERF SCRIPTS | JIRI OLSA44
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
formula name events
PERF SCRIPTS | JIRI OLSA45
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
formula name events
calculation formula
PERF SCRIPTS | JIRI OLSA46
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
formula name events
calculation formulavariable to print
PERF SCRIPTS | JIRI OLSA47
POST PROCESS SCRIPTING - COUNTING
cpi { events { CY = cycles:u IN = instructions:u }
cpi = CY / IN
print cpi}
formula name events
calculation formulavariable to print
$ perf stat f formula.conf e formulacpi a^C Performance counter stats for 'system wide':
739,225,320 cycles:u [100.00%] 620,227,854 instructions:u # 0.84 insns per cycle
1.674587325 seconds time elapsed
1.19186089 cpi
PERF SCRIPTS | JIRI OLSA48
POST PROCESS SCRIPTING - COUNTING
branch { events { IN = instructions:u BI = branchinstructions:u BM = branchmisses:u }
branchrate = BI / IN branchmissrate = BM / IN branchmissratio = BM / BI
print branchrate print branchmissrate print branchmissratio}
$ perf stat f formula.conf e formulabranch du sh / ^Cdu: Interrupt
Performance counter stats for 'du sh /':
39,285,799 instructions:u 8,865,310 branchinstructions:u 273,038 branchmisses:u # 3.08% of all branches
0.923258595 seconds time elapsed
0.22566195 branchrate 0.00695004 branchmissrate 0.03079847 branchmissratio
● branch example
● branch-rate/miss/ratio
PERF SCRIPTS | JIRI OLSA49
POST PROCESS SCRIPTING - COUNTING
branch { events { IN = instructions:u BI = branchinstructions:u BM = branchmisses:u }
branchrate = BI / IN branchmissrate = BM / IN branchmissratio = BM / BI
print branchrate print branchmissrate print branchmissratio}
$ perf stat f formula.conf e formulabranch du sh / ^Cdu: Interrupt
Performance counter stats for 'du sh /':
39,285,799 instructions:u 8,865,310 branchinstructions:u 273,038 branchmisses:u # 3.08% of all branches
0.923258595 seconds time elapsed
0.22566195 branchrate 0.00695004 branchmissrate 0.03079847 branchmissratio
● branch example
● branch-rate/miss/ratio
PERF SCRIPTS | JIRI OLSA50
POST PROCESS SCRIPTING - COUNTING
● needs more testing/users
● not upstream yethttps://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/ perf/formula
PERF SCRIPTS | JIRI OLSA51
POST PROCESS SCRIPTING - COUNTING
● store stat data into perf.data
$ perf stat e 'cycles,instructions' WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
PERF SCRIPTS | JIRI OLSA52
POST PROCESS SCRIPTING - COUNTING
● store stat data into perf.data
$ perf stat e 'cycles,instructions' record WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
perf.data
STAT EVENT
STAT EVENT
PERF SCRIPTS | JIRI OLSA53
POST PROCESS SCRIPTING - COUNTING
● store stat data into perf.data
$ perf stat e 'cycles,instructions' record WORKLOAD
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
perf.data
STAT EVENT
STAT EVENT
$ perf stat report
Performance counter stats for 'find ..':
104,142,555 cycles 64,785,445 instructions
PERF SCRIPTS | JIRI OLSA54
POST PROCESS SCRIPTING - COUNTING
def trace_begin(): print "in trace_begin"
def trace_end(): print "in trace_end"
def stat__cycles(cpu, sec, nsec, val, ena, run): print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA55
POST PROCESS SCRIPTING - COUNTING
def trace_begin(): print "in trace_begin"
def trace_end(): print "in trace_end"
def stat__cycles(cpu, sec, nsec, val, ena, run): print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA56
POST PROCESS SCRIPTING - COUNTING
def trace_begin(): print "in trace_begin"
def trace_end(): print "in trace_end"
def stat__cycles(cpu, sec, nsec, val, ena, run): print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA57
POST PROCESS SCRIPTING - COUNTING
def trace_begin(): print "in trace_begin"
def trace_end(): print "in trace_end"
def stat__cycles(cpu, sec, nsec, val, ena, run): print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)
perf.data
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
SAMPLE
perf sc rip
t
PERF SCRIPTS | JIRI OLSA58
POST PROCESS SCRIPTING - COUNTING
● not upstream, work in progress