perf scripts jiri olsa - events.static.linuxfound.org3 perf scripts | jiri olsa perf stat counting...

59
PERF SCRIPTS | JIRI OLSA 1 perf scripts jiri olsa

Upload: others

Post on 02-Oct-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA1

perf scripts

jiri olsa

Page 2: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA2

HI

● basics

● perf in python

● post process scripts

Page 3: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA3

● perf stat

COUNTING

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

CPU 0 CPU 1 CPU 2

$ perf stat ­e 'cycles,instructions' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

start

stop

Page 4: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA4

SAMPLING

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

WORKLOAD

CPU 0 CPU 1 CPU 2

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB perf.data

start

stop

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sampleperf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

● perf record

Page 5: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA5

PERF INTERFACE IN NUTSHELL

$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

Page 6: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA6

PERF INTERFACE IN NUTSHELL

$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

kerneluser

Page 7: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA7

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   SYS_PERF_EVENT_OPEN

Page 8: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA8

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENT

SYS_PERF_EVENT_OPEN

Page 9: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA9

PERF INTERFACE IN NUTSHELL

kerneluser

SYS_PERF_EVENT_OPEN

$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

Page 10: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA10

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

Page 11: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA11

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB

Page 12: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA12

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB

Page 13: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA13

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB

SYS_MMAP

Page 14: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA14

PERF INTERFACE IN NUTSHELL

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB

SYS_MMAP

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sample

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf.data

Page 15: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA15

● 2 areas of script support

● use perf in python scripts

● post process perf data via python/perl

PERF SCRIPTS

Page 16: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA16

● use perf in python scripts

● perf module

PYTHON SCRIPTS

Page 17: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA17

PYTHON SCRIPTS

kerneluser$ perf stat ­e 'cycles' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                   

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

$ perf record ­e 'cycles' WORKLOAD[ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.048 MB

SYS_MMAP

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sample

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf.data

Page 18: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA18

PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sample

Page 19: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA19

PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sample

#!/usr/bin/python

import perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

    evsel  = perf.evsel(task = 1,                         wakeup_eve..                        sample_id_

    evsel.open(cpus = cpus, threads ..)

    while True:        evlist.poll(timeout = ­1)        for cpu in cpus:            event = evlist.read_on_cpu(cpu)            if not event:                continue

            print event

    while True:        print "nobody likes python anyway.."

Page 20: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA20

PYTHON SCRIPTS

kerneluser

EVENTTASK 1

CPU 0 CPU 1

CGROUP

SYS_READ

SYS_PERF_EVENT_OPEN

SYS_MMAP

IDPIDCPUADDRESSCALLCHAINBRANCHESMEMORYTRACEPOINT

sample

#!/usr/bin/python

import perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

    evsel  = perf.evsel(task = 1,                         wakeup_eve..                        sample_id_

    evsel.open(cpus = cpus, threads ..)

    while True:        evlist.poll(timeout = ­1)        for cpu in cpus:            event = evlist.read_on_cpu(cpu)            if not event:                continue

            print event

    while True:        print "every1 likes python anyway.."

perf.so

python extension# yum install python­perf

Page 21: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA21

● only sampling interface atm

● quite simple one:

cpu_map

thread_map

evsel (open)

evlist (open, mmap, poll, add, read_on_cpu)

(mmap|task|comm|lost|read|sample|throttle)_event

PYTHON PERF MODULE

Page 22: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA22

PYTHON PERF MODULEimport perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

if __name__ == '__main__':    main()

Page 23: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA23

PYTHON PERF MODULEimport perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

    evsel  = perf.evsel(task = 1, comm = 1, mmap = 0,                        wakeup_events = 1, watermark = 1,                        sample_id_all = 1,                        sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)

    evsel.open(cpus = cpus, threads = threads)

if __name__ == '__main__':    main()

Page 24: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA24

PYTHON PERF MODULEimport perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

    evsel  = perf.evsel(task = 1, comm = 1, mmap = 0,                        wakeup_events = 1, watermark = 1,                        sample_id_all = 1,                        sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)

    evsel.open(cpus = cpus, threads = threads)

    evlist = perf.evlist(cpus, threads)    evlist.add(evsel)    evlist.mmap()

if __name__ == '__main__':    main()

Page 25: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA25

PYTHON PERF MODULEimport perf

def main():    cpus    = perf.cpu_map()    threads = perf.thread_map()

    evsel  = perf.evsel(task = 1, comm = 1, mmap = 0,                        wakeup_events = 1, watermark = 1,                        sample_id_all = 1,                        sample_type = perf.SAMPLE_PERIOD | perf.SAMPLE_TID | perf.SAMPLE_CPU)

    evsel.open(cpus = cpus, threads = threads)

    evlist = perf.evlist(cpus, threads)    evlist.add(evsel)    evlist.mmap()

    while True:        evlist.poll(timeout = ­1)        for cpu in cpus:            event = evlist.read_on_cpu(cpu)            if not event:                continue

            print event

if __name__ == '__main__':    main()

Page 26: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA26

PYTHON PERF MODULE

● needs some love

counting interface

stabilize

● volunteers welcome ;-)

$KERNEL/tools/perf/util/python.c

Page 27: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA27

● interface for processing perf data from:

perf record

perf stat

POST PROCESS SCRIPTING

Page 28: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA28

POST PROCESS SCRIPTING - SAMPLING

# Children      Self  Command  Shared Object     Symbol                         # ........  ........  .......  ................  ...............................#    51.40%     0.00%  ls       [kernel.vmlinux]  [k] system_call                     9.71%     0.00%  ls       [kernel.vmlinux]  [k] __alloc_pages_nodemask          9.71%     9.71%  ls       [kernel.vmlinux]  [k] clear_page                                  |                 ­­­clear_page                    __alloc_pages_nodemask                    alloc_pages_vma                    handle_mm_fault                    __do_page_fault                    do_page_fault                    page_fault                    _int_malloc

     8.73%     8.30%  ls       [kernel.vmlinux]  [k] perf_event_context_sched_in                 |                 ­­­perf_event_context_sched_in                    |                              |­­95.07%­­ __perf_event_task_sched_in                    |          finish_task_switch                    |          __schedule                    |          _cond_resched                    |          sys_write                    |          system_call                    |          __GI___libc_write                    |          0x2d646c6975622d66                    |                               ­­4.93%­­ perf_event_exec                               setup_new_exec

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf rep

ort

Page 29: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA29

POST PROCESS SCRIPTING - SAMPLING

# Children      Self  Command  Shared Object     Symbol                         # ........  ........  .......  ................  ...............................#    51.40%     0.00%  ls       [kernel.vmlinux]  [k] system_call                     9.71%     0.00%  ls       [kernel.vmlinux]  [k] __alloc_pages_nodemask          9.71%     9.71%  ls       [kernel.vmlinux]  [k] clear_page                                  |                 ­­­clear_page                    __alloc_pages_nodemask                    alloc_pages_vma                    handle_mm_fault                    __do_page_fault                    do_page_fault                    page_fault                    _int_malloc

     8.73%     8.30%  ls       [kernel.vmlinux]  [k] perf_event_context_sched_in                 |                 ­­­perf_event_context_sched_in                    |                              |­­95.07%­­ __perf_event_task_sched_in                    |          finish_task_switch                    |          __schedule                    |          _cond_resched                    |          sys_write                    |          system_call                    |          __GI___libc_write                    |          0x2d646c6975622d66                    |                               ­­4.93%­­ perf_event_exec                               setup_new_exec

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf rep

ort

Page 30: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA30

POST PROCESS SCRIPTING - SAMPLING

# Children      Self  Command  Shared Object     Symbol                         # ........  ........  .......  ................  ...............................#    51.40%     0.00%  ls       [kernel.vmlinux]  [k] system_call                     9.71%     0.00%  ls       [kernel.vmlinux]  [k] __alloc_pages_nodemask          9.71%     9.71%  ls       [kernel.vmlinux]  [k] clear_page                                  |                 ­­­clear_page                    __alloc_pages_nodemask                    alloc_pages_vma                    handle_mm_fault                    __do_page_fault                    do_page_fault                    page_fault                    _int_malloc

     8.73%     8.30%  ls       [kernel.vmlinux]  [k] perf_event_context_sched_in                 |                 ­­­perf_event_context_sched_in                    |                              |­­95.07%­­ __perf_event_task_sched_in                    |          finish_task_switch                    |          __schedule                    |          _cond_resched                    |          sys_write                    |          system_call                    |          __GI___libc_write                    |          0x2d646c6975622d66                    |                               ­­4.93%­­ perf_event_exec                               setup_new_exec

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf rep

ort

Page 31: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA31

● perf script wrapperperf script ­s <script>

● python/perl support

POST PROCESS SCRIPTING - SAMPLING

Page 32: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA32

POST PROCESS SCRIPTING - SAMPLING

#!/usr/bin/python

def process_event(d):    for k,v in d.items():        print "%s = %s" % (k, v)

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                  call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                           call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain,                       call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain,                call_site, ptr):    pass

def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain,                          call_site, ptr):    pass

def trace_begin():    print "start"

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 33: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA33

POST PROCESS SCRIPTING - SAMPLING

#!/usr/bin/python

def process_event(d):    for k,v in d.items():        print "%s = %s" % (k, v)

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                  call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                           call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain,                       call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain,                call_site, ptr):    pass

def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain,                          call_site, ptr):    pass

def trace_begin():    print "start"

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 34: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA34

POST PROCESS SCRIPTING - SAMPLING

#!/usr/bin/python

def process_event(d):    for k,v in d.items():        print "%s = %s" % (k, v)

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                  call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                           call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain,                       call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain,                call_site, ptr):    pass

def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain,                          call_site, ptr):    pass

def trace_begin():    print "start"

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 35: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA35

POST PROCESS SCRIPTING - SAMPLING

#!/usr/bin/python

def process_event(d):    for k,v in d.items():        print "%s = %s" % (k, v)

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                  call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmem_cache_alloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                           call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kmalloc_node(event, ctxt, cpu, s, ns, tid, comm, callchain,                       call_site, ptr, bytes_req, bytes_alloc, gfp_flags, node):    insert_stat(call_site, ptr, bytes_req, bytes_alloc, cpu);

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain,                call_site, ptr):    pass

def kmem__kmem_cache_free(event, ctxt, cpu, s, ns, tid, comm, callchain,                          call_site, ptr):    pass

def trace_begin():    print "start"

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 36: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA36

● set of callbacks:

trace_begin/trace_end

process_event (non tracepoint)

$(SUBSYSTEM)__$(EVENT) (tracepoint)

POST PROCESS SCRIPTING - INTERFACE

Page 37: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA37

● process_event (args)

args – dictionary with arguments

● $(SUBSYSTEM)__$(EVENT)(...)

event, ctxt, cpu, s, ns, tid, comm, callchain

+ tracepoint specific arguments

POST PROCESS SCRIPTING - INTERFACE

def kmem__kmalloc(event, ctxt, cpu, s, ns, tid, comm, callchain,                  call_site, ptr, bytes_req, bytes_alloc, gfp_flags):    pass

def kmem__kfree(event, ctxt, cpu, s, ns, tid, comm, callchain,                call_site, ptr):    pass

#!/usr/bin/python

def process_event(d):    for k,v in d.items():        print "%s = %s" % (k, v)

Page 38: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA38

● perf record ...

● perf script -g lang

● perf script -s <script.py>

● perf script -l

● perf script record|report <script>

● man perf-script ;-)

POST PROCESS SCRIPTING

Page 39: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA39

● native perf language

● post process perf stat output

POST PROCESS SCRIPTING - COUNTING

Page 40: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA40

● native perf language

● post process perf stat output

POST PROCESS SCRIPTING - COUNTING

$ perf stat ­e 'cycles,instructions' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

Page 41: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA41

● native perf language

● post process perf stat output

POST PROCESS SCRIPTING - COUNTING

$ perf stat ­e 'cycles,instructions' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

        1.34357914 cpi                      cycles per instructions ratio

Page 42: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA42

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

Page 43: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA43

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

formula name

Page 44: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA44

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

formula name events

Page 45: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA45

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

formula name events

calculation formula

Page 46: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA46

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

formula name events

calculation formulavariable to print

Page 47: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA47

POST PROCESS SCRIPTING - COUNTING

cpi {        events {                CY = cycles:u                IN = instructions:u        }

        cpi = CY / IN

        print cpi}

formula name events

calculation formulavariable to print

$ perf stat ­f formula.conf ­e formula­cpi ­a^C Performance counter stats for 'system wide':

       739,225,320       cycles:u                 [100.00%]       620,227,854       instructions:u           #    0.84  insns per cycle        

       1.674587325 seconds time elapsed

        1.19186089 cpi             

Page 48: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA48

POST PROCESS SCRIPTING - COUNTING

branch {        events {                IN = instructions:u                BI = branch­instructions:u                BM = branch­misses:u        }

        branch­rate       = BI / IN        branch­miss­rate  = BM / IN        branch­miss­ratio = BM / BI

        print branch­rate        print branch­miss­rate        print branch­miss­ratio}

$ perf stat ­f formula.conf ­e formula­branch du ­sh / ^Cdu: Interrupt

 Performance counter stats for 'du ­sh /':

        39,285,799       instructions:u                   8,865,310       branch­instructions:u                                                 273,038       branch­misses:u          #    3.08% of all branches        

       0.923258595 seconds time elapsed

        0.22566195 branch­rate                      0.00695004 branch­miss­rate                 0.03079847 branch­miss­ratio        

● branch example

● branch-rate/miss/ratio

Page 49: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA49

POST PROCESS SCRIPTING - COUNTING

branch {        events {                IN = instructions:u                BI = branch­instructions:u                BM = branch­misses:u        }

        branch­rate       = BI / IN        branch­miss­rate  = BM / IN        branch­miss­ratio = BM / BI

        print branch­rate        print branch­miss­rate        print branch­miss­ratio}

$ perf stat ­f formula.conf ­e formula­branch du ­sh / ^Cdu: Interrupt

 Performance counter stats for 'du ­sh /':

        39,285,799       instructions:u                   8,865,310       branch­instructions:u                                                 273,038       branch­misses:u          #    3.08% of all branches        

       0.923258595 seconds time elapsed

        0.22566195 branch­rate                      0.00695004 branch­miss­rate                 0.03079847 branch­miss­ratio        

● branch example

● branch-rate/miss/ratio

Page 50: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA50

POST PROCESS SCRIPTING - COUNTING

● needs more testing/users

● not upstream yethttps://git.kernel.org/cgit/linux/kernel/git/jolsa/perf.git/ perf/formula

Page 51: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA51

POST PROCESS SCRIPTING - COUNTING

● store stat data into perf.data

$ perf stat ­e 'cycles,instructions' WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

Page 52: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA52

POST PROCESS SCRIPTING - COUNTING

● store stat data into perf.data

$ perf stat ­e 'cycles,instructions' record WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

perf.data

STAT EVENT

STAT EVENT

Page 53: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA53

POST PROCESS SCRIPTING - COUNTING

● store stat data into perf.data

$ perf stat ­e 'cycles,instructions' record WORKLOAD

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

perf.data

STAT EVENT

STAT EVENT

$ perf stat report

 Performance counter stats for 'find ..':

       104,142,555      cycles                           64,785,445      instructions

Page 54: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA54

POST PROCESS SCRIPTING - COUNTING

def trace_begin():    print "in trace_begin"

def trace_end():    print "in trace_end"

def stat__cycles(cpu, sec, nsec, val, ena, run):    print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 55: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA55

POST PROCESS SCRIPTING - COUNTING

def trace_begin():    print "in trace_begin"

def trace_end():    print "in trace_end"

def stat__cycles(cpu, sec, nsec, val, ena, run):    print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 56: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA56

POST PROCESS SCRIPTING - COUNTING

def trace_begin():    print "in trace_begin"

def trace_end():    print "in trace_end"

def stat__cycles(cpu, sec, nsec, val, ena, run):    print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 57: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA57

POST PROCESS SCRIPTING - COUNTING

def trace_begin():    print "in trace_begin"

def trace_end():    print "in trace_end"

def stat__cycles(cpu, sec, nsec, val, ena, run):    print "%6d.%09d CPU%d %d cycles" % (sec, nsec, cpu, val)

perf.data

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

SAMPLE

perf sc rip

t

Page 58: perf scripts jiri olsa - events.static.linuxfound.org3 PERF SCRIPTS | JIRI OLSA perf stat COUNTING WORKLOAD WORKLOAD WORKLOAD WORKLOAD WORKLOAD CPU 0 CPU 1 CPU 2 $ perf stat e 'cycles,instructions

PERF SCRIPTS | JIRI OLSA58

POST PROCESS SCRIPTING - COUNTING

● not upstream, work in progress