the new - usenix · •g’day, i’m brendan • recipient of the lisa 2013 award for suddenly...
TRANSCRIPT
TheNew
Brendan Gregg(and many others)
Prentice Hall, 2013
Wednesday, November 20, 13
• G’Day, I’m Brendan
• Recipient of the LISA 2013 Award for Suddenly Giving a Plenary With 1 Hour Notice
• Recipient of the LISA 2013 Award for Outstanding Achievement in System Administration! (Thank you!)
• Work/Research: tools, methodologies, visualizations
• Author of Systems Performance, primary author of DTrace(Prentice Hall, 2011)
• Lead Performance Engineer @joyent
whoami
Wednesday, November 20, 13
• Analysis of apps to metal. Think LAMP not AMP.
• An activity for everyone: from casual to full time.
• The basis isthe system
• The target iseverything
• All softwarecan causeperformanceproblems
Systems Performance
Resource Controls
Applications
Block Device InterfaceVolume Managers
File SystemsVFS
System Libraries
Device Drivers
System Call Interface
Metal
Kern
el
EthernetIP
TCP/UDPSockets
Firmware
Operating System
Scheduler
VirtualMemory
Wednesday, November 20, 13
1990’s Systems Performance
* Proprietary Unix, closed source, static tools
* Limited metrics and documentation* Some perf issues could not be solved* Analysis methodology constrained by tools* Perf experts used inference and experimentation* Literature is still around
$ vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr cd cd s0 s5 in sy cs us sy id 0 0 0 8475356 565176 2 8 0 0 0 0 1 0 0 -0 13 378 101 142 0 0 99 1 0 0 7983772 119164 0 0 0 0 0 0 0 224 0 0 0 1175 5654 1196 1 15 84 0 0 0 8046208 181600 0 0 0 0 0 0 0 322 0 0 0 1473 6931 1360 1 7 92[...]
Wednesday, November 20, 13
2010’s Systems Performance
• Open source (the norm)
• Ultimate documentation• Dynamic tracing
• Observe everything• Visualizations
• Comprehend many metrics• Cloud computing
• Resource controls• Methodologies
• Where to begin, and steps to root cause
Wednesday, November 20, 13
1990’s Operating System Internals
• Studied in text books
kernel
syscallslibraries
applications
operating system
Wednesday, November 20, 13
2010’s Operating System Internals
kernel
syscallslibraries
applications
operating system
• Study the source, observe in production with dynamic tracing
Wednesday, November 20, 13
CPUInterconnect
1990’s Systems Observability
ApplicationsDBs, all server types, ...
Block Device Interface EthernetVolume Managers IP
File Systems TCP/UDPVFS Sockets
I/O BusI/O Bridge
System Libraries
Device Drivers
Scheduler
VirtualMemory
System Call Interface
Operating System Hardware
Sola
ris K
erne
l
apptrace
sotruss
vmstat
mpstat
prstatps
cpustatcputrack
snoop
netstat
lockstat
kstat
sarkstat
Various:
netstatkstat
MemoryBus
CPU1
DRAM
For example, Solaris 9 (which was state of the art!):
truss
Disk Disk Port Port
Expander Interconnect
Interface TransportsI/O Controller Network Controller
iostat
Wednesday, November 20, 13
2010’s Systems Observability
ApplicationsDBs, all server types, ...
Block Device Interface EthernetVolume Managers IP
File Systems TCP/UDPVFS Sockets
Disk Disk Port Port
Expander Interconnect
I/O Bus
Interface TransportsI/O Controller Network Controller
I/O Bridge
System Libraries
Device Drivers
Scheduler
VirtualMemory
System Call Interface CPUInterconnect
MemoryBus
CPU1
DRAM
Operating System Hardware
illum
os K
erne
l
truss
iostat
vmstat
prstatps
cpustatcputrack
snoop
netstat
lockstat
kstat
dtrace
plockstat
intrstat
sarkstat
Various:
nicstatkstat
mpstat
For example, SmartOS:
Wednesday, November 20, 13
• Example DTrace scripts from the DTraceToolkit, DTrace book, ...
Dynamic Tracing turned on the light
ApplicationsDBs, all server types, ...
Block Device Interface EthernetVolume Managers IP
File Systems TCP/UDPVFS Sockets
System Libraries
Device Drivers
Scheduler
VirtualMemory
System Call Interface
iosnoop, iotopdisklatency.d
satacmds.dsatalatency.d
scsicmds.dscsilatency.d
sdretry.d, sdqueue.d
ide*.d, mpt*.d
priclass.d, pridist.dcv_wakeup_slow.ddisplat.d, capslat.d
opensnoop, statsnooperrinfo, dtruss, rwtoprwsnoop, mmap.d, kill.dshellsnoop, zonecalls.dweblatency.d, fddistdnlcsnoop.d
zfsslower.dziowait.d
ziostacks.dspasync.d
metaslab_free.d
fswho.d, fssnoop.dsollife.d
solvfssnoop.d
hotuser, umutexmax.d, lib*.dnode*.d, erlang*.d, j*.d, js*.dphp*.d, pl*.d, py*.d, rb*.d, sh*.dmysql*.d, postgres*.d, redis*.d, riak*.d
Language Providers:
Databases:
soconnect.d, soaccept.d, soclose.d, socketio.d, so1stbyte.dsotop.d, soerror.d, ipstat.d, ipio.d, ipproto.d, ipfbtsnoop.dipdropper.d, tcpstat.d, tcpaccept.d, tcpconnect.d, tcpioshort.dtcpio.d, tcpbytes.d, tcpsize.d, tcpnmap.d, tcpconnlat.d, tcp1stbyte.dtcpfbtwatch.d, tcpsnoop.d, tcpconnreqmaxq.d, tcprefused.dtcpretranshosts.d, tcpretranssnoop.d, tcpsackretrans.d, tcpslowstart.dtcptimewait.d, udpstat.d, udpio.d, icmpstat.d, icmpsnoop.d
cifs*.d, iscsi*.dnfsv3*.d, nfsv4*.d
ssh*.d, httpd*.d
:Services
minfbypid.dpgpginbypid.d
macops.d, ixgbecheck.dngesnoop.d, ngelink.d
Wednesday, November 20, 13
Actual DTrace Example
• Given an interesting kernel function, eg, ZFS SPA sync:
• Trace and print timestamp with latency:
usr/src/uts/common/fs/zfs/spa.c:/* * Sync the specified transaction group. New blocks may be dirtied as * part of the process, so we iterate until it converges. */voidspa_sync(spa_t *spa, uint64_t txg){ dsl_pool_t *dp = spa->spa_dsl_pool;[...]
# dtrace -n 'fbt::spa_sync:entry { self->ts = timestamp; } fbt::spa_sync:return /self->ts/ { printf("%Y %d ms", walltimestamp, (timestamp - self->ts) / 1000000); self->ts = 0; }'dtrace: description 'fbt::spa_sync:entry ' matched 2 probesCPU ID FUNCTION:NAME 0 53625 spa_sync:return 2013 Jul 26 17:37:02 12 ms 0 53625 spa_sync:return 2013 Jul 26 17:37:08 726 ms 6 53625 spa_sync:return 2013 Jul 26 17:37:17 6913 ms 6 53625 spa_sync:return 2013 Jul 26 17:37:17 59 ms
Wednesday, November 20, 13
• Two DTrace ports in progress (dtrace4linux, Oracle), and SystemTap• perf has basic dynamic tracing (not programatic); eg:
Linux Dynamic Tracing Example
# perf probe --add='tcp_sendmsg'Add new event: probe:tcp_sendmsg (on tcp_sendmsg)[...]# perf record -e probe:tcp_sendmsg -aR -g sleep 5 [ perf record: Woken up 1 times to write data ][ perf record: Captured and wrote 0.091 MB perf.data (~3972 samples) ]# perf report --stdio[...]# Overhead Command Shared Object Symbol# ........ ....... ................. ...........# 100.00% sshd [kernel.kallsyms] [k] tcp_sendmsg | --- tcp_sendmsg sock_aio_write do_sync_write vfs_write sys_write system_call __GI___libc_write
active traced call stacks fromarbitrary kernel locations!
Wednesday, November 20, 13
1990’s Performance Visualizations
$ iostat -x 1 extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b sd0 0.0 0.1 5.2 3.9 0.0 0.0 69.8 0 0 sd5 0.0 0.0 0.0 0.0 0.0 0.0 1.1 0 0 sd12 0.0 0.2 0.2 1.1 0.0 0.0 3.1 0 0 sd12 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd13 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd14 0.0 0.0 0.0 0.0 0.0 0.0 1.9 0 0 sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd16 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 [...]
Text-based and line graphs
Wednesday, November 20, 13
2010’s Performance Visualizations
• Latency, utilization, and subsecond offset heat maps
• ... and Flame Graphs!
Wednesday, November 20, 13
Flame Graphs
• An interactive visualization of stack traces
• Consumes any profile data with stack traces and a value. Eg:
• CPU Sample Flame Graphs (aka "Flame Graphs")
• I/O Latency Flame Graphs
• ...
Profile Data.txt Flame Graph.svg
somePerl
Wednesday, November 20, 13
Flame Graphs: MySQL Example
All Stack Samples
Where CPU isreally consumed
The"showstatus"Stack
One StackSample
Same profile data
Wednesday, November 20, 13
Flame Graphs: Syscall Towers
Wednesday, November 20, 13
Flame Graphs: Both Kernel and User Stacks
kernelstack
userstack
useronly
Wednesday, November 20, 13
Memory Allocation Flame Graphs: malloc()
Wednesday, November 20, 13
I/O:Time Flame Graph: MySQL
Wednesday, November 20, 13
Off-CPU Time Flame Graph: MySQL Idle
buf_flush_page_cleaner_threaddict_stats_thread
fts_optimize_threadio_handler_thread
lock_wait_timeout_thread
mysqld_mainsrv_monitor_thread
srv_master_threadsrv_error_monitor_thread
pfs_spawn_thread
mysqld Threads
Wednesday, November 20, 13
Wakeup Latency Flame Graph: ssh
Wednesday, November 20, 13
Chain Graph
Off CPU Stacks:why I blocked
Wakeup Stackswhy I waited
Wakeup Thread 1
Wakeup Thread 2
...
I wokeup
I wokeup
Wednesday, November 20, 13
Cloud Computing
• Performance may be limited by cloud resource controls, rather than physical limits
• Hardware Virtualization complicates things – as a guest you can’t analyze down to metal directly
• Hopefully the cloud provider provides an API for accessing physical statistics, or does the analysis on your behalf
Wednesday, November 20, 13
1990’s Performance Methodologies
• Tools Method
• 1. For each vendor tool:
• 2. For each useful tool metric:
• 3. Look for problems
• Ad Hoc Checklist Method
• 10 recent issues: run A, if B, fix C
• Analysis constrained by tool metrics
Wednesday, November 20, 13
2010’s Performance Methodologies
• Documented in “Systems Performance”:
• Workload Characterization
• USE Method
• Thread State Analysis Method
• Drill-down Analysis
• Latency Analysis
• Event Tracing
• ...
• Also the topic of my LISA'12 talk
Wednesday, November 20, 13
The New Systems Performance
• Really understand how systems work
• New observability, visualizations, methodologies
• Older performance tools andapproaches still used as appropriate
• Great time to be working withsystems, both enterprise and thecloud
• Thank you!
Wednesday, November 20, 13