performance instrumentation beyond what you do now

Download Performance  Instrumentation  Beyond  What  You  Do  Now

Post on 16-Jun-2015

714 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  • 1. Performance Instrumentation beyond what you do nowCary Millsap cary.millsap@method-r.comPercona Performance Conference Santa Clara, California 9:00a9:55a Thursday 23 April 20091

2. Introductions 2 3. Cary Millsap carymillsap.blogspot.com cary_millsap 3 4. 19861989 1999 20084 5. 19861989 SoftwareDeveloper 1999 andPerformance Analyst 20084 6. 5 7. Method R Corporation http://method-r.com6 8. What we do at Method R Corporation Write code for you Troubleshoot performance problems Teach you how to do what we do Write software tools that make your work easier 7 9. Thinking clearly about performance8 10. Performance is HARD 9 11. Our users say that everything is slow, but I dont know where to begin.10 12. Our users are complaining, but all our dials are green.11 13. A story.12 14. In the beginning...(1989: Oracle 6.0.26)13 15. Tuning was 14 16. bstat.sql ...estat.sql report.txt 15 17. 16 18. V$PARAMETER sarV$DB_OBJECT_CACHE ps iostatV$OPEN_CURSOR V$SESSTAT netstat V$FIXED_VIEW_DEFINITIONV$LATCHnfsstat V$TRANSACTIONV$PROCESS V$FILESTAT V$LOCK vmstat V$SQLV$SESSION V$SYSSTAT V$SQLTEXT V$SESS_IOV$LIBRARYCACHE V$ROLLSTATV$ROWCACHEV$WAITSTATpstatV$TIMER16 19. People looked for bad numbers.17 20. Ineficiencies.18 21. But how can you know what causes a specic task to beslow?19 22. 20 23. 21 24. It's latches 21 25. It's I/O It's latches21 26. It's I/O It'sIt's latchesalways I/O21 27. It's It'sbad SQL I/O It'sIt's latchesalways I/O21 28. It's It'sIt'sbad SQL always I/Obad SQL It'sIt's latchesalways I/O21 29. It's It'sIt'sbad SQL always I/Obad SQL It'sIt's latchesThere's always I/not Oenoughmemory21 30. It'sIt'sIt's bad SQLalwaysI/O bad SQL It'sIt's latchesThere's always I/ There'snot Oneverenoughenoughmemorymemory21 31. My problem 22 32. How can you possiblyknowthat? 23 33. Reminded me of 24 34. 25 vailroger.googlepages.com/orionconstellation 35. You do see it...Right?26 36. 27 vailroger.googlepages.com/orionconstellation 37. 27 vailroger.googlepages.com/orionconstellation 38. But who says that is what you have to see? 28 39. 29 40. 29 41. Why not?30 42. Performance is hard.31 43. A good pilot makes it look easy. Van R. Millsap1936200432 44. Performance is EASY 33 45. How?34 46. Its the usersexperience that matters. 35 47. 36 48. A users performance experienceconsists of two elements 37 49. 1. a task 2.time38 50. Task39 51. The things we used to computerize tasks. http://olathe.lib.ks.us/images/Image/Computer%20User.jpg40 52. A task is a business unit of work. Post to the General Ledger Enter an order Look up a book by author41 53. Tasks can nest. PostingPO AP AR FA42 54. Tasks can nest. Print Addresses is a task PostingPO AP AR FA42 55. Tasks can nest. Print Addresses is a task Print Address #42 is a (sub)taskPostingPO AP AR FA42 56. Tasks can nest. Print Addresses is a task Print Address #42 is a (sub)taskPostingPO AP AR FA42 57. Tasks can nest. Print Addresses is a task Print Address #42 is a (sub)task Often, a program is a taskPosting PO AP AR FA 42 58. Tasks can nest. Print Addresses is a task Print Address #42 is a (sub)task Often, a program is a task Often, a tiny part of aPostingprogram is a taskPO AP AR FA 42 59. it. Tasks areBusiness people dont careabout the system exceptthrough execution of the tasks that make up their business. 43 60. it. Tasks are Tasks are whatsystem owners careabout.44 61. Time45 62. time. Performance is about46 63. How fast: Daddy, can your car go 500 miles? He meant 500 miles per hour. To talk about performance (speed), you have to talk about time. 47 64. Two ways to measure performance 48 65. 49 66. tasks per time49 67. tasks per time (thats throughput) 49 68. tasks per time (thats throughput) 49 69. tasks per time (thats throughput) time per task49 70. tasks per time(thats throughput) time per task (thats response time)49 71. Throughput and response time 50 72. Throughput and response time Throughput (X) The tasks-per-time way Number of task executions completed in a given duration orders/second50 73. Throughput and response time Throughput (X) The tasks-per-time way Number of task executions completed in a given duration orders/second50 74. Throughput and response time Throughput (X) The tasks-per-time way Number of task executions completed in a given duration orders/second Response time (R) The time-per-task way Elapsed duration of an execution of a given task seconds/order 50 75. 51 76. X = 1/R 51 77. X = 1/R 51 78. X = 1/R (kind of)51 79. Average throughput is the inverse of average response time. 52 80. Average throughput is the inverse of average response time.X = 1,000 txn/sec? 52 81. Average throughput is the inverse of average response time.X = 1,000 txn/sec? Then R = (1 sec)/(1,000 txn) = .001 sec/txnBut52 82. 53 83. Adding load to create higher throughput changes response time. 53 84. Which leads to a whole nother conversation Id loveto have with you some other time. 54 85. Sequence Diagram55 86. A simple way to view response time is with a UML sequence diagram.RA http://www.websequencediagrams.com 56 87. More complicated systems have nested levels of suppliers and consumers. RA RB http://www.websequencediagrams.com 57 88. The tiers represent the way your system is constructed.RUser http://www.websequencediagrams.com 58 89. This sequence diagram shows the complicated interactions among consumers and suppliers.RUser http://www.websequencediagrams.com59 90. The sequence diagram is aconceptual good tool. 60 91. But when you need to analyze thousands of calls, you need something else.61 92. Prole62 93. A prole is a complete account of a tasks response time.Response time # Calls R/call Call name (seconds)(seconds)0.769 50.3% 5,003 0.000154 unaccounted-for betweendbcalls0.393 25.7% 5,010 0.000078 SQL*Net message from client0.38124.9% 5,013 0.000076 CPU service, execute calls0.090 5.9%11 0.008194 CPU service, prepare calls0.027 1.8% 1 0.027396 log le sync0.008 0.5% 5,010 0.000002 SQL*Net message to client0.000 0.0% 9 0.000000 CPU service, fetch calls0.138 9.1% 5,031 0.000028 unaccounted-for within dbcalls1.530 100.0% Total 63 94. Youve done this before,if youve ever used gcc pg ; gprof java prof ; java ProlerViewer perl d:Dprof ; dprofpp dbms_monitor.session_trace_enable(); p5prof 64 95. Prole Full account of response time Contributions as %R Spanning (sum R) Duration per call Mean, minimum, maximum, Non-overlapping (sum R) Skew Sorted by descending R Drill-down Useful dimension Individual call level of detail Flat prole Maybe even deeper Call graph 65 96. Response Time 66 97. To optimize throughput, youresponse must analyze time.67 98. (Proof) 68 99. (Proof) You cannot optimize X for a task thats ineficient. 68 100. (Proof) You cannot optimize X for a task thats ineficient. 68 101. (Proof)You cannot optimize X for a task thats ineficient.You cannot measure a tasks eficiency without measuringits R. 68 102. (Proof)You cannot optimize X for a task thats ineficient.You cannot measure a tasks eficiency without measuringits R. 68 103. (Proof)You cannot optimize X for a task thats ineficient.You cannot measure a tasks eficiency without measuringits R. Therefore, to optimize X, you must rst analyze R. 68 104. The universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.Donald Knuth69 105. (Programmers arent very good at guessing where their code spends time.) 70 106. To optimize performance (throughput or response time),proles. need people 71 107. Performance is EASY 72 108. Performance is easy if you canstop guessing where your code isslow. 73 109. When you have proles for taskresponse times, performance cannot hide problemsfrom you.74 110. Some surprising things Ivelearned by measuring R 75 111. Disk I/O is often lessimportant than people think. http://carymillsap.blogspot.com/2009/04/cary-on-joel-on-ssd.html76 112. Common performance problems:77 113. Common performance problems: CPU77 114. Common performance problems: CPU77 115. Common performance problems: CPUNetwork I/O 77 116. Common performance problems: CPUNetwork I/O 77 117. Common performance problems: CPUNetwork I/OSoftware serialization77 118. The point78 119. Your problems have nothing todo with experiences Ive had. measure.So79 120. Finding what you need to see80 121. How are you supposed toproles? create these81 122. You have to insist on seeing where time goes for any task you think is important. 82 123. To drill down, you need call-by-call data. (NOT data about aggregations of calls.) 83 124. In Oracle, we do it with a feature called extended SQL tracing. For Developers: Making Friends with the Oracle Database for Fast, Scalable Applications Cary Millsap http://method-r.com/downloads/doc_details/10-for- developers-making-friends-with-the-oracle- database-cary-millsap Optimizing Oracle Performance Cary Millsap with Je Holt84 125. The stu you need 85 126. Feature (attribute) Oracle MySQL App tier Task identication y Call-by-call coverage98%+ DB call begin sequence partly derivable DB call begin timepartly derivable DB call end time y DB call context info y OS call begin sequence partly derivable OS call begin time derivable OS call end time y OS call context info y Call SQL context y Call CPU (sys mode)- Call CPU (usr mode)- Call CPU (total) y SQL execution plansy 86 127. Recap 87 128. Heres what I hope you take away today88 129. Performance is about time and tasks.89 130. If youre interest