end-to-end troubleshooting checklist for microsoft sql server - end-to-en… · end-to-end...

45
End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline Director of Engineering Services, SQL Sentry SQL Server MVP since 2003 Social media at @KEKline Blog: http://blogs.sqlsentry.com/KevinKline

Upload: hanguyet

Post on 04-Aug-2018

239 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

End-to-End Troubleshooting CHECKLIST

for Microsoft SQL Server

Kevin Kline • Director of Engineering Services, SQL Sentry

• SQL Server MVP since 2003

• Social media at @KEKline

• Blog: http://blogs.sqlsentry.com/KevinKline

Page 2: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Tuning blog: http://www.sqlperformance.com/

E-mail [email protected] for free copies of our e-books:

Page 3: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Agenda • Methodology for troubleshooting

• Troubleshooting tools and techniques using the native SQL Server tool kit: – Wait Stats

– Windows Performance Monitor (PerfMon)

– SQL Profiler, Server-Side Traces, and XEvents

– SQL Server DMVs

– Execution Plans

• Summary, Resources, and Q&A

Page 4: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Where to Begin?

• There’s not a “right” or “wrong” place to start. You can start at any of the points shown above.

• Start with the information source that provides the actionable information most quickly

Error Logs

PerfMon

DMVs Profiler &

Trace

SSMS (Execution

Plan)

Page 5: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Methodology • Effective troubleshooting

is like a funnel or series of continuously more refined sieves.

• Each successive sieve filters out smaller “chunks”; that is, harder and more transient errors and problems

• More work is required… Identification & Resolution

Specific SQL Cmds

Resource issues

Errors in the log

Page 6: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Troubleshooting checklist

Shortcut! Has anything changed?

Inside or outside of SQL Server?

Is the issue caused by a SQL Server error?

Are there excessive wait stats?

Correlate wait stats against other metrics.

Follow-up

1.

2.

3.

4.

5.

6.

Page 7: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Your best shortcut

Your best shortcut? Know what has changed!

• Sp_Configure or sys.configurations

• Sp_Dboption (pre-SQL2012) or sys.databases (SQL2012 +)

• DDL triggers for meta-data changes: – Developers?

– Unfettered access to databases?

Remember: Change = Risk

Page 8: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Error Logs • Windows Application helps eliminate non-SQL Server

problems

• SQL Server Error Log and SQL Server Agent Log – Available both as TXT and through the GUI

– SQL Server keeps the six most recent, incrementing with each reboot

• WARNING! Always make sure to enable SQL Server Agent notifications for severity level 18 or greater!

Page 9: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Acting Upon Error Logs • No further research required

– Error found with easy fix

– Error found with difficult fix or restore required

• Further research required – Error found, but time or symptoms of error do not

correlate to the problem

– No error found

Page 10: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo • Error Notification

Page 11: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo of Error Notifications in SSMS

Page 12: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Advanced Error Notification in SSMS

• Error notification can be difficult with lots of SQL Servers.

• Ease the pain by setting up Event Forwarding under the Advanced properties of the SQL Server Agent.

• All events from remote servers are forwarded to one (or more) central servers.

• Now, only one instance of SQLMail/DBMail are needed in your environment.

Page 13: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Rocks, Gravel, or Sand

• We retrieved the top level

information, “the big

rocks”

• Now, what’s the best way

to go deeper?

Page 14: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Wait Stats: It’s all about Bottlenecks!

• Anytime a task in SQL Server waits for something:

– It is reported as a wait type

– Reveals where the bottlenecks are

• SQL Server 2005 aggregates wait type information

• SQL Server 2008 provides new pre-emptive wait stats

Page 15: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Schedulers & Wait Stats • 1 Window = 1 Scheduler

• Users are assigned

to a thread

Uh oh!

The out

of soda!

No problem. Step aside… More syrup for the sodas!

Goes to the waiting or “suspended” queue

Yeah! I’m next in line!

Page 16: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Waits by Task

• sys.dm_os_waiting_tasks

• Wait information

• Task level

• Very accurate

• Transient data

Page 17: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Buffer and Transaction Bottlenecks

• PAGELATCH_xx and LATCH_xx

• PAGEIOLATCH_xx usually come from contention on the buffer pool

• LATCH_xx commonly arise from contention on resources other than buffer pool, especially due to heaps or text data types

• LCK_xx arise from lots of locks and blocks, perhaps by overlong transactions or improperly indexed tables or poorly configured hardware

Page 18: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

CPU Bottlenecks

• SOS_SCHEDULER_YIELD

Yielding processor time

• CXPACKET

• Query parallelism due to splitting

and merging overhead

Page 19: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

IO Bottlenecks • WRITELOG

• Writing transactions to the log on disk

• PAGEIOLATCH_xx

• Represent memory-to-disk transfers

• IO_COMPLETION

• Awaiting I/O task completion

Page 20: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

External Bottlenecks • OLEDB

• Wait on the OLEDB provider – Full-Text Search

– Lots of linked servers

• NETWORKIO

• Often poor client response, in addition to physical networking

Page 21: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Other Bottlenecks • SLEEP_BPOOL_FLUSH

• Checkpoint IO throttling

• RESOURCE_SEMAPHORE_QUERY_COMPILE

• Throttling query compilations

• Compilations, re-compilations, non-cacheable plans

• RESOURCE_SEMAPHORE

• Waiting for a memory grant

Page 22: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Are These Bottlenecks? • WAITFOR

• T-SQL WAITFOR command

• SQLTRACE_BUFFER_FLUSH

• Default trace

• LAZYWRITER_SLEEP

• System process waiting to start

Page 23: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Top 10 Waits from the Field

CPU PRESSURE

• CPU pressure:

SOS_SCHEDULER_YIELD

• Parallelism: CXPACKET

LOCKING

• Long term blocking: LCK_X, LCK_M_U,

& LCK_M_X

MEMORY

• Buffer latch: PAGELATCH_X

• Non-buffer latch: LATCH_X

• Memory grants:

RESOURCE_SEMAPHORE

I/O

• Buffer I/O latch: PAGEIOLATCH_X

• Tran log disk subsystem: WRITELOG & LOGBUFFER

• General I/O issues: ASYNC_IO_COMPLETION & IO_COMPLETION

NETWORK PRESSURE

• Network I/O: ASYNC_NETWORK_IO

Page 24: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Correlating PERF Information

• With wait stats, other older standbys are not as frequently needed. – But they still help!

• PerfMon

• Xevents and Traces (either Profiler or Server-side)

• DMVs

Page 25: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

PerfMon • Benefits: Shows the rate of resource consumption or

activity in a wide variety of areas on the server, for example – Disk IO; Memory; Network

– SQL Server activity - Locking, Blocking, and Deadlocking; Cache Activity; Object Utilization

• Limitations – Very hard to know what to track and what values indicate

good or bad performance

– Doesn’t offer good root-cause analysis, only resource consumption info

Page 26: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Windows Performance Monitor

Page 27: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

OS PerfMon Counters

Object Counter Value Notes

Paging $Usage <70% Amount of page file currently in use

Processor % Processor Time <= 80% The higher it is, the more likely users are delayed.

Processor % Privilege Time <30% of % Processor

Time

Amount of time spent executing kernel commands like SQL Server IO requests.

Process(sqlservr) Process (msmdsrv)

% Processor Time < 80% Percentage of elapsed time spent on SQL Server and

Analysis Server process threads.

System Processor Queue

Length < 4 < 12 per CPU is good/fair, < 8 is better, < 4 is best

Page 28: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

IO and Mem PerfMon Counters

Object Counter Value Notes

Physical Disk Avg Disk Reads/sec < 8 > 20 is poor, <20 is good/fair, <12 is better, <8 is

best

Physical Disk Avg Disk Writes/sec < 8 or <1 Without cache: > 20 poor, <20 fair, <12 better, <8

best. With cache > 4 poor, <4 fair, <2 better, <1 best

Memory Available Mbytes >100 Amount of physical memory available to run

processes on the machine

SQL Server: Memory Manager

Memory Grants Pending ~0 Current number of processes waiting for a

workspace memory grant.

SQL Server: Buffer Manager Free List Stalls/sec < 2 Frequency that requests for db buffer pages are

suspended because there are no buffers.

Page 29: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

RED Flag PerfMon Counters

Object Counter Value Notes

:Access Methods Forwarded Records/sec <10* Tables with records traversed by a pointer. Should be <

10 per 100 batch requests/sec.

:Access Methods Page Splits/sec <20* Number of 8k pages that filled and split into two new

pages. Should be <20 per 100 batch requests/sec.

:Databases Log Growths/sec; Percent

Log used < 1 and

<80%, resp Don’t let transaction log growth happen randomly!

:SQL Statistics Batch Requests/sec * No firm number without benchmarking, but > 1000 is a

very busy system.

:SQL Statistics Compilations/sec;Recompi

lations/sec *

Compilations should be <10% of batch requests/sec; Recompilations should be <10% of compilations/sec

:Locks Deadlocks/sec < 1 Nbr of lock requests that caused a deadlock.

Page 30: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Profiler / XEvents • Monitors SQL Server for the occurrence of events

• When an event fires, Profiler logs the event and information about it

• Useful for: – Finding and diagnosing slow-running code.

– Capturing the series of SQL statements that lead to a problem

– Replaying and reproducing a problem on a test machine

• Doesn’t offer resource consumption info, just granular details

Page 31: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Server-side Traces • Warning! Profiler can be overwhelmed by a high

throughput system!

• Server-side traces happen entirely on the server (no client GUI) and are controlled using stored procedures

• Useful for “auto-start” logging and high performance scenarios

• TIP! Profiler can be used to create a server-side trace. That means no procedures to learn.

Page 32: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo

• Correlating PerfMon and Profiler

Information

Page 33: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo of SQL Profiler • 1: Invoke SQL Profiler

• 2: Choose a template, Standard is usually ok.

• 3: Choose your Events from the Events Selection tab.

• 4: Click RUN to begin the trace.

• 5: Click STOP to end the trace and write it to a file.

Page 34: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Correlating PerfMon and Trace Data

1. After collecting a PerfMon and Profiler trace file, load the Profiler file using

File Open Trace File.

2. Choose File Import Performance Data to load in the PerfMon trace file.

3. Choose the PerfMon counters to show on your analysis screen.

4. You’ll then have the overlay of PerfMon & Profiler data as shown on the

right.

5. You can click anywhere on the timeline to see what was happening at that

point in time.

Page 35: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Dynamic Management Views (DMV)

• Tell exactly what’s happening at present inside of SQL Server

• Multitude of DMVs, which can tell things like: – What are the top 10 most CPU-intensive queries?

– What are the 5 biggest objects in cache?

– Which objects get the most IO?

– Which users consume the most resources?

• DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);

Page 36: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Essential DMVs

Performance & Wait Stats • Sys.dm_os_wait_stats

• Sys.dm_os_performance_counters

• Sys.dm_os_waiting_tasks

I/O • Sys.dm_io_virtual_file_stats

• Sys.dm_io_pending_io_requests

Transactions • Sys.dm_tran_lock

• Sys.dm_db_index_operational_stats

• Sys.dm_db_index_usage_stats

SPID Activity & SQL Statements • Sys.dm_os_exec_requests

• Sys.dm_exec_requests

• Sys.dm_exec_query_stats

• Sys.dm_exec{procedure | trigger}_stats

Why code it yourself? Get Glenn Berry’s latest scripts at http://sqlserverperformance.wordpress.com/category/diagnostic-queries/

Page 37: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

1. Inside or Outside of MSSQL?

Check Windows Server logs. Resolve any errors and recheck.

2. Caused by an MSSQL or SQLAgent error?

Check SQL Server and SQL Agent logs. Resolve any errors and recheck.

3. Excessive wait stats?

Assess wait statistics to categorize the bottleneck using sys.dm_wait_stats.

4. Wait stats correlate to specific sessions or components?

Assess session wait stats using sys.dm_waiting_tasks. Resolve problematic user activity or process.

Assess other problem areas using other DMVs, like dm.os_performance_counters. Resolve system misconfiguration, design problem, or resource shortage.

Page 38: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

SQL Server Management Studio

• Once the root-problem is revealed, you still have to fix it.

• Common resolutions using SSMS include: – Debug a SQL Server procedure or function

– Tune one or more SQL statements

– Add or alter indexes

• Tuning SQL code can be difficult unless you know how to read an execution plan: – SQL Sentry Plan Explorer is FREE!

Page 39: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Execution Plans

• Explain plans tell you all the internal steps that the SQL Server takes to complete the action

• Read graphic explain plans from right to left. (Textual ones from bottom to top)

• Graphic plans use icons to represent actions, while arrows represent data flows

Page 40: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo

• Reading Basic Execution Plans

Page 41: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Demo of SSMS Graphic Execution

Plans

Page 42: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Fixing Bad Code In SSMS

• Fixing bad code is an exercise in experimentation – Lots of tips & tricks to try

– Check out our tuning content at http://sqlsentry.tv

• Figure out the work done in the code, then try a new version that might improve it: – Different search arguments in the WHERE or JOIN clauses to

make better use of indexes

– Use an alternative pattern

– Apply a different locking strategy

– Use a query hint

• Clear you caches, then rewrite, test, repeat…

Page 43: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

Summary

Page 44: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

TOOLS FOR FINDING PERFORMANCE

PROBLEMS

• Adam Machanic’s sp_whoisactive

• Brent Ozar’s sp_askBrent, sp_blitz

• Extended events – Jonathan Kehayias

• DMVs – Glenn Berry’s diagnostic queries • System info: dm_os_performance_counters, dm_os_wait_stats

• Query info: dm_exec_requests, dm_exec_query_stats

• Plans: dm_exec_query_plan, dm_exec_plan_attributes

• Cache/buffer pool: dm_exec_cached_plans, dm_os_buffer_descriptors

• Index info: dm_db_index_usage_stats, dm_io_virtual_file_stats

Page 45: End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server - End-to-En… · End-to-End Troubleshooting CHECKLIST for Microsoft SQL Server Kevin Kline • Director of Engineering

THANK YOU!

• Performance tuning blog at http://SQLPerformance.com.

• Videos at http://SQLSentry.TV

• E-mail [email protected] for free copies of our e-books:

o Just tell them where you met me