analyze database system using a 3 d method

24
ANALYZE DATABASE SYSTEM USING 3D METHOD Ajith Narayanan 19 th Feb 2015, Mumbai, India

Upload: ajith-narayanan

Post on 13-Apr-2017

916 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: Analyze database system using a 3 d method

ANALYZE DATABASE SYSTEM USING 3D METHOD

Ajith Narayanan19th Feb 2015, Mumbai, India

Page 2: Analyze database system using a 3 d method

Who Am I?Ajith Narayanan• Oracle ACE Associate • 11 years of Oracle [APPS] DBA experience.• Blogger :-

http://oracledbascriptsfromajith.blogspot.com• Speaker:- Conferences Of AIOUG, DOAG, NZOUG,

UKOUG, OTNYathra , OTN APAC Tour etc.• Website Chair (2010 -2013) :- ORACLERACSIG (

http://www.oracleracsig.org)• Member:- OAUG & AIOUG, ORACLERACSIG• AIOUG Real Application Clusters SIG Leader

Page 3: Analyze database system using a 3 d method

Agenda• What Is Utilization?• Where OS tools Collect Performance Data?• How Oracle Database Collects The Server

Metrics?• Simulate SAR – Using /proc FS• Calculating OS CPU Utilization: OraPub Core

Method• Calculating OS CPU Utilization : OraPub

Busy:Idle Method• Find The True Units of Power In DB Server

Page 4: Analyze database system using a 3 d method

Agenda• Performance Analysis• 3D Method• 3D Method - Example-1• 3D Method - Example-2• Benefits of 3D method• Time Based Analysis• TBA - Oracle Process - CPU Time• TBA - Oracle Process - Idle Time• How do we classify idle & non-idle wait event?• Have a Diagnostic Framework Handy – For Quick

Troubleshooting• Q&A

Page 5: Analyze database system using a 3 d method

What Is Utilization?Utilization (U) = Requirement(R)/Capacity(C) = 500ML/700ML = 0.71 = 71%

Busy% + idle% = 100

User % + system% + wio% = 100

User% + Nice% + System% + Steal% + wio% = 100

NICE TIME= When a process is niced, its priority is larged, So it is nicer to other processesSTEAL TIME=Virtual machine has stolen or given more time from another VMIDLE TIME= CPUs are idle sometimes waiting for an IO (wio)

Page 6: Analyze database system using a 3 d method

Where OS tools Collect Performance Data?Common OS tools:- Note:- SAR is not default, systat rpm should be install.• sar -u 3 9• vmstat 3 9• topWhere do these tools get the raw data from?

strace vmstat 2 3

• For IO - IOSTAT• For Memory - SAR• For Network -NETSTAT

GOOGLE "linux proc filesystem documentation"

Page 7: Analyze database system using a 3 d method

How Oracle Database Collects The Server Metrics?• STRACE is a debugging utility for Linux and some other Unix-like systems to

monitor the system calls used by a program and all the signals it receives, similar to "truss" utility in other Unix systems. This is made possible by a kernel feature known as ptrace.

• MMNL The Memory Monitor Light (MMNL) process is a new process from 10g and higher versions which works with the Automatic Workload Repository new features (AWR) to write out full statistics buffers to disk as needed.

[ajithpathiyil1:oracle]> ps -ef|grep mnloracle 3334 1 0 Mar 22 ? 188:00 ora_mmnl_ajithrac1oracle 21355 16615 0 08:26:34 pts/15 0:00 grep mnl[ajithpathiyil1:oracle]>

[ajithpathiyil1:oracle]> strace -p 3334 3>&1 2>&1 |grep '/proc/stat'

• The file number was 15 and the reopen cycled every 15 seconds[ajithpathiyil1:oracle]> strace -p 3334 3>&1 2>&1 |grep 'read(15, "cpu'

• The file number was 15 and the read occured every 15 seconds

Page 8: Analyze database system using a 3 d method

Simulate SAR – Using /proc FS#/bin/kshinterval=$1echo "user nice system wio idle“while [1 = 1 ]docpu_all_t0=`cat /proc/stat |head -1`cpu_usr_t0=`echo $cpu_all_t0 | awk '{print $2}'`cpu_nic_t0=`echo $cpu_all_t0 | awk '{print $3}'`cpu_sys_t0=`echo $cpu_all_t0 | awk '{print $4}'`cpu_idl_t0=`echo $cpu_all_t0 | awk '{print $5}'`cpu_wio_t0=`echo $cpu_all_t0 | awk '{print $6}'`sleep $intervalcpu_all_t1=`cat /proc/stat |head -1`cpu_usr_t1=`echo $cpu_all_t1 | awk '{print $2}'`cpu_nic_t1=`echo $cpu_all_t1 | awk '{print $3}'`cpu_sys_t1=`echo $cpu_all_t1 | awk '{print $4}'`cpu_idl_t1=`echo $cpu_all_t1 | awk '{print $5}'`cpu_wio_t1=`echo $cpu_all_t1 | awk '{print $6}'`usr=`echo $cpu_usr_t1-$cpu_usr_t0 | bc`nic=`echo $cpu_nic_t1-$cpu_nic_t0 | bc`sys=`echo $cpu_sys_t1-$cpu_sys_t0 | bc`idl=`echo $cpu_idl_t1-$cpu_idl_t0 | bc`wio=`echo $cpu_wio_t1-$cpu_wio_t0 | bc`tot=`echo $usr+$nic+$sys+$idl+$wio | bc`usr_pct=`echo "scale=2;$usr/$tot" | bc`nic_pct=`echo "scale=2;$nic/$tot" | bc`sys_pct=`echo "scale=2;$sys/$tot" | bc`idl_pct=`echo "scale=2;$idl/$tot" | bc`wio_pct=`echo "scale=2;$wio/$tot" | bc`echo "$usr_pct  $nic_pct  $sys_pct  $idl_pct  $wio_pct"done

Sample Shell Script :- cpuinfo.shUsage:- $> ./cpuinfo.sh <interval>

Page 9: Analyze database system using a 3 d method

Calculating OS CPU Utilization: OraPub Core Method• OS CPU Utilization using the core method completely from v$ views

especially v$osstat.U=R/C1 CPU core, 1 minute = 60 Secs1 CPU core, 2 minute = 120 Secs2 CPU cores, 1 minute = 120 Secs2 CPU cores, 2 minute = 240 Secs • CAPACITY

Elapsed Time(Duration) X CPU cores = Capacity12 X 3600 Secs = 43200 sec (Of CPU)24 x 60 X 60 Secs/min = 86400 sec (Of CPU) • REQUIREMENTSAWR Report = NUM_CPUS & ELAPSED (60 Mins or 3600 Secs)

num_cpus, cpu_cores, cpu_sockets, vcpus, lcpus, num_cores, num_threadsNote:- AWR- OS Statistics - BUSY_TIME (in centiseconds) divide by 100 to get in secs

Page 10: Analyze database system using a 3 d method

Calculating OS CPU Utilization: OraPub Core MethodUtilization = Requirements / Capacity

= time used / time available

* time used = v$osstat.busy_time /100   = 1913617 cs / 100 = 19136 seconds

* time available = duration X v$osstat.num_cpus = 60 min X 60 sec/min X 24

= 86400 sec U = 19136 sec / 86400 sec   = 0.22  = 22% (Average CPU utilization for that 60 min of AWR data)

Page 11: Analyze database system using a 3 d method

Calculating OS CPU Utilization : OraPub Busy:Idle MethodUtilization = Requirements / capacity                = time used / time available * time used = v$osstat.busy_time               = 1913617 cs * time available = v$osstat.busy_time + v$osstat.idle_time                    = 1913617 cs + 7159367 cs                    = 9072984 cs Utilization      = 1913617 cs / 9072984 cs                   = 0.21                   = 21%Source: Delta values from v$osstat found in all AWR reports

This is Interesting. Just watch this 1,913,617/(1,913,617 + 7,159,367) = 0.211,913/(1,913 + 7,159) = 0.2119/(19 + 72) = 0.21

Page 12: Analyze database system using a 3 d method

Find The True Units of Power In DB Server

 True units of power in DB serverU = R/C  = busy_time/(busy_time+idle_time)

- Formula used by OraPub Core method = busy_time/(duration X units_of_power)

- Formula used by OraPub Busy-Idle method

units_of_power = busy_time+idle_time/duration

Page 13: Analyze database system using a 3 d method

Performance Analysis - A very broad topic- Oracle time based analysis is only one of this large endeavors for troubleshooting the performance issues.- Where does it fit in larger performance Analysis?

Big picture - lots of confusing things, leads to guessing, end of the day paralyzes our performance tuning effort. Time based analysis is very different,

I do my performance analysis of systems using 3 different perspectives which gives me complete understanding of the system landscape.

OS -> Oracle -> Application - 3 overlapping circles

Very powerful method, simple to understand

Page 14: Analyze database system using a 3 d method

3D Method Circle 1- OS - Start looking at the OS, We expect there to be a bottleneck, but there is not always a bottleneck

Within OS I will look- CPU, IO, Network & Memory Subsystem

Circle 2 - Oracle - Its all about work and time, Look for opportunity for reducing big chunks of time that can be reduced- this is where time based analysis fits in

Circle 3- Application - Its all about SQL stmts and the server processes that actually does the work.In complex situations - Where i do not know the SQL stmts, We can go for Circle 1 and Circle 2 to direct to right SQL stmts to looked for.

Page 15: Analyze database system using a 3 d method

3D Method - Example-1OS - massive CPU bottleneckOracle do TBA- 99% database time is CPU consumption, remaining 1% is for wait time is memory serialization control(latching)Application - Look for high CPU SQLs

Now Solutions with 3D

OS- Can we get more CPU power? for CPU bottleneck, or alternatively, remove high impact processes in during peak timeOracle do TBA- Add more Cache buffer chain latchesApplication - Look for high CPU SQLs, tune them for less executions (Hard parsing)

Page 16: Analyze database system using a 3 d method

3D Method - Example-2OS - Massive IO read bottleneckOracle do TBA- 30% cpu , 70% wait for sinlge block sync readsApplication - Look for high physical reads SQLs

Now Solutions with 3D

OS - Ask for more IO read capacity, perhaps more devicesOracle do TBA- Because 70% of the processes is waiting to burn CPU, keeping those blocks pinned in buffer cache, increase th buffer cache to use the keep pool to cache the key tables.Here it helps only if blocks are repeatedly touched.Application - Find top read SQLs, tune it or reduce the execution rate.

Page 17: Analyze database system using a 3 d method

Benefits of 3D method  Now that, you would have understood the collaboration between these 3 circles

• Each of the circles not only helps identifying the issues, but gives birth to solutions as well.

• Each circle can be analyzed by different groups of people in parallel, making the troubleshooting quicker.

• Unites the teams to work in collaboration, Eliminates just pointing fingers.

Page 18: Analyze database system using a 3 d method

Time Based Analysis• TBA is a quantitative analysis technique• Based on database time• Exposes opportunities• Provides direction• Helps in understanding what users experience• Advanced analysis setup

• If users cribbing is about time, our analysis is also about time and we will try to create a collaborative link between the two. This is TBA

• At core, Oracle TBA categorizes time in 2 types - CPU time & non-idle wait time

Page 19: Analyze database system using a 3 d method

Time Based Analysis• CPU Time - A Oracle process wants to burn CPU,

Oracle keeps track of this consumption, it could be 3 Secs or 3 Ms, but its the CPU consumption time.

• Non Idle wait time - An oracle process cannot burn CPU for some reasons and it pauses, this pause is called wait time, Oracle keeps track of this time a process is waiting to burn CPU.

E.g. An oracle process is consuming CPU, discovers the block it needs is not in the cache, makes I/O subsystem call to fetch the blocks it needs.

Wait event gives an idea on what the oracle process was waiting for and not consuming any CPU.

Around 1500 wait events, Don't worry you may need to worry about a dozen of wait events in real life.

Page 20: Analyze database system using a 3 d method

TBA - Oracle Process - CPU Time• How much CPU is consumed?• Is CPU time reliable?• How do we get that information?• How do we use this information?

• Oracle processes keeps track of the CPU consumed by itself. A kind of self monitoring

• For a specific oracle process (v$sess_time_model)• For all the oracle processes (v$sys_time_model)

• Both views contains similar columns.

Page 21: Analyze database system using a 3 d method

TBA - Oracle Process - Idle Time• How do we classify idle & non-idle wait event?• DBWR on a 3sec for writing into disk is an idle wait

event - As a user, this event is not significant • But when a server process is a waiting for the I/O

subsystem to get a block from disk its a non-idle wait event, directly impacts the user experience.

• Majority of the wait events are classified as IDLE wait events. (Such wait events are not considered in TBA)

• In real time, we may have to worry about 2 or maybe 4 wait events the max. Most of the other wait events might be clustered under these main wait events explaining those top 2-4 main wait events.

• Database Time or DB Time • DB Time= CPU consumption + Non-Idle Wait time• It can be applied for a specific or a group of SQL

stmts, A session or an entire cluster

Page 22: Analyze database system using a 3 d method

How do we classify idle & non-idle wait event?• Where does this raw data come from?

• Use strace or truss on the OS process id.

• Look for specific System call "getrusage" to find how much CPU time is consumed. This is how Oracle process gathers CPU time consumed by itself.

• Summary -• Oracle time based analysis starts with CPU & Non-

Idle wait times• Oracle processes get CPU usage directly from the OS

system calls.• We see CPU consumptions via time model views.

Page 23: Analyze database system using a 3 d method

Have a Diagnostic Framework Handy – For Quick Troubleshooting• AWR report is awesome• AWR report give hell lot of information.• How do I interpret the AWR report?

• Let’s go for a DEMO

Page 24: Analyze database system using a 3 d method

Q&A