danny kellett java system solutions investigating performance outside of workflow and indexes

46
Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

Upload: marcia-suzanna-mason

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

Danny KellettJava System Solutions

Investigating performance outside of workflow and indexes

Page 2: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 2© 2012 WWRUG Canada Inc. All Rights Reserved

Husband and father• Worked for Remedy / Pere***n / BMC from 1999 to 2007

• BSM ITSM Solution Architect / Consultant• Single Sign-On architect for Java System Solutions

Page 3: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 3© 2012 WWRUG Canada Inc. All Rights Reserved

Warning – There is a lot of content in this presentation !

This is intentional. Not all of this information will mean anything yet but if/when you decide to look at your own system, this presentation will hopefully fill in all those knowledge gaps

Feel free to email me at [email protected]

Page 4: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 4

Latency- What is it and why is it such a big deal?- How do I test for it?- How to use this information to demonstrate performance for your users

Queues & Threads- What are they?- How to read the logs- How to let your AR Server tell you how many it needs

Plugins- What types exist?- How they can impact performance- The process of diagnosis and fix

Agenda

Page 5: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 5

Objects and Results

Objectives- To understand that no matter how many CPU’s and RAM you have, a poor

network can bring an application to its knees- Demystify the confusion about queues and threads- Understand the black boxes called plugins

Results- Understand what the above means and give you the tools and knowledge to

understand your own AR System

Skills developed- To sit through a very technical and possibly boring tech talk and live to talk

about it- The knowledge to understand the “not so well documented” parts of the AR

System

Page 6: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 6© 2012 WWRUG Canada Inc. All Rights Reserved

Latency“Higher latency decreases app response time, user performance and perceived app quality”

Page 7: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 7

Latency :: What Is It And Why Is It Such A Big Deal?

What is it?- In a network, latency, a synonym for delay, is an expression of how much

time it takes for a packet of data to get from one designated point to another. There are two typical types: One way

– The time from the source sending a packet to the destination receiving it

Round trip– The one-way latency from source to destination plus the one-way

latency from the destination back to the source

Latency is not bandwidth- Two key elements of network performance is bandwidth and latency. The

average person is more familiar with the concept of bandwidth as that is the one advertised by manufacturers of network equipment. However, latency matters equally to the end user experience

Page 8: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 8

Latency :: What Is It And Why Is It Such A Big Deal?

Why is it such a big deal to us and BMC Software?- The BMC AR System architecture has multiple

network node points Each line in the diagram is effected by latency If each line added even milliseconds, it all adds

up! If any point adds a delay then the whole “trip” is

effected

The AR System API (ARAPI) is very “chatty”- Instead of eating with a big spoon it eats with a

small spoon

Page 9: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 9

Latency :: What Is It And Why Is It Such A Big Deal?

Real life example- ITSM 7.6.04 SP2, load balanced environment as the

diagram. Browser was Firefox v21.0. URL is HTTPS

Test- Incident console, double clicking an incident.

Baseline (Initial, first load example)- Took a measurement on the current network to get

the number of trips, amount of data and response time in seconds

Second test with latency- With added latency of 50ms (0.05s) from the client

(browser on desktop) through the Load Balancer and to one of the Mid Tier servers

50ms}

Page 10: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 10

Latency :: What Is It And Why Is It Such A Big Deal?

Results of Baseline (Initial, first load example)

50ms}# Trips Amount of Data Time (seconds)

Client to server 65 126.6k

Server to client 103 318.6k

Total 168 445.2k 9s

Results of test with 50ms added latency

Total 168 445.2k 12s

Test summaryJust 50ms of network latency in just one piece of the BMC architecture, from the browser to the Mid Tier, can add ⅓ to your end user response times!

Page 11: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 11

Latency :: How Did I Do Those Tests?

Before you start, understand these things- What happens when you type the Mid Tier address in the URL bar:

Your browser will use your desktop network configuration to get the network details of your Mid Tier.

First is will look at your local hosts file for the Mid Tier host name. If it is not in there it will ask the Domain Name Service (DNS) If your Mid Tier’s IP address is configured in your DNS database, then the

browser will connect to it and everything works, you see the application etc

BUT if you added a line in your local hosts file (c:\windows\system32\drivers\etc\hosts) so that your desktop believes its not that IP address but a different one e.g. 127.0.0.1 then the browser will try and connect to that instead.

127.0.0.1 is something called a loopback adaptor and its basically means your own machine you are typing on. And unless you have a Mid Tier running on your machine, it will fail.

Page 12: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 12

Latency :: How Did I Do Those Tests?

Before you start, understand these things- What happens when you type the Mid Tier address in the URL bar (part 2)

What if you had a Mid Tier on your desktop and you added the same Mid Tier host name to your local hosts file with 127.0.0.1?– Your browser would still display the correct URL address but you

would be connecting to the Mid Tier on your desktop and not the one on the network.

- OK so why do I need to know that? If you installed a piece of software on your desktop that wasn’t a Mid Tier

but something that connected to your REAL Mid Tier on the network, BUT delayed all connections, adding latency …. Then this is called a Proxy and this is what I used. Confused? See next slide for a diagram

Page 13: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 13

Latency :: How Did I Do Those Tests?

Add a proxy on the desktop to simulate latency

Page 14: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 14

Latency :: How Did I Do Those Tests?

Find your Mid Tier real IP using ping or nslookup

Insert that IP value into the proxy app as the MAP IP

Add the Mid Tier URL host name to the loopback address in your local hosts file- 127.0.0.1 try.onbmc.com

Click Start on the proxy. Use your browser as before and record the timings.

Page 15: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 15© 2012 WWRUG Canada Inc. All Rights Reserved

Page 16: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 16

Latency :: How Can This Predict Your Response Times?

Obtain your users latency times to the Mid Tier server.

Latency Open Incident

London 15

Paris 45

Houston 484

EXAMPLE

Page 17: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 17

Latency :: How Can This Predict Your Response Times?

Obtain your users latency times to the Mid Tier server.- Using ping - which uses ICMP but is sometimes turned off on network equipment- Or http-ping with the Free JSS Network Simulator- Those times are round trip, so the time its taken from client to server AND back

from server to client. Therefore when testing, half those values!- E.g. 156 / 2 = 78- The one way latency is 78ms

Page 18: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 18

Latency :: How Can This Predict Your Response Times?

Screenshot of free JSS app https://www.javasystemsolutions.com/download/networksim/jss-networksim.zip

Example data for onbmc.com

Page 19: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 19

Latency :: How Can This Predict Your Response Times?

Using free JSS tools, you can test the response times of all your users from your own desktop and more importantly before your users do

Latency Open Incident

London 15 6s

Paris 45 8s

Houston 484 30s

EXAMPLE

Page 20: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 20

Latency :: If Your Latency Is High

Speak to your network teams about Quality of Service (QOS)- Some network equipment can prioritise certain protocols. The Mid Tier uses

either HTTP typically on port 80 or HTTPS typically on port 443 – have these prioritized if possible

Make sure your architecture has as little latency as possible between the Mid Tiers and AR Servers and more importantly between the AR Server and the database.- ITSM 7.6.04 with approx 900 concurrent users fires approx 127 SQL

statements per second at the database. High latency would bring the app to its knees!

Install local Mid Tier instances near your end users.- There is more traffic between the browser and the Mid Tier than there is

from the Mid Tier to the AR Server.

Make customisations to workflow to remove trips altogether.- Tell story at large outsourcer

Page 21: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 21© 2012 WWRUG Canada Inc. All Rights Reserved

Threads & Queues

Page 22: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 22

A queue is an entry point into the AR System- They are identified by a number, and sometimes referred to as an RPC

queue. Here are some examples 390600 = Admin 390603 = Escalation 390620 = Fast API calls (just a name without intending to indicate

performance) 390635 = List API calls (just a name as well but was aimed at things that

search and return lists/large amount of data)

A queue can have one or more threads defined for them. On start-up, each thread creates a connection to the database that it uses throughout its existence. Threads only close when you shutdown the server or it cannot connect to the databaseOne queue has one or more threads

Threads :: Lets All Get Up To Speed On Queues & Threads

22

Page 23: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 23

If an API call gets routed to a queue and all the current threads are being used, it will look at the Max Threads value configured for that RPC queue. If the current thread number is lower, then it will create another thread and use that.If the list queue is at max resource, it will put the work on Fast queue and vice versaIf both are full, it will move the work to the Admin threadThe AR system has a set of queues -- some pre-defined, some private and defined per instance -- and each of them has a number of processing threads as configured in the ar.cfg/ar.conf

Threads :: Lets All Get Up To Speed On Queues & Threads

23

Page 24: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 24

Search: Google/BMC Support/ARSlist/BMC CommunitiesBMC Atrium Core 8.0.00_20120921_docs.pdf- Page 87 - set Min Threads to 5 and Max Threads to 10- 262, same information repeated on 356, 383

Fast threads — At minimum, the same number as you have CPU cores; at maximum, 3 times the number of CPU cores, but no exceeding 32

List threads — At minimum, the same number as you have CPU cores; at maximum, 5 times the number of CPU cores, but not exceeding 32

- Page 2048 CPU x 1.5 for the Private queue.

SW00427239 - Fast and List threads are not set as per the recommended Queue settings.Doug Mueller ARSList post - In theory, there is no reason you cannot have 10s or even 100s of threads in a queue.

Threads :: Confusing Or Mixed Messages

24

Page 25: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 25

I do not believe the amount of threads is based on the number of CPUs alone. In all my tests, the CPU usage never rose over 55% (excluding Developer Studio work)If the infrastructure can handle it E.g. an MSSQL database has a maximum 32767 connections. So theoretically, if the AR Server could fire that many connections and process them, then why not?Just for now, think about if a connection to a database is doing some long query and is held up, the CPU on the AR Server is still the quickest component in the architecture and will have to wait and therefore it will do “other things”It’s like saying a car can only handle 100BHP- Sure it can handle more if the rest of the car components and driver can

handle it!

Threads :: Confusing Or Mixed Messages

25

Page 26: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 26

There are so many variables that make your system unique- CPU types, HT, SMT, Cores etc

Virtual CPU vs a bare metal CPU differs http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sen

sitive-Workloads.pdf http://scn.sap.com/thread/1646435

- Virtualisation or bare metal- Operating systems and settings

Therefore after 14 years of experience, researching hours and hours, “Googling” the WHOLE internet on system architecture, posting and reading on so many forums my answer is :- Every environment is different so suck it and see and get the system to tell

you- And here’s how I do it.

Threads :: In My Experience The Answer Is…

26

Page 27: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 27

In a queue, if all the threads are constantly busy then you need to do some investigation- If you have a high number of threads already, and these threads are taking

too long to complete. E.g Investigate the queries to the database and work with your DBA to speed them up

- If the above doesn’t work, or the DBA doesn’t want to play ball, then it’s time to increase the Max Thread count in the AR System Administration Console

Each queue and thread takes system resources such as CPU cycles and memory. If those resources are maximised, then it’s time for an upgrade or to add another AR Server in the server groupNo substitution for capacity management and load sharing

Threads :: Here’s How :: Simple Principles

27

Page 28: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 28

In case you didn’t know, you can have multiple log data in one file. Therefore start a log of API, Escalation, SQL and Filter as required

Threads :: Here’s How :: Step 1 – Create Logs

28

Run this log in your peak periods if you can. Or if you are not live, then use a volume and performance testing applicationThis log will get large so make sure you have enough space and use your better judgment regarding the log file size and the amount of time you leave it on. Monitor, do not turn on and go home.

Page 29: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 29

There are a couple of tools you can use- ARLogAnalyzer

https://communities.bmc.com/docs/DOC-2973- AR Log File Analyser

http://www.missingpiecessoftware.com/products/ar-log-file-analyser

Threads :: Here’s How :: Step 2 – Run Log Analyzer

29

Page 30: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 30

Understanding the detail in the AR logs- Green -- The same user- Purple -- The same RPC 390620 which is the Fast queue- Red -- TWO RPC ID’s meaning two different API calls being processed

Every call that the dispatcher thread receives is assigned an RPC ID that can be used to identify the call from the time the call is placed into the queue until a response is sent back to the client

- Cyan -- One thread executing both API calls one after the other

Threads :: Here’s How :: Step 2 – Run Log Analyzer

30

Page 31: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 31

Verify the number of threads that is actually running. If the numbers in the log file match the Max Threads then you know at some point all threads were utilised

Threads :: Here’s How :: Step 3 – What To Look For?

31

ARLogAnalyzer AR Log File Analyser

Both results show : Fast Max is 30 but 27 being usedList Max is 40 but 35 being used

Page 32: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 32

Thread idle time is the time from when a thread completes some work and then has to start work againTherefore the lower the idle time, the busier the thread isRemember this will probably spike during the working day but this is truly the best way to monitor when your busy periods are and how busy your system isThere is no such thing as 0 idle time. Even getting work from the dispatcher takes at least some time- Therefore ignore the MIN Idle Time examples include 0.0007

Look for very small numbers on the AVE idle time column

Threads :: Here’s How :: Step 4 – Idle Time

32

Page 33: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 33

You can identify when the AR Server has needed to increase the thread count on a queueWe can see one queue 390626, which is configured to start with 6 threads (Min Threads value below)

Threads :: Identifying Busy Periods

33

Looking through the log, we can see the thread number synchronously increment for the first 6, 28 to 33, then we see 130

Page 34: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 34

Note the thread id 0000000130 underlined in red on the previous slideSearch your log file for TID: 0000000130

Threads :: Identifying Busy Periods

34

The above screenshot of the log entry is on one line but I had to cut it to fit on the slideFind the first instance of the thread id (TID) and note the time in this example is 13:08. This is when the AR Server decided it needed to create a new thread on the 390626 queue

Page 35: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 35

Another way of identifying number of threads started- AR System Administration Console > System > General > Review Statistics

Threads :: Server Statistics

35

Page 36: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 36© 2012 WWRUG Canada Inc. All Rights Reserved

Plugins…

Page 37: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 37

Three types of plugins. Each do a different thing- AREA – External Authentication- ARF – Filter plugin which are used to extend actions of Filters- ARDBC – Access data outside of forms but mimic the behaviour of forms

There are two types of plugins. C and Java- Which obviously relates to the programming language they are built with

C- Runs through an executable file arplugin.exe (windows), arplugin (*NIX)

Java- Surprisingly runs from separate Java processes, or Java Virtual Machines

They have completely separate configuration, logging output etc

Plugins :: Lets All Get Up To Speed On Plugins

37

Page 38: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 38

C specific- arplugin.exe / arplugin started via armonitor and configured to run in

armonitor.cfg / armonitor.conf E.g. /opt/bmc/ARSystem/bin/arplugin –s srv1 –i /opt/bmc/ARSystem

- Configured through the ar.cfg / ar.conf- “Plugin:”, “Plugin-Path:” & “Plugin-Port:” apply only to the C plugin daemon.- How to identify them? In the ar.cfg/ar.conf

Plugin: Then .dll on Windows, or .so or .a on NIX systems E.g. Plugin: ServerAdmin.so E.g. Plugin: ardbcconf.dll

- Logging is controlled through the AR System Administration Console

Plugins :: Lets All Get Up To Speed On Plugins

38

Page 39: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 39

Java specific- A Java process is started via armonitor and configured to run in

armonitor.cfg / armonitor.conf E.g. /usr/java/jdk1.6.0_06/jre/bin/java -Xmx512m -classpath

/opt/bmc/ARSystem/pluginsvr:/opt/bmc/ARSystem/pluginsvr/arpluginsvr75.jar com.bmc.arsys.pluginsvr.ARPluginServerMain -x svr1 -i /opt/bmc/ARSystem

- Typical ITSM instance has 4 Java plugin servers running Primary plugin server Full Text Search Engine 2 CMDB plugin servers

- Configured via three seperate pluginsvr_config.xml files- How to identify them? Within pluginsvr_config.xml files and some are

aliased in the ar.cfg/ar.conf- Logging is controlled through each log4j_pluginsvr.xml files

Plugins :: Lets All Get Up To Speed On Plugins

39

Page 40: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 40

2 main types of functionality within the plugins1-Way and 2-Way- 1-Way is the AR Server calls the plugin and the plugin returns a response- 2-Way is the AR Server calls the plugin but in order to complete that request,

the plugin must connect back to the AR Server and lookup some data, and then returns a response

The two way plugins are typically the ones to look out for with regards to performance- E.g. REMEDY.ARDBC.APPQUERY

Plugins :: Lets All Get Up To Speed On Plugins

40

Page 41: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 41

Same process of assigning queue numbers permitting the monitoring of data within the logs- E.g. If the plugin is configured on RPC queue 390624 then the API and SQL,

the plugin executes against the AR Server, will be in the API and SQL logs with: <Client-RPC: 390624 >

Plugins that connect back to the AR Server are just clients, which use the same API as the User tool, driver, Mid Tier, etc

Plugins :: Monitoring The Configuration In Log Files

41

Page 42: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 42

Seen in the log files when viewing the Overview ConsoleFind if its a C or a Java plugin by looking in the ar.cfg/ar.conf- Server-Plugin-Alias: REMEDY.ARDBC.APPQUERY REMEDY.ARDBC.APPQUERY

srv1:9999- See port number as :9999 so if Plugin-Port: 9999 then its a C plugin. Otherwise

you can tell its a Java plugin.

Search for NAME in the java plugin config xml (pluginsvr_config.xml)

Plugins :: Example - REMEDY.ARDBC.APPQUERY

42

Now look for the above classname line in the log4j_pluginsvr.xml

Page 43: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 43

Change warn to trace, restart and open the arjavaplugin.log

Plugins :: Example - REMEDY.ARDBC.APPQUERY

43

Now look for the above line in the log4j_pluginsvr.xml

ITSM OOTB, this plugin is not configured to run on its own queue. Add the line in the pluginsvr_config.xml

Page 44: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 44

Plugins that connect back to the AR Server are clients just like the User Tool and Mid Tier etcMost, if not all, are not configured to run on queues OOTBIt’s OK to run trace logs in production as long as it is managed!- No better log than one with real user transactions, on a real working system- Use log rotation with some form of log monitoring- You don’t need a lot of log data. An hour can be enough

Include this analysis in your capacity management assessment- E.g. If we add 1000 users, will I need to increase my thread count?

Plugins :: Summary

44

Page 45: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 45

A lot of content but it hopefully was practical contentEach system is unique so use these techniques on your own systemDon’t be afraid to monitor. I agree log files do grow fast but manage this rather than not taking any logs at allEmail me if you have any questions, I am geeky enough to actually enjoy this

Conclusion

Page 46: Danny Kellett Java System Solutions Investigating performance outside of workflow and indexes

© 2013 WWRUG Canada Inc. All Rights Reserved 46

Wrap-up