webinar: node.js transaction tracing & root cause analysis with strongloop arc

34
TRANSACTION TRACING & ROOT CAUSE ANALYSIS WITH STRONGLOOP ARC Jordan Kasper | Developer Evangelist

Upload: jguerrero999

Post on 15-Aug-2015

74 views

Category:

Software


0 download

TRANSCRIPT

TRANSACTION TRACING & ROOTCAUSE ANALYSIS WITH

STRONGLOOP ARCJordan Kasper | Developer Evangelist

STEP ONEThe first step in monitoring, profiling, and tracing your

Node application is to run it in a process manager!

BUILD YOUR APP WITH SLC~$ npm install ­g strongloop

~/my­app$ slc build...~/my­app$ ls... ... my­app­0.1.0.tgz

INSTALL AND RUN STRONG PMOn your deployment machine...

~$ npm install ­g strong­pm~$ sl­pm­install

DEPLOY TO STRONG PMFrom our development machine (or staging, etc)...

~/my­app$ slc deploy http+ssh://myserver.com:8701

RUNNING LOCALLYIf you need to profile things locally (your machine or a

staging/testing server), run slc start from your app directory:~/my­app$ slc startProcess Manager is attempting to run app ..

To confirm it is started: slc ctl status tracing­example­app To view the last logs: slc ctl log­dump tracing­example­app ...

Then start the Arc UI:~/my­app$ slc arc

METRICS AND MONITORING

VIEWING METRICS

AVAILABLE METRICSCPU Load (system)Heap Memory sageEvent Loop CountEvent Loop Tick TimingHTTP ConnectionsDatabase Connections (Oracle, MySQL, Mongo, Postgres)Misc other modules (Redis, Memcache(d), Message queues)

WHAT DO I LOOK FOR?CPU Usage is pretty obvious, just watach your high points!With Heap Memory Usage you want to see a "sawtooth"chart, each drop indicates garbage collection. No drop is

bad!

WHAT DO I LOOK FOR?

WHAT DO I LOOK FOR?The two Event Loop metrics are opposed. You want theloop count to remain high under normal load (more ticks

per metrics cycle is good). Any dips may be bad.The Loop timing, on the other hand, indicates how long

event loop ticks are taking. Any spikes here are bad!

SETUP METRICS COLLECTIONOn our production machine, with strong-pm installed,

simply set the collection location:~$ export STRONGLOOP_METRICS="log:/path/to/api­metrics.log"

~$ export STRONGLOOP_METRICS="syslog"

~$ export STRONGLOOP_METRICS="statsd://my­log­server.com:1234"

~$ export STRONGLOOP_METRICS="graphite://my­log­server.com:1234"

~$ export STRONGLOOP_METRICS="splunk://my­log­server.com:1234"

SETUP METRICS COLLECTIONAlternatively, on the production machine you can run:

~$ sl­pm­install ­­metrics <url>

Or during runtime:~$ slc ctl env­set my­app STRONGLOOP_METRICS=<url>

PROFILING

PROFILINGWe can spot issues using the metrics being monitored, but

now we need to find the cause of those issues.Profiling CPU usage and memory is the way to do this.

PROFILING IN ARC

CPU PROFILES

MEMORY PROFILES

PROGRAMMATIC MEMORY MONITORINGIf we have memory issues, it may be helpful to monitor

memory usage dynamically.~$ npm install heapdump ­­save

var heapdump = require('heapdump');var THRESHOLD = 500;

setInterval(function () var memMB = process.memoryUsage().rss / 1048576; if (memMB > THRESHOLD) process.chdir('/path/to/writeable/dir'); heapdump.writeSnapshot(); , 60000 * 5);

MEMORY MONITORINGCaution: Taking a heap snapshot is not trivial on

resources.If you already have a memory problem, this could kill your

process!Unfortunately sometimes you have no alternative.

SMART PROFILINGHow can we using the monitoring to profile?

"smart profiling" based on event loop blockage~$ slc ctl cpu­start 1.1.49408 20

1. Monitors a specific worker ( 1.1.49408 )2. Event loop blocked for more than 20ms, start CPU profile3. Stop profiling once event loop resumes

FINDING THE WORKER ID~$ slc ctl statusService ID: 1Service Name: my­appEnvironment variables: No environment variables definedInstances: Version Agent version Cluster size 4.1.0 1.5.1 4Processes: ID PID WID Listening Ports Tracking objects? CPU profiling? 1.1.49401 49401 0 1.1.49408 49408 1 0.0.0.0:3001 1.1.49409 49409 2 0.0.0.0:3001 1.1.49410 49410 3 0.0.0.0:3001 1.1.49411 49411 4 0.0.0.0:3001

TRANSACTION TRACING

DEEP TRANSACTIN TRACINGAnalyze performance of your application from a high level

down to the function level.

RESOURCE CONSUMPTION TIMELINE

ANOMOLY INSPECTIONSee something off?

Click on that point in the resource usage chart.(The orange triangles at the bottom identify anomolies

betond three-sigma deviations.)

VIEW TRACE SEQUENCES

TRACING WATERFALL

By clicking on the "sync" line we can inspect the costs of thesynchronous code.

FLAME CHARTS

FLAME CHARTSThe flame chart identifies each function in the call stack,

organized in color by module.The size of the bar represents the total resource

consumption for that function and all of its function calls.Clicking on a function shows that functions resource usage.

LOOKING FOR MORE?Check out our blog post on Transaction Tracing and

identifying a DoS attack!http://bit.ly/arc-tracing

THANK YOU!QUESTIONS?

Jordan Kasper | Developer EvangelistJoin us for more events!

strongloop.com/developers/events