Download - Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten
![Page 1: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/1.jpg)
Von Schweinen, Schlangen & Papierschnitten
Das 1x1 des Performance Troubleshooting
Rainer SchuppeAppDynamics GmbH
![Page 2: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/2.jpg)
about me
• Customer Support
• System Support / Ops
• Consultant / Dev
• Solution Architect
• Sales Engineer
![Page 3: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/3.jpg)
Where to start? What to do? Who to blame?
Tooling
Symptoms
Diagnose
Oh no! Not again!or: Why care about performance
![Page 4: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/4.jpg)
43 %
57 %
How Many User Abandon Your Slow Website After 3 Seconds?*
These StayAnd SufferThrough A
Poor Experience
These LeaveAnd Find YourCompetitor
*PhoCusWright and Akamai study
![Page 5: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/5.jpg)
35 %
65 %
And What About 18-24 Year Olds After Only 2 Seconds Of Waiting? *The Future OfYour BusinessJust Left andFound Your
Competition
*PhoCusWright and Akamai study
![Page 6: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/6.jpg)
BIG DATA
Hadoop Cassandra MongoDB
Coherence
Memcached
CLOUD
Amazon EC2 Windows Azure
VMWare
Login Search Flight
View Flight Status Make Reservation
Weblogic Oracle
.NET
MQ
ATG, Vignette, Sharepoint
SQL Server
JBoss
Tomcat
Tomcat
Mule, Tibco, AG
ESB
.NET
Tomcat
SOA
WEB 2.0
Browser Logic AJAX Web Frameworks
Release 3.4 Release 3.5 Release 3.6 Release 4.0
AGILE
Release 1.1 Release 1.2 Release 1.23 Release 1.5
Release 4.4 Release 4.5 Release 4.6 Release 5.0
Release 2.4 Release 2.5 Release 2.6 Release 3.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
Complexity increases
![Page 7: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/7.jpg)
Generic Troubleshooting Process
DiagnosisTriageAlert / Detection
Fix Solution Finding
Rootcause Detection
Move on with life
Data / Information
![Page 8: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/8.jpg)
• Determine who needs to fix it
• Starts with overview and comparison to „normal“ performance
• First level task (Operators)
• First indication of problem type
• Needs transactional data
Triage
![Page 9: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/9.jpg)
• 46,463 Checkouts processed◦ 482 returned an error, 1325 were slow, 576 were very
slow and 111 stalled.• 3,956 Payments processed
◦ 12 returned an error, 242 were slow, 96 were very slow and 79 stalled
Business Transactions can help
![Page 10: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/10.jpg)
BIG DATA
Hadoop Cassandra MongoDB
Coherence
Memcached
CLOUD
Amazon EC2 Windows Azure
VMWare
Login Search Flight
View Flight Status Make Reservation
Weblogic Oracle
.NET
MQ
ATG, Vignette, Sharepoint
SQL Server
JBoss
Tomcat
Tomcat
Mule, Tibco, AG
ESB
.NET
Tomcat
SOA
WEB 2.0
Browser Logic AJAX Web Frameworks
Release 3.4 Release 3.5 Release 3.6 Release 4.0
AGILE
Release 1.1 Release 1.2 Release 1.23 Release 1.5
Release 4.4 Release 4.5 Release 4.6 Release 5.0
Release 2.4 Release 2.5 Release 2.6 Release 3.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
100 ms
50 ms
45,3 ms50 ms
10 ms60 ms
150 ms160 ms
145 ms
145 ms145 ms
145 ms145 ms
10 ms
250 ms
300 ms300 ms
310 ms
15 ms
1 ms 250 ms
![Page 11: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/11.jpg)
BIG DATA
Hadoop Cassandra MongoDB
Coherence
Memcached
CLOUD
Amazon EC2 Windows Azure
VMWare
Login Search Flight
View Flight Status Make Reservation
Weblogic Oracle
.NET
MQ
ATG, Vignette, Sharepoint
SQL Server
JBoss
Tomcat
Tomcat
Mule, Tibco, AG
ESB
.NET
Tomcat
SOA
WEB 2.0
Browser Logic AJAX Web Frameworks
Release 3.4 Release 3.5 Release 3.6 Release 4.0
AGILE
Release 1.1 Release 1.2 Release 1.23 Release 1.5
Release 4.4 Release 4.5 Release 4.6 Release 5.0
Release 2.4 Release 2.5 Release 2.6 Release 3.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
Release 1.4 Release 1.5 Release 1.6 Release 2.0
�Problem
![Page 12: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/12.jpg)
![Page 13: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/13.jpg)
• Determine the root of the problem
• Uses first level information to narrow scope
• Needs specialists
• Lots of data / information needed in real time and historical
• Usually needs iterations
• More than 1 tool used in the process
Diagnose
![Page 14: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/14.jpg)
• Confirm the rootcause after you diagnosed it
• Document it
• Recreate it in test if possible
• Needs the same data as diagnostics
Rootcause detection
![Page 15: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/15.jpg)
• Find a solution for the problem
• Architect a workaround or a fix
• Again needs the diagnostic data
• Run some test runs with different options - check them in realtime
• Confirm the idea for the fix
• May be a different team then the trouble shooters
Solution finding
![Page 16: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/16.jpg)
• Intuition
• Experience
• Tools
• Logfiles
• Communication
How to get the data?
![Page 17: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/17.jpg)
© val-j - sxc.hu
Tooling
![Page 18: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/18.jpg)
Concurrency Data Volume Resource
3 Key Things Impact Performance & Availability
![Page 19: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/19.jpg)
Development
Data Volume ResourceConcurrency
QA/Test
Data Volume ResourceConcurrency
Production
Data Volume ResourceConcurrency
Why do things crash and slow down?
![Page 20: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/20.jpg)
LoggingARMBytecode Instrumentation / AspectsSamplingJMX (Java Management Extensions)PMI (IBM WebSphere specific)
Technologies DevTestProd
![Page 21: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/21.jpg)
Pros:
• Anything can be logged
• Easy to implement (if you have the sourcecode)Cons:
• Only what the developer thinks is needed
• I/O heavy
• No chance for change if you don‘t own the source code
• Lots of files - no TX context usually
• How to correlate in distributed environment?
Logfiles DevTestProd
![Page 22: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/22.jpg)
Logfiles[#|2013-04-16T16:04:44.319+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|
_ThreadID=14;_ThreadName=pool-1-thread-9;|Starting to initialize the Top Summary Stats Data Store timer|#]
[#|2013-04-16T16:04:44.335+0200|INFO|sun-appserver2.1|com.appdynamics.TOP.SUMMARY.STATS.WRITE|_ThreadID=14;_ThreadName=pool-1-thread-9;|START TIME for timer service(TopSummaryStatsWriterTimerTaskBean) will be: Tue
Apr 16 16:05:00 CEST 2013|#]
[#|2013-04-16T16:04:44.338+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Successfully initialized the Top Summary Stats Data Store timer|#]
[#|2013-04-16T16:04:44.338+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Starting to initialize the Top Summary Stats Data Purger timer|#]
[#|2013-04-16T16:04:44.369+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Successfully initialized the Top Summary Stats Data Purger timer|#]
[#|2013-04-16T16:04:44.369+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Starting to initialize the Top Summary Stats Detail String cache timer|#]
[#|2013-04-16T16:04:44.376+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Successfully initialized the Top Summary Stats Detail String cache timer|#]
[#|2013-04-16T16:04:44.376+0200|INFO|sun-appserver2.1|com.singularity.ee.controller.beans.ControllerManagerBean|_ThreadID=14;_ThreadName=pool-1-thread-9;|Starting to initialize the Top Summary Stats rollup timer|#]
![Page 23: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/23.jpg)
Pros:
• No config needed
• Lots of data - lots of detailCons:
• Lots of data - not suitable for production
• Needs experience
• No transactional concept / context
Profiler DevTest
![Page 24: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/24.jpg)
Profiler
![Page 25: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/25.jpg)
Pros:
• Built into most application servers
• JConsole is part of the JDK
• Easy to implement MBeansCons:
• No transaction context
• Not available for 3rd party
• No historical data
• Usually one JVM only
JMX (and similar) DevTestProd
![Page 26: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/26.jpg)
JMX (and similar)
![Page 27: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/27.jpg)
Pros:
• They are free
• Transaction context (most of them)
• Quick setup (the commercial ones)Cons:
• Usually functionally constrained (commercial)
• Hard to configure (open source)
• Usually no history
APM tools (free) DevTestProd
![Page 28: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/28.jpg)
Pros:
• Transactions, Historical data
• Distributed monitoring
• Deep dive diagnostics
• Production fitCons:
• Costly
• Choose the right one
APM tools (commercial) DevTestProd
![Page 29: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/29.jpg)
http://java.dzone.com/articles/java-performance-troubleshooti-0
Link Tip
![Page 30: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/30.jpg)
There are just 2 sorts of issues
Diagnosis
![Page 31: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/31.jpg)
codecentric AG 31
© NLTeddy - sxc.hu
![Page 32: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/32.jpg)
codecentric AG 32
© ross666 - sxc.hu
![Page 33: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/33.jpg)
• Constantly slow (Turtle)• Slowly, but constantly slower• Exponentially slower• Suddenly slower • Sporadically slow• Spontaneous crash
50 shades of slow (appx.)
![Page 34: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/34.jpg)
• Sudden outage• Always erroneous• Sporadically Errormessages• Silent death / Bleed to death• Increasing errorrates• Wrong / meaningless error messages
The wonderful world of errors
![Page 35: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/35.jpg)
• Look at symptoms• Eliminate definite non-causes• Prioritize the suspicions• Confirm suspicion / Eliminate suspicion
• Compare with „normal“• Gather more information• Define root cause and confirm it• Redo from Start
Diagnosis – Rough Flow
![Page 36: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/36.jpg)
• Bad Coding• Too much load• Backend not reachable / slow• Conflicting resources• Memory Leak• Resource Leak• Network / Hardware Problem
Possible Causes(in no particular order)
![Page 37: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/37.jpg)
• Consistent slowness• Slower and slower against some variable
• Time / Load
• Sporadic hangs / random errors• Foreseeable lockups• “Sudden chaos”• High utilization of resources (CPU,
memory, network, etc.)
Possible Symptoms
![Page 38: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/38.jpg)
The Causes
![Page 39: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/39.jpg)
Linear Memory Leak• Symptoms:
• OOM (Out of memory error)• Slow over time with spikes• Hockeystick graph
• Causes• Objects added to linear structures without being removed
(e.g., linked lists)• Other API misuse (addListener() without corresponding
removeListener(), etc.)
![Page 40: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/40.jpg)
• Aggregate detection: • linear growth in heap utilization• GC time growth
• Specific detection:• Figure out object types being leaked• Verbose GC• Find related APIs and search code for misuse
Linear Memory Leak
![Page 41: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/41.jpg)
• Challenges• References - many small objects are referenced in one
collection• Death by 1000 cuts (Papierschnitte)
• Specific detection:• Figure out object types being leaked• Verbose GC• Find related APIs and search code for misuse
Linear Memory Leak
![Page 42: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/42.jpg)
• Heap Dump Comparison• Needs at least 2 dumps• Stops the JVM• Can take several minutes each• Creates tons of data• Finds the object, not the code responsible for the leak
• Profiler• High overhead - not for production• Lots of data
• APM Solution• Collection based algorithm – finds only collection leaks• Instance counting• Trade off between low overhead and usefulness of data
Specific detection
![Page 43: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/43.jpg)
![Page 44: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/44.jpg)
![Page 45: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/45.jpg)
![Page 46: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/46.jpg)
![Page 47: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/47.jpg)
• Causes:• Objects added to most data structures
without being removed (e.g., vectors, hashtables)
• Other API misuse (as Linear Leak)• Aggregate detection:
• exponential growth in heap• Specific detection:
• Same as Linear Leak
Exponential Memory Leak
![Page 48: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/48.jpg)
• Causes:• API misuse of Java objects with resource-
style lifecycle (create->use->destroy)• Aggregate detection:
• Slow over time• Growth in heap (if you’re lucky)
• Specific detection:• Audit code for API misuses• Object instance tracking
Resource Leak
![Page 49: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/49.jpg)
• Causes:• Overcautious data integrity strategy• Synchronising is always good
• Aggregate detection: • Stalled threads• High thread usage - low CPU usage
• Specific detection:• Thread dumps as needed• Stack traces / graphs• CPU block / wait timing measurement
Resource conflict / blocking
![Page 50: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/50.jpg)
Resource conflict (bolck / wait)
![Page 51: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/51.jpg)
Trx/min
Avg RTPool LimitPool Usage
Trx Stalls
Production Ground to a halt for 2 hours And again the next day
![Page 52: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/52.jpg)
• Causes:• Infinite loop in code
• Aggregate detection: • Stalled threads• Permanently high usage of CPU / threads
• Specific detection:• Thread dumps as needed• Stack traces / graphs
Bad Coding: Infinite Loop
![Page 53: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/53.jpg)
• Causes:• Idiot with a “Learn Java in 24 Hours” book
• Aggregate Detection: • Response time measurement• Aggregate CPU utilization
• Specific Detection: • Detailed CPU utilization
• Typical Cure:• Cache of data or of performed calculations
Bad Coding: CPU-Bound Component
![Page 54: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/54.jpg)
• Causes:• Poorly implemented data bridge layer, or simply
too many of them• DB -> XML -> XSLT -> More XML -> “Custom
Data Management Layer” -> Consumer
• Aggregate Detection: • Response time measurements
• Specific Detection: • Call graphs - Call trace (stack trace not
enough)• Ask for a design or architecture document
Layer-itis
![Page 55: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/55.jpg)
• Causes:• Hibernate fixes everything• Massive SQL statements (length and amount)• Wrong data strategy
• Aggregate Detection: • Response time measurements• DB time measurements
• Specific Detection: • Call stacks / snapshots
O/R Mapper misuse
![Page 56: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/56.jpg)
Caching issues
![Page 57: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/57.jpg)
• Causes: • Continual attempts to call backend +
unavailable backend• Aggregate Detection / Specific Detection:
• Response time measurement• Backend detection - measurement (time
& # of calls)• Stalled TX count• Exceptions • Busy thread count
The Unending Retry
![Page 58: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/58.jpg)
don’t forget about thrown exceptions
![Page 59: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/59.jpg)
• Causes: • Fundamental error in threading / lock
acquisition strategy• Aggregate Detection:
• Stalled threads / permanently high concurrent usage
• Specific Detection: • Deadlock detection in JVM• Thread dumps• Busy thread count
Threading: Deadlock / Livelock
![Page 60: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/60.jpg)
Found one Java-level deadlock:============================="Thread-2": waiting to lock monitor 102054308 (object 7f3113800, a java.lang.Object), which is held by "Thread-1""Thread-1": waiting to lock monitor 1020348b8 (object 7f3113810, a java.lang.Object), which is held by "Thread-2" Java stack information for the threads listed above:==================================================="Thread-2": at DeadlockTest$2.run(DeadlockTest.java:42) - waiting to lock <7f3113800> (a java.lang.Object) - locked <7f3113810> (a java.lang.Object) at java.lang.Thread.run(Thread.java:680)"Thread-1": at DeadlockTest$1.run(DeadlockTest.java:26) - waiting to lock <7f3113810> (a java.lang.Object) - locked <7f3113800> (a java.lang.Object) at java.lang.Thread.run(Thread.java:680)
Threading: Deadlock
![Page 61: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/61.jpg)
• Causes: • Many threads bottlenecked waiting for
one lock• Aggregate Detection:
• Stalled threads / high concurrent usage• Exponential slowness• Low CPU usage
• Specific Detection: • Request response time monitoring• CPU block / wait timing
Threading: Chokepoint
![Page 62: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/62.jpg)
Threading: Chokepoint
![Page 63: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/63.jpg)
• Causes: • Overusage of internal resource (threads,
database connections, etc.)• Underallocation of same
• Aggregate Detection:• Stalled threads / high concurrent usage• Call rate and average response time of internal
resource• Specific Detection:
• Also compare with methods from Resource Leak, External Bottleneck, and Overusage of External System
Internal Resource Bottleneck
![Page 64: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/64.jpg)
• Causes: • External system (database, authentication server) is
slow• Compare with Overusage of external system
• Aggregate Detection:• Response time on backend calls• Exceptions
• Specific Detection: • Callgraphs• Specific monitoring on those backends
External Bottleneck
![Page 65: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/65.jpg)
Commit happy
![Page 66: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/66.jpg)
Trx/min
Avg RTPool LimitPool Usage
Trx Stalls
Production Ground to a halt for 2 hours And again the next day
![Page 67: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/67.jpg)
• Causes: • Poor design or tuning of interaction with backend system
(e.g., join between two million-row tables for each user logon)
• O/R mapper misconfiguration• Aggregate Detection:
• Response time measurement• Specific Detection:
• Timing on backend systems• Also need tools for those backend systems
Overusage of External System
![Page 68: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/68.jpg)
excessive database access
![Page 69: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/69.jpg)
query too much data
![Page 70: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/70.jpg)
• One interesting problem occurs when the size of transactions with backend systems needs to be tuned
• Can be intertwined with / exacerbated by Layer-itis and Overusage of External System
Many small requests
System constantly wastes resources
dispatching / unmarshalling many xactions and results
“Death by a thousand cuts”
One HUGE request
System periodically slows to a crawl as many resources get
thrown at large chunk of work
“Pig in a Python”
“Just Right”
![Page 71: Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Papierschnitten](https://reader033.vdocuments.site/reader033/viewer/2022052315/5563a6b1d8b42a2d538b56de/html5/thumbnails/71.jpg)
Fragen ?