production debugging web applications
TRANSCRIPT
© Copyright SELA Software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel | www.selagroup.com
SELA DEVELOPER PRACTICEDecember 11-15, 2016
Ido Flatow
Production Debugging Web Applications
THE STORIES YOU ARE ABOUT TO HEAR ARE BASED ON ACTUAL CASES. LOCATIONS, TIMELINES, AND NAMES
HAVE BEEN CHANGED FOR DRAMATIC PURPOSES AND TO
PROTECT THOSE INDIVIDUALS WHO ARE STILL LIVING.
For the Next 60 Minutes…IntroductionService hangsUnexplained exceptionsHigh memory consumption
Why Are You Here?You are going to hear about
Bugs in web applicationsTips for better codingDebugging tools, and when to use them
You will not leave here as expert debuggers! SorryBut… You will leave with a good starting pointAnd probably anxious to check your code
How Are we Going to Do This?What did the client report?Which steps we used to troubleshoot the issue?What did we find?How did we fix it?What were those tools we used?
The Tired WCF ServiceClient
Local bankReported
WCF service works fine for few hours, then stops handling requestsClients call the service, wait, then time outServer CPU is high
WorkaroundRestart IIS Application pool
TroubleshootingConfigured WCF to output performance counters
Used Performance Monitor to watch WCF’s counters, specifically
Instances Percent Of Max Concurrent Calls
Troubleshooting - cntdWaited for the service to hangInspected counter values
Value was at 100% (101.563% to be exact)At this point, no clients were active!
Reminder - WCF throttles concurrent calls (16 x #Cores)
Troubleshooting - cntdWatched w3wp thread stacks with Process Explorer
Noticed many .NET threads in sleep loop
Issue found - Requests hanged in the service, causing it to throttle new requestsFixed code to stop endless loop – problem solved!
The Tools in UsePerformance Monitor (perfmon.exe)
View counters that show the state of various application aspectsMost people use it to check CPU, memory, disk, and network state.NET CLR has useful counters for memory, GC, JIT, locks, threads, exceptions, etc.Other useful counters: WCF, ASP.NET, IIS, and database providers
Sysinternals Process ExplorerAlternative to Task ManagerSelect a process and view its managed and native threads and stacksExamine each thread’s CPU utilizationView .NET CLR performance counters per processhttps://download.sysinternals.com/files/ProcessExplorer.zip
Why We Do Volume TestsClient
QA team. Government collaboration appReported
MVC web application works in regular day-to-day useApplication succeeded under load testsUnder volume tests, application throws unexplained errorsReturns HTTP 500, with no specific error messageApplication logs are not showing any relevant information
WorkaroundNone. Failed under volume tests
TroubleshootingChecked Event Viewer for errors, found nothingUsed Fiddler to view the HTTP 500 response
Error text was too general, not very useful
Troubleshooting - cntdDecided to use IIS Failed Request Tracing
Luckily, the MVC app had an exception filter that used tracingCreated a Failed Request Tracing rule for HTTP 500Added the System.Web.IisTraceListener to the web.config
Waited for the test to reach its breaking point…
Troubleshooting - cntdOpened the newly created trace file in IE
Found an error! Exception in JSON serialization - string too big
Stack overflow to the rescue…
Troubleshooting - cntdRan the test again – failed again!Checked the JavaScriptSerializer serialization code
Where is MaxJsonLength set?Inspected MVC’s JsonResult codeFound the code that configured the serializer
Troubleshooting – almost doneCode fix was quite easy
But how big was our JSON string? 5MB? 1GB? Time to grab a memory dump…
return Json(data); return new JsonResult { Data = data, MaxJsonLength = NEW_MAX_SIZE};
Troubleshooting – just one more thingQuickest way to dump on an exception - DebugDiag
Troubleshooting – final piece of the puzzle
Tricky part, using WinDbg to find the values
Troubleshooting – final piece of the puzzle
Which thread had the exception - !Threads
Troubleshooting – final piece of the puzzle
Get the thread’s call stack - !ClrStackJavaScriptSerializer.Serialize takes a StringBuilder …
Troubleshooting – final piece of the puzzle
List objects in the stack - !DumpStackObjects (!dso)
Troubleshooting – final piece of the puzzle
Get the object’s fields and values - !DumpObj (!do)
The Tools in UseFiddler
HTTP(S) proxy and web debuggerInspect, create, and manipulate HTTP(S) trafficView message content according to its type, such as image, XML/JSON, and JSRecord traffic, save for later inspection, or export as web testshttp://www.fiddlertool.com
IIS Failed Request TracingTroubleshoot request/response processing failuresCollects traces from IIS modules, ASP.NET pipeline, and your own trace messagesWrites each HTTP context’s trace messages to a separate fileCreate trace file on: status code, execution time, event severityhttp://www.iis.net/learn/troubleshoot/using-failed-request-tracing
The Tools in UseDecompilers
Browse content of .NET assemblies (.dll and .exe)Decompile IL to C# or VB Find usage of a field/method/propertySome tools support extensions and Visual Studio integration
http://ilspy.nethttps://www.jetbrains.com/decompilerhttp://www.telerik.com/products/decompiler.aspx
The Tools in UseDebugDiag
Memory dump collector and analyzerCan generate stack trees, mini dumps, and full dumpsAutomatic dump on crash, hanged requests, perf. counter triggers, etc.Contains an analysis tool that scans dump files for known issueshttps://www.microsoft.com/en-us/download/details.aspx?id=49924
WinDbgManaged and native debugger, for processes and memory dumpsShows lists of threads, stack trees, and stack memoryQuery the managed heap(s), object content, and GC rootsVarious extensions to view HTTP requests, detect dead-locks, etc.https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk
Leaking Memory In .NET – It Is Possible!Client
Local insurance companyReported
Worker process memory usage increase over timeNot sure if it’s a managed or a native issue
WorkaroundIncrease application pool recycle to twice a day
TroubleshootingFirst, need to know if the leak is native or managedChecked process memory with Sysinternals VMMap
Looking at multiple snapshots, seems to be managed (.NET) related
Troubleshooting - cntdTime to get some memory dumps
Need several dumps, so we can compare themVery simple to do, using Windows Task Manager
Next, open them and compare memory heaps
Troubleshooting - cntdCompared the dumps with Visual Studio 2015 (Requires the Enterprise edition)
Troubleshooting - cntdDidn’t take long to notice the culprit and reason
Hundreds of DimutFile objects, each containing large byte arrays
Troubleshooting - cntdThese objects were not “leaked”, they were cached!
Recommended fix includedDo not cache many large objectsCache using an expiration (sliding / fixed)
Troubleshooting – wait a second…The memory diff. had another suspicious leak
Why are we leaking the HomeController?
Troubleshooting - cntdChecked roots
Controller is also cached, why?Referenced by the CacheItemRemovedCallback event
Troubleshooting - cntdChecked the code for last time
CacheItemRemoved is registered to the event, but it is an instance methodNote - adding instance method to a global event may leak its containing object
The fix - change the callback method to static
The Tools in UseSysinternals VMMap
Helps in understanding and optimizing memory usageShows a breakdown of the process memory typesDisplays virtual and physical memoryCan show a detailed memory map of address spaces and usagehttps://technet.microsoft.com/en-us/sysinternals/vmmap.aspx
Visual Studio managed memory debug (Enterprise)Part of Visual Studio’s dump debuggerDisplays list of object types and their inclusive/exclusive sizesTracks each object’s root pathsCompare memory heaps between dump fileshttps://msdn.microsoft.com/en-us/library/dn342825.aspx
When SSL/TLS Fails…Client
Airport shuttle service siteReported
Application suddenly fails to communicate with external services over HTTPSError is “Could not establish trust relationship for the SSL/TLS secure channel”Cannot reproduce the error in dev/test
WorkaroundRestart IIS (iisreset.exe )
TroubleshootingChecked Event Viewer for any related error
Found the SSL/TLS error in the Application and System logs
According to MSDN documentation, error code 70 is protocol version support
Troubleshooting - cntdUsed Microsoft Message Analyzer (network sniffer) to watch the TLS handshake messages
Before issue starts – client asks for TLS 1.0, handshake completes
After issue starts – client asks for TLS 1.2, handshake stops
Troubleshooting - cntdChecked the Server Hello, it returned TLS 1.1, not 1.2
Switched to TCP view to verify client’s behaviorClient indeed sends a FIN, and server responds with an RST
Troubleshooting – moment of clarityDeveloper remembered adding code to support new Paypal standards of using only TLS 1.2
Code set to only use TLS 1.2, removing support for TLS 1.0 and 1.1
Suggested fixUse enum flags to support all TLS versions – Tls | Tls11 | Tls12This is the actual default for .NET 4.6 and onFor .NET 4.5.2 and below – default is Ssl3 | Tls
The Tools in UseMicrosoft Message Analyzer
Replaces Microsoft’s Network Monitor (NetMon)Captures, displays, and analyzes network trafficCan listen on local/remote NICs, loopback, Bluetooth, and USBSupports capturing HTTPS pre-encryption, using Fiddler proxy componenthttps://www.microsoft.com/en-us/download/details.aspx?id=44226
Event Viewer (eventvwr.exe)Discussed previously
Additional Tools (for next time…)Process monitoring
IIS Request Monitoring, Sysinternals Process MonitorTracing and logs
PerfView (CLR/ASP.NET/IIS ETW tracing), IIS/HTTP.sys logs, IIS Advanced Logging, Log Parser Studio
DumpsSysinternals ProcDump, DebugDiag Analysis
Network sniffersWireshark
How to Start?Understand what is happeningBe able to reproduce the problem ”on-demand”Choose the right tool for the taskWhen in doubt – get a memory dump!
ResourcesYou had them throughout the slides
My Info@IdoFlatow // [email protected] // http://www.idoflatow.net/downloads