advanced trouble-shooting of critical real-time systemsslide subtitle advanced trouble-shooting of...
TRANSCRIPT
![Page 1: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/1.jpg)
Slide subtitle
ADVANCED trouble-shooting
of real-time systems
Bernd Hufmann, Ericsson
![Page 2: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/2.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 2
AGENDA
2
4 References
3 Timing Analysis
1 Introduction
Trace Compass Overview
Q&A5 Q&A
![Page 3: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/3.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 3
› Troubleshooting tool
› Framework to build trace visualization and analysis tools
› Scalable: handle traces exceeding memory
› Extensible for any trace or log format: Binary, text, XML etc.
› Reusable views and widgets
› Available as standalone product or set of plug-ins
Trace Compass Overview
![Page 4: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/4.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 4
Trace Compass Overview
![Page 5: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/5.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 5
Data Flow
![Page 6: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/6.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 6
Data Flow
![Page 7: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/7.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 7
› Events Table
COMMON Features
![Page 8: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/8.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 8
› Searching
› Filtering
› Highlighting
COMMON Features
![Page 9: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/9.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 9
› Trace annotation (bookmarks) and markers
COMMON Features
![Page 10: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/10.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 10
StateFUL Analyses
![Page 11: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/11.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 11
› Pattern analysis
– Find a sequence of data within
a trace
› Customize Trace Compass
without adding code
–Generate state systems
– Do timing analysis
– Define specialized views
XML Analysis & views
![Page 12: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/12.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 12
› Extensible view to display of call stacks over time
› LTTng-UST and finstrument-functions of GCC
Call stack View
![Page 13: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/13.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 13
› Trace Compass can open multiple traces together to view it as one
– This is called an Experiment
› Useful for
– Traces coming from multiple nodes
– Traces from applications written in different languages
– Different layers (network, etc.)
› Traces can be synchronized by time
–Manually
– Automatic algorithm (extensible)
Trace Correlation
![Page 14: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/14.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 14
› Linux Tracing Toolkit - LTTng (UST, Kernel)
› Text & XML Logs (custom parsers)
› Common Trace Format – CTF
– application, kernel, HW, bare metal, etc.
› Packet Capture
› Best Trace Format - BTF
› GDB Trace Points
Built-in Trace Types
![Page 15: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/15.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 15
› Real-time systems
› We have two metrics to analyse
› what is the data and when did it come
› Timing is as important as data
› Measure time between a start and end state
– Simple: Start and end event
–Often: State Machine to determine start and end
› Represent execution times, latencies, latency chains etc.
TIMING ANALYSIS
![Page 16: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/16.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 16
› Locate timing problems
› Missed deadlines
› Potential missed deadline (find problem before it occurs)
› Analyze timing problems
› Find root cause and solution
› Solve difficult to debug sporadic problems
TIMING ANALYSIS
![Page 17: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/17.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 17
› Soft IRQ Latency
EXAMPLE
softIrq_raise
Softirq_handler_ent
ry
Latency 1
softirq_handler_exit
Latency 2
Total
Latency
softirq_raise
softirq_handler_entry
Latency 1
Latency 2
Total
Latency
Parameter: CPU ID, IRQ #
![Page 18: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/18.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 18
› Time between start and end
› Time for each transition
› Percentage sub-duration vs total
Generalization
![Page 19: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/19.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 19
› Define a state machine for timing analysis
– Implementation in Java as Trace Compass extension
– Data-driven pattern matching (in XML)
› Defining timing analyses on-the-fly
› Store in a built-in segment store
› Visualize data in various supplied views
Your Timing Analysis
![Page 20: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/20.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 20
Visualization
› Table
–Get raw data
– Explore data
– Sorting, highlighting, filtering
› Scatter Chart
– Latency vs Time
– Have a big picture of the
current range
![Page 21: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/21.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 21
Visualization
› Statistics
–Min, max, average etc.
– Find worst offenders
– Find worst possible offender
combination
› Distribution Chart
– Find outliers and modes easily
![Page 22: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/22.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 22
› Locate timing problems
› Missed deadlines
› Potential missed deadline (find problem before it occurs)
› Analyze timing problems
› Find root cause and solution
› Solve difficult to debug sporadic problems
TIMING ANALYSIS
![Page 23: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/23.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 23
› System overload
› System misconfiguration (e.g. wrong priorities of tasks)
› Priority inversion
– Lower priority task is blocking higher priority task (indirectly)
› Blocked threads, starvation, deadlock
› Slow code
Example Root Causes
![Page 24: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/24.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 24
Resources View
› Displays resources states (color-coded) over time
– CPUs, IRQs, SoftIRQs
![Page 25: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/25.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 25
Critical Path
› Displays of system wait chains for given process
![Page 26: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/26.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 26
PRIORITY VIEW
› Group processes per CPU and priority
› Quickly find priority inversion or misconfigured task priorities
› Note: View not mainlined yet – Prototype!
![Page 27: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/27.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 27
› Find contention at the Kernel
level using LTTng
› Realized as XML pattern
analysis
› Count of simultaneous waits
› Show all in timing analysis views
› Uaddr vs Thread Gantt chart
FUTEX analysis
![Page 28: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/28.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 28
OS Tracing Overview
› Overloaded resources
› CPU, Memory and IO Usage
› Counter-intuitive example, CPU
usage too low:
– Kernel memory usage is rising
› Find the offending process
– IO usage is high
› Maybe it’s swaps
– Too many seeks?
› Low IO, low CPU, low memory
usage and low bandwidth
![Page 29: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/29.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 29
FLAME GRAPH View
› Aggregation of function durations per call stack
› Highlights most time consuming execution path
› Find functions for performance optimization
![Page 30: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/30.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 30
Future Development
› User-configurable periodic markers
› Custom charts
› Enhanced call graph analysis and views
› Call stack views using data-driven analysis
› Pin & clone of views
› Time based import of traces/experiments
› Scalable segment store
› Enhanced searching, filtering and highlighting in Gantt charts
› Data-driven analysis and view enhancements
› Cropping of traces
› Priority view
› …
![Page 31: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/31.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 31
› Project pages
– http://tracecompass.org
– http://projects.eclipse.org/projects/tools.tracecompass
› Documentation
– Trace Compass User Guide
– Trace Compass Developer Guide
REFERENCES
![Page 32: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/32.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 32
› Linux Tracing Toolkit (LTTng)
– http://lttng.org/
› Diagnostic and Monitoring Working Group
– http://diamon.org/
› Common Trace Format (CTF)
– http://diamon.org/ctf/
› Trace Research Project
– http://hsdm.dorsal.polymtl.ca/
REFERENCES
![Page 33: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/33.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 33
› Mailing list
› IRC
– oftc.net #tracecompass
› Mattermost
– https://mattermost-test.eclipse.org/eclipse/channels/trace-compass
CONTACTS
![Page 34: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/34.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 34
![Page 35: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/35.jpg)
![Page 36: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/36.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 37
› Custom Text and XML Parsers
– Line based parser with regex
– XML based extracting data from
XML elements and their attributes
Custom Parsers
![Page 37: ADVANCED trouble-shooting of critical real-time systemsSlide subtitle ADVANCED trouble-shooting of real-time systems Bernd Hufmann, Ericsson](https://reader033.vdocuments.site/reader033/viewer/2022041910/5e66e9871c2065560909af7e/html5/thumbnails/37.jpg)
ADVANCED trouble-shooting of critical real-time systems | © Ericsson AB 2017 | 2017-02-21 | Page 39
› High Resolution Timer – cyclictest application of rt-tests
› Latency between timer expiry till task starts
› Latency = Δ1+ Δ2 + Δ3 + Δ4
EXAMPLE
Event: 1 2 3 4 5 1
Δ1 Δ2 Δ3 Δ4 task …
› Event 1: Timer expires
› Event 2: Interrupt begins executing
› Event 3: Interrupt handler marks the task to react
› Event 4: Linux scheduler switches to the task
› Event 5: Application task begins executing