keystone real time trace workshop - texas...
TRANSCRIPT
TI Confidential – NDA Restrictions 1
Keystone Real Time Trace Workshop March 2013
Vikas Varshney Imtaz Ali Engineering Manager – Debug and Emulation Engineering Manager – Analysis Tools
TI Confidential – NDA Restrictions
Agenda
• Tools roadmap overview (15 minutes)
• Multicore debug and trace features (15 minutes)
• Workshops (120 minutes)
– Setup overview
– WS1: Getting started with DSP trace
– WS2: Customize DSP trace for data tracing
– WS3: Hotspot analysis with function profiling, stalls, and cache analysis
– WS4: Getting started with non-intrusive system trace (STM) SoC profiling
– WS5: Customize SoC profiling for DDR and MSMC bandwidth and latency analysis
– WS6: DDR bandwidth analysis and latency by interfacing embedded APIs (cToolsLib)
– WS7: Command line trace decode interface for offline decode
– WS8: Getting started with Cortex A program execution trace and function profiling
• SerDes based trace receiver – preview (15 minutes)
• Q & A (15 minutes)
2
TI Confidential – NDA Restrictions
Roadmaps
3
TI Confidential – NDA Restrictions
CCS Keystone support roadmap
4
CC
S v5
Eclipse 3.8 GUI Composer System Analyzer 1.1 XDS2xx Support Enabler for:
Keystone2
Future Production
Early Adopter In Development
5.5.0
CCSv6 GA
Windows 8 Host Support Trace Usability Improvements I CGT Optimizer Assistant Keystone2 Simulators
CCSv6 EA
4Q’12 1Q’13 2Q’13 3Q’13
5.4.0
26 Nov’12 15 Apr’13 Aug’13
Trace Analyzer: Code Coverage Statistical Function Profiling
5.3.0
Release
4Q’13
6.0.0 Beta
Edison / Lamarr Support Enhancements to System and Trace Analysis CGT C6000 v8 Smaller downloadable images
CC
S v6
Eclipse v4.3 Cortex A15 Hypervisor debug support Trace Usability Improvements II
Code Coverage Statistical Function Profiler Context Aware Function Profiler
CGT C6000 v8 Beta New installer framework (InstallBuilder)
Apr’13 to Jun’13 Release Jul’13 to Oct’13 Dec’13
Eclipse v4.3 (Pre-Release Version) Partial / Incremental implementation of new
features & enhancements Bug fixes
4
TI Confidential – NDA Restrictions
XDS roadmap
Y10 1Q11 2Q11 3Q11 4Q11 1H12 2H12 Future
Plan/Future
Production
Early Adopter/ Beta
In Development
XDS100 v2 • Entry level JTAG • USB 2.0 • TI 14 and CTI 20 native • Open HW reference design
XDS100 v3 (cJTAG) • Entry level JTAG & cJTAG • USB 2.0 •TI 14 and CTI 20 native •Open HW reference design
XDS200 • Performance JTAG & cJTAG at low cost • ARM SWD and SWO support • USB 2.0 and optional ENET • Low cost embedded emulator option • Bi-direction GPIOs for instrumentation • TI, MIPI, and ARM connector option • 3P EPK licensed
Future On-chip
XDS Pro Trace $ 3.5K
XDS560 $3K
XDS560T Trace Receiver • External DSP trace receiver • USB 2.0 •10 pin @ 167 MHz DDR • 64 MB trace storage buffer • Auto skew & jitter calibration • TI 60 native • Discontinued
XDS560T $10K
XDS510 $1.5K
XDS100 v2 < $100
XDS560 v2 STM
$1.5K
XDS510 • Performance JTAG • USB 2.0 • TI 14 and CTI 20 native • 3P EPK licensed
XDS560 • High performance JTAG • PCI and USB 2.0 • TI 14 and CTI 20 native • 3P EPK licensed
XDS560 v2 STM • High performance JTAG & cJTAG • USB 2.0 and ENET • STM receiver • 4 pin @ 100 MHz •128 MB storage • MIPI 60 native • 3P EPK licensed
XDS Pro Trace Receiver • Multi-channel ARM, DSP, and STM • USB 2.0 and ENET • 32 pin @ 250 MHz DDR • 2 GB trace storage buffer • Auto skew & jitter calibration • MIPI 60 native / add-on cable to any XDS • 3P EPK licensed
XDS560 v2 derivatives
~ 1K
XDS200 ~$299 XDS100 v3
(cJTAG) < $150
5
TI Confidential – NDA Restrictions
Multicore debug and trace features
6
TI Confidential – NDA Restrictions
Keystone debug and trace – key goals
• Multicore debug IP within SoC
Efficiency
• Across all KeyStone devices
Consistency
• Development and deployment
Product Life Cycle
• 3P framework and low cost tools
Eco System Enablement
7
TI Confidential – NDA Restrictions
Keystone debug & trace strategy
• Debugging cores independently or as a group
• Synchronization and execution correlation
• Runtime visibility in data flows spanning over multiple cores
Multicore Interactions
• CPU loading and load balancing
• Actual application execution sequence
• Cache and CPU stalls impact
Application Optimization
• Bus transactions visibility
• Interface throughput and bottlenecks characterization
Optimize Data Flow in the System
8
TI Confidential – NDA Restrictions
Keystone-1 (debug & trace view)
CTools Debug SS
1149.1
STM ETB
Peripherals…
Switch Fabric
Trace Pin Export STM Pin Export
C66x DSP
DSP
SS
JTAG
Trace
AET
ETB
SW Msg
ICEpick
CP_Tracer
DDR
CP_Tracer L2 Memory
CP_Tracer
9
C66x DSP Shannon - 8 Cores Nyquist – 4 Cores Appleton - 4 Cores
XTI
AR
M S
S
Cortex A8
JTAG
SW Msg
HW BP/WP
CTM
ETM Trace
PMU
ETB
Trace Funnel
Replicator
Cortex A8 Shannon - NA Nyquist – NA Appleton - 1 Core
TPIU
TI Confidential – NDA Restrictions
Keystone-2 (debug & trace view)
CTools Debug SS
1149.1
STM TBR
Peripherals…
Switch Fabric
Trace Pin Export STM Pin Export
C66x DSP
DSP
SS
JTAG
Trace
AET
ETB
SW Msg
CP_Tracer
DDR
CP_Tracer L2 Memory
CP_Tracer
10
Tetr
is
Cortex A15
JTAG
SW Msg
XTI
HW BP/WP
CTM
PTM Trace
ARM STM
TPIU
TBR
ICEpick
PMU
Trace Funnel
Replicator
Tetr
is S
S
C66x DSP Kepler - 8 Cores
Cortex A15 Kepler - 4 Cores
TI Confidential – NDA Restrictions
Debug capabilities summary
11
Feature Shannon Nyquist Appleton Kepler
JTAG debug √ √ √ √
CoreSight ARM Debug for Cortex -- -- √ √
DSP AET – HWBP, WP, & sequencer √ √ √ √
DSP Trace – PC, data, & events √ √ √ √
Cortex HWBP & WP -- -- √ √
Cortex Performance Measurement Units (PMU) -- -- √ √
Cortex Processor Trace - PC & timing -- -- √ √
Cortex Processor Trace – Data -- -- √ --
STM SW messages √ √ √ √
STM CP Tracers 17 16 18 32
On-Chip Trace Buffer – DSP Trace 4KB 4KB 4KB 4KB
On-Chip Trace Buffer – STM 32KB 32KB 32KB 32KB (TBR)
On-Chip Trace Buffer – Cortex Trace -- -- 32KB 16 KB (TBR)
Trace Export 20 pins 20 pins 32 pins (TPIU) 32 pins (TPIU)
TI Confidential – NDA Restrictions
Embedded debug and analysis
• CtoolsLib – Enabling embedded debug, trace setup, and analysis use case
• Easy access to debug capabilities via simple C APIs
• Very low latency and small footprint (order of few KBs)
• Easy OS integration
• Integrated with MCSDK
• Easy import and data visualization via CCS
Field Deployed Debug and Trace
12
API Shannon Nyquist Appleton Kepler
AETLib √ √ √ √
ETBLib (with DMA draining support) √ √ √ √
DSPTraceLib √ √ √ √
ETMLib -- -- √ √
STMLib √ √ √ √
STM Linux driver -- -- √ √
CPTLib √ √ √ √
TI Confidential – NDA Restrictions
Workshops
13
TI Confidential – NDA Restrictions
MCSDK Image processing demo overview
14
Input Image
Bit Map
Image
RGB Slice
0
Luma (Y)
ImageGradient
ImageEdge
RGB Slice
1
Luma (Y)
ImageGradient
ImageEdge
RGB Slice
3
Luma (Y)
ImageGradient
ImageEdge
Combine
Edge
Slices and
Create Bit
Map Image
File Read
RGB extract
& Slicing
Slice 0
(Core 0)
Slice 1
(Core 1)
Slice 3
(Core 3)
Bit Map
Image
Output Image
RGB to Y
RGB to Y
RGB to Y
IMGLIB:
Sobel filter
IMGLIB:
Sobel filter
IMGLIB:
Sobel filter
IMGLIB:
Threshold
IMGLIB:
Threshold
File Write
Core0 (Master Core)
Processing
Core0-3 (Slave Core)
Processing
• This application shows implementation of an image processing system using a simple multicore framework. This application will run TI image processing kernels (a.k.a, imagelib) on multiple cores to do image processing (eg: edge detection, etc) on an input image from host PC.
• For more details on MCSDK Image processing demo, please refer to: http://processors.wiki.ti.com/index.php/MCSDK_Image_Processing_Demonstration_Guide
TI Confidential – NDA Restrictions
Setup and Installation overview • Hardware Setup
– XDS560v2 Pro Trace
– Nyquist (C6670) EVM
– Appleton (C6614) EVM (only required for ARM trace workshops)
– Ethernet cable connected between Nyquist EVM and host PC
• Software Installation
– Code Composer Studio v5.4
– BIOS-MCSDK v02.01.02.06 (+ patch 02.01.02.06P01) or newer
15
TI Confidential – NDA Restrictions
Setup and Installation overview 1. Download the following from links specified on previous page
i. CCS5.4.0.000xx_win32
ii. bios_mcsdk_02_01_02_06_setupwin32.exe (or newer)
iii. bios_mcsdk_02_01_02_06_patch01_setupwin32.exe (if using v2_01_02_06)
2. Install Code Composer Studio
3. Install BIOS MCSDK into c:\ti folder
4. Install BIOS MCSDK Patch (if required) into c:\ti folder
5. Start Code Composer Studio
6. Select a workspace when requested
7. Wait until CCS Add Discovered Products window comes up
8. Select OK
9. If warning pop-up, select OK
10. Say Yes to restarting CCS when requested
11. After CCS restarts, if requested to add other versions of NDK, select Cancel
12. Close TI Resource Explorer window
16
TI Confidential – NDA Restrictions
Steps to import and build the demo 1. From CCS main menu, select Project -> Import Existing CCS Eclipse Project
2. Browse to folder C:\ti\mcsdk_2_01_02_06\demos\image_processing\ipc\evmc6670l
3. Select OK
17
4. Select the following projects i. Image_processing_evmc6670l_slave
ii. Image_processing_evmc6670l_master
iii. Image_processing_evmc6670l_total_bandwidth_master
5. Select Finish
6. Right click in Project Explorer on each project imported and select Build Project
TI Confidential – NDA Restrictions
Steps to run the demo (i) • To setup the demo in static IP mode, SW9 position2 should be OFF. Other DIP switch settings, starting from
position1 to position4:
– SW3: OFF, ON, ON, ON SW4: ON, ON, ON, ON SW5: ON, ON, ON, ON
– SW6: OFF, ON, ON, ON SW9: ON, OFF, ON, ON
• Connect the Nyquist EVM to the host PC using Ethernet cable and
• Connect XDS560v2 Pro Trace to the EVM and PC (via USB). Power up the EVM and the XDS560v2.
• Change the host PC network settings to use static IP address 192.168.2.101.
– Got to “Control Panel” -> “Network and Sharing Center” -> “Change Adapter Settings”
– Right click on “Local Area Network” and change the “Properties”
– If you prefer to use DHCP, IP address is shown in CCS console after the target is run.
18
TI Confidential – NDA Restrictions
Steps to run the demo (ii) • In CCS, setup C6670 target configuration with XDS560v2 Pro Trace USB connection
– Go to File New Target Configuration File
– Type file name as C6670_XDS560v2 and click Finish
– Now select Connection as Spectrum Digital XDSPRO USB Emulator
– Type C6670 in the Device field; device names will be filtered; select/check TMS320C6670
– Click on Target Configuration from Advanced Setup (RHS)
– Now select C66x_0 and include <CCS_INSTALL>\ccsv5\ccs_base\emulation\boards\evmc6670l\gel\evm6670l.gel from “initialization script” box (RHS). Click on Save
– Select View->Target Configurations to see a list of all configuration files
– Select the one you just created (as C6670_XDS560v2 .ccxml) under User Defined
– Launch the debug session by selecting the Launch Selected Configuration in the context menu
• Group all the 4 C66x cores into one single group
19
TI Confidential – NDA Restrictions
Steps to run the demo (iii) • Connect the cores
• Load image_processing_evmc6670l_master.out (mcsdk_2_01_02_06 \ demos \ image_processing \ ipc \ evmc6670l \ master \ no_instrumentation \ Debug) on core0.
•
• Load image_processing_evmc6670l_slave.out (mcsdk_2_01_02_06 \ demos \ image_processing \ ipc \ evmc6670l \ slave \ no_instrumentation \ Debug) on cores1,2, and 3.
• Run all the cores from the CCS debug view
20
TI Confidential – NDA Restrictions
Steps to run the demo (iv)
21
• Open a web browser and type in 192.168.2.100 (EVM’s IP address) in the address box. One can see the following interactive webpage:
• Select Number of Cores as “Four”
• Browse and provide the path to a bitmap image evmc6678l_1920x1080_5_93MB.bmp (available at: mcsdk_2_01_02_06 \ demos \image_processing \ images)
• Click on Process
TI Confidential – NDA Restrictions
Steps to run the demo (v) Image processing demo output includes details such as processing time and output image.
22
TI Confidential – NDA Restrictions
WS1 Getting started with DSP trace
23
TI Confidential – NDA Restrictions
WS1 – Tracing program execution 1. If Image Processing Demo is not already running then complete steps i to iii of Steps
to Run Demo as described in Workshop Setup
4. Click to clear any saved/cached settings from previous run
5. Set Trace Range = End at Address
6. Set End Address = convert_rgb_to_y
7. Select Start to Open Trace Viewer
2. Select c66xx_0 in the debugger
3. In menu select Tools -> Hardware Trace Analyzer -> PC Trace to start PC Trace
24
Known Issue: First time this menu is selected may take few seconds.
List of analysis available is dependent on the selected
core(s) and their state Description of default configuration
TI Confidential – NDA Restrictions
WS1 – Tracing program execution 8. Trace Viewer status shows that buffer is already wrapped but will only be shown
when recording ends
9. In this case recording will end either when convert_rgb_to_y is executed (or C66xx_0 is halted)
10. Ignore the warning in the view for now. Clock frequency will be obtained when data collection stops
11. Complete “steps iv” of “Steps to Run Demo” (as described in Workshop Setup) to run process image
12. Wait for Trace Viewer to show all collected data
25
TI Confidential – NDA Restrictions
WS1 – Analyze trace result 13. Grab and drag column borders to resize as needed
14. Graph and drag column headers to reposition columns are required
15. Trace Viewer shows Program Addresses executed leading up to convert_rgb_to_y
Use this button to auto-fit all column width
Grab column edge and drag to resize column width
Grab column header and drag to move column
26
TI Confidential – NDA Restrictions
WS1 – View source code
1. In Trace Viewer, click on the record before convert_rgb_to_y
2. From Trace Viewer right-click-context-menu select Trace Viewer -> View Source Code
3. The file mcip_core.c is open at line 116 showing source code corresponding to the program address in the selected record
4. Scroll down in Trace Viewer to the record containing convert_rgb_to_y and notice the function convert_rgb_to_y is highlighted in the source file
27
TI Confidential – NDA Restrictions
WS1 – Function execution graph
1. From Trace Viewer toolbar, select Analyze -> Function Execution Graph
2. Click on the + next to Function on y-axis to expand graph
3. Double-click on graph title to expand graph to full-screen
4. Click multiple times on the Zoom out button in the graph toolbar to see entire execution
28
TI Confidential – NDA Restrictions
WS1 – Function execution graph
5. Grab y-axis with mouse and drag to see more of the name of the functions
6. Place mouse just below the x-axis and select that last bit of the graph to zoom into selected region
7. (Optional) From Function Execution Graph toolbar select Display Properties
8. (Optional) In the properties view, click on State/Event Categories tab, uncheck Visibility of functions that are not of interest, select OK. This will fit more of the graph in view
29
TI Confidential – NDA Restrictions
9. Note the graph shows what function is executing and not function entry/exist
10. From latter part of graph observe process_rgb making some uia logging calls then calling convert_rgb_to_y
11. Double click on graph title to collapse full-screen view
12. Click anywhere in graph to automatically scroll Trace Viewer to same cycle position
13. Click anywhere in the Trace Viewer to scroll graph to same cycle position
14. Click on Graph toolbar to disable grouping
15. Now click anywhere in graph and note that Trace Viewer is no longer scrolled
WS1 – Function execution graph
Function Name
Cycle count
Running Function
Sort Functions
Expand & Collapse
Enable/Disable
Grouping
30
TI Confidential – NDA Restrictions
15. Click on in Function Execution Graph toolbar then click at the beginning of an instance of process_rgb in the graph. This inserts a measurement marker 1 (X1)
16. Repeat step 1 but this time click at the end of same instance or process_rgb. This inserts measurement marker 2 (X2)
17. Look at top left corner of graph to see number of cycle between X1 and X2
18. While holding Shift button, use mouse to select and drag X2. Notice change in the number at top left corner of graph
19. Double-click on X2 to remove
20. From context menu select Remove All Measurement Marks to remove remaining markers (in this case only X1)
WS1 – Function execution graph
31
TI Confidential – NDA Restrictions
1. From Trace Viewer toolbar, select Analyze -> Program Address vs. Cycle
2. Click on the graph zoom out button ( ) multiple times to see entire range of program addresses executed
3. (Optional) While holding ‘ALT’ button, use mouse to zoom into a selected region
4. (Optional) Use the zoom reset button ( ) on graph toolbar to restore original zoom
WS1 – Program address graph
32
TI Confidential – NDA Restrictions
WS1: What did we learn?
• Can use DSP trace to get real-time tracing of program execution
• Analysis are available in Trace Viewer to process collected data
• Function Execution Graph provides a bird’s eye view of program execution
• Function Execution Graph can be used to measure the number of cycles between operations
• Program Address Graph shows what program addresses are executed
• Views have numerous features to help navigate the large volume of data that may be collected
33
TI Confidential – NDA Restrictions
WS2 Customize DSP trace for data tracing
34
TI Confidential – NDA Restrictions
WS2 – Tracing data access 1. Skip this step if continuing from WS1 or if Image Processing Demo is already running.
Complete steps i to iii of Steps to Run Demo (as described in Workshop Setup)
2. Select C66xx_0 in debugger
3. Open PC Trace Analysis from Tools -> Hardware Trace Analyzer -> PC Trace
4. If PC Trace was already running (from WS1) then select Close PC Trace in the Resource already in Use! dialog that pops-up (only 1 trace analysis can run on a cpu at any time)
5. Note that configuration from previous run is restored
6. In the Hardware Trace Configuration dialog select Advanced Settings
35
TI Confidential – NDA Restrictions
WS2 – Tracing data access 7. In Advanced Properties dialog note that this analysis has three trace “jobs”. A
receiver (in this case ETB) and two trigger jobs. One trigger to start trace and the other to end trace (when program address at convert_rgb_to_y is executed)
8. Select PC Trace in the left column
9. In the right column expand the Properties tree and to What to Trace properties and enable tracing of Write Data and Read Data
10. Select OK
11. Select Start in Hardware Trace Analysis Configuration
36
TI Confidential – NDA Restrictions
WS2 – Tracing data access
12. Complete step iv of Steps to Run Demo to process image
13. Wait for Trace Viewer to update with collected data
14. Note that Trace Viewer does not show Data Read and Data Write columns by default
15. In trace viewer toolbar click on Column Settings button
16. In Column Settings dialog, enable visibility of Read Data and Write Data
17. Select OK to exist dialog
18. In Trace Viewer resize and reposition columns as required
19. Scroll through Trace Viewer to see what data was read/written
37
TI Confidential – NDA Restrictions
WS2 – Saving configuration
1. Click on Analysis Properties button in Trace Viewer toolbar – This reopen the configuration dialog. Here properties can be modified and re-applied to the analysis
– We’ll not be modifying properties, instead we’ll save current configuration for future reuse
2. Press the Save button at the bottom of the configuration dialog
3. In Save Configuration dialog enter My Data Trace for Analysis Name then press Save
4. Press Cancel to exit Hardware Trace Analysis Configuration dialog
5. Close the Trace Viewer
6. Go to Tools -> Hardware Trace Analysis -> Saved Analysis and note that My Data Trace is now available for reuse
38
TI Confidential – NDA Restrictions
WS2 – Sharing configuration
7. Create a c:\temp folder on your hard disk
8. Select Tools -> Hardware Trace Analysis -> Saved Analysis -> My Data Trace
9. Click Export Analysis button at bottom of configuration dialog
10. Browse to c:\temp folder, select Save
11. Click on Delete button at bottom of configuration dialog to delete this saved analysis
12. Go to Tools -> Hardware Trace Analyzer. Note that Saved Analysis no longer exists
13. Select Tools -> Hardware Trace Analyzer -> Import Analysis …
14. Browse to c:\temp, select File Name My Data Trace.zip, click Open
15. Go to Tools -> Hardware Trace Analyzer. Note that Saved Analysis now exists with My PC Trace
39
TI Confidential – NDA Restrictions
WS2- What did we learn?
• Trace can be used to monitor what data addresses and values are accessed
• Can further customize trace configuration using Advanced Settings
• Configurations can be saved for reuse
• Saved configurations can be exported/imported
40
TI Confidential – NDA Restrictions
WS3 Hotspot analysis with function, stall and
cache profiler
41
TI Confidential – NDA Restrictions
WS3 – Running function profiler 1. If Image Processing Demo is not running, complete steps i to iii of Steps to Run Demo
2. Select C66xx_0 in debugger
3. Open Function Profiler from Tools -> Hardware Trace Analyzer -> Function Profiling
4. Click to reset to original settings
5. Change Transport/Receiver Type to Pro Trace with Buffer Size 1MB
42
TI Confidential – NDA Restrictions
WS3 – Running function profiler 6. Click Data Collection Settings to expand
7. Select Start and Stop at Address for Trace Range. Note: Stop Address will not end trace just stop collection until Start is encountered again
8. Set Start Address = IMG_sobel_3x3_8 and End Address = MultiProc_self
43
9. Select Start
10. Complete step iv of Steps to Run Demo
11. Wait for Demo to complete
12. Press Stop in Trace Viewer toolbar
13. Note Trace Viewer and Exclusive Function Profiler processing data
Know Issue: Next time this UC is run, End Address will equal Start Address. End Address will have to be explicitly modified.
TI Confidential – NDA Restrictions
WS3 – Analyzing function profile results 1. Wait for Trace Viewer and Exclusive Function Profiler processing to complete
2. Resize column width of Exclusive Function Profiler view as needed
3. Click on CPU Cycle Total column header twice to sort data in descending order
4. Note that assembly routines are shown as unknown_<address of first symbol above >_<address of first symbol below – 1>_<first symbol above>. Explicit names can be provided via xml file specified in preference
5. Note 2 functions IMG_thr_le2min_8() and IMG_sobel_3x3_8() are taking ~99% of time
44
TI Confidential – NDA Restrictions
WS3 – Analyzing function profile results 6. Scroll to right on Exclusive Function Profiler Table
7. Observe that ~92% (1.2M cycles) of IMG_thr_le2min_8() time was a result of pipeline stalls
45
TI Confidential – NDA Restrictions
WS3 – Running stalls profiler 1. Select Tools menu->Hardware Trace Analyzer->Function Profiling (C66xx_0)->Close
Session to close the current running Function Profiler
2. Open Stall Profiler from Tools -> Hardware Trace Analyzer -> Stall Profiling
46
TI Confidential – NDA Restrictions
WS3 – Running stall profiler
3. Click to reset to original settings
4. Change Transport/Receiver Type to Pro Trace with Buffer Size 1MB
5. Click on Advanced Settings to setup Start/Stop condition (Start/Stop support will be added
to the configuration dialog in the next release)
47
TI Confidential – NDA Restrictions
WS3 – Running stall profiler
6. Select Pipeline Stall Analysis trigger in left column of Advanced Properties dialog
7. Expand Properties tree in left column and change Actions to Start Trace and Location to IMG_sobel_3x3_8
8. Expand Global Category to see what events are collected by default
9. Click on in the left margin to add another trigger (default name can be changed)
10. Select Trigger2 in the left column and change Actions to End Trace and Location to MultiProc_self. Click somewhere else in property view to allow symbol to be evaluated
48
TI Confidential – NDA Restrictions
WS3 – Running stall profiler
49
11. Select OK in Advanced Properties dialog
12. Select Start in Hardware Trace Analysis Configuration dialog
13. Complete step iv of Steps to Run Demo
14. Wait for Demo to complete
15. Press Stop in Trace Viewer toolbar
16. Wait for Trace Viewer and Stall Cycle Profiler processing to complete
TI Confidential – NDA Restrictions
WS3 – Analyzing stall profiler results
50
1. Observe that ~1M of stall cycles for IMG_thr_le2min_8() is a result of L1D Read Misses
TI Confidential – NDA Restrictions
WS3 – Running cache analysis 1. Open Cache Analyzer from Tools -> Hardware Trace Analyzer -> Cache Analyzer
2. Close Stall Profiler when requested
3. Click to reset to original settings
4. Change Transport/Receiver Type to Pro Trace with Buffer Size 1MB
5. Expand Data Collection Settings and select LID Cache Miss Analysis
6. Click on Advanced Settings
51
TI Confidential – NDA Restrictions
WS3 – Running cache analysis 7. Select L1D Cache Miss Analysis trigger in left column
8. Expand Properties tree and change Actions to Start Trace and Location to IMG_sobel_3x3_8
9. Expand Global Category to see what event are collected by default
10. Click on in the left margin to add another trigger
11. Select Trigger2 in the left columns and change Actions to End Trace and Location to MultiProc_self
52
TI Confidential – NDA Restrictions
WS3 – Running cache analysis
53
12. Select OK in Advanced Properties dialog
13. Select Start in Hardware Trace Analysis Configuration dialog
14. Complete step iv of Steps to Run Demo
15. Wait for Demo to complete
16. Press Stop in Trace Viewer toolbar
15. Wait for Trace Viewer and Cache Event Profiler processing to complete
TI Confidential – NDA Restrictions
WS3 – Analyzing cache results
54
1. Observe that the 1.1M L1D Read Miss cycles of IMG_thr_le2min_8() is resulting from 16320 cache misses
TI Confidential – NDA Restrictions
WS3 – Using files to view data across analysis
55
1. Open Function Profiler from Tools -> Hardware Trace Analyzer -> Function Profiling
2. Select close Cache Analysis when requested
3. Set End Address = MultiProc_self. This is to work around the issue where this is not remembered.
4. Select Start
5. Complete step iv of Steps to Run Demo
6. Wait for Demo to complete
7. Press Stop in Trace Viewer toolbar
8. Wait for Trace Viewer and Exclusive Function Profiler processing to complete
9. Create a c:\temp folder on your hard disk if not already present
10. Select Save in Trace Viewer toolbar
11. Browse to c:\temp, Specify File Name mytrace and select Save
TI Confidential – NDA Restrictions
WS3 – Using files to view data across analysis
56
12. Click Start in Trace Viewer toolbar to restart tracing
13. Repeats steps 5 to 8 above to profile the application again
14. Select Open File from Tools -> Hardware Trace Analyzer -> Open File
15. Browse to c:\temp, select File Name mytrace.tdf and select Open
16. In Trace Viewer – MyFunctionProfileTrace.tdf select Analyze->Exclusive Function Profiling
17. Now current profile result can be visually compared with saved result
TI Confidential – NDA Restrictions
WS3 – Exporting data
57
1. In Exclusive Function Profile – C66xx_0 right-click-context menu, select Data -> Export All … to export all records
2. (Optional) In the Export Data Dialog Add/Remove columns to export
3. (Optional) In the Export Data Dialog use the Move button to rearrange order in which columns are to be exported
4. Browse to c:\temp folder, specify File Name myexporttrace and select Save, then select OK to export all records
5. Data is exported in CSV format which can be consumed by CCS and other tools such as Excel
TI Confidential – NDA Restrictions
WS3 – Importing data
58
1. Select Open File from Tools -> Hardware Trace Analyzer -> Open File
2. At Bottom Right corner of Open Trace File dialog select CSV trace data file (*.csv)
3. Browse to c:\temp, select File Name myexporttrace.csv, select Open
4. The data form the csv file is now visible in the Trace Viewer
TI Confidential – NDA Restrictions
WS3 – Using analysis dashboard
59
1. Select Open File from Tools -> Hardware Trace Analyzer -> Analysis Dashboard
2. Observe features of Dashboard shown below
3. Select Remove All ( ) to remove all running analysis
List of all running analysis
Run additional analysis
Delete selected analysis
Delete all analysis
Expand/collapse all nodes
Enable/Disable analysis. This free up all hardware resources
Open configuration dialog
What is the data source
Click to collapse/expand
Double click to open/select view
TI Confidential – NDA Restrictions
WS3- What did we learn?
• DSP Trace can be used to profile hotspots in application
• Hotspots can further be analyzed using stall and cache profiling
• Data can be saved to binary file to use for comparison with future results or to share with others
• Data can be exported/imported via CSV file
• Analysis Dashboard provides access to all analysis
60
TI Confidential – NDA Restrictions
WS4 Getting started with non-intrusive system
trace (STM) SoC profiling
61
TI Confidential – NDA Restrictions
WS4- Setup memory throughput analysis (i)
62
1. Continue from the WS3
2. Go to Tools Hardware Trace Analyzer Memory Throughput and Access Analysis
3. Select Transport Type Pro Trace, Buffer Type Stop-on-full, Buffer Size 64 MB and Number of Pins 4 pin. Go to the advanced settings
TI Confidential – NDA Restrictions
4. By default, DDR3 memory throughput will be captured.
5. For DDR3, as shown in the snapshot below, under Transaction Master enable only C66x_0 (core 0) and disable all other masters.
6. Click OK.
7. Now click on Start to setup the trace
8. Run all the cores from CCS, if not already running. Now, run the demo by following the steps in slides steps to run the demo (iv & v).
63
WS4- Setup memory throughput analysis (ii)
TI Confidential – NDA Restrictions 64
9. Hit Stop in Trace Viewer tab:
10. DDR CP tracer messages are captured in the Trace viewer tab:
WS4- Analysis view
TI Confidential – NDA Restrictions
WS4- View core0 DDR3 bandwidth utilization
65
11. Select Memory Throughput – CSSTM_0 tab and select DDR:CPU zoom to the portion of the graph where the image is being processed:
TI Confidential – NDA Restrictions
WS4- View system DDR3 bandwidth utilization
66
12. Now select DDR:All Bus Masters
TI Confidential – NDA Restrictions
WS4- View system DDR3 Latency
67
12. Select Minimum Average Latency – CSSTM_0 tab and zoom to the portion of the graph where the image is being processed:
TI Confidential – NDA Restrictions
WS4- What did we learn?
• On the MCSDK image processing demo, we compared Core0’s DDR3 bandwidth usage with the complete system’s DDR3 bandwidth usage.
• On the MCSDK image processing demo, we captured system’s DDR3 latency.
• We were able to setup and analyze DDR3 memory performance and access analysis for MCSDK image processing demo.
68
TI Confidential – NDA Restrictions
WS5 Customize STM SoC profiling for DDR and
MSMC bandwidth and latency analysis
69
TI Confidential – NDA Restrictions
WS5- Setup (i)
70
1. Continue from the WS4
2. Click on Analysis Properties for bringing up setup configuration dialog box
3. Now go to the advanced settings
TI Confidential – NDA Restrictions
WS5- Setup (ii) 4. We can see by default, DDR3 memory throughput will be captured.
5. Now add a custom trigger for capturing MSMC memory throughput.
6. In properties, select Transaction monitor MSMC_0, Average Access Size/Rate false
7.Click OK.
8. Now click on Start to setup the trace
9. Run all the cores from CCS, if not already running. Now, run the demo by following the steps in slides steps to run the demo (iv & v)
71
TI Confidential – NDA Restrictions
WS5- Analysis view
72
10. Hit Stop in Trace Viewer tab:
11. DDR and MSMC CP tracer messages are captured in the Trace viewer tab:
TI Confidential – NDA Restrictions
WS5- View core0 DDR3 bandwidth utilization
73
12. Select Memory Throughput – CSSTM_0 tab and select DDR:CPU zoom to the portion of the graph where the image is being processed:
TI Confidential – NDA Restrictions
WS5- View system DDR3 bandwidth utilization
74
13. Now select DDR:All Bus Masters:
TI Confidential – NDA Restrictions
WS5- View cores MSMC_0 bandwidth utilization
75
14. Now select MSMC_0:CPU :
TI Confidential – NDA Restrictions
WS5- View system MSMC_0 bandwidth utilization
76
15. Now select MSMC_0:All Bus Masters :
TI Confidential – NDA Restrictions
WS5- What did we learn?
• Customize non-intrusive SoC profiling (memory performance and access analysis) job to add MSMC memory bandwidth measurement to the default DDR memory bandwidth job.
• On the MCSDK image processing demo, we compared Core0’s DDR3 bandwidth usage with the complete system’s DDR3 bandwidth usage.
• On the MCSDK image processing demo, we compared all cores (0-3) MSMC (bank0) bandwidth usage with the complete system’s MSMC (bank0) bandwidth usage.
• Similar to MSMC_0, bandwidth at any other memory end point (core0 L2, core1 L2 . . ) can be measured.
77
TI Confidential – NDA Restrictions
WS6 DDR bandwidth and latency analysis by
interfacing embedded APIs (cToolsLib )
78
TI Confidential – NDA Restrictions
cToolsLib Software
79
Keystone Debug and Trace HW
AETLib DSPTraceLib ETBLib CPTLib STMLib
Ctools Use-Case Library (Ctools_UCLib)
CP Tracer profiling- System bandwidth
- System latency
- Master bandwidth- Total bandwidth
- Event profiling
PC Trace- Trace capture on an exception- Start and stop PC + timing trace
Memory watch- Capture a list of unintended accesses to a particular memory range- Raise an exception on the first unintended access
Statistical profiling
Application SW
• CToolsLib package is a collection of libraries that provides access to Keystone debug and trace features via software APIs.
• cToolsLib information and APIs are available on http://processors.wiki.ti.com/index.php/CToolsLib
• MCSDK also comes with cToolsLib package and Use Case library that encapsulates cToolsLib APIs into high level use cases for easy integration (c:\ti\ctoolslib_1_0_0_2)
TI Confidential – NDA Restrictions
WS6- Setup (i) 1. cToolsLib instrumentation APIs are embedded for the master core (core0). This workshop uses the image
processing demo with cToolsLib instrumentation.
2. Power cycle the C6670 EVM.
3. Follow all the steps in slide steps to run the demo (ii)
4. Connect the cores
5. Load image_processing_evmc6670l_total_bandwidth_master.out (mcsdk_2_01_02_06 \ demos \ image_processing \ ipc \ evmc6670l \ master \ total_bandwidth \ Debug) on core0.
6. Load image_processing_evmc6670l_slave.out (mcsdk_2_01_02_06 \ demos \ image_processing \ ipc \ evmc6670l \ slave \ no_instrumentation \ Debug) on cores1,2, and 3.
80
TI Confidential – NDA Restrictions
WS6- Setup (ii)
81
6. Go to Tools Hardware Trace Analyzer Custom System Trace and select Transport Type Pro Trace, Buffer Type Stop-on-full, Buffer Size 64 MB and Number of Pins 4 pin :
7. Hit Start and a trace viewer – CSSTM_0 tab is opened. 8. Now run all the cores. Run the demo, following all the steps in slides steps to run the demo (iv & v)
TI Confidential – NDA Restrictions
WS6- Analysis view
82
9. Hit Stop in Trace Viewer tab:
10. DDR CP tracer messages are captured in the Trace viewer tab:
TI Confidential – NDA Restrictions
WS6- View core0 DDR3 bandwidth utilization
83
11. From the trace viewer tab, select Analyze Memory Throughput
TI Confidential – NDA Restrictions
WS6- View system DDR3 bandwidth utilization
84
12. Now select DDR:All Bus Masters:
TI Confidential – NDA Restrictions
WS6- View system DDR3 Latency
85
13. From the trace viewer tab, select Analyze Minimum Average Latency
TI Confidential – NDA Restrictions
WS6- What did we learn?
• How to use cToolsLib embedded APIs to perform non-intrusive system trace (STM) SoC profiling.
• On the MCSDK image processing demo, we compared Core0’s DDR3 bandwidth usage with the complete system’s DDR3 bandwidth usage.
• On the MCSDK image processing demo, we captured system’s DDR3 latency.
• Using cToolsLib embedded APIs, we were able to setup and analyze DDR3 memory performance and access analysis for MCSDK image processing demo.
86
TI Confidential – NDA Restrictions
WS7 Command line trace decode interface for
offline decode
87
TI Confidential – NDA Restrictions
WS7- Invoking command line decoder
1. This workshop uses TD (http://processors.wiki.ti.com/index.php/TD), a command line decoder to convert hardware trace into human readable format.
2. Open a command line shell and change directory to <CCS>\ccsv5\ccs_base\emulation\analysis\bin
3. We use previously saved trace file (TDF) from the WS3 to decode offline
4. Type the following command
td -procid 66x -bin C:\temp\mytrace.tdf -app C:/ti/MCSDK_2_01_02_06/demos/image_processing/ipc/evm6670l/image_processing_evmc6670l_master.out -rcvr Pro -format CSV_NO_TPOS_QUOTE -columns "Program Address,Cycles,Trace Status" -timestamp abs –output mytrace.csv
Notes:
Without the “–output” option, the output would displayed on the stdio
if you do not specify “–column” option, all the columns will be outputted. Help is available with –help option.
88
TI Confidential – NDA Restrictions
WS7- Visualizing the output
1. The output from the command line decoder is below (opened in MS Excel) . This includes all the information.
2. If needed, the output CSV file could also be imported in CCS for further analysis.
89
TI Confidential – NDA Restrictions
WS7- Further processing the output
1. The output of the tool could be directed to other general purpose post processing tools for custom /command line processing.
td -procid 66x -bin C:\temp\mytrace.tdf -app C:/ti/MCSDK_2_01_02_06/demos/image_processing/ipc/evm6670l/image_processing_evmc6670l_master.out -rcvr Pro -format CSV_NO_TPOS_QUOTE -columns "Program Address,Cycles,Trace Status" -timestamp delta | grep “Pipeline stall”
90
TI Confidential – NDA Restrictions
WS7- What did we learn?
• How to invoke command line decoder for offline decode on trace captured elsewhere.
• The output can also be imported in CCS for analysis.
• The output can be stored in a text file or piped to other tools for additional processing.
91
TI Confidential – NDA Restrictions
WS8 Function profiling using Cortex A program
trace
92
TI Confidential – NDA Restrictions
WS8- Setup (i)
1. In CCS, setup C6614 target configuration with XDS560v2 Pro Trace USB connection
– Go to File New Target Configuration File
– Type file name as C6614_XDS560v2 and click Finish
– Now select Connection as Spectrum Digital XDS560V2 STM USB Emulator
– Type C6614 in the Device field; device names will be filtered; select/check TMS320C6614
– Click on Target Configuration from Advanced Setup (RHS)
– Select View->Target Configurations to see a list of all configuration files.
– Select the one you just created (as C6614_XDS560v2 .ccxml) under User Defined.
– Launch the debug session by selecting the Launch Selected Configuration in the context menu.
2. Connect Cortex A8 core
93
TI Confidential – NDA Restrictions
WS8- Setup (ii)
3. Download the Cortex A8 demo example (A8.zip) and unzip the files on your PC.
4. Load modem.out to the Cortex A8 from the downloaded example.
5. In menu select Tools -> Hardware Trace Analyzer -> PC Trace to start PC Trace
6. Click on Start to setup the trace and open Trace Viewer.
94
TI Confidential – NDA Restrictions
WS8- Tracing program execution 1. Now run the application from debug view (Resume/F8) and then halt or suspend the execution
2. Program trace shows up in the Trace Viewer
95
TI Confidential – NDA Restrictions
WS8- Profiling and execution flow graph 1. Function profiling can be run on the program execution trace by Analyze-> Exclusive Function
Profiler
2. Function execution graph can be launched by clicking Analyze -> Function Execution Graph
96
TI Confidential – NDA Restrictions
WS8- What did we learn?
• Can use Cortex A8 ETM trace to get real-time trace of program execution
• Analysis is available in Trace Viewer to run on collected data
• Profiling analysis provides summary of executed functions and cycles spent
• Function Execution Graph provides a bird’s eye view of program execution
• Can use Function Execution Graph to measure the number of cycles between operations
97
TI Confidential – NDA Restrictions
SerDes based trace receiver - preview
98
TI Confidential – NDA Restrictions
XDS SerDes Trace Integration Kit (TRIK)
• A low cost and high speed trace receiver by using off the shelf Keystone EVM – Uses available SERDES functional interfaces (e.g. Hyperlink)
– EVM DDR memory as trace buffer (e.g. 512 MB with Nyquist EVM)
– Comes with deployable TRIK embedded software (source and binary form)
• Supports simultaneous trace
export from multiple cores
(DSP, ARM, and STM).
• Seamless CCS integration.
99
Target SoC
Trace receiver DSP
DDR3 Memory
Host PC (running CCS)
Ethernet cable
XDS100 or XDS200 (JTAG over USB or
Ethernet)
Hyperlink cableEVM as XDS
SERDES trace receiver
TI Confidential – NDA Restrictions
XDS SerDes Trace Integration Kit (TRIK)
• Non-intrusive to target DDR memory (both in terms of memory and read/write bandwidth).
• Needs dedicated target DMA resources and very little impact on the target Tera-Net data movement.
100
CORE ETB 2
CORE ETB 3
STM ETB
CORE 1
CORE 2
CORE 3
STM
CORE ETB 0CORE 0
DMA Controller and TeraNet
(Chip Infrastructure)
DDR3 System Memory
CORE ETB 1
Hyperlink
Host Computer
(for decoding
and display)
CPU and Trace Sub-System
C6670 (Nyquist)
Trace Fill Path
Trace Drain Path
DMA Controller and TeraNet (Chip Infrastructure)
DDR3 System Memory (2GB)
ETHERNET (or) PCIe (or) USB
Trace Receiver C6670 (Nyquist) or
Low cost C6657 (Gauss)
Hyperlink
4 Lanes @ 12.5 Gbps per lane – 50 Gbps (Hyperlink Cable)
EVM as a low-cost Trace Receiver with required amount of DDR3 memory Target Device which is generating Trace samples
TI Confidential – NDA Restrictions
XDS SerDes Trace Integration Kit (TRIK)
101
ETB0
ETB1
ETB2
ETB7
STM ETB
EDMAHyperlink
Slave
Hyperlink Master
Hyperlink PHY
ETB0 Buffer
(circular)
ETB1 Buffer
(circular)
ETB2 Buffer
(circular)
ETB7 Buffer
(circular)
STM ETB Buffer
(circular)
Hyperlink Master
Hyperlink Slave
Hyperlink PHY
ETB0 Buffer (circular or
stop on full)
ETB1 Buffer (circular or
stop on full)
ETB2 Buffer (circular or
stop on full)
ETB7 Buffer (circular or
stop on full)
STM ETB Buffer (circular or stop on full)
MSMC memory DDR3 memoryTrace Receiver
DSP
Target DSP (Trace generator)
Hyperlink push
Hyperlink pull
TI Confidential – NDA Restrictions
Q & A
102