renesas electronics america inc. id 130c: increasing application performance and data throughput...
TRANSCRIPT
Renesas Electronics America Inc.
ID 130C: Increasing Application Performance and Data Throughput with SH-2A MCUs
Dean Chang
Product Marketing Manager
12 October 2010
Version: 1.1
2
Mr. Dean Chang
Product Marketing Manager SH-2A MCU/MPUs
Wi-Fi Wireless LAN Partners
Building Automation Segment Marketing
Previous Experience Responsible for launch of a new line of Cortex M3 MCUs at
Fujitsu Semiconductor
Active in wireless standards activities such as Wi-Fi (IEEE 802.11), WiMAX (802.16, board member and chair of the service provider working group), Bluetooth (802.15) and ZigBee (802.15)
BSEE from Cal Poly San Luis Obispo
3
Renesas Technology and Solution Portfolio
Microcontrollers& Microprocessors
#1 Market shareworldwide *
Analog andPower Devices#1 Market share
in low-voltageMOSFET**
Solutionsfor
Innovation
Solutionsfor
InnovationASIC, ASSP& Memory
Advanced and proven technologies
* MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010
** Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).
44
Renesas Technology and Solution Portfolio
Microcontrollers& Microprocessors
#1 Market shareworldwide *
Analog andPower Devices#1 Market share
in low-voltageMOSFET**
ASIC, ASSP& Memory
Advanced and proven technologies
* MCU: 31% revenue basis from Gartner "Semiconductor Applications Worldwide Annual Market Share: Database" 25 March 2010
** Power MOSFET: 17.1% on unit basis from Marketing Eye 2009 (17.1% on unit basis).
Solutionsfor
Innovation
Solutionsfor
Innovation
55
Microcontroller and Microprocessor Line-up
Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial
Legacy Cores Next-generation migration to RX
High Performance CPU, FPU, DSC
Embedded Security
Up to 10 DMIPS, 130nm process350 uA/MHz, 1uA standbyCapacitive touch
Up to 25 DMIPS, 150nm process190 uA/MHz, 0.3uA standbyApplication-specific integration
Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security
Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display
High Performance CPU, Low Power
Ultra Low PowerGeneral Purpose
66
Microcontroller and Microprocessor Line-up
Superscalar, MMU, Multimedia Up to 1200 DMIPS, 45, 65 & 90nm process Video and audio processing on Linux Server, Industrial & Automotive
Up to 500 DMIPS, 150 & 90nm process 600uA/MHz, 1.5 uA standby Medical, Automotive & Industrial
Legacy Cores Next-generation migration to RX
High Performance CPU, FPU, DSC
Embedded Security
Up to 10 DMIPS, 130nm process350 uA/MHz, 1uA standbyCapacitive touch
Up to 25 DMIPS, 150nm process190 uA/MHz, 0.3uA standbyApplication-specific integration
Up to 25 DMIPS, 180, 90nm process 1mA/MHz, 100uA standby Crypto engine, Hardware security
Up to 165 DMIPS, 90nm process 500uA/MHz, 2.5 uA standby Ethernet, CAN, USB, Motor Control, TFT Display
High Performance CPU, Low Power
Ultra Low PowerGeneral Purpose
SuperH
7
Utility
Electric Meter
Innovation
IP Network CoreSolar Inverter
Utility
Smart Meter
– Smart Energy Network
8
Our SH-2A MCU Solution
High Performance SH-2A Process Core efficiencies Equivalent to Application Processors (2 DMIPS/MHz)
Combined with High Performance Memory and Peripherals on MCUs
The SH-2A with Floating Point Unit (FPU) Integrates enough functionality to replace a DSP + MCU into a
single chip Benefits include smaller form factors, lower cost, less EMI Software development is simple because everything is coded on
a single platform
Application ProcessorCore With FPU
MCU MemoryAnd Peripherals
9
Agenda
SH-2A Key Features and Benefits
SH-2A Architecture
Peripherals to Enhance Your Applications
Applications
Development Tools and Starter Kits
Q&A
10
Key Takeaways
By the end of this session you will be able to:
Identify the strengths of the SH-2A Architecture and
Peripherals
Identify the right applications for SH-2A MCUs
Understand some key ways to optimize performance while
minimizing CPU bandwidth
11
SH-2A Key Features and Benefits
Double Precision 64 bit FPU
Short 9 cycleInterrupt Latency
Multifunction Timers2 Inverters supported
Industry’s Fastest 10nsFlash Memory (1MB)
Large SRAMUp to 128KB
SH-2A MCUs shine when customer needs
performance & high-throughput
SH-2A MCUs shine when customer needs
performance & high-throughput
2.0 DMIPs/MHz
Superscalar RISC Core
Peripherals12 bit ADC Fast 1.0uS
Sampling Rate
High Speed Memory
Rich Connectivity
10/100 Ethernet CAN 2.0 USB 2.0
12
SH-2A Architecture
13
Super Scalar versus Dual Core
Scalar – One Thread/One Instruction at a time Single Instruction Stream/Single Pipeline
– Fetch, Decode, Execute
Super Scalar – One or more threads/More than one instruction at a time Example: One Thread / Two instructions at a time
– 2 FETCH, 2 DECODE, 2 EXECUTE
Dual Core –2 independent threads
14
SH-2A FeaturesSH-2A Features: Superscalar Pipeline / Floating : Superscalar Pipeline / Floating Point UnitPoint Unit
5 Stages
SH-2A-FPU CPU Core
only
CPU
FPU
1 2 3 4 5PipelinePipeline
Superscalar
2.4 DMIPS/MHz from RAM2.0 DMIPS/MHz from Flash
15
SH-2A offers the Highest DMIPS/MHz
Source: Respective vendor’s web sites.
2
1.561.25 1.2
0.9
0
0.5
1
1.5
2
2.5
SH-2A PIC32 Cortex-M3 AVR32 UC3A ARM7TDMI
DM
IPS
/MH
z
16
QUESTIONS?
What is the DMIPS/MHz performance of the SH-2A when executing out of Flash Memory? 2.0 DMIPs/MHz
Does a SH-2A allow you to execute 4 simultaneous instructions? No, 2 simultaneous instructions
17
Floating Point Unit
2 Million FLoating Point Operations Per Sec (MFLOPS)/MHz
Total of 400MFLOPS @ 200MHz
IEEE754-compliant Easily share data with other systems
Single (32-bit) & Double (64-bit) Precision Precise and faster control loops & algorithms
Designed for Embedded Systems Automatic scaling of floating format Supports FMAC, FABS, FLOAT, FDIV, FSQRT etc.
Function(Double Precision)
Time* (nS)
sin 680
cos 650
tan 900
asin 995
acos 1225
atan 695
log 910
exp 950
pow 1140
* Based on SH7203 (SH-2A core with FPU) 200MHz execution from SDRAM with cache enabled Performance using flash-based MCU & FPU at 200MHz is not available at this time.
18
FPU Advantages
Floating Point based math is easy to understand
Simulations HW FPU based math is faster and requires less code space
– FPU Performance of a polynomial formula (R32C @ 32MHz)– SUM(An * x^n), where n = 0 to 5 and A0 to A5 are
constants
// Read the ADC code into float valuerawADCFloatValue = float(adcCode);
// Linearize the ADC Value
actualTemperatureValue = 1.23456*rawADCFloatValue + 45.8
// Read the ADC coderawADCFixedValue = adcCode;
// Linearize the ADC Value
actualTemperatureValue = FIX12_MUL(FIX12_fromfloat(1.23456), rawADCFloatValue) + FIX12_fromfloat(45.8);
With Floating Point Without Floating Point
19
SH-2A Fast Interrupt Response
Drawing not to scale
CPULatency
SaveContext
(By Complier)User Code
RestoreContext
Typical MCUs
INT Trigger
9 Cycles
CPULatency +
Save ContextUser Code
RestoreContext
SH-2A MCU
15Reg.
BanksLIFO
HW saves the context in register bank LIFO
OnePrimary
Reg.Bank
+
Latency
SH7216 Cortex-M3 ARM7TDMI PIC32
MCU Interrupt Latency 9 18+ 24 – 42 18 – 40+
20
SH-2A Bus Structure
SH-2A CPU(Superscalar)
On-chip RAM
F bus(instruction)
M bus(data)
32bit/1cyc
Cachecontroller
I bus(internal bus) 32bit/1cyc
DMAC/DTCBus StateController External bus
Bridge
P bus(peripheral bus)
Timers ADC SCI PORT
32bit/1cyc
16bit/3cyc
Cache memory Instruction/Data cache: 8KB/8KB
4way set associative (LRU)
On-chip Flash
SDRAM, SRAM, etc... I/F
FPUFPU
HarvardArchitecture
21
2 wait cycles
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
1 wait cycle
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
30 MHz
no wait
IF D E M WBIF D E M WBIF D E M WBIF D E M WB
D E M WBD E M WBD E M WBD E M WB
WW
D E M WBD E M WBD E M WBD E M WBW
W W
W
100 MHz
Pro
cess
ing
perf
orm
ance
MCU frequency
SH with 100 MHz
Flash
Competing MCU with 30 MHz
Flash
High Performance 100 MHz Flash
22
Fast Flash = More RAM for Application
Code or Frequently used Tables in RAM to achieve full speed
CPUCode
RAMSlow Flash
Data
Code/Tables
Competitor MCUWith Slow Flash
Slower Access
Fastest Access
Less RAM for
Data
More RAM for
Data
Result: SH MCUs can execute similar applications with less RAM
CPUCode
RAMFast Flash
Data
Fast Access
Fastest Access
23
QUESTIONS?
Name one unique feature of the FPU in the SH-2A relative to other Flashed-based MCUs Supports Dual Precision (64bit) Floating Point Math
How many register banks SH-2A contain and why does it matter? 16. It reduces the interrupt latency to just 9 cycles.
What’s the maximum frequency of the flash without adding a wait state? 100 MHz
24
Peripherals to Enhance your Application
25
Multi-function Timer Units
Timer Unit 2
Timer Unit 2S
ADC Trigger
DTC Trigger
DMA Trigger
ADC Trigger
DTC Trigger
DMA Trigger
Auto Shutdown
2x Encoder I/Ps
Dead Time Comp.
12x PWMs
6x 16-bit Timers
Auto Shutdown
Dead Time Comp.
100MHz Clock
8x PWMs
3x 16-bit Timers
•Support for Two 3 Phase Motors at the Same Time•Computational Power to Support Advanced Sensorless Vector Algorithms•Up to 8 Different Operational Modes
26
Dual 12-bit ADC with 8 Channels
S/H
S/H
S/HSAR
ADC INT
Multiple ADC Result Registers
3 SimultaneousSample & Hold
ADC Clock Up to 50 MHz Fast 1µs Conversion Rate Analog Input supports 0-5V
SH7216
(single ADC shown here)
27
External Memory Interface
Flash/ROM
SRAM
BurstROM
SDRAM
Separate Read & WriteWait Cycles for each CS
8 CS Regions
Bus Arbitrator
SDRAM Auto Refresh
Little/Big Endian
8/16/32 bus
28
Data Transfer Controller (DTC)
IRQOn chip
Data TransferController
INTCIRQ
Clear
Less Interrupts Increase CPU Efficiency
Less Interrupts Increase CPU Efficiency
29
QUESTIONS?
Name one kind of advanced motor algorithm that is supported by the SH-2A Multi Function Timers? Sensorless Vector Algorithms
What name an advantage of having a Data Transfer Controller? Extends the number of DMAs via software limited only by
memory Reduces number of interrupts on the CPU for greater efficiency
30
Rich Connectivity
31
10/100 Ethernet MAC
10/100 Ethernet MAC Full and Half-duplex modes Can connect to any MII-compliant PHY Magic Packet detection & Wake-on-
LAN Transmit and Receive FIFO – 2 KB
each Two Integrated DMA channels
TCP/IP Open source TCP/IP supporting uIP in
Renesas Demonstration Kit Many TCP/IP options available from
third parties
SH7216
100pinPHY
Magnetics
MII
10/100 EthernetMAC
32
Controller Area Network (CAN)
Common Control/Status
Registers
CAN 2.0BProtocol Engine
CPUInterface
Message Buffer
AcceptanceFilter
Control Registers
15 Tx/Rx+
1 Rx
Up to 1Mbps data rate
INTs
Clock
Data
Control
Unique Features: Hardware support to simplify SW & Reduce CPU load Disable Automatic Retransmission on Bus Error Automatic Priority-based transmission – Mailbox number or ID-based
RX
TX
SH7137, SH7147, SH7286, SH7216
33
USB 2.0 Full Speed Device
Status&
Control
FIFO
USBEngine
TransceiverD+
D-
Integrated USB Transceiver External 48MHz clock or shared 12MHz+PLL clock Ability to disable USB module to save power 128 byte FIFO on transmit and receive
SH7285, SH7286, SH7216
34
Renesas Wi-Fi Solutions
SPI,UART
I/F
IEEE 802.11a/b/g/n
Integrated TCP/IP StackUp to 10Mbps
SPI/UART
Redpine Driver
Example Demo
• 32-bit RISC Flash MCU• Up to 400DMIPS @ 200MHZ• 32/64-bit FPU, Ethernet, USB, CAN
www.am.renesas.com/wifi
35
Low Cost Motor Control Demo Board
On-board 24VDC PMAC Motor USB Powered to 6000 RPM External power to 10000 RPM Drive larger motor with external
power module
Pre-programmed Vector Control Algorithm 3 Shunt Current Detection Hall & Encoder Connectors
PC Application to learn/experiment Real-time display of parameters
36
Target Applications
37
Target Applications
Factory AutomationPrecision Motion ControlIndustrial Connectivity
Fast I/O Control
Operator Panels
Scientific & MedicalSignal Analysis
Quiet Motor ControlConnectivity
Operator Panels
Building AutomationHigh-end Security Systems
Image ProcessingSpeech, Connectivity
Thermostat, Control Panel
Control Panels
Office AutomationImage Processing
Precise Stepper ControlConnectivity
Operator Panels
White GoodsEnergy Efficient Motor Control
Information DisplaysConsumerMedia Players
User Interfaces
38
SH7216 Application Example
Factory Automation
39
Real Life Performance Enhancements
Analog Data Collection with Hardware Assist Combination of MTU timers ADC, DMA and Buffers Saves 7% CPU Bandwidth
Data Transfer Controller – Sound Pump Announces Phone Numbers – 10 sec time for a single interrupt
rather than an interrupt every 8 kHz Data Transfer Controller – Data Scattering
ADC collects data from 4 different sources After completed, one Interrupt is generated Data is automatically stored at different buffer locations
Complex Stepper Motor S-Curve Profiles in FLASH Timer sets “Profile Rate” Triggers DTC DTC transfers PWM data based on Profile
All Features IncreaseCPU Efficiency
All Features IncreaseCPU Efficiency
40
Development Tools & Software Solutions
41
Development Tools C/C++ Compilers
MULTI®
KPIT GNU Tools
FREE
Evaluation Systems
Motor Control
Emulators
E10A-USB E200F
FREE
Development Environments
MULTI®
FREE
Sample Code & Libraries from RenesasFREE
RTOS & Middleware
42
Hardware Debuggers
E200F
E10A
On-chip Debug interfaces USB E10A for JTAG Advanced User Debug (AUD) version Pipeline trace RTOS Aware
Full In-Circuit Emulators Non-intrusive debugging Application uses all package pins Application uses all ROM & RAM Advanced debug features Complex events Full bus trace Coverage
Seamless Integration with HEW
43
Free SW from Renesas
Sample code for major peripherals
Vector Control Motor Algorithms
Ethernet Send/Receive, Open Source TCP/IP supporting uIP
CAN API – compatible with R8C & R32C API
USB Device – CDC, MSC, HID
Fixed Point Math & DSP Libraries for SH-2A FPU
Available on am.renesas.com
44
Third Party Support by SW Components
Third Party IDE Compiler Debug RTOSTCP/IPStack
USB Device
USBHost
Graphics File
CMX - - -SH7216
SH7264SH7216 SH7216 No - SH7216
Express Logic - - -SH7216SH7264
SH7264SH7216
SH7216 SH7264 SH7264SH7216
SH7264
FreeRTOS.org - - -SH7216
SH7264SH7216 - - - -
IARSH7216SH7264
SH7216SH7264
SH7216SH7264
- - - - - -
KPIT GNU ToolsSH7216SH7264
SH7216SH7264
SH7216SH7264
- - - - - -
Jungo - - - - - SH7264 SH7264 - -
Micrium - - -SH7216SH7264
SH7216SH7264*
SH7216SH7264*
SH7264* SH7264*SH7216SH7264*
Micro Digital - - - No No No SH7264 - No
RoweBots - - -SH7216SH7264
SH7216*SH7264*
SH7216*SH7264*
SH7264* -SH7216SH7264
Segger - - -SH7216SH7264
SH7216 SH7216 SH7264* SH7264SH7216SH7264
* = In DevelopmentNo = Not Yet
‘-’ = Not Offered
45
Our SH-2A MCU Solution
High Performance Core at 2 DMIPS/MHz
Combined with High Performance Memory and Peripherals on MCUs
Single chip can replace DSP + MCU Combination Smaller Form Factor Lower Cost Simplified Design
Application ProcessorCore With FPU
MCU MemoryAnd Peripherals
46
Questions?
47
Utility
Electric Meter
Innovation – Smart Energy Network
IP Network CoreSolar Inverter
Utility
Smart Meter
48
Thank You!
49
Appendix
50
MTU2 triggers ADC (Accurate Sample Rate) ADC Complete triggers DMAC DMAC transfers data to buffer Half-Intr (PING ready) can be used to trigger filter TASK Complete Intr (PONG ready) triggers FILTER TASK and reloads
Analog Data collection and DSP Processing
MTU2Channel 0
ADC0
AD Trigger (200kHz)
DMACChannel 4
MemoryPING/PONG
Buffer
AN0
AN1
AD Complete Half Intr (PING Ready)
Complete Intr (PONG Ready)
Data toFilter Task
HW Assist to Acquire and Transfer data to Buffer
saves 7% CPU Bandwidth
HW Assist to Acquire and Transfer data to Buffer
saves 7% CPU Bandwidth
51
DSP Processing
DMAC Interrupt “passes” buffer to DSP Filters/Library Buffer is passed to Filter task created by the Signal
Processing Library
Boxcar N=8Decimate by 8
FIR7 TapnD=4
31.25kHz/float
Boxcar N=8Decimate by 8
250kHz/signed 16Bit 7.8125kHz/float
Float Conversion
Float Conversion
Sample1
Sample2
Created by decim process init call, executed by decim process call
FIR7 TapnD=4
Extensive Signal Processing Library Available for Free
Extensive Signal Processing Library Available for Free
52
Data Transfer Control (DTC) Sound “pump”
MTU2 triggers DTC at “sample rate” DTC sends data out “analog” (sound) port
D/A, PWM, etc.... DTC “chains” to next part of sound DTC Continues until complete Example: Announcing Phone Number
8kHz sample rate 10 numbers CPU runs through the 10 blocks updating source pointer to correct
“numbers” Software enables DTC start at “head” of Chain.
DTC “1” Transfer
DTC “2” Transfer
DTC “3” Transfer
DTC “4” Transfer
DTC “6” Transfer
DTC “7” Transfer
DTC “8” Transfer
DTC “9” Transfer
DTC “0” Transfer
DTC “5” Transfer
intr
Only one interrupt serviced with 10sec of
sound, rather than 8kHz interrupts
Only one interrupt serviced with 10sec of
sound, rather than 8kHz interrupts
53
DTC “Data Scattering”
ADC collecting 4 pieces of “unrelated” data 4 contiguous result registers
ADC “complete” triggers DTC DTC scatters ADC data to correct (non-contiguous) buffers
Interrupt after ALL buffers updated Post flags for tasks waiting on Data
DTC Transfer
DTC Transfer
DTC Transfer
DTC Transfer
Current Sample
transfer to Motor Control
“Pot” Reading Sample
transfer to User Interface
Current Sample
transfer to PFC Current
Control
Voltage Sample
transfer to PFC Voltage
Control
intr
54
DTC “Data Gathering”
Input task is “malloc-ing” buffers as data comes in 128 bytes chunks After collecting 4 buffers, data must be written to “sector” of FLASH Drive
(512 Bytes) Operation:
CPU updates DTC source pointers to the 4 buffers Command sequence to FLASH drive (Write Sector Command) Start DTC to transfer “data block” DTC Interrupt – “free” buffers, FLASH Drive writer Task goes idle
buffer 3
buffer 1
buffer 4
buffer 2
Memory (non-contiguous buffers)
FLASH Drive
DTC Transfer 1
DTC Transfer 2DTC
Transfer 3DTC Transfer 4
55
Complex Stepper profiles using DTC
S-Curve Profiles in FLASH Timer sets “Profile Rate” Triggers DTC DTC transfers PWM data based on Profile
Renesas Electronics America Inc.