NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS)
Eric D. Marsman1, Robert M. Senger1, Michael S. McCorquodale2, Matthew R. Guthaus1, Rajiv A. Ravindran1,
Ganesh S. Dasika1, Scott A. Mahlke1, Richard B. Brown3
1University of Michigan, 2Mobius Microsystems, 3University of Utah
IEEE International Symposium on Circuits and SystemsMay 23rd – May 26th, 2005, Kobe, Japan
A 16-Bit Low-Power Microcontroller with Monolithic MEMS-LC Clocking
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 2
Overview
• Motivation• Microsystem Architecture
– Microcontroller– Clock Generation– Dynamic Frequency Scaling (DFS)
• Microsystem Measured Results– Microcontroller– Compiler Utilization– Instruction Level Power Modeling– Clock Generation– DFS
• Future Directions• Conclusion
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 3
Motivation
Wireless Integrated Microsystems (WIMS)Environmental Sensors Biomedical Implants
Cochlear Implant
Deep Brain
Implants
Gas Chromatograph
HeavyMetals
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 4
Motivation (cont)
• Power minimization– Frequency scaling
– Voltage scaling
– Memory architecture
– Process technology
– Leakage current mitigation
Core Process Frequency No. Bits Core Power
ARM7TDMI 0.18um 88MHz 32 22mW
Tensilica Xtensa
0.18um 200MHz 32 80mW
MIPS32M4K 0.13um 300MHz 32 84mW
Infineon C166S
0.18um 80MHz 16 160mW
Commercially available cores
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 5
Microsystem Architecture
• 16-bit, 3-stage pipeline• Software controlled register interface to clock generator• Peripheral communication interfaces for flexibility
Register Files
Fetch Decode Execute
Memory Management Unit
BootROM
64KBSRAM
USART SPI
CMOS-MEMS Clock Generator
64 K
B E
xter
nal
Mem
ory
LoopCache X3
TimerTestInt. X2
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 6
Microcontroller Architecture
• Primarily a Load-Store architecture• 77 instructions, 8 addressing modes• Data and address registers split into two windows• Hardware support for one level of interrupts and subroutines• Banked memory architecture with additional external memory
interface– Energy/area tradeoffs
compared to single 64kB
bank
• Low-power loop cache
for commonly executed
instructions
0
0.2
0.4
0.6
0.8
1
1x64 2x32 4x16 8x8 16x4 32x2Ram Structure ('banks' x 'size in kB')
No
rmal
ized
Are
a an
d P
ow
er
Power Area
15.9% more area69.2% less power
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 7
Monolithic Clock Generation
• Complementary, cross coupled, negative-transconductance tank
• Frequency trimming via modulation of tail current with vtrim
• CMOS compatible• 1.056GHz oscillation frequency• Buffer amplifier removes amplitude variation
L
C C
DFF
QD+
_
16fo
vtrim
R
QDFF
QD
Q2fo
DFF
QD
Q
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 8
Dynamic Frequency Scaling
• Fully synthesized logic, no custom design• Synchronization chain ensures glitch free output• Optional external clock input
D
C Q
QD
C Q
2 f 0
D
C Q
QD
C Q
D
C Q
QD
C Q
f 0
f 1
f 1 5
C l o c k D i v i d e r
C l o c k S e l
f c l kD
C
Q
Q
D
C
Q
D
C
Q
Q
D
C
Q
D
C
Q
Q
D
C
Q
F F 0 F F 1 F F 4
C l o c k S y n c h r o n i z e r
S y s t e m C l o c k
T o C l o c k T r e e
E x t e r n a l C l o c k
E x t e r n a l C l o c k S e l
,15 ... 2, , 1 ;2
0 nf
fnn
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 9
Dynamic Frequency Scaling (cont)
f0
Clock Sel
fclk
2f0
FF0.Q
FF4.Q
f2
f0 f2 f1
f1
glitch
• Glitch suppression example
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 10
Microsystem Measured Results
• TSMC 0.18m MM/RF bulk CMOS
• 3.5 million transistors• Operates up to 92MHz• 33.9mW core power
consumption @ 92MHz & 1.8V
• 1.4mW core power consumption @ 10MHz & 1.1V
• 17.28mW MEMS clock source power consumption @ 1.8V
• 740W sleep power consumption @ 1.1V
16KB SRAM
16KB SRAM
16KB SRAM
16KB SRAM
PIPELINECLK
PE
RIP
HE
RA
LS
CA
CH
E ANALOG
TEST
3.54mm
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 11
Microcontroller Measured Results
• Static loop cache utilization provides 4 to 20% energy savings
• Vdd scaling across different frequencies allows for adjustment to program workload requirements
0.00
10.00
20.00
30.00
40.00
50.00
60.00
1.10 1.20 1.30 1.40 1.50 1.60 1.70 1.80 1.90 2.00 2.10 2.20
Core Vdd (V)
Co
re P
ow
er
(mW
)
Chip #1 Chip #2 Chip #3 Chip #4
90MHz
50MHz
10MHz
Power vs. Vdd across frequency rangesLoop cache energy savings
0
5
10
15
20
25
30
35
40
45
Data1 Data2 Data3 Data4 Fetch1 Fetch2
Mea
sure
d P
ow
er (
mW
)
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
30.0%
35.0%
40.0%
45.0%
Po
wer
Sav
ing
s u
sin
g L
oo
p C
ach
e
SRAMOnly
SRAM andLoop Cache
PercentageSavings
56%
LC
acc
esse
s
93%
LC
acc
esse
s
29%
LC
acc
esse
s
23%
LC
acc
esse
s
28%
LC
acc
esse
s
23%
LC
acc
esse
s
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 12
WIMS C Compiler
• Windowed versus non-windowed machine– 19% reduction in power consumption
– 30% performance improvement
• Dynamic instruction placement in 512B loop cache achieves 43% energy savings over static placement
0
10
20
30
40
50
60
sha
epic
gsm
dec
rast
a
raw
c
raw
d
cjp
eg
djp
eg
blo
wfi
sh
un
epic
gsm
enc
rijn
dae
l
aver
age
% E
ner
gy
Sa
vin
gs
DynamicStatic
peg
wit
dec
Energy savings in 64B loop cache
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 13
Instruction Level Power Modeling
• Divide ISA into groups of similar instructions
• noops model inter-instruction pipeline switching
• Account for memory access energy separately
Instruction
Group
Energy
(nJ)
Instruction
Group
Energy
(nJ)
add-sub 0.2403 win swap 0.1832
shift 0.1950 load imm 0.1961
boolean 0.2127 branch-nt 0.1720
compare 0.2082 branch-t 0.5741
multiply 2.7702 jmp abs 0.5372
divide 2.7160 jmp rel 0.4020
copy 0.2127 jmp abs sub 0.5658
bit 0.6137 jmp rel sub 0.3527
load abs 0.5249 return 0.3700
load rel 0.3661 swi 0.5585
store abs 0.4427
store rel 0.3070 noop 0.1931
Energy per instruction group
1Excludes memory access energy as this is memory dependent
Ext Mem (nJ)1 Loop (nJ) MMR (nJ)
Boot Rom (nJ)
inst fetch -0.0554 -0.0507 - -0.0420
bit2 -0.1643 -0.1615 -0.1909 -
load abs2 -0.0976 -0.1016 -0.0877 -
load rel2 -0.1039 -0.1039 -0.1091 -
store abs2 -0.0411 -0.0461 -0.0427 -
store rel2 -0.0525 -0.0633 -0.0575 -
2Fetch energy counted separately
Memory access energy
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 14
Clock Generation Results
• No external reference• No PLL/DLL• High frequency accuracy• Low start-up latency• Low temperature
coefficient• Broad operating
temperature range• Low jitter• Minimal area overhead
(3% of die)• Low Power• All Si technology
Metric/Parameter LC Clock
Reference frequency 1056MHz
Output frequencies 0.002 – 66MHz
Frequency accuracy across lot ±0.75%
Frequency precision (no trim) ±2%
Trimmed frequency accuracy 100ppm
Worst case duty cycle 48/52
Worst case RMS period jitter <300ppm
Temperature stability ±0.9% (-40 to 100C)
Max. operation temperature 150C
Power supply 1.8V
Bias current 9.6mA
Power dissipation 17.28mW
Min. operating power 7.2mW
Start-up latency (25C/125C) 18ns/28ns
Si footprint 0.3mm2
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 15
MEMS Fabrication
• Post processing etch using PAD cut
• Suspended inductor• Varactor etch
unsuccessful– No etch chemistry for
MiM oxy-nitride
dielectric
– Use transconductance
modulation instead
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 16
33MHz
1MHz
glitch-free frequency switching
DFS Results
• Glitch free switching
• Switching latency is 5/2f0, or 37.45ns for this implementation
33MHz 16MHz8MHz 4MHz
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 17
Future Directions
• Add DSP for Cochlear Implants and other bio-medical devices• Include ring oscillator for a lower power alternative• ISA improvements to reduce
compiler bottlenecks– Address register support
– Separate data and address
register windows
– DMA instructions
• Decrease sleep mode power• Explore Microsystem design in
advanced technologies
8KBSRAM
8KBSRAM
8KBSRAM
8KBSRAM
CACHE
PIP
EL
INE
I/ODSP CLK
3.0mm
Preliminary next generation system
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 18
Conclusion
• Described a highly-functional, low-power Microsystem ideally suited for remote and bio-medical applications
• DFS allows on-the-fly, low-latency adaptation to workload requirements from 33.9mW @ 90MHz to 1.4mW @ 10MHz or sleep mode at 740W
• Monolithic clock reference decreases system size, cost, and power consumption compared to other techniques
• Power-aware compiler takes advantage of low-power architectural features to achieve maximum power reduction
NSF ERC for Wireless Integrated MicroSystems (WIMS)NSF ERC for Wireless Integrated MicroSystems (WIMS) 19
Acknowledgements
• NSF ERC for WIMS• MOSIS Educational Program• Artisan Components• TSMC• Cadence• Synopsys• Mentor Graphics• Coventor