estimation de performances multicritères pour les systèmes sur puce (soc) jean luc dekeyser
TRANSCRIPT
Estimation de Performances Multicritères pour les
Systèmes sur Puce (SoC)
Jean Luc Dekeyser
Diverse fonctionnalités, rapide, petit, pas cher…
• Time-to-prototype
• Time-to-market
• Flexibilité (Maintainability)
• Faible puissance/ Dissipation thermique (durée de vie des batteries)
• Coût de production
• Adaptation rapide avec les nouveaux standards
• Fiabilité, sécurité…
Les SoC d’aujourd'hui doivent répondre à ces paradigmes!!!
Motivations: Tendance des produits
ARM PrimeXsys
Wireless platform: Standard SoC Kernel based on ARM926EJ-S
Source: ©ARM
Triscend A7 CSoC
ARM7TDMI + FPGA
Source: ©Triscend
ASIP: reconfigurable microprocessor
Tensilica Xtensa
Source: ©Tensilica
Motivations: Plateformes cibles
Scheduling/Arbitration
proportionalshareWFQ
staticdynamicfixed priority
EDFTDMA
FCFS
Communication Templates
Architecture # 1 Architecture # 2
Computation Templates
DSP
E
Cipher
SDRAMRISC
FPGA
LookUp
DSP
TDMA
Priority
EDF
WFQ
RISC
DSP
LookUp
Cipher
E E E
E E E
static
Quelle architecture est adéquate pour notre application?
Exploration de l’espace de solutions
Application Architecture
Mapping
Analysis
Cette méthodologie peut se faire à différents niveaux d’abstraction.
Temps d’exécution (fréquence) Consommation d’énergie (ou puissance) Surface en silicium (transistors) Coût Ces critères peuvent être estimés à différents
niveaux d’abstraction Des outils académiques et industriels sont
développés pour estimer chaque critère.
Analyse multicritères du système
Analyse multicritères du système
• Adéquation Application/Architecture: – optimisation multi-objective– Trouver un ensemble de trade-offs: Temps,
puissance, taille, coût…
Semiconductor Industry Roadmap
Year 2004 2007 2013 2016(f) Technology generation (nm) 90 65 32 22Wafer size (cm) 30 30 45 45(r) Defect density (per cm2) 0.14 0.14 0.14 0.14(A) P die size (cm2) 3.1 3.1 3.1 3.1!!! Chip Frequency (GHz) 4.2 9.3 23 39.6MTx per Chip (Microprocessor) 553 1204 4424 8848!!! MaxPwr(W) High Performance 158 189 251 288
Semiconductor Technology Roadmap
Evolution du nombre de transistor
40048008
80808085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Po
wer
Den
sity
(W
/cm
2)
Hot Plate
NuclearReactor
RocketNozzle
Power density too high to keep junctions at low tempPower density too high to keep junctions at low temp
Courtesy, Intel
Densité de Puissance
10,000
1,000
100
10
1
0.1
0.01
0.001
Log
ic tr
ansi
stor
s pe
r ch
ip(i
n m
illi
ons)
100,000
10,000
1000
100
10
1
0.1
0.01
Pro
duct
ivit
y(K
) T
rans
./Sta
ff-M
o.
1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 2009
IC capacity
productivity
Gap
1981 leading edge chip required 100 designer months 10,000 transistors / 100 transistors/month
2002 leading edge chip requires 30,000 designer months 150,000,000 / 5000 transistors/month
Designer cost increase from $1M to $300M
Nombre de transistors/Productivité
Puissance vs. Nombre de transistor
020406080
100120140
2003
2005
2007
2009
2012
2015
2018
0
10
20
30
40
50
60
taille des Tren nm
F en Ghz
0100200300400
0
0,5
1
1,5
W
Vdd
Prévisions ITRS 2003
année
année
année
050
100150200250300350
05001000150020002500300035004000
M Tr/cm2
W
W
W
Abstraction Level Objectives
Functional Application
TLM
Transaction
Level
Modeling
Communicants Process
(CP)
Syst description.= comm process,
Data exchange between functions.
Programmer View
(PV)
Defined architecture. Functional verification. Communication with channels
Cycle Accurate
and/or
Bit accurate
Cycle Accurate* Archi, pipeline, …
Précis au bit (CABA)* Communication protocol.
RTL : Register
Transfer Level
Implementation details: functional units, logic gates
Acc
urac
y
Spe
ed u
pNiveaux d’abstraction pour la simulation
Techniques d’estimation de performances
Emulation: plateforme réelle existante Totalement reconfigurable/ Partiellement reconfigurable
Exemple: plateforme FPGA, ALTERA, XILINX… Mesures directes des performances: temps d’exécution,
consommation, surface...
Simulation: plateforme non existante Description du système
Différents niveaux: RTL (Register Transfer Level), CABA (Cycle Accurate Bit Accurate), TLM (Transaction Level Modeling) et Functional Level.
Différents langages de description: VHDL, SystemC, Verilog…
Emulation
Calcul d’énergieCalcul
d’énergie
Mesure de temps
Mesure de temps
A
AnalyseAnalyseProgramme asm ou C
ou Reconfiguration (VHDL)
Programme asm ou C
ou Reconfiguration (VHDL)
SYSTEM
GATE
CIRCUIT
VoutVin
CIRCUIT
VoutVin
MODULE
+
DEVICE
n+S D
n+
G
Niveaux d’implémentation d’un composantA
ccur
acy
Spe
ed u
p
Reflect the actual circuit layout, include geometric information, cannot be simulated directly:
behavior can be deduced by correlating the layout model with a behavioral description at a higher level or by extracting circuits from the layout.
Length of wires and capacitances frequently extracted from the layout, back-annotated to descriptions at higher levels (more precision for delay and power estimations).
Reflect the actual circuit layout, include geometric information, cannot be simulated directly:
behavior can be deduced by correlating the layout model with a behavioral description at a higher level or by extracting circuits from the layout.
Length of wires and capacitances frequently extracted from the layout, back-annotated to descriptions at higher levels (more precision for delay and power estimations).
Simulation au niveau physique
din
powlo
powhi
dout
© Mosis (http://www. mosis.org/Technical/Designsupport/polyflowC.html);Tool: Cadence
Simulation au niveau physique: exemple
Simulation au niveau transistor
using analog simulator (SPICE) Input: Models (transistor, gates, macro)
Textual netlist (schematic, extracted layout, behavioral)
Output: Circuit response (waveforms, patterns) Time domain Frequency domain Power analysis
Simulation au niveau transistor: exemple
Simulation au niveau porte logic
Models contain gates as the basic components. Provide accurate information about signal transition
probabilities and can therefore also be used for power estimations.
Delay calculations can be more precise than for the RTL. Typically no information about the length of wires (still estimates).
Term sometimes also employed to denote Boolean functions (No physical gates; only considering the behavior of the gates).Such models should be called “Boolean function models”.
Models contain gates as the basic components. Provide accurate information about signal transition
probabilities and can therefore also be used for power estimations.
Delay calculations can be more precise than for the RTL. Typically no information about the length of wires (still estimates).
Term sometimes also employed to denote Boolean functions (No physical gates; only considering the behavior of the gates).Such models should be called “Boolean function models”.
Simulation au niveau porte logic: Exemple
source: http://geda.seul.org/screenshots/screenshot-schem2.png
At this level, we model all the components at the register-transfer level, including arithmetic/logic units (ALUs), registers, memories, muxes and decoders.Models at this level are always cycle-true.Automatic synthesis from such models is not a major challenge.
At this level, we model all the components at the register-transfer level, including arithmetic/logic units (ALUs), registers, memories, muxes and decoders.Models at this level are always cycle-true.Automatic synthesis from such models is not a major challenge.
Simulation au niveau RTL
Simulation au niveau RTL: exemple
Controller
BP C
Inst
ruct
ion
reg
iste
r IR
Mem
or
y
Spe
ich
er
alu_
co
ntro
l
T
sign_extend
<<
2
4
*
AL
U
Re
g
0
0
0
0
0
01
1
1
1
1
1
2
2
3
§
31:26
25:21
20:16
25:0
15:015:11
i2
a2
a1
i3
a
3
a
2
a
1
o2
o1
PC
So
urc e
Ta
rge
tWrit e
AL
UO
p
AL
US
el A
AL
US
el B
Re
gW
rit e
Re
gD
es t
Me
mT
oR
eg
IRW
rite
Me
mR
ea
d
Me
mW
rite
PC
Writ
e
PC
Wr it
eC
Ior D
*§ 31: 28
"00“