scas against embedded crypto devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11....
TRANSCRIPT
![Page 1: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/1.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 1
SCAs against Embedded Crypto Devices
F.-X. Standaert
UCL Crypto Group, Universite catholique de Louvain
Lecture 1 - Hardware Implementations
![Page 2: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/2.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 2
Outline
I Different types of computing devices
I Two key concepts
I Hardware performance indicatorsI Implementation tradeoffs
I Technology scaling
I Design tradeoffs
I FPGAs
I Application to block ciphers
I Further readings
![Page 3: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/3.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 3
Different types of computing devices
I General purpose computers (e.g. microprocessors)I Software-programmed
I Reconfigurable devices (e.g. FPGAs)I Application Specific Integrated Circuits (e.g. AES)
I Hard-codedI Tradeoff: flexibility vs. performance
![Page 4: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/4.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 4
Sequential logic
I 1 cycle: read in memory - operate - store in memory
I Operation delay Top > than critical path Tph (in sec)
I Operation frequency fop = 1/Top (in Hz)
![Page 5: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/5.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 5
Abstraction levels (for memory & operations)
![Page 6: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/6.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 6
Hardware performance indicators
I Hardware cost (in gates, transistors or circuit size)
I Operation frequency (in Hz)
I Data throughput (in bit/sec)
I Data latency (in clock cycles)I Power and energy (in Watts and Joules)
I Not equivalent, e.g.I Power matters for RFID devicesI Energy matters for battery-supplied devices
![Page 7: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/7.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 7
Implementation tradeoffs
I Tph ∝ LD · CL·Vdd
Ion, with:
I LD the operation logic depth (in gates)I CL the load capacitance (in Farad)I Vdd the circuit supply voltageI Ion the MOSFET drain current in ON state
“Tph decreaseswith larger Vdd”
![Page 8: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/8.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 8
Implementation tradeoffs (II)
I Sources of power and energy consumptionI Ptot = Pdyn + Pstat
I Pdyn ∝ Ngates · CL · V 2dd · fop · α (1 + βsc)
I α: activity factor / β: short circuitsI Pstat ∝ Ileak · Vdd
I with Ileak increasing with smaller Vdd
“Minimum energy besttrades Pdyn and Pstat”(here with Top = Tph)
⇒ ∃ frequency/energy tradeoff
![Page 9: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/9.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 9
Technology scaling
I Pdyn dominates old technologies (down to 0.1µm)
I Pstat becomes significant in nanoscale devices
I Inter-device variability also increases with scaling !
![Page 10: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/10.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 10
Design tradeoffs
I Resources sharing, e.g. with the AES ByteSub
I Low cost design: 1 S-box, 16 cycles
I Fast design: 16 S-boxes, 1 cycles
I Low cost implies more control ⇒ less efficient
![Page 11: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/11.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11
Design tradeoffs (II)
I Inner pipelining, e.g. with the AES round
Ideally: fop × 2(usually worse in practice)
Latency: 11 → 22 (cycles)
Throughput?
(128 bits/11 cycles) · fop(256 bits/22 cycles) · fop⇒ ideally ×2
![Page 12: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/12.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11
Design tradeoffs (II)
I Inner pipelining, e.g. with the AES round
Ideally: fop × 2(usually worse in practice)
Latency: 11 → 22 (cycles)
Throughput?(128 bits/11 cycles) · fop(256 bits/22 cycles) · fop⇒ ideally ×2
![Page 13: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/13.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12
Design tradeoffs (III)
I Further improvements of the throughput (fop fixed)
Parallelism (left) less efficient than outer pipelining (right)
![Page 14: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/14.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12
Design tradeoffs (III)
I Further improvements of the throughput (fop fixed)
Parallelism (left) less efficient than outer pipelining (right)
![Page 15: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/15.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 13
FPGAs
I “Sea” of programmable logic blocks
I Connected with programmable routing
I Functionality determined by configuration bitsI Different technologies
I 0.18µm → 45 nmI Several manufacturers
![Page 16: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/16.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 14
FPGAs (II)
![Page 17: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/17.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 15
FPGAs (III)
I Logic blocksI From 3-input Look-Up Tables. . .
to 8-bit Arithmetic and Logic UnitsI The granularity of the device influences both the
design performances and configuration time
I Routing blocksI Structured according to the interconnect lengthI Major impact in final performances
I Embedded blocksI Memories, multipliers, . . .
![Page 18: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/18.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 16
(How to use) FPGAs (IV)
I Compared to ASICs: fabrication + packaging arereplaced by configuration (i.e. sending a programmingfile to the chip to determine the “gates” functionality)
![Page 19: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/19.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 17
FPGAs (V)
I Example: Xilinx logic block
![Page 20: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/20.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 18
Application to block ciphers
I Target FPGA 1 has logic blocks (LB1) made of:I Two 4-input LUTsI One 1-bit MUX to combine the LUTsI Two registers
I Target FPGA 2 has logic blocks (LB2) made of:I Four 6-input LUTsI Three 1-bit MUX to combine the LUTsI Four registers
I Embedded memory, with each block (MB) made of:I 4096-bit RAM memoriesI Dual-ported (i.e. 2 R/W operations per cycle)I Configurable (4096× 1, 2048× 2, . . . )
![Page 21: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/21.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 19
S-box implementations
I “Minimum memory” cost (in bits) of S1/S2? { . . . }I Cost of S1/S2 in LB1/LB2? { . . . }I Would you use the memory to implement S1/S2?
![Page 22: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/22.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 20
Block cipher design
I Consider an AES-like cipher with the following round:
I To be implemented in FPGA 1 with S-box S2
I With MixColumn in 256 LUTs and logic depth 2 LUTs
I And the full cipher iterating 11 rounds
![Page 23: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/23.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 21
Block cipher design
I What is the cost of one round in LUTs?I Design and evaluate the cost (in LUTs and regs) of:
I A 1-round loop architecture without pipelineI A 1-round loop architecture with maximum pipeline
I What is the latency (in cycles) of these architectures?I Assume TLUT = 10 ns, what is the throughput
achieved by these architectures (in bit/sec)?I Is this assumption realistic (physically speaking)?
I “Ideally”, what would happen if we move to a 2-roundloop architecture, or a 32-bit loop architecture?
I { . . . }
![Page 24: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/24.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 22
Examples
I FPGA implementations of the AES Rijndael
Index E,D? Key Sched. Feedback? Device Architecture
1. E only on-the-fly no Virtex-E 128-bit unrolled2. E only on-the-fly no Virtex-E 128-bit loop3. E/D precomputed yes Virtex-II 32-bit loop4. E/D precomputed yes Spartan-II 8-bit loop5. E/D precomputed yes Spartan-II PicoBlaze
Index LUTs Regs. Slices RAMBs Freq. Throughput1. 3516 3840 2784 100 92 MHz 11.7 Gbit/sec2. 3846 2517 2257 0 169 MHz 2 Gbit/sec3. 288 113 146 3 123 MHz 358 Mbit/sec4. - - 124 2 67 MHz 2.2 Mbit/sec5. - - 119 2 90 MHz 710 Kbit/sec
![Page 25: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/25.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 23
Summarizing
I Specialized hardware implementations (ASICs, FPGAs)can be used to reach high performances
I Many different metrics exist (cost, speed, . . . )
I Hardware Design optimization (e.g. sharing,pipelining) depends on algorithmic features
I Technology scaling can have high impact too!
![Page 26: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/26.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 24
Further readings
I International Technology Roadmap forSemiconductors, http://www.itrs.net/
I F. Rodriguez-Henriquez, F. Saqib, N.A. Diaz, C.K.Koc, Cryptographic Algorithms on ReconfigurableHardware, Springer, 2007.
I H. Kaeslin, Digital Integrated Circuit Design,Cambridge University Press, 2008.
I J.M. Rabaey, Digital Integrated Circuits: a DesignPerspective, second edition, Prentice Hall, 2003.
![Page 27: SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11. 3. · UCL Crypto Group Microelectronics Laboratory SCAs against Embedded Crypto](https://reader035.vdocuments.site/reader035/viewer/2022081518/6138983d0ad5d206764959b5/html5/thumbnails/27.jpg)
UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 25
Thanks