morus - nanyang technological university | ntu · morus a fast authenticated cipher hongjun wu tao...
TRANSCRIPT
MORUSA Fast Authenticated Cipher
Hongjun Wu Tao Huang
Nanyang Technological University
DIAC 2016, Nagoya26 Sep 2016
Different Design Approaches:
Fast
Lightweight
DIAC 2016 MORUS
AES-NI (AEGIS)
SIMD (MORUS)
Mode (JAMBU)
Dedicated (ACORN)
Design Motivation and Main Features
• To design a high-speed authenticated cipher: • No AES-NI • Make use of the SIMD (SSE2, AVX2) instructions
• Features• Fast in software: 0.69 cpb on Haswell• Fast in hardware: 95.8 Gbps on Xilinx Virtex 7
250 Gbps on 65 nm ASIC (ETH implementation)
• Nonce-based
DIAC 2016 MORUS
Changes in MORUS v2
• Tweaks are only applied to the finalization of MORUS• Remove register 𝑆3 in the message word of finalization
• Change the tag generation to the same way as the keystream generation• Increase the number of steps from 8 to 10 (compensating the change in tag generation)
• Rationale for tweaks• Improve the hardware efficiency of MORUS
DIAC 2016 MORUS
MORUS: Parameters
State size(bits)
Key size(bits)
Tag size(bits)
Plaintext size(bits)
AD size(bits)
MORUS-1280-128 1280 128 128 <264 <264
MORUS-640-128 640 128 128 <264 <264
MORUS-1280-256 1280 256 128 <264 <264
DIAC 2016 MORUS
MORUS: State and Operations
• State organization • MORUS-1280: five 256-bit words
• MORUS-640 : five 128-bit words
•Operations:
• XOR, AND, SHIFT
• 𝑅𝑜𝑡𝑙_128_32(𝑥, 𝑛): Divide a 128-bit block 𝑥 into 4 32-bit words, rotate each word left by 𝑛 bits.
• 𝑅𝑜𝑡𝑙_256_64(𝑥, 𝑛): Divide a 256-bit block 𝑥 into 4 64-bit words, rotate each word left by 𝑛 bits.
DIAC 2016 MORUS
MORUS: Initialization
•Load IV, key and constants into the initial state
•Update state: 16 steps
•Key is XORed to the state at the end of the initialization
DIAC 2016 MORUS
MORUS: Keystream Generation
• State 𝑆 = {𝑆0, 𝑆1, 𝑆2, 𝑆3, 𝑆4}
• For MORUS-640: • 𝑘𝑒𝑦𝑠𝑡𝑟𝑒𝑎𝑚 = 𝑆0 ⊕ 𝑆1 <<< 96 ⊕ (𝑆2 & 𝑆3)
• For MORUS-1280• 𝑘𝑒𝑦𝑠𝑡𝑟𝑒𝑎𝑚 = 𝑆0 ⊕ 𝑆1 <<< 192 ⊕ (𝑆2 & 𝑆3)
DIAC 2016 MORUS
MORUS: Finalization (Tweaked!)
MORUS v1
• State update: 8 steps
• Message𝑆3 ⊕ (𝑎𝑑𝑙𝑒𝑛||𝑚𝑠𝑔𝑙𝑒𝑛)
• Tag generation𝑆1 ⊕𝑆2 ⊕𝑆3 ⊕𝑆4
MORUS v2• State update: 10 steps• Message
(𝑎𝑑𝑙𝑒𝑛||𝑚𝑠𝑔𝑙𝑒𝑛)
• Tag generation𝑆0 ⊕ 𝑆1 <<< 𝐶
∗⊕
(𝑆2 & 𝑆3)
* 𝐶 = 96 for MORUS-640; 𝐶 = 192 for MORUS-1280
DIAC 2016 MORUS
MORUS: Security Goal
Confidentiality (bits) Integrity (bits)
MORUS-640-128 128 128
MORUS-1280-128 128 128
MORUS-1280-256 256 128
DIAC 2016 MORUS
Security of MORUS: Initialization
•Algebraic degree•After 10 steps, the algebraic degree exceeds 256
•Differential cryptanalysis•differential probability < 2-256
DIAC 2016 MORUS
Security of MORUS: Encryption
•Guess-and-determine attack • state size of MORUS is at least five times of key size• keystream generation function
• state bits are not directly known to the adversary
DIAC 2016 MORUS
Security of MORUS: Finalization
• Internal state collision •Probability < 2-128
•Differential forgery attack on the finalization •10 steps, differential probability < 2-256
DIAC 2016 MORUS
Security of MORUS
•Remark on the analysis by Mileva et al. in BalkanCryptSec 2015•Not that relevant to the security of MORUS
• Collison on the state update function: assuming special difference in the state – unrealistic
• Distinguisher in nonce-reuse scenarios – excluded in our security claim
• “differential bias” – becomes invalid when a different key is used
DIAC 2016 MORUS
MORUS: Hardware Performance
•State update function of MORUS is designed to be fast in hardware•AND and XOR gates are used• Short critical path
DIAC 2016 MORUS
MORUS: Hardware Performance
•Current implementation on FPGA using CAESAR API• Virtex 7, Xilinx Vivado 2016.2
Area(Slice)
Area(LUT)
Frequency(MHz)
TP(Gbps)
TP/LUT(Mbps/LUT)
MORUS-640 681 2129 342.4 43.8 20.6
MORUS-1280 1045 3746 370.4 95.8 25.6
DIAC 2016 MORUS
MORUS: Hardware Performance
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Area(LUT)
MORUS v1 MORUS v2
0
10
20
30
40
50
60
70
80
90
100
Throughput(Gbps)
MORUS v1 MORUS v2
0
5
10
15
20
25
30
Throughput/LUT(Mbps/LUT)
MORUS v1 MORUS v2
Comparison between MORUS-1280 v1 and MORUS-1280 v2
DIAC 2016 MORUS
MORUS: Hardware Performance
• Performance on ASIC: high throughtput/area (Michael Muehlberghuber and Frank K. Gürkaynak, DIAC 2015)
DIAC 2016 MORUS
• Performance on ASIC: high throughput (250Gbps) (Michael Muehlberghuber and Frank K. Gürkaynak, DIAC 2015)
DIAC 2016 MORUS
MORUS: Software Performance
16B 64B 512B 1024B 4096B 16384B
MORUS-640(EA) 40.64 10.35 2.30 1.72 1.30 1.19
MORUS-640(DV) 38.47 10.13 2.30 1.72 1.29 1.18
MORUS-1280(EA) 45.32 10.38 1.85 1.24 0.80 0.69
MORUS-1280(DV) 45.74 10.66 1.91 1.28 0.81 0.70
• Speed on Haswell, AVX2 is used in MORUS-1280
DIAC 2016 MORUS
MORUS: Software Performance
•Faster than AES-GCM on Haswell (1.03 cpb)
•Almost the same as MORUS v1 for long message
•Reasons: •Benefits from SIMD•Removed the redundant operations in the cipher
DIAC 2016 MORUS
Conclusion
•MORUS• The fastest candidate on the platforms with SIMD but
with no AES-NI (0.69 cpb with AVX2)• The most efficient candidate in hardware
MORUS-1280: 95.88 Gbps, 3764 LUTs, 25.6 Mbps/LUT
•MORUS v2• Tweaked finalization to reduce hardware area.
Throughput/Area is increased by 28%
DIAC 2016 MORUS