rns modular computations for cryptographic applicationsrns modular computations for cryptographic...
TRANSCRIPT
HAL Id: hal-01141347https://hal.inria.fr/hal-01141347
Submitted on 11 Apr 2015
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
RNS Modular Computations for CryptographicApplications
Karim Bigou, Arnaud Tisserand
To cite this version:Karim Bigou, Arnaud Tisserand. RNS Modular Computations for Cryptographic Applications.RAIM: 7ème Rencontre Arithmétique de l’Informatique Mathématique, Apr 2015, Rennes, France.2015. �hal-01141347�
RNS Modular Computations for Cryptographic ApplicationsKarim Bigou & Arnaud Tisserand
1. Elliptic Curve Cryptography (ECC)
Elliptic curve over FP: y2 = x3 + a x + b with P a `-bit prime
y2 = x3 + 4x + 20 over F1009
Security levels: ` ∈ {160, . . . ,600} bitsCurve level operations:I point addition (ADD): Q + Q′
I point doubling (DBL): Q + QI scalar multiplication:
[k ]Q = Q + Q + . . . + Q︸ ︷︷ ︸k times
Security (ECDLP): knowing Q and[k ]Q, k cannot be recoveredECDLP : Elliptic Curve Discrete Logarithm Problem
3. RNS Computation Flow in ECC Applications
RNS allows to perform some field level operations in parallel
mod m1mod m2mod m3mod m4mod m5
+,−,×,−1 in Fp
ADD, DBL
[k]Q
±× over one channel over one RNS vector(i.e. n channels)
base extension modulo P in RNS
1 n
time
n
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
• • •
• • •
• • •
• • •
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
±×
±×
±×
±×
•
••
5. New RNS Modular Inversion (MI) (CHES 2013)
State-of-the-art RNS MI methods:I based on Fermat’s Little Theorem (FLT-MI): X−1 = X P−2 mod P
i.e. a large exponentiation with a lot of modular reductionswhich costs O(log2 P × n2) EMMs
I very limited parallelization due to internal data dependencies
Proposed method PM-MI:I extended binary Euclidean algorithm (binary-ternary version)I uses the plus-minus trick:
if X and Y are odd then X + Y = 0 mod 4 or X − Y = 0 mod 4I PM-MI works without BE and costs O(log2 P × n) EMMs
CTRL
(shared)
local reg.
{@, en, r/w}
Arithmetic Unit(6 pipeline stages)
{rst, mode, . . .}
ww
w
w w
INw
OUTw
cmpw
= 1̂ = −̂1
precomp.mult.
≈ 2n × w w
@1
precomp.ri (×2)
@2
dlog2r ie
precomp.add.
17 × w
@3
w
Example: # EMMs for ` = 192 bitsn × w FLT-MI PM-MI Gain Factor
12× 17 103140 5474 189× 22 61884 4106 157× 29 40110 3193 12
0 50
100 150 200 250 300 350 400 450 500
Inve
rsio
n t
ime
[µ
s]
192 bits
FLT−MIPM−MI
256 bits 384 bits 521 bits
4 5 6 7 8 9
10
7 8 9 10 11 12
sp
ee
d u
p
n 8 9 10 11 12
n 10 12 14 16 18 20 22
n 15 16 17 18 19
n
0
500
1000
1500
2000
2500
3000
3500
4000
7 9 12
slic
es
FLT−MI 192 bits
7 9 12
PM−MI 192 bits
8 9 12
FLT−MI 256 bits
8 9 12
PM−MI 256 bits
0
10
20
30
40
50
60
70
80
7 9 12
# b
locks (
DS
P /
BR
AM
)
n
DSPBRAM
7 9 12
n
8 9 12
n
8 9 12
n
0
2000
4000
6000
8000
10000
12000
10 12 14 17 18 20 22
slic
es
FLT−MI 384 bits
10 12 14 17 18 20 22
PM−MI 384 bits
15 16 19
FLT−MI 521 bits
15 16 19
PM−MI 521 bits
0
20
40
60
80
100
120
10 12 14 17 18 20 22
# b
locks (
DS
P /
BR
AM
)
n
DSPBRAM
10 12 14 17 18 20 22
n
15 16 19
n
15 16 19
n
2. Residue Number System (RNS)
X a large `-bit integer is represented by:−→X = (x1, . . . , xn) = (X mod m1, . . . ,X mod mn)
channel 1
±×mod m1
w
z1
w
y1
w
x1
channel 2
±×mod m2
w
z2
w
y2
w
x2 . . .. . .
. . .
. . .
channel n
±×mod mn
w
zn
w
yn
w
xnX
Y
Z
RNS base B = (m1, . . . ,mn)n pairwise w-bit co-primeswith n × w > `
The Chinese remaindertheorem (CRT) is the baseof RNSEMM elementary modular multiplication (w bits)
Pros:I carry free between channelsI fast parallel +, −, × and some exact divisionsI non-positional number system, randomization against SCAsI flexibility for hardware implementations
Cons:I comparison, modular reduction and division are much harder
4. State-of-the-Art Algorithms and Architectures
RNS Montgomery ReductionInput:
−→X ,−→X ′
Output: (−→ω ,−→ω ′) with ω ≡ X ×M−1 mod P−→Q ←−
−→X × (−
−→P −1) (in base B)−→
Q ′ ←−BE(−→Q ,B,B′)−→
S ′ ←−−→X ′ +
−→Q ′ ×
−→P ′ (in base B′)
−→ω ′ ←−−→S ′ ×
−→M−1 (in base B′)
−→ω ←−BE(−→ω ′,B′,B)
B B′ו
•×+ו
•
BE
BE
BE: base extensionM =
∏mi
channel 1
rower 1
w
w
channel 2
rower 2
w
w
. . .
channel n
rower n
w
w
cox
. . .
1
t
w
w
Output
Input
n× w
w w w w w w
CTRL
6. Fast Patterns for RNS Computations (ASAP 2014)
Cost of standard and modular multiplications in RNS:I standard: n EMMs fully parallelI modular: 2n2 + O(n) EMMs 1 mult. & 1 red.
Proposed method:I splits operands into 2 parts:
−→X =
−−→(Kx) ×
−−−→(Ma) +
−−→(Rx)
allows to replace 2n moduli by only 32n
I reuses split result in various computation patternsI requires an hypothesis on P: OK for ECC/DH, but not for RSA
Cost for some patterns (#EMMs):
Operations s-o-t-a our
AB mod P 2n2 + 4n 2.5n2 + 12.5n
A2 mod P 2n2 + 4n 1.75n2 + 10.5n
Cst×A mod P 2n2 + 4n 1.75n2 + 7n
Cst×A2 mod P 4n2 + 8n 2.75n2 + 16.5n
Usage for Diffie-Hellman or ElGamal:
0.70.80.91.01.11.2
10 20 30 40 50 60 70
Our
/ R
ef
n
EMM Expo. LSBF
0.7
0.8
0.9
1.0
1.1
1.2
Our
/ R
ef
EMM Expo. Montg.
baseextension
(BE)
computation
sin
1base
SPLIT PR MR
baseB a
Xa
Ya
Ua
Kx
Ky
Ry = Ya
Rx = Xa
Qa Sa
baseB b
Xb
Yb
Rx Kx
Ry Ky
Ub Qb Sb
baseB c
Xc
Yc
Rx Kx
Ry Ky
Uc Qc Sc
Funding from DGA-INRIA PhD grant and project PAVOIS ANR 12 BS02 002 01
http://pavois.irisa.fr/