rns modular computations for cryptographic applicationsrns modular computations for cryptographic...

HAL Id: hal-01141347https://hal.inria.fr/hal-01141347

Submitted on 11 Apr 2015

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

RNS Modular Computations for CryptographicApplications

Karim Bigou, Arnaud Tisserand

To cite this version:Karim Bigou, Arnaud Tisserand. RNS Modular Computations for Cryptographic Applications.RAIM: 7ème Rencontre Arithmétique de l’Informatique Mathématique, Apr 2015, Rennes, France.2015. �hal-01141347�

https://hal.inria.fr/hal-01141347

https://hal.archives-ouvertes.fr

RNS Modular Computations for Cryptographic ApplicationsKarim Bigou & Arnaud Tisserand

1. Elliptic Curve Cryptography (ECC)

Elliptic curve over FP: y2 = x3 + a x + b with P a `-bit prime

y2 = x3 + 4x + 20 over F1009

Security levels: ` ∈ {160, . . . ,600} bitsCurve level operations:I point addition (ADD): Q + Q′

I point doubling (DBL): Q + QI scalar multiplication:

[k ]Q = Q + Q + . . . + Q︸︷︷︸k times

Security (ECDLP): knowing Q and[k ]Q, k cannot be recoveredECDLP : Elliptic Curve Discrete Logarithm Problem

3. RNS Computation Flow in ECC Applications

RNS allows to perform some field level operations in parallel

mod m1mod m2mod m3mod m4mod m5

+,−,×,−1 in Fp

ADD, DBL

[k]Q

±× over one channel over one RNS vector(i.e. n channels)

base extension modulo P in RNS

1 n

time

n

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

• • •

• • •

• • •

• • •

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

±×

±×

±×

±×

•

••

5. New RNS Modular Inversion (MI) (CHES 2013)

State-of-the-art RNS MI methods:I based on Fermat’s Little Theorem (FLT-MI): X−1 = X P−2 mod P

i.e. a large exponentiation with a lot of modular reductionswhich costs O(log2 P × n2) EMMs

I very limited parallelization due to internal data dependencies

Proposed method PM-MI:I extended binary Euclidean algorithm (binary-ternary version)I uses the plus-minus trick:

if X and Y are odd then X + Y = 0 mod 4 or X − Y = 0 mod 4I PM-MI works without BE and costs O(log2 P × n) EMMs

CTRL

(shared)

local reg.

{@, en, r/w}

Arithmetic Unit(6 pipeline stages)

{rst, mode, . . .}

ww

w

w w

INw

OUTw

cmpw

= 1̂ = −̂1

precomp.mult.

≈ 2n × w w

@1

precomp.ri (×2)

@2

dlog2r ie

precomp.add.

17 × w

@3

w

Example: # EMMs for ` = 192 bitsn × w FLT-MI PM-MI Gain Factor

12× 17 103140 5474 189× 22 61884 4106 157× 29 40110 3193 12

0 50

100 150 200 250 300 350 400 450 500

Inve

rsio

n t

ime

[µ

s]

192 bits

FLT−MIPM−MI

256 bits 384 bits 521 bits

4 5 6 7 8 9

10

7 8 9 10 11 12

sp

ee

d u

p

n 8 9 10 11 12

n 10 12 14 16 18 20 22

n 15 16 17 18 19

n

0

500

1000

1500

2000

2500

3000

3500

4000

7 9 12

slic

es

FLT−MI 192 bits

7 9 12

PM−MI 192 bits

8 9 12

FLT−MI 256 bits

8 9 12

PM−MI 256 bits

0

10

20

30

40

50

60

70

80

7 9 12

# b

locks (

DS

P /

BR

AM

)

n

DSPBRAM

7 9 12

n

8 9 12

n

8 9 12

n

0

2000

4000

6000

8000

10000

12000

10 12 14 17 18 20 22

slic

es

FLT−MI 384 bits

10 12 14 17 18 20 22

PM−MI 384 bits

15 16 19

FLT−MI 521 bits

15 16 19

PM−MI 521 bits

0

20

40

60

80

100

120

10 12 14 17 18 20 22

# b

locks (

DS

P /

BR

AM

)

n

DSPBRAM

10 12 14 17 18 20 22

n

15 16 19

n

15 16 19

n

2. Residue Number System (RNS)

X a large `-bit integer is represented by:−→X = (x1, . . . , xn) = (X mod m1, . . . ,X mod mn)

channel 1

±×mod m1

w

z1

w

y1

w

x1

channel 2

±×mod m2

w

z2

w

y2

w

x2 . . .. . .

. . .

. . .

channel n

±×mod mn

w

zn

w

yn

w

xnX

Y

Z

RNS base B = (m1, . . . ,mn)n pairwise w-bit co-primeswith n × w > `

The Chinese remaindertheorem (CRT) is the baseof RNSEMM elementary modular multiplication (w bits)

Pros:I carry free between channelsI fast parallel +, −, × and some exact divisionsI non-positional number system, randomization against SCAsI flexibility for hardware implementations

Cons:I comparison, modular reduction and division are much harder

4. State-of-the-Art Algorithms and Architectures

RNS Montgomery ReductionInput:

−→X ,−→X ′

Output: (−→ω ,−→ω ′) with ω ≡ X ×M−1 mod P−→Q ←−

−→X × (−

−→P −1) (in base B)−→

Q ′ ←−BE(−→Q ,B,B′)−→

S ′ ←−−→X ′ +

−→Q ′ ×

−→P ′ (in base B′)

−→ω ′ ←−−→S ′ ×

−→M−1 (in base B′)

−→ω ←−BE(−→ω ′,B′,B)

B B′×•

•×+×•

•

BE

BE

BE: base extensionM =

∏mi

channel 1

rower 1

w

w

channel 2

rower 2

w

w

. . .

channel n

rower n

w

w

cox

. . .

1

t

w

w

Output

Input

n× w

w w w w w w

CTRL

6. Fast Patterns for RNS Computations (ASAP 2014)

Cost of standard and modular multiplications in RNS:I standard: n EMMs fully parallelI modular: 2n2 + O(n) EMMs 1 mult. & 1 red.

Proposed method:I splits operands into 2 parts:

−→X =

−−→(Kx) ×

−−−→(Ma) +

−−→(Rx)

allows to replace 2n moduli by only 32n

I reuses split result in various computation patternsI requires an hypothesis on P: OK for ECC/DH, but not for RSA

Cost for some patterns (#EMMs):

Operations s-o-t-a our

AB mod P 2n2 + 4n 2.5n2 + 12.5n

A2 mod P 2n2 + 4n 1.75n2 + 10.5n

Cst×A mod P 2n2 + 4n 1.75n2 + 7n

Cst×A2 mod P 4n2 + 8n 2.75n2 + 16.5n

Usage for Diffie-Hellman or ElGamal:

0.70.80.91.01.11.2

10 20 30 40 50 60 70

Our

/ R

ef

n

EMM Expo. LSBF

0.7

0.8

0.9

1.0

1.1

1.2

Our

/ R

ef

EMM Expo. Montg.

baseextension

(BE)

computation

sin

1base

SPLIT PR MR

baseB a

Xa

Ya

Ua

Kx

Ky

Ry = Ya

Rx = Xa

Qa Sa

baseB b

Xb

Yb

Rx Kx

Ry Ky

Ub Qb Sb

baseB c

Xc

Yc

Rx Kx

Ry Ky

Uc Qc Sc

Funding from DGA-INRIA PhD grant and project PAVOIS ANR 12 BS02 002 01

http://pavois.irisa.fr/

rns modular computations for cryptographic applicationsrns modular computations for cryptographic...

Documents