workpackage 5: high performance mathematical computing · 2021. 1. 6. · t5.4 singular d5.6,...

48
Workpackage 5: High Performance Mathematical Computing Cl´ ement Pernet Final OpenDreamKit Project review Luxembourg, October 30, 2019 C. Pernet: WP 5 1 Luxembourg, 2019-10-30

Upload: others

Post on 08-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Workpackage 5:High Performance Mathematical Computing

Clement Pernet

Final OpenDreamKit Project review

Luxembourg, October 30, 2019

C. Pernet: WP 5 1 Luxembourg, 2019-10-30

Page 2: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

High Performance Mathematical Computing

Mathematical computing

Computing with a large variety of objects

I Z,Q,Z/pZ,Fq, 17541718814389012164632

I Polynomials over Z,Q,Z/pZ,Fq, 25x

3 + x2 − 119x+ 2

I Matrices over Z,Q,Z/pZ,Fq,[27 3 −19 0 2

]I Matrices of polynomials over Z,Q,Z/pZ,Fq,

[3x2+3 2x2+34x2+1 x2+4x+4

]

I Tree algebras

3q2−q5

q5+2q4+3q3+3q2+2q+1

a

b c

d

+ 2qq4+q3+2q2+q+1

a

b

c d

for applications where all digits matter (most often).

C. Pernet: WP 5 2 Luxembourg, 2019-10-30

Page 3: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

High performance mathematical computing

Need for High performance: applications where size is crucial:

Experimental maths: testing conjectures

I larger instances give higher confidence

Algebraic cryptanalysis: security = computational difficulty

I key size determined by the largest solvable problem

Example

Breaking RSA by integer factorization: n = pq. Last record:I n of 768 bits

I linear algebra in dimension 192 796 550 over F2 (105Gb)

I About 2000 CPU years

3D data analysis, shape recognition:

I via persistent homologyI large sparse matrices over F2, Z

C. Pernet: WP 5 3 Luxembourg, 2019-10-30

Page 4: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

High performance mathematical computing

Need for High performance: applications where size is crucial:

Experimental maths: testing conjectures

I larger instances give higher confidence

Algebraic cryptanalysis: security = computational difficulty

I key size determined by the largest solvable problem

Example

Breaking RSA by integer factorization: n = pq. Last record:I n of 768 bits

I linear algebra in dimension 192 796 550 over F2 (105Gb)

I About 2000 CPU years

3D data analysis, shape recognition:

I via persistent homologyI large sparse matrices over F2, Z

C. Pernet: WP 5 3 Luxembourg, 2019-10-30

Page 5: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

High performance mathematical computing

Need for High performance: applications where size is crucial:

Experimental maths: testing conjectures

I larger instances give higher confidence

Algebraic cryptanalysis: security = computational difficulty

I key size determined by the largest solvable problem

Example

Breaking RSA by integer factorization: n = pq. Last record:I n of 768 bits

I linear algebra in dimension 192 796 550 over F2 (105Gb)

I About 2000 CPU years

3D data analysis, shape recognition:

I via persistent homologyI large sparse matrices over F2, Z

C. Pernet: WP 5 3 Luxembourg, 2019-10-30

Page 6: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Goal: delivering high performance to maths users

Harnessing modern hardware parallelisation

I in-core parallelism (SIMD vectorisation)

I multi-core parallelism

I distributed computing: clusters, cloud

LinBoxFLINT NumPy

Systems :

Components :

SageMath SingularGAP PARI/GP

MPIR PPL

C. Pernet: WP 5 4 Luxembourg, 2019-10-30

Page 7: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Goal: delivering high performance to maths users

Harnessing modern hardware parallelisation

I in-core parallelism (SIMD vectorisation)

I multi-core parallelism

I distributed computing: clusters, cloud

LinBoxFLINT NumPy

CloudMulticoreserver

HPC clusterSIMD

Systems :

Components :

Architectures :

SageMath SingularGAP PARI/GP

MPIR PPL

C. Pernet: WP 5 4 Luxembourg, 2019-10-30

Page 8: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Goal: delivering high performance to maths users

Languages

I Computational Maths software uses high level languages (e.g. Python)

I High performance delivered by languages close to the metal (C, assembly)

compilation, automated optimisation

LinBoxFLINT

Python

Cython Pythran

C

NumPy

CloudMulticoreserver

HPC clusterSIMD

Systems :

Components :

Languages :

Architectures :

SageMath SingularGAP PARI/GP

MPIR PPL

C. Pernet: WP 5 4 Luxembourg, 2019-10-30

Page 9: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

High performance mathematical computing

Goal:

I Improve/Develop parallelization of software components

I Expose them through the software stack

I Offer High Performance Computing to VRE’s users

C. Pernet: WP 5 5 Luxembourg, 2019-10-30

Page 10: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Computational Kernels

Linear algebra

Arithmetic

C. Pernet: WP 5 6 Luxembourg, 2019-10-30

Page 11: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Computational Kernels

Linear algebra

Similarities with numerical HPC

I building block to which others reduce toI (rather) simple algorithmicI high compute/memory intensity

Specificities

I Multiprecision arithmetic : lifting from finite precision (Fp)I Rank deficiency : unbalanced dynamic blockingI Early adopter of subcubic matrix arithemtic : recursion

Arithmetic

C. Pernet: WP 5 6 Luxembourg, 2019-10-30

Page 12: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Computational Kernels

Linear algebra

Arithmetic

I Wide variety of computing domains: Z,Q,Z/pZ,Fq,Fq[X],Z[X,Y, Z], ..

I possibly with dynamic size

Challenge

I most often memory intensive operations : hard to parallelizeI very fine grain, but billions of instance : fine tuning

C. Pernet: WP 5 6 Luxembourg, 2019-10-30

Page 13: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 7 Luxembourg, 2019-10-30

Page 14: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 8 Luxembourg, 2019-10-30

Page 15: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outcome of WorkPackage 5

Component Review 1 Review 2 Final review

T5.1 Pari/GP D5.16T5.2 GAP D5.15T5.3 LinBox D5.12 D5.14T5.4 Singular D5.6, D5.7 D5.13T5.5 MPIR D5.5, D5.7T5.6 Combinatorics D5.1 D5.11T5.7 Pythran D5.2 D5.11T5.8 SunGrid Engine D5.3

Overall

I 31 software releases

I 16 research papers in journals or conference proceedings

C. Pernet: WP 5 9 Luxembourg, 2019-10-30

Page 16: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Addressing recommendations of review 1 and 2

RP1 Recommendation 10: Regarding WP5, make contacts with HPCcommunity in order to ascertain current state-of-the-art. The work in thisWP needs to be nearer the leading edge.

Leading edge achievements in linear algebraI symmetric factorization outperforms LAPACK implementationI new non-hierarchical generator for quasiseparable matricesI large scale parallelization of rational linear solver

C. Pernet: WP 5 10 Luxembourg, 2019-10-30

Page 17: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Addressing recommendations of review 1 and 2

RP1 Recommendation 10: Regarding WP5, make contacts with HPCcommunity in order to ascertain current state-of-the-art. The work in thisWP needs to be nearer the leading edge.

Leading edge achievements in linear algebraI symmetric factorization outperforms LAPACK implementationI new non-hierarchical generator for quasiseparable matricesI large scale parallelization of rational linear solver

C. Pernet: WP 5 10 Luxembourg, 2019-10-30

Page 18: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Addressing recommendations of review 1 and 2

RP2 Recommendation 1: A minor aspect: In the deliverable D5.11,authors have to clarify the reason why the speedup with the use of cores isnot so high when you increment the number of cores. The presentation hasalso to be improved.

I D5.11 was complemented with a clarification, polished resubmitted afterthe review.

RP2 Recommendation 10: Some guidelines (set of recommendations) forusing the different hardware architectures would be recommendable.

I A Blog post was produced as a use case and published onopendreamkit.org.

C. Pernet: WP 5 11 Luxembourg, 2019-10-30

Page 19: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Addressing recommendations of review 1 and 2

RP2 Recommendation 1: A minor aspect: In the deliverable D5.11,authors have to clarify the reason why the speedup with the use of cores isnot so high when you increment the number of cores. The presentation hasalso to be improved.

I D5.11 was complemented with a clarification, polished resubmitted afterthe review.

RP2 Recommendation 10: Some guidelines (set of recommendations) forusing the different hardware architectures would be recommendable.

I A Blog post was produced as a use case and published onopendreamkit.org.

C. Pernet: WP 5 11 Luxembourg, 2019-10-30

Page 20: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 12 Luxembourg, 2019-10-30

Page 21: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

PARI-GP

PARI ecosystem

PARI library: dedicated routines for number theory

PARI-GP: an interactive system

GP2C: a GP to C compiler

Generic parallelization engine for the whole suite

Delivers support for

I sequential computations

I POSIX threads

I MPI for distributed computing

Features:

I Same code base

I automated parallelization

I full control for power users/developpers

C. Pernet: WP 5 13 Luxembourg, 2019-10-30

Page 22: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Main Achievements

I Fast linear algebra over the rationals and cyclotomic fields

I Fast Chinese remainders and multimodular reduction

I Parallel polynomial resultant

I Fast modular polynomials and applications

I MPQS integer factorization rewrite

Well-honed strategy after preliminary assessment

I Creation of ”worker” functions from existing code

I Insertion of actual parallel instructions

I Incremental buildup, independently instrumenting one high-level functionat a time.

C. Pernet: WP 5 14 Luxembourg, 2019-10-30

Page 23: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Highlight

Linear algebra over rationals:

I Required for cyclotomic rings

I Fast Chinese remaindering

I Fast CUP decomposition overfinite fields

I Parallelization

Fourier transform of L-functions

10 20 30

10

20

30

ideal sp

eedup

number of threads

spee

du

p

Parallel speedup

matdetmatker

C. Pernet: WP 5 15 Luxembourg, 2019-10-30

Page 24: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Highlight

Linear algebra over rationals:

I Required for cyclotomic rings

I Fast Chinese remaindering

I Fast CUP decomposition overfinite fields

I Parallelization

Fourier transform of L-functions10 20 30

10

20

30

ideal sp

eedup

number of threads

spee

du

p

Parallel speedup

matdetmatker

ellan

C. Pernet: WP 5 15 Luxembourg, 2019-10-30

Page 25: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 16 Luxembourg, 2019-10-30

Page 26: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

HPC GAP

HPC-GAP

I Multi-threaded GAP.I Targets:I multicore servers.I good speedupsI high level abstraction

Achievement

I Fork from GAP, diverged for a long time

I Huge effort to bring it back in:GAP 4.9.1 (Month 33).: first GAP release with HPCGAPintegrated as a compile-time option

C. Pernet: WP 5 17 Luxembourg, 2019-10-30

Page 27: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Dense linear algebra over small finite fields

I Matrix multiplication, Gaussian elimination, echelon forms, etcI A key kernel for many GAP computations

MeatAxe64

A new C and assembler library, tuned for performance at all levels:

I new data representations and assembler kernels

I new algorithms for many fields

I control of cache usage and memory bandwidth: allowing for sharing between threads and cores

I purpose built highly efficient task farm: 1M x 1M dense matrix multiply over GF(2) in 8 hours (64 core AMDbulldozer).

Fully available from GAP.

GAP DemoC. Pernet: WP 5 18 Luxembourg, 2019-10-30

Page 28: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 19 Luxembourg, 2019-10-30

Page 29: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Parallel Rational solver algorithmic

Method Bit complexity

Gauss over Q 2O(n)

Gauss mod bound(sol) O(n5)Chinese Remaindering O (n4)p-adic lifting O (n3)

Chinese Remaindering

1. Solve the system independentlymodulo p1, p2, . . . , pk

2. Reconstruct a solution modulop1 × p2 × . . . , pk.

3. Reconstruct over Q

p-adic lifting

1. Solve the system modulo p

2. Iteratively lift the solution modulop2, p3, . . . , pk

3. Reconstruct over Q

C. Pernet: WP 5 20 Luxembourg, 2019-10-30

Page 30: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Distributed memory Chinese Remaindering

Conclusions

I (almost) embarrassingly parallel

I but overwhelming computational cost (O (n4))

I hybrid OpenMP-MPI version slightly slower but better memory efficiency

C. Pernet: WP 5 21 Luxembourg, 2019-10-30

Page 31: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Shared memory p-adic lifting

A new hybrid algorithm: Chinese Remaindering within p-adic lifting

I Smaller critical path

I Higher degree of parallelism

Improving state of the art efficiency

I Improved sequential efficiency(memory access pattern, BLAS3)

I Chinese remaindering deliversgood parallel scaling

0

100

200

300

400

500

600

100 200 300 400 500 600 700 800

tim

e (

s)

matrix dimension (bitsize 400)

Classic Dixon, sequentialRNS-Dixon with 32 primes, sequentialRNS-Dixon with 32 primes, 2 threadsRNS-Dixon with 32 primes, 4 threadsRNS-Dixon with 32 primes, 8 threads

RNS-Dixon with 32 primes, 16 threads

C. Pernet: WP 5 22 Luxembourg, 2019-10-30

Page 32: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

GPU enabled fflas-ffpack

0

500

1000

1500

2000

2500

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Speed (

Gfo

ps)

matrix dimension n

FFLAS with CUBLAS (1 GPU)FFLAS with OpenBLAS (CPU) 32 cores

FFLAS with OpenBLAS (CPU) 1 core

I Matrix product overZ/131071Z

I 32 Skylake-X cores(AVX512)

I 1 Nvidia Tesla V100 GPU

Limitations and perspectives

Bottleneck in the transfer between GPU and RAM

I deport more computations to the GPU : dedicated GPU kernels

I communication avoiding block scheduling : deep structural change

C. Pernet: WP 5 23 Luxembourg, 2019-10-30

Page 33: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

GPU enabled fflas-ffpack

0

500

1000

1500

2000

2500

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Speed (

Gfo

ps)

matrix dimension n

FFLAS with CUBLAS (1 GPU)FFLAS with OpenBLAS (CPU) 32 cores

FFLAS with OpenBLAS (CPU) 1 core

I Matrix product overZ/131071Z

I 32 Skylake-X cores(AVX512)

I 1 Nvidia Tesla V100 GPU

Limitations and perspectives

Bottleneck in the transfer between GPU and RAM

I deport more computations to the GPU : dedicated GPU kernels

I communication avoiding block scheduling : deep structural change

C. Pernet: WP 5 23 Luxembourg, 2019-10-30

Page 34: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outline

Workpackage management

T5.1: Number theory with PARI/GP

T5.2: Group theory with GAP

T5.3: Exact linear algebra with LinBox

T5.4: Polynomial arithmetic with Singular

C. Pernet: WP 5 24 Luxembourg, 2019-10-30

Page 35: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Singular demo

Demo

C. Pernet: WP 5 25 Luxembourg, 2019-10-30

Page 36: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Main achievements

Multivariate Polynomial Multiplication in FLINT

0 10 20 30

0

10

20

30

threads

spee

du

p

Z 20349 x 20349 → 28398035

Z/pZ 20349 x 20349 → 28398035 I over Z, Q, Fp, Fpn forp < 264.

I Lex, degLex anddegGrevLex orderingsupported

I Sequential alg: improvesSingular’s

I Close to linear scaling

I Singular now relies onFLINT

C. Pernet: WP 5 26 Luxembourg, 2019-10-30

Page 37: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Main achievements

Multivariate Polynomial Multiplication in FLINT

0 10 20 30

0

10

20

30

threads

spee

du

p

Z 20349 x 20349 → 28398035

Z/pZ 20349 x 20349 → 28398035Z 53130 x 53130 → 95033335

Z/pZ 53130 x 53130 → 95033335

I over Z, Q, Fp, Fpn forp < 264.

I Lex, degLex anddegGrevLex orderingsupported

I Sequential alg: improvesSingular’s

I Close to linear scaling

I Singular now relies onFLINT

C. Pernet: WP 5 26 Luxembourg, 2019-10-30

Page 38: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Main achievements

Multivariate Polynomial GCD in FLINT

0 5 10 150

5

10

15

threads

spee

du

p

Z delivered

Z/pZ delivered I over Z, Q, Fp, Fpn forp < 264.

I Lex, degLex anddegGrevLex orderingsupported

I Sequential alg: improvesSingular’s

I Close to linear scaling

I Singular now relies onFLINT

C. Pernet: WP 5 26 Luxembourg, 2019-10-30

Page 39: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Main achievements

Multivariate Polynomial GCD in FLINT

0 5 10 150

5

10

15

threads

spee

du

p

Z delivered

Z/pZ deliveredZ improved

Z/pZ improved

I over Z, Q, Fp, Fpn forp < 264.

I Lex, degLex anddegGrevLex orderingsupported

I Sequential alg: improvesSingular’s

I Close to linear scaling

I Singular now relies onFLINT

C. Pernet: WP 5 26 Luxembourg, 2019-10-30

Page 40: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

New Perspectives and directions

Parallelizing memory intensive kernels

I Overhead of split and combine

I Only amortized for large instances

I Parallelization at a coarser grain

Data placement

I Heavy use of dynamicallocation (malloc)

I Causes performances fickle

I To be used with parsimony

I Investigate dedicated allocators

New directions

Application driven:

I More orderings: block, weighted

I Factorization: (harnessing most T5.4 contributions)

C. Pernet: WP 5 27 Luxembourg, 2019-10-30

Page 41: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

New Perspectives and directions

Parallelizing memory intensive kernels

I Overhead of split and combine

I Only amortized for large instances

I Parallelization at a coarser grain

Data placement

I Heavy use of dynamicallocation (malloc)

I Causes performances fickle

I To be used with parsimony

I Investigate dedicated allocators

New directions

Application driven:

I More orderings: block, weighted

I Factorization: (harnessing most T5.4 contributions)

C. Pernet: WP 5 27 Luxembourg, 2019-10-30

Page 42: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

New Perspectives and directions

Parallelizing memory intensive kernels

I Overhead of split and combine

I Only amortized for large instances

I Parallelization at a coarser grain

Data placement

I Heavy use of dynamicallocation (malloc)

I Causes performances fickle

I To be used with parsimony

I Investigate dedicated allocators

New directions

Application driven:

I More orderings: block, weighted

I Factorization: (harnessing most T5.4 contributions)

C. Pernet: WP 5 27 Luxembourg, 2019-10-30

Page 43: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Conclusion

Outcome of WorkPackage 5

Review 1 Review 2 Final review

T5.1 Pari/GP D5.16T5.2 GAP D5.15T5.3 LinBox D5.12 D5.14T5.4 Singular D5.6, D5.7 D5.13T5.5 MPIR D5.5, D5.7T5.6 Combinatorics D5.1 D5.11T5.7 Pythran D5.2 D5.11T5.8 SunGrid Engine D5.3

Overall

I 31 software relases

I 16 research papers in journals or conference proceedings

C. Pernet: WP 5 28 Luxembourg, 2019-10-30

Page 44: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Lessons learnt

From dedicated to general purpose HPC components:

I Early instances of HPC computer algebra: dedicated to sometarget application (breaking RSA, etc)

I Building a general purpose HPC component:I challengingI longer term sustainabilityI integration/composition of parallel components

Identifying the right place to focus efforts on

I Premature focus on embarassingly parallel codes may be an error

Risk of technology dependency

I Cilk: from success to shut-downI Interchangeability and modularity (PARI, LinBox)

C. Pernet: WP 5 29 Luxembourg, 2019-10-30

Page 45: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Perspectives

Exploiting emerging technologies:

I Non-Volatile RAM:: cheaper fat nodes,: but deeper cache hierarchy

Interractive control over the architecture at the VRE level

I threads per component, GPUs, distributed nodes

: user decision based on algorithms, not systems: slick interface of the VRE vs. compilation hurdle

Parallelism friendly portable containers

I supporting SIMD, multicores, accelerators

C. Pernet: WP 5 30 Luxembourg, 2019-10-30

Page 46: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Extra Slides

Extra Slides

C. Pernet: WP 5 31 Luxembourg, 2019-10-30

Page 47: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outsourced computing security (D5.12)

Exploratory aspect: Computation over the Cloud

Outsourcing computation on the cloud:

I trusted lightweight client computerI untrusted powerful cloud server

⇒ need for certification protocols

Multiparty computation:

I each player contribute with a share of the inputI shares must remain private

Contribution

ISSAC’17, JSC’19: Linear time certificates for LU, Det., Rank Profilematrix, etc

IWSEC’19: Secure multiparty Strassen’s algorithm

C. Pernet: WP 5 32 Luxembourg, 2019-10-30

Page 48: Workpackage 5: High Performance Mathematical Computing · 2021. 1. 6. · T5.4 Singular D5.6, D5.7D5.13 T5.5 MPIR D5.5, D5.7 T5.6 Combinatorics D5.1D5.11 T5.7 Pythran D5.2D5.11 T5.8

Outsourced computing security (D5.12)

Exploratory aspect: Computation over the Cloud

Outsourcing computation on the cloud:

I trusted lightweight client computerI untrusted powerful cloud server

⇒ need for certification protocols

Multiparty computation:

I each player contribute with a share of the inputI shares must remain private

Contribution

ISSAC’17, JSC’19: Linear time certificates for LU, Det., Rank Profilematrix, etc

IWSEC’19: Secure multiparty Strassen’s algorithm

C. Pernet: WP 5 32 Luxembourg, 2019-10-30