research matters nick higham february 25, 2009 school of ...higham/talks/chebfun12.pdf · going to...

49
Challenges in Exploiting Higher Precision Floating Point Arithmetic Nick Higham School of Mathematics The University of Manchester [email protected] http://www.ma.man.ac.uk/~higham @nhigham Chebfun and Beyond Conference, Oxford September 17 - 19, 2012

Upload: lynhan

Post on 28-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

Research Matters

February 25 2009

Nick HighamDirector of Research

School of Mathematics

1 6

Challenges in ExploitingHigher Precision Floating Point Arithmetic

Nick HighamSchool of Mathematics

The University of Manchester

highammamanacukhttpwwwmamanacuk~higham

nhigham

Chebfun and Beyond Conference OxfordSeptember 17 - 19 2012

Floating Point Number System

Floating point number system F sub R

y = plusmnm times βeminust 0 le m le βt minus 1

Base βprecision t exponent range emin le e le emax

Floating point numbers are not equally spaced

If β = 2 t = 3 emin = minus1 and emax = 3

0 05 10 20 30 40 50 60 70

University of Manchester Nick Higham Higher Precision Arithmetic 2 33

Floating Point Number System

Floating point number system F sub R

y = plusmnm times βeminust 0 le m le βt minus 1

Base βprecision t exponent range emin le e le emax

Floating point numbers are not equally spaced

If β = 2 t = 3 emin = minus1 and emax = 3

0 05 10 20 30 40 50 60 70

University of Manchester Nick Higham Higher Precision Arithmetic 2 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Floating Point Number System

Floating point number system F sub R

y = plusmnm times βeminust 0 le m le βt minus 1

Base βprecision t exponent range emin le e le emax

Floating point numbers are not equally spaced

If β = 2 t = 3 emin = minus1 and emax = 3

0 05 10 20 30 40 50 60 70

University of Manchester Nick Higham Higher Precision Arithmetic 2 33

Floating Point Number System

Floating point number system F sub R

y = plusmnm times βeminust 0 le m le βt minus 1

Base βprecision t exponent range emin le e le emax

Floating point numbers are not equally spaced

If β = 2 t = 3 emin = minus1 and emax = 3

0 05 10 20 30 40 50 60 70

University of Manchester Nick Higham Higher Precision Arithmetic 2 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Floating Point Number System

Floating point number system F sub R

y = plusmnm times βeminust 0 le m le βt minus 1

Base βprecision t exponent range emin le e le emax

Floating point numbers are not equally spaced

If β = 2 t = 3 emin = minus1 and emax = 3

0 05 10 20 30 40 50 60 70

University of Manchester Nick Higham Higher Precision Arithmetic 2 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Precision versus Accuracy

Unit roundoff u = 12β

1minust

fl(abc) = ab(1 + δ1) middot c(1 + δ2) |δi | le u= abc(1 + δ1)(1 + δ2)

asymp abc(1 + δ1 + δ2)

Precision = uAccuracy asymp 2u

Accuracy is not limited by precision

University of Manchester Nick Higham Higher Precision Arithmetic 3 33

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Walkersrsquo Trouser Review

bull pARAMO Torres Trousers pound105 Waterproof insulated trousers to be worn as

an outer layer or next to the skin I understand the concept but the execution is unusual The waist is

bull wide relative to thighs and the cut is narrow around backside and crotch (despi te a gusset) There is only

it The wa ist connects like a nappy adjust the stati c integral webbing belt via a large quick-release buckle (which can clash with a rucksack or camera bag hipbelt) then

fold a fabric fla p up to the belt and secure with two Ve lcro tabs This leaves an air gap on each side between

waist and the top of the leg zip Knees articulate but when the leg is raised above step height the fabric is restrictive across the thigh and backside Full length side zips (legs cannot be easily shortened) allow rapid zipping on or off - useful if they are worn as a supershywarm overtrouser Paramo say Torres represent a practical su rviva l aid but as a next-to-skin trouser they fee l compromised by the cut and an insulated overt rouser has a limited market

THE LOWDOWN

Fabric Nikwax Analogy Insulator (polyester microfibre outer 100g polyester fill) Sizes XS-XL (unisex) Inside leg 79cm only Waist integral belt front flap with Velcro tabs

IHtIrnl Paramo 01892 786444 wwwparamo_couk

Cit THE NORTH FACE Insulated Trekker Pant Heading to the Antarctic Pack a pair of

these Inside the stretch nylon exterior is a quilted taffeta lining its li ke being wrapped in a duvet The on ly hitch is that outside co ld dry cond it ions theyre often too warm There are no leg vents and no water repellency although the insulation stops moisture (mist not rain) seeping through Breathability is good for such a warm garment and they are super-comfortable in chilly weather A choice of leg lengths is available and the plain hem is easily adjusted The waist is generous with belt loops and a static drawcord to keep them in place although the drawcord isnt very effective against the weight and bulk of the trousers Pockets are odd the zipped side pockets are on ly just big enough for my (small) hands and Im sti ll searching for a use for the zipped pocket behind the left thigh I found these pants too warm for British hill walking but appreciated them in cold dry weather in the French Alps

THE LOWDOWN Fabric 90 nylon 10 elastane polyester insulation Sizes Men 30-38 Women 8-16

IHtImU The North Face 01539822155 wwwthenorthfacecomeu

Inside leg Men Regular 80-83cm Long 85-88cm Women Short 71-75cm Regular 76-80cm Long 81-85cm_ All size graduated Waist zip 2 press studs belt loops static drawcord Pockets 2 zipped front 1 zipped rear 1 zipped back thigh

pound65

January 2011 Outdoor Photography 75

University of Manchester Nick Higham Higher Precision Arithmetic 5 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksXYZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

RGB to XYZ

From CIE Standard (1931)XYZ

=

049 031 020017697 081240 001063

0 001 099

RGB

But in many booksX

YZ

=

049000 031000 020000017697 081240 001063

0 001000 099000

RGB

University of Manchester Nick Higham Higher Precision Arithmetic 6 33

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

IEEE Standard 754-1985

Binary β = 2

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

Arithmetic ops (+minus lowast radic) performed as if firstcalculated to infinite precision then roundedDefault round to nearest round to even in case of tie

University of Manchester Nick Higham Higher Precision Arithmetic 7 33

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Kahan on Higher Precision

For now the 10-byte Extended format is atolerable compromise between

the value of extra-precise arithmetic and the price ofimplementing it to run fast

very soon two more bytes of precision will become tolerableand ultimately a 16-byte format

That kind of gradual evolution towards wider precision

was already in view whenIEEE Standard 754 for Floating-Point Arithmetic was framed

mdash Computer Benchmarks Versus Accuracy (1994)

University of Manchester Nick Higham Higher Precision Arithmetic 8 33

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

IEEE Standard 754-2008

Type Size Range u = 2minust

single 32 bits 10plusmn38 2minus24 asymp 60times 10minus8

double 64 bits 10plusmn308 2minus53 asymp 11times 10minus16

quadruple 128 bits 10plusmn4932 2minus113 asymp 96times 10minus35

University of Manchester Nick Higham Higher Precision Arithmetic 9 33

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Extended and Mixed Precision BLAS (2002)

Part of new standard developed by BLAS TechnicalForumExtra precision used internally input and outputarguments remain single or doubleExtra precision is anything at least 15 times asaccurate as double precision and wider than 80 bitsCounterparts of selected level 1 2 and 3 BLASroutines Extra input argument specifies precision ofinternal computationsReference implementation uses double-double formatgiving extra precision of about 106 bits

University of Manchester Nick Higham Higher Precision Arithmetic 10 33

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Need for Higher Precision

Bailey Simon Barton amp Fouts Floating PointArithmetic in Future Supercomputers Internat JSupercomputer Appl 3 86ndash90 1989Bailey Barrio amp Borwein High-Precision ComputationMathematical Physics and Dynamics Appl MathComput 218 10106ndash10121 2012

Long-time simulationsResolving small-scale phenomenaLarge-scale simulations

University of Manchester Nick Higham Higher Precision Arithmetic 11 33

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Going to Higher Precision

If we have quadruple or higher precision what do we needto do to modify existing algorithms

To what extent are existing algs precision-independent

University of Manchester Nick Higham Higher Precision Arithmetic 13 33

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Direct Matrix Factorizations

Threshold pivoting for sparse matrices may depend onprecisionCondition estimation to detect nearly singular matrices

gtgt A = gallery(rsquolotkinrsquo14) x = Arandn(n1)Warning Matrix is close to singular or badly scaledResults may be inaccurate RCOND = 2761711e-18

Iterative refinementEquilibration

But all ldquorely only on LAPACK DLAMCHrdquo

University of Manchester Nick Higham Higher Precision Arithmetic 14 33

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eAlog(A) cos(A) At use Padeacute approximants

Padeacute degree chosen to achieve accuracy uPadeacute coeffs and algorithm parameters need rederivingfor a different u Logic may changeexpm logm need changing for smaller u

Methods based on best Linfin approximations to eA forHermitian A also need higher order approximationsderiving

Scalar elementary functions

University of Manchester Nick Higham Higher Precision Arithmetic 15 33

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Iterative Algorithms

QR-based Dynamically Weighted Halley IterationNakatsukasa Bai amp Gygi (2010)X0 = Aα isin Cmtimesn (m ge n)[radic

ck Xk

I

]=

[Q1

Q2

]R

Xk+1 =bk

ckXk +

1radicck

(ak minus

bk

ck

)Q1Qlowast2

Cubically convergent with limkrarrinfin Xk = U whereA = UH is a polar decompositionForms basis of new spectral divide amp conquer algs forsymm eirsquoproblem and SVD (H amp Nakatsukasa 2012)

University of Manchester Nick Higham Higher Precision Arithmetic 16 33

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Maximum Number of Iterations

If κ2(A) le uminus1

u 10minus16 10minus32 10minus64 10minus128

Scaled Newton 9 11 13 15QDWH 6 7 8 9

Increasing the precision makes little difference to thecost in flops

University of Manchester Nick Higham Higher Precision Arithmetic 17 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Derivative Approximations

Freacutechet derivative of f Cntimesn rarr Cntimesn

Lf (AE) asymp f (A + hE)minus f (A)h

forward difference

with error O(h) and hopt =(

uf (A)E2

)12

If f Rntimesn rarr Rntimesn and AE isin Rntimesn (Al-Mohy amp H 2010)

Lf (AE) asymp Imf (A + ihE)

hcomplex step

has error O(h2) and can take h arbitrarily small eg 10minus100

University of Manchester Nick Higham Higher Precision Arithmetic 18 33

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Mixed Precision

Even if we have higher precision we might not need to useit everywhere

Iterative refinement for Ax = bIterative refinement for ersquovalue problemsPolynomial root finding (Boyd this workshop)Petschow Quintana-Ortiacute amp Bientinesi (2012) usingquad precision to improve orthogonality of doubleprecision MRRR

University of Manchester Nick Higham Higher Precision Arithmetic 19 33

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Chebfun

How would Chebfun need to change with higher precisionarithmetic

gtgt help chebfunprefchebfunpref Settings for Chebfun

eps - Relative tolerance used in construction andsubsequent operationsFactory value is 2^-52 (Matlabrsquos factoryvalue of machine epsilon)

University of Manchester Nick Higham Higher Precision Arithmetic 20 33

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Chebfun

function pass = complexrotation This code makes sure a few things are ok if you make them complex eg integration norms inner products and singular values LNT 20 May 2008

f = chebfun((x) exp(x))fi = chebfun((x) 1iexp(x))g = chebfun(rsquo1(2-x)rsquo)gi = chebfun(rsquo1i(2-x)rsquo)A = [f g]

pass(1) = (sum(fi)==1isum(f))pass(2) = norm(fi)==norm(f)pass(3) = abs((frsquog)-((1if)rsquo(1ig))) lt 1e-15pass(4) = (norm(giinf)-norm(ginf)) lt 1e-15pass(5) = (norm(fi1)-norm(f1)) lt 1e-15pass(6) = norm(svd(A) - svd(1iA)) lt 1e-15

University of Manchester Nick Higham Higher Precision Arithmetic 21 33

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Unstable Algorithms in Higher Precision

For unstable alg where error bounds available increase theprecision accordingly

University of Manchester Nick Higham Higher Precision Arithmetic 22 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

RandomizationA isin Cntimesn with A = 1Davies (2007) ldquoapproximate diagonalizationrdquo

tol = 1e-8[VD] = eig(A + tolrandn(n))F = Vdiag(feval(fundiag(D)))V

Manchester usage in VPA

digits(d)tol = 10^(-d2)[VD] = eig(vpa(A) + tolvpa( rand(n) ))F = Vdiag(feval(fundiag(D)))V

Daviesrsquo analysis suggests rel err asymp 10minusd2

H amp Relton (in progress)

University of Manchester Nick Higham Higher Precision Arithmetic 23 33

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Tiny Relative Errors

Normwise relative errors

θinfin(x y) =x minus yinfinxinfin

from a numerical experiment

132e-22 339e-22 339e-21 867e-20139e-18 436e-18 530e-18 583e-18145e-17 376e-17 376e-17 427e-17

What precision was used

University of Manchester Nick Higham Higher Precision Arithmetic 24 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Base β = 2 u = 2minust Dingle amp H (2011)

Theorem

If x 6= 0 and y are distinct normalized flpt numbers then|x minus y ||x | ge u and this lower bound is attainable

Theorem

Let 0 6= x y isin Rn be vectors of normalized flpt numbers Ifθinfin(x y) lt u then xk = yk for all k st |xk | = xinfin

x =

[1

10minus22

] y =

[1

2times 10minus22

] θinfin(x y) = 10minus22

Tiny relative errors can corrupt performance profilesSee cure in Dingle amp H (2011)

University of Manchester Nick Higham Higher Precision Arithmetic 25 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Increasing the Precision

y = eπradic

163 evaluated at t digit precision

t y20 2625374126407687440025 262537412640768744000000030 262537412640768743999999999999

Is the last digit before the decimal point 4

t y35 2625374126407687439999999999992500740 2625374126407687439999999999992500725972

So no itrsquos 3

University of Manchester Nick Higham Higher Precision Arithmetic 26 33

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Another Example

Consider the evaluation in precision u = 2minust of

y = x + a sin(bx) x = 17 a = 10minus8 b = 224

10 15 20 25 30 35 4010

minus14

10minus13

10minus12

10minus11

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

t

error

University of Manchester Nick Higham Higher Precision Arithmetic 27 33

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

MythIncreasing the precision at which a computation isperformed increases the accuracy of the answer

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Rounding Errors in Digital Imaging

An 8-bit RGB image is an m times n times 3 array of integersaijk isin 01 255Every editing op (levels curves colour balance )aijk larr round(fijk(aijk)) incurs a rounding error

Should we edit in 16-bit

University of Manchester Nick Higham Higher Precision Arithmetic 29 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

16-bit vs 8-bit editing

SLR cameras generate image at 12ndash14 bits internally

8-bit 0 1 25516-bit 0 39times 10minus3 255

Controversial Margulis says using 16 bits makes nopractical difference in quality

Relevant metric is not normwise relative error

University of Manchester Nick Higham Higher Precision Arithmetic 30 33

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

High Precision in MATLAB

VPA arithmetic in Symbolic Math ToolboxAdvanpix Multiprecision Computing Toolbox forMATLAB

University of Manchester Nick Higham Higher Precision Arithmetic 31 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

gtgt which -all eigbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleeig) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleeig) double methodCMATLAB2012btoolboxsymbolicsymbolicsymeigm sym methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedeigm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayeigm gpuArray methoddmatlabchebfunlinopeigm linop methoddmatlabchebfunchebopeigm chebop method

gtgt which -all qrbuilt-in (CMATLAB2012btoolboxmatlabmatfunsingleqr) single methodbuilt-in (CMATLAB2012btoolboxmatlabmatfundoubleqr) double methoddmatlabMultiprecision_Computing_Toolboxmpm mp methodCMATLAB2012btoolboxdistcompparallelcodistributedqrm codistributed methodCMATLAB2012btoolboxdistcompgpugpuArrayqrm gpuArray methoddmatlabchebfunchebfunqrm chebfun method

gtgt which -all toeplitzCMATLAB2012btoolboxmatlabelmattoeplitzm

University of Manchester Nick Higham Higher Precision Arithmetic 32 33

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

Conclusions amp Future Directions

Need to prepare for quad precision in hardwareArbitrary precision arithmetic in software increasinglyuseful Are we fully exploiting it

High precision veryuseful for computingldquoexact answersrdquo fortesting algs

NLA group

University of Manchester Nick Higham Higher Precision Arithmetic 33 33

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

References I

Multiprecision Computing Toolbox for MATLABAdvanpix LLC Tokyo|httpwwwadvanpixcom|

A H Al-Mohy and N J HighamThe complex step approximation to the Freacutechetderivative of a matrix functionNumer Algorithms 53(1)133ndash148 2010

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IInt J High Performance Computing Applications 16(1)1ndash111 2002

University of Manchester Nick Higham Higher Precision Arithmetic 1 4

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

References II

Basic Linear Algebra Subprograms Technical (BLAST)Forum Standard IIInt J High Performance Computing Applications 16(2)115ndash199 2002

E B DaviesApproximate diagonalizationSIAM J Matrix Anal Appl 29(4)1051ndash1064 2007

University of Manchester Nick Higham Higher Precision Arithmetic 2 4

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

References III

N J Dingle and N J HighamReducing the influence of tiny normwise relative errorson performance profilesMIMS EPrint 201190 Manchester Institute forMathematical Sciences The University of ManchesterUK Nov 201111 pp

W KahanComputer benchmarks versus accuracyDraft manuscript June 1994

University of Manchester Nick Higham Higher Precision Arithmetic 3 4

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix

References IV

X S Li J W Demmel D H Bailey G Henry Y HidaJ Iskandar W Kahan S Y Kang A Kapur M CMartin B J Thompson T Tung and D J YooDesign implementation and testing of extended andmixed precision BLASACM Trans Math Software 28(2)152ndash205 June 2002

D MargulisPhotoshop LAB Color The Canyon Conundrum andOther Adventures in the Most Powerful ColorspacePeachpit Press Berkeley CA USA 2006ISBN 0-321-35678-0xviii+366 pp

University of Manchester Nick Higham Higher Precision Arithmetic 4 4

  • Appendix