automated floating-point precision analysis
DESCRIPTION
Automated Floating-Point Precision Analysis. Michael O. Lam Ph.D. Defense 6 Jan 2014 Jeff Hollingsworth, Advisor. Context. Floating-point arithmetic is ubiquitous. Context. Floating-point arithmetic represents real numbers as ( ± 1. frac × 2 exp ) Sign bit Exponent - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/1.jpg)
Automated Floating-PointPrecision Analysis
Michael O. Lam
Ph.D. Defense6 Jan 2014
Jeff Hollingsworth, Advisor
![Page 2: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/2.jpg)
2
Context
• Floating-point arithmetic is ubiquitous
![Page 3: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/3.jpg)
3
Context
Floating-point arithmetic represents real numbers as (± 1.frac × 2exp)
– Sign bit– Exponent– Significand (“mantissa” or “fraction”)
032 16 8 4
Significand (23 bits)Exponent (8 bits)
Single Precision
03264 16 8 4
Significand (52 bits)Exponent (11 bits)
Double Precision
![Page 4: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/4.jpg)
4
Context
032 16 8 4
Significand (23 bits)Exponent (8 bits)
0x40000000
03264 16 8 4
Significand (52 bits)Exponent (11 bits)
0x4000000000000000
Floating-point arithmetic represents real numbers as (± 1.frac × 2exp)
– Sign bit– Exponent– Significand (“mantissa” or “fraction”)
Representing 2.0:
![Page 5: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/5.jpg)
5
Context
032 16 8 4
Significand (23 bits)Exponent (8 bits)
0x40200000
03264 16 8 4
Significand (52 bits)Exponent (11 bits)
0x4005000000000000
Floating-point arithmetic represents real numbers as (± 1.frac × 2exp)
– Sign bit– Exponent– Significand (“mantissa” or “fraction”)
Representing 2.625:
![Page 6: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/6.jpg)
6
Context
032 16 8 4
Significand (23 bits)Exponent (8 bits)
0x3DCCCCCD
03264 16 8 4
Significand (52 bits)Exponent (11 bits)
0x3FB999999999999A
Floating-point arithmetic represents real numbers as (± 1.frac × 2exp)
– Sign bit– Exponent– Significand (“mantissa” or “fraction”)
Representing 0.1:
![Page 7: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/7.jpg)
7
Context
032 16 8 4
Significand (23 bits)Exponent (8 bits)
0x3F9DF3B6
03264 16 8 4
Significand (52 bits)Exponent (11 bits)
0x3FF3BE76C8B43958
Floating-point arithmetic represents real numbers as (± 1.frac × 2exp)
– Sign bit– Exponent– Significand (“mantissa” or “fraction”)
Representing 1.234:
![Page 8: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/8.jpg)
8
Context
• Floating-point is ubiquitous but problematic– Rounding error
• Accumulates after many operations• Not always intuitive (e.g., non-associative)• Naïve approach: higher precision
– Lower precision is preferable• Tesla K20X is 2.3X faster in single precision• Xeon Phi is 2.0X faster in single precision• Single precision uses 50% of the memory bandwidth
![Page 9: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/9.jpg)
9
Problem
• Current analysis solutions are lacking– Numerical analysis methods are difficult– Static analysis is too conservative– Trial-and-error is time-consuming
• We need better analysis solutions– Produce easy-to-understand results– Incorporate runtime effects– Automated or semi-automated
![Page 10: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/10.jpg)
10
Thesis
Automated runtime analysis techniques can inform application developers regarding floating-point behavior,
and can provide insights to guide developers towards reducing precision with minimal impact on accuracy.
![Page 11: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/11.jpg)
11
Contributions
1. Floating-point software analysis framework2. Cancellation detection3. Mixed-precision configuration4. Reduced-precision analysis
Initial emphasis on capabilityover performance
2.7182818284590452353603...
![Page 12: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/12.jpg)
12
Example: Sum2PI_Xint sum2pi_x() { int i, j, k; real x, y, acc, sum; real final = PI * OUTER; /* correct answer */
sum = 0.0; for (i=0; i<OUTER; i++) { acc = 0.0; for (j=1; j<INNER; j++) {
/* calculate 2^j */ x = 1.0; for (k=0; k<j; k++) x *= 2.0; /* 870K execs */
/* approximately calculate pi */ y = (real)PI / x; /* 58K execs */ acc += y; /* 58K execs */ } sum += acc; /* 2K execs */ } real err = abs(final-sum)/abs(final); if (err < EPS) printf(“SUCCESSFUL!\n"); else printf(“FAILED!!!\n");}
/* SUM2PI_X – approximate pi*x in a computationally- * heavy way to demonstrate various CRAFT analyses */
/* constants */#define PI 3.14159265359#define EPS 1e-7
/* loop iterations; OUTER is X */#define OUTER 2000#define INNER 30
![Page 13: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/13.jpg)
13
Contribution 1 of 4
Software Framework
![Page 14: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/14.jpg)
14
Framework
CRAFT: Configurable Runtime Analysis for Floating-point Tuning
2.7182818284590452353603...
![Page 15: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/15.jpg)
15
Framework
• Dyninst: a binary analysis library– Parses executable files (InstructionAPI & ParseAPI)– Inserts instrumentation (DyninstAPI)– Supports full binary modification (PatchAPI)– Rewrites binary executable files (SymtabAPI)
• Binary-level analysis benefits– Programming language-agnostic– Supports closed third-party libraries– Sensitive to compiler transformations
![Page 16: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/16.jpg)
16
Framework
• CRAFT framework– Dyninst-based binary mutator (C/C++)– Swing-based GUI viewers (Java)– Automated search scripts (Ruby)
• Proof-of-concept analyses– Instruction counting– Not-a-Number (NaN) detection– Range tracking (from Brown et al. 2007)
![Page 17: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/17.jpg)
17
Sum2PI_X
No NaNs detected
![Page 18: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/18.jpg)
18
Contribution 2 of 4
Cancellation Detection
![Page 19: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/19.jpg)
19
Cancellation
• Loss of significant digits due to subtraction
• Cancellation detection– Instrument every addition and subtraction– Report cancellation events
2.491264 (7) 1.613647 (7) - 2.491252 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0)
(5 digits cancelled) (all digits cancelled)
PRECISION
![Page 20: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/20.jpg)
20
Cancellation: GUI
![Page 21: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/21.jpg)
21
Cancellation: GUI
![Page 22: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/22.jpg)
22
Cancellation: Sum2PI_X
Version SignificandSize (bits)
CanceledBits
Single 23 18Mixed 23/52 23Double 52 29
![Page 23: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/23.jpg)
23
Cancellation: Results
• Gaussian elimination– Detect effects of a small pivot value– Highlight algorithmic differences
• Domain-specific insights– Dense point fields– Color saturations
• Error checking– Larger cancellations are better
![Page 24: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/24.jpg)
24
Cancellation: Conclusions
• Automated analysis can detect cancellation• Cancellation detection serves a wide variety of
purposes• Later work expanded the ability to identify
problematic cancellation [Benz et al. 2012]
![Page 25: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/25.jpg)
25
Contribution 3 of 4
Mixed Precision
![Page 26: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/26.jpg)
26
Mixed Precision
• Tradeoff: Single (32 bits) vs. Double (64 bits)• Single precision is faster– 2X+ computational speedup in recent hardware– 50% reduction in memory storage and bandwidth
• Double precision is more accurate– 16 digits vs. 7 digits
![Page 27: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/27.jpg)
27
Mixed Precision
• Most operations use single precision• Crucial operations use double precision
1: LU ← PA2: solve Ly = Pb3: solve Ux0 = y4: for k = 1, 2, ... do5: rk ← b – Axk-1
6: solve Ly = Prk
7: solve Uzk = y8: xk ← xk-1 + zk
9: check for convergence10: end for
Red text indicates double-precision(all other steps are single-precision)
Mixed-precision linear solver[Buttari 2008]
Difficult to prototype
50% speedup on average(12X in special cases)
![Page 28: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/28.jpg)
28
Mixed Precision
OriginalBinary Modified
BinaryCRAFT
Double Precision Mixed Precision
MixedConfig
![Page 29: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/29.jpg)
29
Mixed Precision
• Simulate single precision by storing 32-bit version inside 64-bit double-precision field
downcast conversion
03264 16 8 4
Double
03264 16 8 4ReplacedDouble
7 F F 4 D E A D
Non-signalling NaN 032 16 8 4
Single
![Page 30: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/30.jpg)
30
Mixed Precision
gvec[i,j] = gvec[i,j] * lvec[3] + gvar
1 movsd 0x601e38(%rax, %rbx, 8) %xmm0
2 mulsd -0x78(%rsp) * %xmm0 %xmm0
3 addsd -0x4f02(%rip) + %xmm0 %xmm0
4 movsd %xmm0 0x601e38(%rax, %rbx, 8)
![Page 31: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/31.jpg)
31
Mixed Precision
gvec[i,j] = gvec[i,j] * lvec[3] + gvar
1 movsd 0x601e38(%rax, %rbx, 8) %xmm0check/replace -0x78(%rsp) and %xmm0
2 mulss -0x78(%rsp) * %xmm0 %xmm0check/replace -0x4f02(%rip) and %xmm0
3 addss -0x4f02(%rip) + %xmm0 %xmm0
4 movsd %xmm0 0x601e38(%rax, %rbx, 8)
![Page 32: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/32.jpg)
32
Mixed Precision
![Page 33: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/33.jpg)
33
Mixed Precision
push %raxpush %rbx
<for each input operand> <copy input into %rax> mov %rbx, 0xffffffff00000000 and %rax, %rbx # extract high word mov %rbx, 0x7ff4dead00000000 test %rax, %rbx # check for flag je next # skip if replaced <copy input into %rax> cvtsd2ss %rax, %rax # down-cast value or %rax, %rbx # set flag <copy %rax back into input>next: <next operand>
pop %rbxpop %rax
<replaced instruction> # e.g. addsd => addss
![Page 34: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/34.jpg)
34
Mixed Precision
• Question: Which parts to replace?• Answer: Automatic search– Empirical, iterative feedback loop– User-defined verification routine– Heuristic search optimization
![Page 35: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/35.jpg)
35
Automated Search
![Page 36: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/36.jpg)
36
Automated Search
![Page 37: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/37.jpg)
37
Automated Search
• Keys to search algorithm– Depth-first search
• Look for replaceable larger structures first• Modules, functions, blocks, etc.
– Prioritization• Inspect highly-executed routines first
![Page 38: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/38.jpg)
38
Mixed Precision: Sum2PI_X
Failed single-precisionreplacement
![Page 39: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/39.jpg)
39
Mixed Precision: Sum2PI_Xint sum2pi_x() { int i, j, k; real x, y, acc; sum_type sum;
real final = PI * OUTER;
sum = 0.0; for (i=0; i<OUTER; i++) { acc = 0.0; for (j=1; j<INNER; j++) {
x = 1.0; for (k=0; k<j; k++) x *= 2.0;
y = (real)PI / x; acc += y; } sum += acc; } real err = abs(final-sum)/abs(final); if (err < EPS) printf(“SUCCESSFUL!\n"); else printf(“FAILED!!!\n");}
real
32 64
sum type
32 ✗64 ? ✔
/* SUM2PI_X – approximate pi*x in a computationally- * heavy way to demonstrate various CRAFT analyses */
/* constants */#define PI 3.14159265359#define EPS 1e-7
/* loop iterations; OUTER is X */#define OUTER 2000#define INNER 30
![Page 40: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/40.jpg)
40
Mixed Precision: Sum2PI_Xint sum2pi_x() { int i, j, k; real x, y, acc; sum_type sum;
real final = PI * OUTER;
sum = 0.0; for (i=0; i<OUTER; i++) { acc = 0.0; for (j=1; j<INNER; j++) {
x = 1.0; for (k=0; k<j; k++) x *= 2.0;
y = (real)PI / x; acc += y; } sum += acc; } real err = abs(final-sum)/abs(final); if (err < EPS) printf(“SUCCESSFUL!\n"); else printf(“FAILED!!!\n");}
real
32 64
sum type
32 ✗64 ✔ ✔
/* SUM2PI_X – approximate pi*x in a computationally- * heavy way to demonstrate various CRAFT analyses */
/* constants */#define PI 3.14159265359#define EPS 1e-7
/* loop iterations; OUTER is X */#define OUTER 2000#define INNER 30
![Page 41: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/41.jpg)
41
Mixed Precision: Results
• SuperLU– Lower error threshold = fewer replacements
Threshold % Executions Replaced
Final Error
1.0e-03 99.9 1.59e-04
1.0e-04 87.3 4.42e-05
7.5e-05 52.5 4.40e-05
5.0e-05 45.2 3.00e-05
2.5e-05 26.6 1.69e-05
1.0e-05 1.6 7.15e-07
1.0e-06 1.6 4.7e7-07
![Page 42: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/42.jpg)
43
Mixed Precision: Results
• AMGmk– Highly-adaptive multigrid microkernel– Built-in error tolerance– Search found complete replacement– Manual conversion
• Speedup: 175s to 95s (1.8X)• Conventional x86_64 hardware
![Page 43: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/43.jpg)
46
Mixed Precision: ResultsBenchmark(name.CLASS)
CandidateInstructions
Configurations Tested
% Dynamic Replaced
bt.W 6,228 3,934 83.2
bt.A 6,262 4,000 78.6
cg.W 962 251 7.4
cg.A 956 255 5.6
ep.W 423 117 47.2
ep.A 423 114 45.5
ft.W 426 75 0.3
ft.A 426 74 0.2
lu.W 6,038 4,117 57.4
lu.A 6,014 3,057 57.4
mg.W 1,393 443 39.2
mg.A 1,393 437 36.6
sp.W 4,458 5,124 40.5
sp.A 4,507 4,920 30.5
![Page 44: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/44.jpg)
48
Mixed Precision: Results
• Memory-based analysis– Replacement candidates: output operands– Generally higher replacement rates– Analysis found several valid variable-level replacements
Benchmark(name.CLASS)
CandidateOperands
Configurations Tested
% Executions Replaced
bt.A 2,342 300 97.0
cg.A 287 68 71.3
ep.A 236 59 37.9
ft.A 466 108 46.2
lu.A 1,742 104 99.9
mg.A 597 153 83.4
sp.A 1,525 1,094 88.9
![Page 45: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/45.jpg)
49
Mixed Precision: Conclusions
• Automated tools can prototype mixed-precision configurations
• Automated search can provide precision-level replacement insights
• Precision analysis could provide another “knob” for application tuning
• Even if computation requires double precision, storage/communication may not
![Page 46: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/46.jpg)
50
Contribution 4 of 4
Reduced Precision
![Page 47: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/47.jpg)
51
Reduced Precision
• Simulate reduced precision with truncation– Truncate result after every operation– Allows zero up to double (64-bit) precision– Less overhead (fewer added operations)
• Search routine– Identifies component-level precision requirements
0 Single Double Single Double
vs.
![Page 48: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/48.jpg)
52
Reduced Precision: GUI
• Bit-level precision requirements
0 Single Double
![Page 49: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/49.jpg)
53
Reduced Precision: Sum2PI_X
0 bits (single – exponent only)
22 bits (single)
27 bits (double – overly conservative)
32 bits (double)
![Page 50: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/50.jpg)
54
Reduced Precision
• Faster search convergence compared to mixed-precision analysis
Benchmark Instructions OriginalWall time (s)
Speedup
cg.A 956 1,305 59.2%ep.A 423 978 42.5%ft.A 426 825 50.2%lu.A 6,014 514,332 86.7%mg.A 1,393 2,898 66.0%sp.A 4,507 422,371 44.1%
![Page 51: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/51.jpg)
55
Reduced Precision
• General precision requirement profiles
Low sensitivity High sensitivity
![Page 52: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/52.jpg)
56
Reduced Precision: ResultsNAS (top) & LAMMPS (bottom)
bt.A (78.6%)
chute
mg.A (36.6%) ft.A (0.2%)
lj rhodo
![Page 53: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/53.jpg)
57
Reduced Precision: ResultsNAS mg.W (incremental)
>5.0% - 4:66
>0.1% - 15:45
>1.0% - 5:93 >0.5% - 9:45
>0.05% - 23:60 Full – 28:71
![Page 54: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/54.jpg)
58
Reduced Precision: Conclusions
• Automated analysis can identify general precision level requirements
• Reduced-precision analysis provides results more quickly than mixed-precision analysis
• Incremental searches reduce the time to solution without sacrificing fidelity
![Page 55: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/55.jpg)
59
Contributions
• General floating-point analysis framework– 32.3K LOC total in ~200 files– LGPL on Sourceforge: sf.net/p/crafthpc
• Cancellation detection– WHIST’11 paper, PARCO 39/3 article
• Mixed-precision configuration– SC’12 poster, ICS’13 paper
• Reduced-precision analysis– ICS’14 submission in preparation
![Page 56: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/56.jpg)
60
Future Work
• Short term– Optimization and platform ports– Analysis extension and composition– Further case studies
• Long term– Compiler-based implementation– IDE and development cycle integration– Program modeling and verification
![Page 57: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/57.jpg)
61
Conclusion
Automated runtime analysis techniques can inform application developers regarding floating-point behavior,
and can provide insights to guide developers towards reducing precision with minimal impact on accuracy.
![Page 58: Automated Floating-Point Precision Analysis](https://reader035.vdocuments.site/reader035/viewer/2022081517/56816307550346895dd38381/html5/thumbnails/58.jpg)
62
Acknowledgements– Collaborators –
Jeff Hollingsworth (advisor) and Pete Stewart (UMD)Bronis de Supinski, Matt Legendre, et al. (LLNL)
– Colleagues –Ananta Tiwari, Tugrul Ince, Geoff Stoker,
Nick Rutar, Ray Chen, et al.CS Department @ UMD
Intel XED2
– Family & Friends –Lindsay Lam (spouse)
Neil & Alice Lam, Barry & Susan WaltersWallace PCA and Elkton EPC
cartoon byNick Rutar