floating point numbers

18
Floating Point Numbers

Upload: glynn

Post on 05-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Floating Point Numbers. It's all just 1s and 0s. Computers are fundamentally driven by logic and thus bits of data Manipulation of bits can be done incredibly quickly Given n bits of information, there are 2 n possible combinations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Floating Point Numbers

Floating Point Numbers

Page 2: Floating Point Numbers

It's all just 1s and 0s

Computers are fundamentally driven by logic and thus bits of data Manipulation of bits can be done incredibly

quickly

Given n bits of information, there are 2n possible combinationsThese 2n representations can encode pretty much anything you want, letters, numbers, instructions….

Page 3: Floating Point Numbers

Bases of number systems

Base 10 numbers: 0,1,2,3,4,5,6,7,8,9 3107 = 3103 +1102 + 0101 +7100

Base 2 numbers: 0,1 3107 = 1 2 4 8 16 32 64 128 256 512 1024

2048 =1211 + 1210 + 029 + 028 + 027 + 026 + 125

+ 024 + 023 + 022 + 121 + 120

=110000100011

Addition, multiplication etc, all proceed same way

Page 4: Floating Point Numbers

Base Notation

What does 10 mean? 10 in binary = 2 decimal 10 in octal (base 8) = 8 decimal 10 in decimal = 10 decimal

Need some method of differentiating between these possibilitiesTo avoid confusion, where necessary we write 1010= 102=

Page 5: Floating Point Numbers

Integer Representation

Integers obviously fit into this base 2 notationsRemains challenge to represent negative numbers 2s complement Excess-N

Extra choice is order of bitsChoice is made chip-by-chip portability

Page 6: Floating Point Numbers

Floating Point Representation

Computers represent oating point numbers in binary form

For generality, they use a binary form of scientic notation

329.25 = 0.2925 10

In binary, we can use powers of 2

29.25

Page 7: Floating Point Numbers

Floating Point Size

In IEEE.h IEEE.h:#define IEEE_FLOAT_SIZE 4 IEEE.h:#define IEEE_DOUBLE_SIZE 8 IEEE.h:#define IEEE_QUAD_SIZE 16

Page 8: Floating Point Numbers

Distribution

Precision

# bits

MantissaBits

Expon.Bits

SignBit

Single 32 23 8 1

Double 64 52 11 1

Page 9: Floating Point Numbers

In Decimal Terms

Each binary floating point double holds roughly 16 decimal digits technically, 2^(-52)

MATLAB example

Page 10: Floating Point Numbers

Advantages

Scientific notation can work on any scale (all handled by exponent)So long as errors are small relative to scale of data values, calculations are accurate right?

Page 11: Floating Point Numbers

Example 1

1e12 + 0.2 – 1e12

Page 12: Floating Point Numbers

Problem

Nice decimal numbers (0.2) have continuing binary representations like 1/3 = 0.3333333, 0.2 has binary

0.0011 0011 0011 0011…

Analogy with adding, subtracting large number

Page 13: Floating Point Numbers

Roundoff Error

Round-off error will always be present e.g. Roundoff error is more significant when you are subtracting two almost equal quantitiese.g in decimal, 255.67 – 255.69

Page 14: Floating Point Numbers

Example 2

A = 112000000 B = 100000 C = 0.0009 X = A - B / C

Page 15: Floating Point Numbers

Common occurrence

Delta x in finite element methods numerical differentiation

Places where more closely packed data gives

Page 16: Floating Point Numbers

Example 3: Numerical Diff.

Page 17: Floating Point Numbers

Example 4: Recursion

Comparing sum of delta x and real sum t = 0; N = 10000; dx = 1/N; for (I = 1:N)

t = t + dx; end

Page 18: Floating Point Numbers

Avoiding (Large) Roundoff Error

Avoid substracting almost-equal quantitiesAvoid dividing by small quantitiesAvoid sums over large loops, especially with different orders of magnitude in the sumAvoid recursive calculations, where errors will accumulate