floating point numbers & parallel computing. outline fixed-point numbers floating point numbers...

Floating Point Numbers& Parallel Computing

2

Outline

• Fixed-point Numbers• Floating Point Numbers• Superscalar Processors• Multithreading• Homogeneous Multiprocessing• Heterogeneous Multiprocessing

3.141592653589793238462643383…

3

Fixed-point Numbers

• How to represent rational numbers in binary?• One way: define binary “point” between integer and fraction• Analogous to point between integer and fraction for decimal

numbers:

6.75

integer point fraction

4

Fixed-point Numbers

• Point’s position is static (cannot be changed)• E.g., point goes between 3rd and 4th bits of byte:

0110.1100

4 bits for integer

component

4 bits for fraction

component

5

Fixed-point Numbers

• Integer component: binary interpreted as before• LSB is 20

0110.1100

= 22 + 21

= 4+2= 6

6

Fixed-point Numbers

• Fraction component: binary interpreted slightly differently• MSB is 2-1

0110.1100

= 2-1 + 2-2

= 0.5 + 0.25

= 0.75

7

Fixed-point Numbers

0110.1100= 2-1 + 2-2

= 0.5 + 0.25

= 0.75

= 22 + 21

= 4+2= 6

6.75

8

Fixed-point Numbers

• How to represent negative numbers?• 2’s complement notation

-2.375

1101.1010

9

Fixed-point Numbers

1. Invert bits2. Add 13. Convert to fixed-point decimal4. Multiply by -1

1101.1010

0010.0101

0010.0110

= 2-2 + 2-3

= 0.25 + 0.125

= 0.375

21 = 2

2.375-2.375

10

Outline


3.141592653589793238462643383…

11

Floating Point Numbers

• Analogous to scientific notation• E.g., 4.1 × 10 3 = 4100

• Gets around limitations of constant integer and fraction sizes

• Allows representation of very small and very large numbers

12


• Just like scientific notation, floating point numbers have:• sign (±)• mantissa (M)• base (B)• exponent (E)

4.1 × 10 3 = 4100

M = 4.1

B = 10

E = 3

13


• Floating point numbers in binary

32 bits

sign 1 bit

exponent8 bits

mantissa23 bits

14


• Example: convert 228 to floating point

228 = 1110 0100 = 1.1100100 × 27

sign = positiveexponent = 7mantissa = 1.1100100base = 2 (implicit)

15


228 = 1110 0100 = 1.1100100 × 27

sign = positive (0)exponent = 7mantissa = 1.1100100base = 2 (implicit)

0 0000 0111 11100100000000000000000

16


• In binary floating point, MSB of mantissa is always 1• No need to store MSB of mantissa (1 is implied)

• Called the “implicit leading 1”

0 0000 0111 11100100000000000000000

0 0000 0111 11001000000000000000000

17


• Exponent must represent both positive and negative numbers• Floating point uses biased exponent

• Original exponent plus a constant bias• 32-bit floating point uses bias 127

• E.g., exponent -4 (2-4) would be -4 + 127 = 123 = 0111 1011• E.g., exponent 7 (27) would be 7 + 127 = 134 = 1000 0110

0 0000 0111 11001000000000000000000

0 1000 0110 11001000000000000000000

18


• E.g., 228 in floating point binary (IEEE 754 standard)

0 1000 0110 11001000000000000000000

sign bit = 0

(positive)

8-bit biased exponentE = number – bias E = 134 – 127 = 7

23-bit mantissa without implicit leading 1

19


• Special cases: 0, ±∞, NaN

value sign bitexponen

tmantiss

a

0 N/A0000000

000…000

+∞ 01111111

100…000

-∞ 11111111

100…000

NaN N/A1111111

1non-zero

20


• Single versus double precision• Single: 32-bit float

• Range: ±1.175494 × 10-38 ---> ±3.402824 × 1038

• Double: 64-bit double• Range: ±2.22507385850720 × 10-308

---> ±1.79769313486232 × 10308

# bits (total)

# sign bits

# exponent bits

# mantissa bits

float 32 1 8 23

double 64 1 11 52

21

Outline


3.141592653589793238462643383…

22

Superscalar Processors

• Multiple hardwired copies of datapath• Allows multiple instructions to execute simultaneously • E.g., 2-way superscalar processor

• Fetches / executes 2 instructions per cycle• 2 ALUs• 2-port memory unit• 6-port register file (4 source, 2 write back)

23


• Datapath for 2-way superscalar processor

2 ALUs

2-port memory

unit

6-port register file

24


• Pipeline for 2-way superscalar processor• 2 instructions per cycle:

25


• Commercial processors can be 3, 4, or even 6-way superscalar• Very difficult to manage dependencies and hazards

Intel Nehalam (6-way superscalar)

26

Outline


3.141592653589793238462643383…

27

Multithreading (Terms)

• Process: program running on a computer• Can have multiple processes running at same time • E.g., music player, web browser, anti-virus, word processor

• Thread: each process has one or more threads that can run simultaneously• E.g., word processor: threads to read input, print, spell check, auto-save

28

Multithreading (Terms)

• Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture • Practical processors rarely achieve ILP greater than 2 or 3

• Thread level parallelism (TLP): degree to which a process can be split into threads

29

Multithreading

• Keeps processor with many execution units busy• Even if ILP is low or program is stalled (waiting for memory)

• For single-core processors, threads give illusion of simultaneous execution• Threads take turns executing (according to OS)• OS decides when a thread’s turn begins / ends

30

Multithreading

• When one thread’s turn ends: -- OS saves architectural state-- OS loads architectural state of another thread-- New thread begins executing

• This is called a context switch• If context switch is fast enough, user perceives threads as

running simultaneously (even on single-core)

context switch

context switch

31

Multithreading

• Multithreading does NOT improve ILP, but DOES improve processor throughput• Threads use resources that are otherwise idle

• Multithreading is relatively inexpensive• Only need to save PC and register file

idle

next task…

vs

32

Outline


3.141592653589793238462643383…

33

Homogeneous Multiprocessing

• AKA symmetric multiprocessing (SMP)• 2 or more identical processors with single shared memory• Easier to design (than heterogeneous)

• Multiple cores on same (or different) chip(s)• In 2005, architectures made shift to SMP

34

Homogeneous Multiprocessing

• Multiple cores can execute threads concurrently• True simultaneous execution• Multi-threaded programming can be tricky..

core #1

core #2

core #3

core #4

single-core

multi-core

threads w/ single-core vs. multi-core

35

Outline


3.141592653589793238462643383…

36

Heterogeneous Multiprocessing

• AKA asymmetric multiprocessing (AMP)

• 2 (or more) different processors• Specialized processors used for specific tasks• E.g., graphics, floating point, FPGAs

• Adds complexity

Nvidia GPU

37

Heterogeneous Multiprocessing

• Clustered: • Each processor has its

own memory• E.g., PCs connected on a

network

• Memory not shared, must pass information between nodes…• Can be costly

floating point numbers & parallel computing. outline fixed-point numbers floating point numbers...

Documents

binary point

floating point numbersjust

floating point numbersexponent

floating point binary

negative numbersfloating

rational numbers

large numbers

decimal numbers