farhan mohamed ali (w2-1) jigar vora (w2-2) sonali kapoor (w2-3) avni jhunjhunwala (w2-4)

40
1 Farhan Mohamed Ali (W2- 1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation W2 Project Objective: Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics. Design Manager: Zack Menegakis

Upload: sona

Post on 09-Jan-2016

97 views

Category:

Documents


5 download

DESCRIPTION

Presentation 12 MAD MAC 525. Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4). W2. Design Manager: Zack Menegakis. 26 th April, 2006 Short Final Presentation. Project Objective: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

1

Farhan Mohamed Ali (W2-1)Jigar Vora (W2-2)Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

Presentation 12

MAD MAC 525

26th April, 2006Short Final Presentation

W2

Project Objective:Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics.

Design Manager: Zack Menegakis

Page 2: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

2

Agenda

• Marketing (Jigar)• Project Description (Farhan)• Algorithmic Description (Farhan)• Design Process (Sonali)• Floorplan Evolution (Sonali)• Layout (Avni)• Design Specifications (Avni)• Conclusion (Jigar)

Page 3: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

3

MARKETING

• Application of product: HDR rendering in gaming graphics

• Why HDR? Used in games like Far Cry

• Optimization for speed( chose this because of market)

• Competition- if enter market, possible barriers to entry

Page 4: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

4

MAD MAC and HDR

• What is HDR?

• Show animation explaining concept

Page 5: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

5

MAD MAC and HDR• MAD MAC accelerates FP16 blending to enable true HDR graphics

• What is HDR?

• HDR = High Dynamic Range

• Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value

• Dynamic range of luminance in real-world scenes can be 100,000 : 1

• With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics

•Nature isn’t clamped to [0..1] and neither should CG

• In lay terms:

• Bright things can be really bright

• Dark things can be really dark

• And the details can be seen in both

Page 6: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

6

Page 7: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

7

• Multiply Accumulate unit (MAC)

• Executes function AB+C on 16 bit floating point inputs. Inputs will be OpenEXR format.

• Multiply and add in parallel to greatly speed up operation

• Rounding is only performed only once so greater accuracy than individual multiply and add functions.

• Also known as:

• Fused Multiply Add (FMA)

• Multiply Add (MAD/MADD) in graphics shader programs

• Many applications benefit from a fast FMA

• Graphics – HDR rendering, blending and shader ops

• DSPs – computing vector dot-products in digital filters

• Fast division, square root – eliminates extra hardware

• Available in many newer CPUs and DSPs because it’s so cool

• One ring (circuit) to rule them all!

PROJECT DESCRIPTION

Page 8: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

8

ALGORITHMIC DESCRIPTION

• Step through entire process

• Multiply and align occurs concurrently- always align C to A*B

• Outputs go to adder, normalize, round, overflow checker and output register

Page 9: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

9

RegArray A RegArray B RegArray C

Multiplier Exp Calc Align

Adder/SubtractorControlLogic

&Sign

Dtrmin

Normalize

Round

Ovf Checker

Leading 0 Anticipator

10 10 10

5

55

1435225

4

36

14

101

5

5

Input Input Input

Output

16 16 16

16RegY

15

1

1

1

Block Diagram

Page 10: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

10

IMPLEMENTATION

• Implementation of each module- how and why we chose a particular method keeping in mind goal of speed( multiplier, adder)

Page 11: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

11

Design Decisions (contd.):• Multiplier Implementation

– 11 x 11 Carry-Save Multiplier– Reasons:

• Fast because it avoids having ripple carry in every stage

• Enables Compact Layout

Page 12: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

12

Design Process

• Verilog-> Schematic-> Layout– Behavioral -> Structural Verilog– Transistors/gates -> Full Schematic– Gate/Component Layout -> Top Level

• Transistor Count fluctuated from 20,200 to 12,800• Major design decisions

– Decided against implementing denormal arithmetic because it would increase the complexity of the project beyond the scope of the class

– Round performed only once at the end.– Picked nPass over Tgate in the normalize shifter– Adder: variable length carry select-> Han-Carlson binary tree

adder

Page 13: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

13

VERIFICATION OF DESIGN

Verilog Simulations ( show outputs)– Overview– How/Why it works– Behavioral/Structural

Explain why we couldn’t get a high-level simulator and how we tested our verilog design.

Page 14: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

14

SCHEMATICS

• Show schematics of major blocks: adder, multiplier, and top-level

• HOW WE VERIFIED: analog simulation

Page 15: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

15

Top Level Schematic

Page 16: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

16

Multiplier Schematic

Page 17: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

17

Adder Schematic

Page 18: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

18

FLOORPLAN EVOLUTION

• Initial floorplan

• How it evolved (with animation)- why and how we changed it

Page 19: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

19

Multiplier

Align C

Reg A

Reg

BExpCalc

Reg C

Pipeline Reg Pipeline Reg

AdderLd

Zero

Pipeline Reg

NormalizeRound

Reg Y

Main Floorplan

Page 20: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

20

Floorplan

Page 21: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

21

Full Chip LayoutExponent

AlignZero

Adder

MultiplierNormalize

Round

Ovf

Page 22: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

22

Pipelining

• Initially planned 5-6 pipeline stages

• Reduced to 4 pipeline stages – made possible by implementing fast carry lookahead adders in critical path modules (adder and multiplier)

Page 23: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

23

Pipeline Reg

Pipelining Stages

MultiplierAlign

C

Reg A

Reg

BExpCalc

Reg C

Pipeline Reg Pipeline Reg

AdderLd

Zero

Pipeline Reg

NormalizeRound

Reg Y

Pipeline Reg

Overflow checker

Page 24: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

24

LAYOUT

• Final Layout

• Layout of large blocks such as multiplier, adder and normalize

Page 25: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

25

Layout Decisions

• 3 standard cell heights

• Uniform width vdd and ground rails

• Wider vdd and ground rails in power hungry modules

• Max of 8 flip flops per clock pulse generator

• Metal directionality

Page 26: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

26

Multiplier Layout with pipelining

Page 27: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

27

Adder Layout

Page 28: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

28

Normalize Layout

Page 29: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

29

FINAL LAYOUT

Page 30: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

30

Design Specifications

• Worst case delay = 2.25ns

• Long buses are all buffered (not tested yet)

• Estimated clocking speed = 400MHz

• Height by width = 193.86 um * 301.545 um

• Area = 58,458 um^2

• Aspect ratio = 1:1.55

• Total Transistor density = 0.22

Page 31: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

31

Layout densities

• Active : 14.05%

• Poly : 9.25%

• Metal 1 : 33.89%

• Metal 2 : 18.00%

• Metal 3 : 14.99%

• Metal 4 : 6.29%

Page 32: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

32

Layer Masks - Poly

Page 33: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

33

Layer Masks – Metal 1

Page 34: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

34

Layer Masks – Metal 2

Page 35: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

35

Layer Masks – Metal 3

Page 36: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

36

Layer Masks – Metal 4

Page 37: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

37

Schematic Power: mW (350Mhz)

Layout Power: mW

Schematic Delay

Layout Delay

Multiplier

-w/ pipeline

2.97

??

N/A

??

3.38n

1.9n

N/A

2.25n

Exponents 1.608 2.21 1.01n 1.2n

Align 0.094 0.113 480p 637p

Adder 8.48 9.73 1.34n 1.7n

Leading 0 0.232 0.857 506p 551p

Normalize 1.458 1.546 407p 437p

Round 0.631 1.21 864p 986p

OvfCheck 0.13 0.19 453p 475p

Registers ?? ?? 179p 193p

Total ?? ?? - -

Page 38: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

38

Area:

um2

Transistor Count

Transistor

Density

Multiplier

-w/ pipeline

20388 4496 0.22

Exponents 5,163 738 0.14

Align 3,995 500 0.13

Adder 13,202 3174 0.24

Leading 0 1,253 364 0.29

Normalize 3,190 942 0.3

Round 1,802 494 0.28

OvfCheck 200 70 0.35

Registers, etc

N/A 1948 N/A

Total 58,458 12,730 0.22

Page 39: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

39

Conclusion

• More marketing

• Summarize chip functionality

• Extending applications of chip

Page 40: Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)

40

Comments?