crosscutting issues: the r ô le of compilers
DESCRIPTION
Architecture. Compiler. Crosscutting Issues: The R ô le of Compilers. Architects must be aware of current compiler technology. Front End. High-level Optimisations. Global Optimiser. Code Generator. Modern Compilers. E.g. procedure inlining, loop transformations. Register allocation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/1.jpg)
![Page 2: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/2.jpg)
Crosscutting Issues: The Rôle of Compilers
• Architects must be aware of current compiler technology
Compiler
Architecture
![Page 3: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/3.jpg)
Modern Compilers
Front End
High-level Optimisations
Global Optimiser
Code Generator
E.g. procedure inlining,loop transformations
Register allocation
Machine dependentoptimisations
![Page 4: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/4.jpg)
Compiler Technology
• Multiple passes complicate matters– E.g. common subexpression elimination must
assume that a register will be allocated for the temporary value
– E.g. Procedure inlining before size is known
• Register allocation is critical– Uses graph colouring techniques– Requires at least 16 registers to be effective
![Page 5: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/5.jpg)
Architectural Issues
• How are variables allocated and addressed?– Stack: local variables, scalars– Global data area: global variables, constants,
arrays– Heap: dynamic objects, not scalars
• How many registers are needed?– Integer: 26 registers– FP: 20 registers
![Page 6: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/6.jpg)
Aiding Compiler Writers
• Architectures should:– Be regular (orthogonal instruction set)– Provide primitives, not solutions– Simplify trade-offs among alternatives– Not require run-time interpretation of data
known at compile-time• VAX CALLS
Keep it simple!
![Page 7: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/7.jpg)
Compiler Support for Multimedia Instructions
• SIMD instructions act on multiple smaller data items in a large “word”– Solutions, not primitives!– Too few registers!– Data types not found in programming
languages!
Result: Only used by low-level graphics libraries.
![Page 8: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/8.jpg)
Multimedia Instructions
• These SIMD instructions act like a “mini-vector” architecture– E.g. MMX in 64 bits
• 8 × 8-bit vectors
• 4 × 16-bit vectors
• 2 × 32-bit vectors
– SSE: 128 bits– Much more limited than genuine vector
processors
![Page 9: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/9.jpg)
Putting It All Together: MIPS• 64-bit load/store design
• RISC features:– GPR, load-store architecture– Small, simple instruction set– Designed for efficient pipelining (fixed length
instructions)– Efficient compiler target
![Page 10: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/10.jpg)
MIPS
• 32 64-bit integer registers– R0…R31– R0 fixed: 0
• 32 64-bit or 32-bit floating point registers– Supports “paired single” operations
![Page 11: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/11.jpg)
MIPS Data Types
• Integer:– Bytes, 16-bit halfwords, 32-bit words, 64-bit
double words• Operations are all 64-bit
• Floating point:– 32-bit and 64-bit
![Page 12: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/12.jpg)
MIPS Addressing Modes
• Only immediate and displacement– 16-bit displacements/immediates– Register-indirect: set displacement = 0– 16-bit absolute: use R0
• Byte addressable with 64-bit addresses
• Big-endian or little-endian
• Alignment required
![Page 13: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/13.jpg)
MIPS Instructions
• Three instruction formats:
opcode rs rt immediate
6 5 5 16
I-type
opcode offset
6 26
J-type
opcode rs rt shamt
6 5 5 5
R-type rd
5
funct
6
![Page 14: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/14.jpg)
MIPS Operations• Load-store• ALU operations
– Add, subtract, multiply, divide, and, or, xor, LUI (load upper immediate), shifts
• Control transfer– Set conditions– Branch (reg=0, reg0, reg1=reg2, reg1reg2),
jump, jump-and-link (call)– Conditional move
• Floating point– Paired single operations– Multiply-add (DSP)
![Page 15: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/15.jpg)
MIPS: Instruction Usage
• Integer applications:– Load, add, branch, store,
or, compare
• FP applications:– Add (int), load (int), load,
multiply, add, store
Figure 2.34.
![Page 16: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/16.jpg)
Another View: Trimedia Media Processor
• Embedded processor for multimedia applications– E.g. set-top boxes (decoders, etc.) and TVs
• Very different architecture– 128 32-bit registers (FP or int)– Partitioned (SIMD) instructions– 2’s complement and saturating arithmetic– VLIW architecture
![Page 17: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/17.jpg)
Trimedia: VLIW Approach
• Compiler can group up to five instructions for simultaneous execution– Must be independent– Use NOPs if there are insufficient independent
instructions• Large program size
• Trimedia uses memory compression
• Programs are 2-3 times larger than MIPS (even with compression)!
![Page 18: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/18.jpg)
Fallacies and Pitfalls
• Pitfall: Designing a “high-level” instruction set to support HLL’s– Seldom provide an exact match– Often too general (VAX CALLS)
![Page 19: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/19.jpg)
Fallacies and Pitfalls
• Fallacy: There is such a thing as a typical program– Programs vary very significantly
• Pitfall: Designing an architecture to reduce code size without considering compilers– Compilers have much greater impact on code
size– Start with densest compiled code
![Page 20: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/20.jpg)
Fallacies and Pitfalls
• Pitfall: Expecting good compiled performance for DSPs– Hand-tuned assembler is faster and more
compact
• Fallacy: An architecture without flaws cannot be successful– 80x86!
• Segments, accumulators, stack-based FP
![Page 21: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/21.jpg)
Fallacies and Pitfalls
• Fallacy: You can design a flawless architecture– All designs have trade-offs
• VAX code size more important than easy decoding
• Early RISCs: delayed branches
• Address space
![Page 22: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/22.jpg)
2.15. Concluding Remarks
• 1960’s: Stack architectures– Matched the compiler technology of the day
• 1970’s: CISC era– Tried to support HLL features in hardware
• Today: RISC era– Simple, load-store architectures
![Page 23: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/23.jpg)
Concluding Remarks
• Trends in the 1990’s:– Move to 64 bits– Conditional instructions
• Eliminating branches
– Optimisation of cache access (prefetch instructions)
– Support for multimedia– Faster floating point
![Page 24: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/24.jpg)
The Future
• Trend towards VLIW architectures
• Increased use of conditional execution
• Blending of general-purpose and DSP architectures
• Emulating 80x86 architecture
![Page 25: Crosscutting Issues: The R ô le of Compilers](https://reader035.vdocuments.site/reader035/viewer/2022062304/568134d5550346895d9c002f/html5/thumbnails/25.jpg)