c-programming-optimization techniques class 1

45
Profiling Tools 1 Optimization Techniques Session -1

Upload: jackharish

Post on 12-Jun-2015

193 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C-Programming-Optimization Techniques Class 1

Profiling Tools 1

Optimization Techniques

Session -1

Page 2: C-Programming-Optimization Techniques Class 1

2

Content

Program optimization – introduction Optimization techniques for embedded systems

development C for Embedded systems

Page 3: C-Programming-Optimization Techniques Class 1

3

Session Objectives

To learn the importance of optimization of the program To know the different optimization techniques for

embedded systems design To understand why to use c for embedded system

development

Page 4: C-Programming-Optimization Techniques Class 1

Profiling Tools 4

Introduction

Page 5: C-Programming-Optimization Techniques Class 1

5

The Problem

PC speed increased 500 times since 1981, but today’s software is more complex and still hungry for more resources

How to run faster on same hardware and OS architecture? Highly optimized applications run tens times faster

than poorly written ones Using efficient algorithms and well-designed

implementations leads to high performance applications

Page 6: C-Programming-Optimization Techniques Class 1

6

Writing Fast Programs

Use a fast algorithm It does not make sense to optimize a bad algorithm

Implement it efficiently Detect hotspots using profiler and fix them

Understanding of target system architecture is often required – such as cache structure

Use platform-specific compiler extensions – memory pre-fetching cache control-instruction branch prediction SIMD instructions

Write multithreaded applications

Page 7: C-Programming-Optimization Techniques Class 1

7

Writing Fast Programs

Use good coding practices Use good data structures Apply appropriate optimization techniques Optimizing code takes time and reduces source code

readability

Page 8: C-Programming-Optimization Techniques Class 1

8

Optimizing Embedded Software

Embedded software often runs on processors with limited computation power, thus optimizing the code becomes a necessity

Program can be made either faster or smaller, but not both An improvement in one of these areas can have a

negative impact on the other It is up to the programmer to decide which of these

improvements is most important to her/him Recommendation: reduce the size of your program

Page 9: C-Programming-Optimization Techniques Class 1

9

Optimizing For Program Size

Goal: Reduce hardware cost of memory Reduce power consumption of memory units

Two opportunities: Data

Reuse constants, variables, data buffers in different parts of code

Requires careful verification of correctnessGenerate data using instructions

InstructionsAvoid function inliningChoose CPU with compact instructionsUse specialized instructions where possible

Page 10: C-Programming-Optimization Techniques Class 1

10

Cost Of High Performance

Page 11: C-Programming-Optimization Techniques Class 1

11

Performance: Where To Look

“Maximize performance - who knows where to optimize and where not to optimize”

Spend your time optimizing the portions of code where the most time is taken Run a compiled program to learn where that

program spends its time May profile other computational resource usage -

Space, Power, I/O Not easy to estimate this resource usage by static

analysis (requires dynamic)

Page 12: C-Programming-Optimization Techniques Class 1

12

Performance: Where To Look

Problem: You're given a program's source code (which someone else wrote) and asked to improve its performance by at least 20%

Where do you begin? Look at source code and try to find

inefficient C code Try rewriting some of it in assembly Rewrite using a different algorithm (Remove random portions of the code)

Page 13: C-Programming-Optimization Techniques Class 1

13

Performance: Where To Look

How to figure out where a program is spending its time? Count every static instruction - to know which routines

(functions) were the biggest Big deal, large functions that aren't executed often

don't really matter Count every dynamic instruction – to know which

routines executed the most instructions Excellent! It tells the “relative importance” of each

function But doesn't account for memory system

Count how many cycles were spent in each routine - to know which routines took the most amount of time

Page 14: C-Programming-Optimization Techniques Class 1

14

The Software Optimization Process

Find hotspots

Modify application

Retest using benchmark Investigate causes

Create benchmark

Hotspots are areas in your code that take a long time to execute

Page 15: C-Programming-Optimization Techniques Class 1

15

Extreme Optimization Pitfalls

Large application’s performance cannot be improved before it runs

Build the application then see what machine it runs on Runs great on my computer… Debug versus release builds Performance requires assembly language

programming Code features first then optimize if there is time

leftover

Page 16: C-Programming-Optimization Techniques Class 1

16

Key Point:

Software optimization doesn’tbegin where coding ends –

It is ongoing process that starts at design stage and

continues all the way through development

Page 17: C-Programming-Optimization Techniques Class 1

17

90/10 Rule

90% of execution time is spent in 10% of code

So the ‘hot’ 10% is the code that must be optimized

Optimization takes time, but gives efficient code – so

only use for 10%

Simple interpretation is quick, but gives slow code –

use for 90%

Tradeoff – need to get balance right!

Page 18: C-Programming-Optimization Techniques Class 1

18

How To Find Performance Bottlenecks

Determine how the system resources are being utilized to identify system-level bottlenecks

Measure the execution time for each module and function in the application

Determine how the various modules running on the system affect the performance of each other

Identify the most time-consuming function calls and call sequences within the application

Determine how the application is executing at the processor level to identify microarchitecture-level performance problems

Page 19: C-Programming-Optimization Techniques Class 1

19

Improving Program Performance

Compiler writers try to apply several standard optimizations - Do not always succeed

Compiler writers sometimes apply aggressive optimizations Often not “informed” enough to know that change

will help rather than hurt Optimizations based on specific

architecture/implementation characteristics can be very helpful Much harder for compiler writers because it

requires multiple, generally very different, “back end” implementations

Page 20: C-Programming-Optimization Techniques Class 1

20

Improving Program Performance

How can one help? Better code, algorithms and data structures (of

course) Re organize code to help compiler find opportunities

for improvement Replace poorly optimized code with assembly code

(i.e., bypass compiler)

Page 21: C-Programming-Optimization Techniques Class 1

21

Writing Efficient C code

To write efficient C code, you must be aware of areas The C compiler has to be conservative The limits of the processor architecture the C

compiler is mapping to The limits of a specific C compiler - dependent on

the compiler vendor look at the compiler’s documentation or

experiment with the compiler

Page 22: C-Programming-Optimization Techniques Class 1

22

Performance Tools Overview

Timing mechanisms Stopwatch : UNIX time tool

Optimizing compiler (easy way) System load monitors

vmstat , iostat , perfmon.exe, Vtune Counter Software profiler

Gprof, VTune, Visual C++ Profiler, IBM Quantify Memory debugger/profiler

Valgrind , IBM Purify, Parasoft Insure++

Page 23: C-Programming-Optimization Techniques Class 1

23

Optimization Techniques

• Bad memory management has serious impacts• Poor data locality causes high power dissipation• Poor memory throughput leads to poor

performance• Optimization techniques

• Platform independent• Loop transformation • Data reuse• Processor partitioning

Page 24: C-Programming-Optimization Techniques Class 1

24

Optimization Techniques

Architecture specificMemory modeling optimization

Register allocation – graph coloring Custom memory architecture

Memory address generationGeneral compilers – generated addresses are

periodicEmbedded systems – address sequence might

not be periodic

Page 25: C-Programming-Optimization Techniques Class 1

25

Optimization Techniques

The "scope" of the optimization: Local optimizations - Performed in a part of one procedure.

Common sub-expression elimination (e.g. those occurring when translating array indices to memory addresses.

Using registers for temporary results, and if possible for variables.

Replacing multiplication and division by shift and add operations. Global optimizations - Performed with the help of data flow

analysis and split-lifetime analysis. Code motion (hoisting) outside of loops Value propagation Strength reductions

Inter-procedural optimizations

Page 26: C-Programming-Optimization Techniques Class 1

26

Optimization Techniques

What is improved in the optimization: Space optimizations - Reduces the size of the

executable/object. Constant pooling Dead-code elimination.

Speed optimizations - Most optimizations belong to this category

Page 27: C-Programming-Optimization Techniques Class 1

27

Optimization Techniques

There are important optimizations not covered above, e.g. the various loop transformations: Loop unrolling - Full or partial transformation of a

loop into straight code Loop blocking (tiling) - Minimizes cache misses by

replacing each array processing loop into two loops, dividing the "iteration space" into smaller "blocks"

Loop interchange - Change the nesting order of loops, may make it possible to perform other transformations

Loop distribution - Replace a loop by two (or more) equivalent loops

Loop fusion - Make one loop out of two (or more)

Page 28: C-Programming-Optimization Techniques Class 1

Profiling Tools 28

C Language In Embedded Systems

Page 29: C-Programming-Optimization Techniques Class 1

29

C Language In Embedded Systems

A number of causes to the increased popularity of C in embedded system area:

The ever-increasing complexity of applications drives programmers from assembly to the high-level languages

The high-level programming language C offers good support for high-speed, low-level I/O operations Programmers of embedded applications particularly

appreciate this mixed high/low-level approach In comparison to other high-level language compilers,

C language compilers tend to deliver more condensed code size

Page 30: C-Programming-Optimization Techniques Class 1

30

C Language In Embedded Systems

Virtually all mathematical modeling tools generate C source code

C offers significant productivity gains with opportunities for Code re-use Improved code maintenance Ongoing developments over the life of the application

C can be written in a structured manner that reduces the chance of producing errors C can also be written in a very condensed manner,

which is hard to comprehend and dramatically increases the likelihood of introducing errors

Page 31: C-Programming-Optimization Techniques Class 1

31

C Language In Embedded Systems

The compiler does not necessarily detect small typing errors The operators &&, &, ||, |, +=, =, and ==, and think

of the ease with which a typo will still lead to perfectly valid C code

Not every programmer is fully aware of the effects of all the possible constructs in the C language Casts (implicit or explicit) can cause both confusion

and errors

Page 32: C-Programming-Optimization Techniques Class 1

32

C Language In Embedded Systems

One of the main reasons that C compilers do a great job of generating compact, efficient code is because of the limited run-time checking in C There are no provisions in C that would prevent

arithmetic exceptions such as divide by zero, overflow, validity of addresses or pointers, or surpassing array boundaries from causing a runtime software failure

It is therefore easy to understand that programmers with a special interest in writing robust, consistent code have a concern with the programming language C

Page 33: C-Programming-Optimization Techniques Class 1

33

C Language In Embedded Systems

Many of the companies developing safety-related

embedded applications have written guidelines to

restrict the use of error-prone C constructs with the

intention of reducing the probability of errors

The goal of these standards is to increase portability,

reduce maintenance, and above all improve clarity

Mixed coding style is harder to maintain than bad

coding style

Page 34: C-Programming-Optimization Techniques Class 1

34

C Language In Embedded Systems

These standards recognize that individual

programmers have the right to make judgments about

how best to achieve the goal of code clarity

All code should be ANSI standard and should compile

without warning under at least its principal compiler

Any warnings that cannot be eliminated should be

commented in the code

Page 35: C-Programming-Optimization Techniques Class 1

Profiling Tools 35

Optimizing C Code

Page 36: C-Programming-Optimization Techniques Class 1

36

Help From The Compiler

Always use compiler optimization settings to build an application for use with performance tools

Understanding and using all the features of an optimizing compiler is required for maximum performance with the least effort

Use a compiler that supports your CPU Avoid compiler optimization when debugging Compiler optimization may:

Cause certain variables to vanish Prevent stepping through each line of the code Make it impossible to place breakpoints freely

Identify your machine to the compiler gcc -march=athlon

Page 37: C-Programming-Optimization Techniques Class 1

37

Help From The Compiler

Ask the compiler to unroll loops gcc -funroll-loops gcc -funroll-all-loops

Ask the compiler to generate procedures inline gcc -finline-functions

Ask the compiler to generate conditional expressions in place of branches gcc -O

Use hand tuned library calls for your platform There is very little gain in optimizing the string copy

function... Someone already did this for you

Page 38: C-Programming-Optimization Techniques Class 1

38

Gcc Optimization Levels O0

don’t optimize reduce cost of compilation make debugging possible

O1 basic optimizations for execution time and space reduction only functions declared as inline are expanded inline only variables declared as register are placed in registers

O2 most optimization flags are turned on compiler optimizes variable reister usage does not do any space-speed trade-offs (ie no inlines)

O3 turns on all available optimization flags compiler will attempt inlining for all compact functions code generated is much larger than 02 but only slightly faster

Page 39: C-Programming-Optimization Techniques Class 1

39

Optimizing Compiler : Choosing Optimization Flags Combination

Page 40: C-Programming-Optimization Techniques Class 1

40

Optimizing Compiler’s Effect

Page 41: C-Programming-Optimization Techniques Class 1

41

Helping The Compiler

Variables Avoid complicated pointer arithmetic; use array

indexes Use aliases Use const and register where appropriate Use integer arithmetic in place of floating point Use local variables in place of function arguments Use word sized variables if possible Avoid globals; use static variables as a last resort Avoid volatile unless you mean it

Page 42: C-Programming-Optimization Techniques Class 1

42

Helping The Compiler

Functions Declare compact functions as inline Declare local functions as static Avoid function calls in tight and frequent loops Avoid indirect calls Avoid recursion, unless necessary Use __attribute__ ((noreturn)) Use __attribute__ ((const))

Page 43: C-Programming-Optimization Techniques Class 1

43

Helping The Compiler

Control flow Simple design will often prevent extra branches Fewer branches leads to more effective branch

prediction Faster for loop If..else… Switch Loop breaking

Page 44: C-Programming-Optimization Techniques Class 1

44

Helping The Compiler

Files Keep closely related functions together Little optimization is done (by ld) at the linking stage

Libraries Use functions best suited for the task memcpy can be faster than strcpy if you know the

length puts is faster than printf

Page 45: C-Programming-Optimization Techniques Class 1

45

Summary

Software optimization doesn’t begin where coding ends – It is ongoing process that starts at design stage and continues all the way through development

• Optimization techniques• Platform independent• Loop transformation • Data reuse• Processor partitioning