premature optimisation workshop

93
ACCU 2015 PROJECT DATE CONFERENCE 23 APRIL PREMATURE OPTIMISATION WORKSHOP ARJAN VAN LEEUWEN

Upload: arjan-van-leeuwen

Post on 15-Jul-2015

880 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Premature optimisation workshop

ACCU 2015

PROJECT

DATE CONFERENCE23 APRIL

PREMATURE OPTIMISATION WORKSHOPARJAN VAN LEEUWEN

Page 2: Premature optimisation workshop

WWW.OPERA.COMJOIN THE COOL KIDS ON THE INFORMATION SUPERHIGHWAY

Page 3: Premature optimisation workshop

ACCU 2015

PROJECT

DATE CONFERENCE23 APRIL

PREMATURE OPTIMISATION WORKSHOPARJAN VAN LEEUWEN

Page 4: Premature optimisation workshop

A short conversation

Page 5: Premature optimisation workshop

OPTIMISING IS FUNAND KNOWING HOW TO DO IT CAN BE USEFUL

Page 6: Premature optimisation workshop

PREMATURE OPTIMISATION IS THE ROOT OF ALL EVILDONALD KNUTH, “STRUCTURED PROGRAMMING WITH GOTO STATEMENTS”

Page 7: Premature optimisation workshop

PROGRAMMERS WASTE ENORMOUS AMOUNTS OF TIME THINKING ABOUT, OR WORRYING ABOUT, THE SPEED OF NONCRITICAL PARTS OF THEIR PROGRAMS, AND THESE ATTEMPTS AT EFFICIENCY ACTUALLY HAVE A STRONG NEGATIVE IMPACT WHEN DEBUGGING AND MAINTENANCE ARE CONSIDERED.

WE SHOULD FORGET ABOUT SMALL EFFICIENCIES, SAY ABOUT 97% OF THE TIME: PREMATURE OPTIMISATION IS THE ROOT OF ALL EVIL.

YET WE SHOULD NOT PASS UP OUR OPPORTUNITIES IN THAT CRITICAL 3%.

Page 8: Premature optimisation workshop

IN ESTABLISHED ENGINEERING DISCIPLINES A 12% IMPROVEMENT, EASILY OBTAINED, IS NEVER CONSIDERED MARGINAL AND I BELIEVE THE SAME VIEWPOINT SHOULD PREVAIL IN SOFTWARE ENGINEERING.

Page 9: Premature optimisation workshop

SMALL THINGS CAN MAKE A DIFFERENCEAND ARE WORTH STUDYING

Page 10: Premature optimisation workshop

Goals

Find small changes that can make a difference

Don’t sacrifice elegance for speed

Give ideas on how to optimise

Page 11: Premature optimisation workshop

In the toolbox

Common sense (doing nothing is always faster)

Disassembler

Time measurement

Profiling tools

Page 12: Premature optimisation workshop

MICRO-OPTIMISATIONS IN C++

Page 13: Premature optimisation workshop

C++

Close to the metal

Object model well-defined [Lippman96]

Efficiency has been a major design goal for C++ from the beginning

“You don’t pay for what you don’t use”

Benefits from years of C optimisation experience

Page 14: Premature optimisation workshop

Branches

Basis of much we do in imperative languages

Compare and branch

Page 15: Premature optimisation workshop

if-else-ifvoid GetAndProcessResult() { if (GetResult() == DOWNLOADED) return ProcessDownloadedFile(); else if (GetResult() == NEEDS_DOWNLOAD) return DownloadFile(); else if (GetResult() == NOT_AVAILABLE) return ReportNotAvailable(); else if (GetResult() == ERROR) return ReportError(); }

Page 16: Premature optimisation workshop

if-else-ifvoid GetAndProcessResult() { const int result = GetResult(); if (result == DOWNLOADED) return ProcessDownloadedFile(); else if (result == NEEDS_DOWNLOAD) return DownloadFile(); else if (result == NOT_AVAILABLE) return ReportNotAvailable(); else if (result == ERROR) return ReportError(); }

Page 17: Premature optimisation workshop

if-else-if switch!void GetAndProcessResult() { switch (GetResult()) { case DOWNLOADED: return ProcessDownloadedFile(); case NEEDS_DOWNLOAD: return DownloadFile(); case NOT_AVAILABLE: return ReportNotAvailable(); case ERROR: return ReportError(); } }

Page 18: Premature optimisation workshop

The joys of switch

Clarifies intention

Clearer warnings / error messages

Always allows compiler to create jump table or do binary search

O(1) lookups

Page 19: Premature optimisation workshop

Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case DOWNLOADED: return ProcessDownloadedFile(); case NEEDS_DOWNLOAD: return DownloadFile(); case NOT_AVAILABLE: return ReportNotAvailable(); case ERROR: return ReportError(); } }

Page 20: Premature optimisation workshop

Jump tablevoid GetAndProcessResult() { switch (GetResult()) {

} }

case 0:return ProcessDownloadedFile();

case 1:return DownloadFile();

case 2:return ReportNotAvailable();

case 3:return ReportError();

Page 21: Premature optimisation workshop

Jump table

case 0: return ProcessDownloadedFile();

case 1: return DownloadFile();

case 2: return ReportNotAvailable();

case 3: return ReportError();

Page 22: Premature optimisation workshop

Jump tablevoid GetAndProcessResult() { switch (GetResult()) {

} }

case 0:return ProcessDownloadedFile();

case 1:return DownloadFile();

case 2:return ReportNotAvailable();

case 3:return ReportError();

Page 23: Premature optimisation workshop

Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case 102: return ProcessDownloadedFile(); case 103: return DownloadFile(); case 104: return ReportNotAvailable(); case 105: return ReportError(); } }

Page 24: Premature optimisation workshop

Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case 102+0: return ProcessDownloadedFile(); case 102+1: return DownloadFile(); case 102+2: return ReportNotAvailable(); case 102+3: return ReportError(); } }

Page 25: Premature optimisation workshop

Jump table?void GetAndProcessResult() { switch (GetResult()) { case 1: return ProcessDownloadedFile(); case 16: return DownloadFile(); case 88: return ReportNotAvailable(); case 65536: return ReportError(); } }

Page 26: Premature optimisation workshop

Jump table Binary searchvoid GetAndProcessResult() { switch (GetResult()) { case 1: return ProcessDownloadedFile(); case 16: return DownloadFile(); case 88: return ReportNotAvailable(); case 65536: return ReportError(); } }

Compilers are smart

Page 27: Premature optimisation workshop

Predicting branches

Predicting branches is hard

Automated mechanisms (profile-guided optimisations) can offer big gains at the cost of having to profile your build

If you’re very certain of your case, some compilers offer instructions such as __builtin_expect (gcc, clang)

Page 28: Premature optimisation workshop

Strings

Most used and mis-used type in programming

Mutable strings are the root of all evil

Page 29: Premature optimisation workshop

Strings misuse

String is not a basic type

A mutable string is a dynamic array of characters

Almost anything you can do with a string is a function of the characters in that string

Think about what will happen with long strings

Page 30: Premature optimisation workshop

Using std::string

Be careful with modifying operations such as append()

Avoid creating a string out of many parts, better to create at once

Look into when alternative string types are useful

Page 31: Premature optimisation workshop

Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::string copied;

for (size_t i = 0; i < length; i += BLOCKSIZE) copied.append(to_copy + i, std::min(BLOCKSIZE, length - i));

return copied; }

Page 32: Premature optimisation workshop

Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::stringstream copied;

for (size_t i = 0; i < length; i += BLOCKSIZE) copied.write(to_copy + i, std::min(BLOCKSIZE, length - i));

return copied.str(); }

Page 33: Premature optimisation workshop

Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::string copied; copied.reserve(length);

for (size_t i = 0; i < length; i += BLOCKSIZE) copied.append(to_copy + i, std::min(BLOCKSIZE, length - i));

return copied; }

Page 34: Premature optimisation workshop

Growing strings

Method Time spent, 3 run average (ms)

std::string::append() 1399

std::stringstream 5102

std::string::append() with std::string::reserve()

851

Page 35: Premature optimisation workshop

Converting numbers to strings and vice versa

Can be a major source of slowness

Often more features than needed

Investigate alternative libraries (boost::spirit)

Writing specialised functions a possibility (but with its own maintainability issues)

Page 36: Premature optimisation workshop

Integer-to-string conversion

std::string Convert(int i) { std::stringstream stream; stream << i; return stream.str(); }

Page 37: Premature optimisation workshop

Integer-to-string conversion

std::string Convert(int i) { return std::to_string(i); }

Page 38: Premature optimisation workshop

Integer-to-string conversionstd::string Convert(int i) { namespace karma = boost::spirit::karma; std::string converted; std::back_insert_iterator<std::string>

sink(converted);

karma::generate(sink, karma::int_, i); return converted; }

Page 39: Premature optimisation workshop

Integer-to-string conversion

Method Time spent, 3 run average (ms)

std::stringstream 2959

std::to_string 1012

boost::spirit::karma 332

Page 40: Premature optimisation workshop

String-to-integer conversion

int Convert(const std::string& str) { return std::stoi(str); }

Page 41: Premature optimisation workshop

String-to-integer conversion

int Convert(const std::string& str) { namespace qi = boost::spirit::qi; int converted;

qi::parse(str.begin(), str.end(), qi::int_, converted); return converted; }

Page 42: Premature optimisation workshop

String-to-integer conversion

Method Time spent, 3 run average (ms)

std::stoi 3920

boost::spirit::qi 1276

Page 43: Premature optimisation workshop

Function calls

Function calls have overhead

Lookup in virtual function table

Setting up stack, restoring stack

Page 44: Premature optimisation workshop

Avoiding virtual functions or virtual function calls

Only declare functions (this includes destructors) virtual when it’s actually needed

Don’t use virtual functions for types that are handled by value

If type is known, no lookup is needed

Sometimes compile-time polymorphism offers an alternative

Page 45: Premature optimisation workshop

Avoiding function calls

For small functions called in tight loops, inlining helps

Allow the compiler to inline functions where it makes sense (have definition available)

If the compiler doesn’t co-operate and you’re sure it makes sense (measure this), force it

Page 46: Premature optimisation workshop

Tail callsA tail call happens when a function is the final call made in another function

Tail calls can be eliminated, so that they end up being a jump construction

Eliminates call overhead

Be aware of this and create tail calls where possible

Also allows efficient recursive functions

Page 47: Premature optimisation workshop

Facilitating tail calls

unsigned djb_hash(const char* string) { int c = *string; if (!c) return 5381;

return djb_hash(string + 1) * 33 + c; }

Page 48: Premature optimisation workshop

Facilitating tail callsunsigned djb_hash( const char* string, unsigned seed) { int c = *string; if (!c) return seed;

return djb_hash( string + 1, seed * 33 + c); }

Page 49: Premature optimisation workshop

Facilitating tail calls

Method Time spent, 3 run average (ms)

Tail call elimination not possible 2274

Tail call elimination possible 1097

Page 50: Premature optimisation workshop

Use lambda functions

C++11 lambdas can always be trivially inlined, unlike function pointers

Offers an elegant and fast way of processing data

Combines well with aggregate functions

Page 51: Premature optimisation workshop

Use lambda functionsvoid twice(int& value) { value *= 2; }

std::vector<int> EverythingTwice( const std::vector<int>& original) { std::vector<int> result(original); std::for_each(result.begin(), result.end(), &twice); return result; }

Page 52: Premature optimisation workshop

Use lambda functions

std::vector<int> EverythingTwice2( const std::vector<int>& original) { std::vector<int> result(original); std::for_each(result.begin(), result.end(), [](int& value){ value *= 2; }); return result; }

Page 53: Premature optimisation workshop

Use lambda functions

Method Time spent, 3 run average (ms)

Function pointer (not inlined) 1684

Lambda function (inlined) 220

Page 54: Premature optimisation workshop

Return-value optimisation

Allows the compiler to avoid copy construction on temporaries

Executed by compilers when function returns one named variable

Be aware of where it could be possible, allow the compiler to help you

But sometimes it’s more helpful to implement…

Page 55: Premature optimisation workshop

Move semantics

User defines for movable types how they can be moved correctly

‘Guaranteed’ way of getting return value optimisation

Helpful in combination with std::vector (to keep data local)

Can come for free using “Rule of zero”

Page 56: Premature optimisation workshop

Move semanticsclass Typical { public: Typical() : content_("this is a typical string") {} Typical(const Typical& other) : content_(other.content_) {}

private: std::string content_; };

Page 57: Premature optimisation workshop

Move semantics

class Typical { public: TypicalMove () : content_("this is a typical string") {}

private: std::string content_; };

Page 58: Premature optimisation workshop

Move semantics

std::vector<Typical> CreateTypical() { std::vector<Typical> new_content; for (int i = 0; i < 1024; ++i) new_content.push_back(Typical());

return new_content; }

Page 59: Premature optimisation workshop

Move semantics

Method Time spent, 3 run average (ms)

With copy constructor 2617

Following “Rule of zero” 1002

Page 60: Premature optimisation workshop

DataMake sure that all data you need in a loop is physically as close together as possible

Allows CPU to use its cache efficiently

Use contiguous memory arrays where possible

Avoid data structures that rely on pointers (eg. linked lists)

Page 61: Premature optimisation workshop

Dataint sum() { std::forward_list<int> data(1024, 5); int result; for (int i = 0; i < 1000000; ++i) { result = std::accumulate( data.begin(), data.end(), 0); } return result; }

Page 62: Premature optimisation workshop

Dataint sum() { std::vector<int> data(1024, 5); int result; for (int i = 0; i < 1000000; ++i) { result = std::accumulate( data.begin(), data.end(), 0); } return result; }

Page 63: Premature optimisation workshop

Data

Method Time spent, 3 run average (ms)

std::forward_list 1115

std::vector 61

Page 64: Premature optimisation workshop

MICRO-OPTIMISATIONS IN PYTHON

Page 65: Premature optimisation workshop

Python

Emphasises readability

Dynamic type system, automatic memory management

Several projects dedicated to improving performance

Always try to avoid calling functions many times

Page 66: Premature optimisation workshop

Prefer literals over “constructors”

def a(): return dict(firstkey=1, secondkey=2)

Page 67: Premature optimisation workshop

Prefer literals over “constructors”

def a(): return dict(firstkey=1, secondkey=2)

def b(): return { 'firstkey': 1, 'secondkey': 2 }

Page 68: Premature optimisation workshop

Prefer literals over “constructors”

Method Time spent, 3 run minimum (ms)

dict() 376

Dictionary literals 135

Page 69: Premature optimisation workshop

Prefer slice notation over “copy constructor”

l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]

def a(): return list(l)

Page 70: Premature optimisation workshop

Prefer slice notation over “copy constructor”

l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]

def a(): return list(l)

def b(): return l[:]

Page 71: Premature optimisation workshop

Prefer slice notation over “copy constructor”

Method Time spent, 3 run minimum (ms)

Copy via list() 2671

Slice notation 1679

Page 72: Premature optimisation workshop

All functions have overhead

Function call overhead in Python is substantial

All functions can be redefined - even built-ins need to be looked up first

Try to avoid function calls (even more so than inC++)

Using literals or other built-in constructs can help avoid function calls

Page 73: Premature optimisation workshop

String formatting

Python has a built-in function str() to convert other types to string

In most cases this offers enough features for conversions of types to strings

Faster than formatting

Page 74: Premature optimisation workshop

String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))

Page 75: Premature optimisation workshop

String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))

def b(): a = 5 b = 2 c = 3 return str(a*(b+c))

Page 76: Premature optimisation workshop

String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))

def c(): a = 5 b = 2 c = 3 return "%s" % (a*(b+c))

Page 77: Premature optimisation workshop

Method Time spent, 3 run minimum (ms)

“%d” 514

str() 260

“%s” 233

String formatting

Page 78: Premature optimisation workshop

Prefer aggregate functionsdef a(): s = 0; for i in range(50000): s += i return s

Page 79: Premature optimisation workshop

Prefer aggregate functionsdef a(): s = 0; for i in range(50000): s += i return s

def b(): return sum(range(50000))

Page 80: Premature optimisation workshop

Method Time spent, 3 run minimum (ms)

Summing manually 1728

Using sum() 587

Prefer aggregate functions

Page 81: Premature optimisation workshop

Prefer aggregate functions

Python has a number of built-in functions for aggregates: all(), min(), max(), sum(), etc

Using them brings big speed advantages

Always preferred over manually iterating

Page 82: Premature optimisation workshop

Use list comprehensions

def a(): l = [] for i in range(1000): l.append(i) return l

Page 83: Premature optimisation workshop

Use list comprehensions

def a(): l = [] for i in range(1000): l.append(i) return l

def b(): return [i for i in range(1000)]

Page 84: Premature optimisation workshop

Method Time spent, 3 run minimum (ms)

Append to list 701

List comprehension 321

Use list comprehensions

Page 85: Premature optimisation workshop

List comprehensions offer a concise way of creating lists

Speed as well as readability advantages

Can be nested as well!

Use list comprehensions

Page 86: Premature optimisation workshop

Don’t use optimisations from other languages

def a(): x = 1; for i in range(1000): x = x + x return x

Page 87: Premature optimisation workshop

Don’t use optimisations from other languages

def b(): x = 1; for i in range(1000): x = x * 2 return x

Page 88: Premature optimisation workshop

Don’t use optimisations from other languages

def c(): x = 1; for i in range(1000): x = x << 1 return x

Page 89: Premature optimisation workshop

Method Time spent, 3 run minimum (ms)

x + x 736

x * 2 1001

x << 1 1342

Don’t use optimisations from other languages

Page 90: Premature optimisation workshop

LET’S TRY ITPREPARE YOUR LAPTOPS!

Page 91: Premature optimisation workshop

PYTHON: WWW.CYBER-DOJO.ORG

E94905

C++: git clone https://github.com/avl7771/premature_optimization.git

Page 92: Premature optimisation workshop

ConclusionsOptimising is fun!

Knowledge about optimisations can help you help your compiler or interpreter

Not all optimisations worsen maintainability

Micro-optimisations can differ between languages, compilers, architectures… Measuring works!

Test your assumptions

Page 93: Premature optimisation workshop

ARJAN VAN LEEUWEN@AVL7771