premature optimisation workshop
TRANSCRIPT
ACCU 2015
PROJECT
DATE CONFERENCE23 APRIL
PREMATURE OPTIMISATION WORKSHOPARJAN VAN LEEUWEN
WWW.OPERA.COMJOIN THE COOL KIDS ON THE INFORMATION SUPERHIGHWAY
ACCU 2015
PROJECT
DATE CONFERENCE23 APRIL
PREMATURE OPTIMISATION WORKSHOPARJAN VAN LEEUWEN
A short conversation
OPTIMISING IS FUNAND KNOWING HOW TO DO IT CAN BE USEFUL
PREMATURE OPTIMISATION IS THE ROOT OF ALL EVILDONALD KNUTH, “STRUCTURED PROGRAMMING WITH GOTO STATEMENTS”
PROGRAMMERS WASTE ENORMOUS AMOUNTS OF TIME THINKING ABOUT, OR WORRYING ABOUT, THE SPEED OF NONCRITICAL PARTS OF THEIR PROGRAMS, AND THESE ATTEMPTS AT EFFICIENCY ACTUALLY HAVE A STRONG NEGATIVE IMPACT WHEN DEBUGGING AND MAINTENANCE ARE CONSIDERED.
WE SHOULD FORGET ABOUT SMALL EFFICIENCIES, SAY ABOUT 97% OF THE TIME: PREMATURE OPTIMISATION IS THE ROOT OF ALL EVIL.
YET WE SHOULD NOT PASS UP OUR OPPORTUNITIES IN THAT CRITICAL 3%.
“
”
IN ESTABLISHED ENGINEERING DISCIPLINES A 12% IMPROVEMENT, EASILY OBTAINED, IS NEVER CONSIDERED MARGINAL AND I BELIEVE THE SAME VIEWPOINT SHOULD PREVAIL IN SOFTWARE ENGINEERING.
“
”
SMALL THINGS CAN MAKE A DIFFERENCEAND ARE WORTH STUDYING
Goals
Find small changes that can make a difference
Don’t sacrifice elegance for speed
Give ideas on how to optimise
In the toolbox
Common sense (doing nothing is always faster)
Disassembler
Time measurement
Profiling tools
MICRO-OPTIMISATIONS IN C++
C++
Close to the metal
Object model well-defined [Lippman96]
Efficiency has been a major design goal for C++ from the beginning
“You don’t pay for what you don’t use”
Benefits from years of C optimisation experience
Branches
Basis of much we do in imperative languages
Compare and branch
if-else-ifvoid GetAndProcessResult() { if (GetResult() == DOWNLOADED) return ProcessDownloadedFile(); else if (GetResult() == NEEDS_DOWNLOAD) return DownloadFile(); else if (GetResult() == NOT_AVAILABLE) return ReportNotAvailable(); else if (GetResult() == ERROR) return ReportError(); }
if-else-ifvoid GetAndProcessResult() { const int result = GetResult(); if (result == DOWNLOADED) return ProcessDownloadedFile(); else if (result == NEEDS_DOWNLOAD) return DownloadFile(); else if (result == NOT_AVAILABLE) return ReportNotAvailable(); else if (result == ERROR) return ReportError(); }
if-else-if switch!void GetAndProcessResult() { switch (GetResult()) { case DOWNLOADED: return ProcessDownloadedFile(); case NEEDS_DOWNLOAD: return DownloadFile(); case NOT_AVAILABLE: return ReportNotAvailable(); case ERROR: return ReportError(); } }
The joys of switch
Clarifies intention
Clearer warnings / error messages
Always allows compiler to create jump table or do binary search
O(1) lookups
Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case DOWNLOADED: return ProcessDownloadedFile(); case NEEDS_DOWNLOAD: return DownloadFile(); case NOT_AVAILABLE: return ReportNotAvailable(); case ERROR: return ReportError(); } }
Jump tablevoid GetAndProcessResult() { switch (GetResult()) {
} }
case 0:return ProcessDownloadedFile();
case 1:return DownloadFile();
case 2:return ReportNotAvailable();
case 3:return ReportError();
Jump table
case 0: return ProcessDownloadedFile();
case 1: return DownloadFile();
case 2: return ReportNotAvailable();
case 3: return ReportError();
Jump tablevoid GetAndProcessResult() { switch (GetResult()) {
} }
case 0:return ProcessDownloadedFile();
case 1:return DownloadFile();
case 2:return ReportNotAvailable();
case 3:return ReportError();
Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case 102: return ProcessDownloadedFile(); case 103: return DownloadFile(); case 104: return ReportNotAvailable(); case 105: return ReportError(); } }
Jump tablevoid GetAndProcessResult() { switch (GetResult()) { case 102+0: return ProcessDownloadedFile(); case 102+1: return DownloadFile(); case 102+2: return ReportNotAvailable(); case 102+3: return ReportError(); } }
Jump table?void GetAndProcessResult() { switch (GetResult()) { case 1: return ProcessDownloadedFile(); case 16: return DownloadFile(); case 88: return ReportNotAvailable(); case 65536: return ReportError(); } }
Jump table Binary searchvoid GetAndProcessResult() { switch (GetResult()) { case 1: return ProcessDownloadedFile(); case 16: return DownloadFile(); case 88: return ReportNotAvailable(); case 65536: return ReportError(); } }
Compilers are smart
Predicting branches
Predicting branches is hard
Automated mechanisms (profile-guided optimisations) can offer big gains at the cost of having to profile your build
If you’re very certain of your case, some compilers offer instructions such as __builtin_expect (gcc, clang)
Strings
Most used and mis-used type in programming
Mutable strings are the root of all evil
Strings misuse
String is not a basic type
A mutable string is a dynamic array of characters
Almost anything you can do with a string is a function of the characters in that string
Think about what will happen with long strings
Using std::string
Be careful with modifying operations such as append()
Avoid creating a string out of many parts, better to create at once
Look into when alternative string types are useful
Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::string copied;
for (size_t i = 0; i < length; i += BLOCKSIZE) copied.append(to_copy + i, std::min(BLOCKSIZE, length - i));
return copied; }
Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::stringstream copied;
for (size_t i = 0; i < length; i += BLOCKSIZE) copied.write(to_copy + i, std::min(BLOCKSIZE, length - i));
return copied.str(); }
Growing stringsstd::string CopyString( const char* to_copy, size_t length) { std::string copied; copied.reserve(length);
for (size_t i = 0; i < length; i += BLOCKSIZE) copied.append(to_copy + i, std::min(BLOCKSIZE, length - i));
return copied; }
Growing strings
Method Time spent, 3 run average (ms)
std::string::append() 1399
std::stringstream 5102
std::string::append() with std::string::reserve()
851
Converting numbers to strings and vice versa
Can be a major source of slowness
Often more features than needed
Investigate alternative libraries (boost::spirit)
Writing specialised functions a possibility (but with its own maintainability issues)
Integer-to-string conversion
std::string Convert(int i) { std::stringstream stream; stream << i; return stream.str(); }
Integer-to-string conversion
std::string Convert(int i) { return std::to_string(i); }
Integer-to-string conversionstd::string Convert(int i) { namespace karma = boost::spirit::karma; std::string converted; std::back_insert_iterator<std::string>
sink(converted);
karma::generate(sink, karma::int_, i); return converted; }
Integer-to-string conversion
Method Time spent, 3 run average (ms)
std::stringstream 2959
std::to_string 1012
boost::spirit::karma 332
String-to-integer conversion
int Convert(const std::string& str) { return std::stoi(str); }
String-to-integer conversion
int Convert(const std::string& str) { namespace qi = boost::spirit::qi; int converted;
qi::parse(str.begin(), str.end(), qi::int_, converted); return converted; }
String-to-integer conversion
Method Time spent, 3 run average (ms)
std::stoi 3920
boost::spirit::qi 1276
Function calls
Function calls have overhead
Lookup in virtual function table
Setting up stack, restoring stack
Avoiding virtual functions or virtual function calls
Only declare functions (this includes destructors) virtual when it’s actually needed
Don’t use virtual functions for types that are handled by value
If type is known, no lookup is needed
Sometimes compile-time polymorphism offers an alternative
Avoiding function calls
For small functions called in tight loops, inlining helps
Allow the compiler to inline functions where it makes sense (have definition available)
If the compiler doesn’t co-operate and you’re sure it makes sense (measure this), force it
Tail callsA tail call happens when a function is the final call made in another function
Tail calls can be eliminated, so that they end up being a jump construction
Eliminates call overhead
Be aware of this and create tail calls where possible
Also allows efficient recursive functions
Facilitating tail calls
unsigned djb_hash(const char* string) { int c = *string; if (!c) return 5381;
return djb_hash(string + 1) * 33 + c; }
Facilitating tail callsunsigned djb_hash( const char* string, unsigned seed) { int c = *string; if (!c) return seed;
return djb_hash( string + 1, seed * 33 + c); }
Facilitating tail calls
Method Time spent, 3 run average (ms)
Tail call elimination not possible 2274
Tail call elimination possible 1097
Use lambda functions
C++11 lambdas can always be trivially inlined, unlike function pointers
Offers an elegant and fast way of processing data
Combines well with aggregate functions
Use lambda functionsvoid twice(int& value) { value *= 2; }
std::vector<int> EverythingTwice( const std::vector<int>& original) { std::vector<int> result(original); std::for_each(result.begin(), result.end(), &twice); return result; }
Use lambda functions
std::vector<int> EverythingTwice2( const std::vector<int>& original) { std::vector<int> result(original); std::for_each(result.begin(), result.end(), [](int& value){ value *= 2; }); return result; }
Use lambda functions
Method Time spent, 3 run average (ms)
Function pointer (not inlined) 1684
Lambda function (inlined) 220
Return-value optimisation
Allows the compiler to avoid copy construction on temporaries
Executed by compilers when function returns one named variable
Be aware of where it could be possible, allow the compiler to help you
But sometimes it’s more helpful to implement…
Move semantics
User defines for movable types how they can be moved correctly
‘Guaranteed’ way of getting return value optimisation
Helpful in combination with std::vector (to keep data local)
Can come for free using “Rule of zero”
Move semanticsclass Typical { public: Typical() : content_("this is a typical string") {} Typical(const Typical& other) : content_(other.content_) {}
private: std::string content_; };
Move semantics
class Typical { public: TypicalMove () : content_("this is a typical string") {}
private: std::string content_; };
Move semantics
std::vector<Typical> CreateTypical() { std::vector<Typical> new_content; for (int i = 0; i < 1024; ++i) new_content.push_back(Typical());
return new_content; }
Move semantics
Method Time spent, 3 run average (ms)
With copy constructor 2617
Following “Rule of zero” 1002
DataMake sure that all data you need in a loop is physically as close together as possible
Allows CPU to use its cache efficiently
Use contiguous memory arrays where possible
Avoid data structures that rely on pointers (eg. linked lists)
Dataint sum() { std::forward_list<int> data(1024, 5); int result; for (int i = 0; i < 1000000; ++i) { result = std::accumulate( data.begin(), data.end(), 0); } return result; }
Dataint sum() { std::vector<int> data(1024, 5); int result; for (int i = 0; i < 1000000; ++i) { result = std::accumulate( data.begin(), data.end(), 0); } return result; }
Data
Method Time spent, 3 run average (ms)
std::forward_list 1115
std::vector 61
MICRO-OPTIMISATIONS IN PYTHON
Python
Emphasises readability
Dynamic type system, automatic memory management
Several projects dedicated to improving performance
Always try to avoid calling functions many times
Prefer literals over “constructors”
def a(): return dict(firstkey=1, secondkey=2)
Prefer literals over “constructors”
def a(): return dict(firstkey=1, secondkey=2)
def b(): return { 'firstkey': 1, 'secondkey': 2 }
Prefer literals over “constructors”
Method Time spent, 3 run minimum (ms)
dict() 376
Dictionary literals 135
Prefer slice notation over “copy constructor”
l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]
def a(): return list(l)
Prefer slice notation over “copy constructor”
l = [ 'a', 'b', 'c', 'd', 'e', 'f' ]
def a(): return list(l)
def b(): return l[:]
Prefer slice notation over “copy constructor”
Method Time spent, 3 run minimum (ms)
Copy via list() 2671
Slice notation 1679
All functions have overhead
Function call overhead in Python is substantial
All functions can be redefined - even built-ins need to be looked up first
Try to avoid function calls (even more so than inC++)
Using literals or other built-in constructs can help avoid function calls
String formatting
Python has a built-in function str() to convert other types to string
In most cases this offers enough features for conversions of types to strings
Faster than formatting
String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))
String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))
def b(): a = 5 b = 2 c = 3 return str(a*(b+c))
String formattingdef a(): a = 5 b = 2 c = 3 return "%d" % (a*(b+c))
def c(): a = 5 b = 2 c = 3 return "%s" % (a*(b+c))
Method Time spent, 3 run minimum (ms)
“%d” 514
str() 260
“%s” 233
String formatting
Prefer aggregate functionsdef a(): s = 0; for i in range(50000): s += i return s
Prefer aggregate functionsdef a(): s = 0; for i in range(50000): s += i return s
def b(): return sum(range(50000))
Method Time spent, 3 run minimum (ms)
Summing manually 1728
Using sum() 587
Prefer aggregate functions
Prefer aggregate functions
Python has a number of built-in functions for aggregates: all(), min(), max(), sum(), etc
Using them brings big speed advantages
Always preferred over manually iterating
Use list comprehensions
def a(): l = [] for i in range(1000): l.append(i) return l
Use list comprehensions
def a(): l = [] for i in range(1000): l.append(i) return l
def b(): return [i for i in range(1000)]
Method Time spent, 3 run minimum (ms)
Append to list 701
List comprehension 321
Use list comprehensions
List comprehensions offer a concise way of creating lists
Speed as well as readability advantages
Can be nested as well!
Use list comprehensions
Don’t use optimisations from other languages
def a(): x = 1; for i in range(1000): x = x + x return x
Don’t use optimisations from other languages
def b(): x = 1; for i in range(1000): x = x * 2 return x
Don’t use optimisations from other languages
def c(): x = 1; for i in range(1000): x = x << 1 return x
Method Time spent, 3 run minimum (ms)
x + x 736
x * 2 1001
x << 1 1342
Don’t use optimisations from other languages
LET’S TRY ITPREPARE YOUR LAPTOPS!
PYTHON: WWW.CYBER-DOJO.ORG
E94905
C++: git clone https://github.com/avl7771/premature_optimization.git
ConclusionsOptimising is fun!
Knowledge about optimisations can help you help your compiler or interpreter
Not all optimisations worsen maintainability
Micro-optimisations can differ between languages, compilers, architectures… Measuring works!
Test your assumptions
ARJAN VAN LEEUWEN@AVL7771