lies, damn lies, and benchmarks

Download Lies, Damn Lies, and Benchmarks

Post on 12-Jul-2015

1.008 views

Category:

Technology

1 download

Embed Size (px)

TRANSCRIPT

  • LiesDamn Lies& Benchmarks

    StevenLembarkWorkhorseComputinglembark@wrkhors.com

  • Perl is too slow

    Heard that before? Yeah...Mostly wrong can't refute it without data.Need to benchmark the times.

  • Damn lies...

    Good benchmarks find realistic times.Most benchmarks prove a point.They get ignored.Ignored results are not lazy.

  • Benchmarking perl

    The *NIX time command.Good enough to answer most questions.Avoids much Benchmarking Stuff (BS).

  • Simplest tool: time

    real, system, and user times.real time heavily affected by system load.system + user better indication of work.real work = blocked.

  • bash takes less time to start up

    perl isn't any slower: Zero work for both. Real is all blocked.

    $ time perl -e 0

    real 0m0.005suser 0m0.000ssys 0m0.000s

    $ time bash /dev/null

    real 0m0.005suser 0m0.000ssys 0m0.000s

  • BS: Startup Times

    If something just ran it is probably in core.Saves overhead running it the second time.Run everything twice to benchmark startups.Multiple runs or single-user manage background noise.

  • Minimizing startup issues

    Save kernel calls, context switches, interrupts, latency, transfer I/O...tmpfs on linux minimizes overhead.Test with un-loaded system.Avoid virtual systems (CPU, EMC) unless that is what you are testing.

  • What does startup time tell us?

    Opterons are fast?Useless by itself.Necessary baseline.Differences are a warning.

  • Analyzing startup times.

    Big differences usually indicate a problem:Mis-compiled: -O0 -g on production code.Mixing 32- and 64-bit code and O/S.Background noise from other running jobs.Botched startups leave everything else suspect.

  • Do something!OK, let's time an operation.Listing a directory is common enough. ls lists the contents, sorts lexically.Perl's glob is similar.

  • Trivial persuit: ls vs glob.

    lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*'

    real 0m0.007suser 0m0.000ssys 0m0.000s

    lembark@dizzy etl $ time perl -e '$\="\n"; $,=" "; print glob "/tmp/*"'

    real 0m0.019suser 0m0.010ssys 0m0.000s

    Mostly blocked: 7ms bash vs. 9ms perl.Failing to clear the screen can skew results!Remote display, virtual machines.

  • BS: Milliseconds matter

    Really care about 12ms? OK, perl is slower. Most of the difference is in blocked time.Hint: perl and shell block at the same rate.perl compiles a statement, which adds overhead.Use ls for what it is.

  • Doing more

    Search files using their basenames:Find all of the basenames from 2012.05.05 through 2012.05.16.First step: How many files are there?

  • TimesCompare File::Find with /bin/find.Roughly same system time, added user for compile.Shell is faster because it is single-purpose.

    $ time find . -type f | wc -l;18583

    real 0m0.080suser 0m0.020ssys 0m0.050s

    $ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "\n"'18583

    real 0m0.274suser 0m0.220ssys 0m0.050s

  • Multi-layer pipesCompare the basename to a regex.Shell: find . -type f | xargs -l1 basename |

    egrep -E '2012.05.(?:0[5-9]|1[0-6])'

    Find files, extract basenames, and search with extended syntax (largely borrowed from Perl).One-liner with perl, File::Find & File::Basename.

  • BS: Forks & pipes are free.Real, user, and system time are higher for bash. xargs has to fork/exec many copies of basename. system overhead from buffering pipes is also higher.Plumbing is expensive!

    $ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l1604

    real 0m29.823suser 0m0.710ssys 0m4.220s

    $ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return;/2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "\n"'1604

    real 0m0.301suser 0m0.170ssys 0m0.130s

  • Replacing content in place

    perl's -i replaces files in place.Shell pre-opens files, can't sort -d < a > a.Shell requires sort -d < a > b && mv b a.Now imagine filtering a few thousand files...

  • perl -n & -p with -i

    Say you have to update the package names for a few hundred modules from ::Source to ::RDS.Mixing shell with perl:find . -type f | xargs perl -i -p -e's/::Source\b/::RDS/g';

    Exercise: Try writing this in pure shell.

  • Running it doesn't take long eitherNice division of labor: find & xargs deal with the names. perl deals with the regex. not much typing either way. not much time either.

    $ time find . -type f | xargs perl -i -p -e 's/::Source\b/::RDS/g'

    real 0m0.112suser 0m0.044ssys 0m0.016s

  • What this means to you.

    Plumbing and forks are not free.Single-purpose programs faster for one thing.Chaining the simpler tools adds overhead.Languages faster for multi-stage tasks.

Recommended

View more >