map grep sort
TRANSCRIPT
Copyright 2014 Daina Pettit
map, grep, sort – slide 1
Streamlining and simplifying your Perl code using
Map, Grep, and Sort
Daina Pettit
Copyright 2014 Daina Pettit
map, grep, sort – slide 2
“Perl culture” sometimes gets shortened to “Perl cult”.*
Larry Wall
*Wall, Larry, Perl, the first postmodern computer language, Linux World [Conference], March 3, 1999
Copyright 2014 Daina Pettit
map, grep, sort – slide 3
Overview● What are map, grep, & sort and why should I care?● map details● grep details● sort details● Combining map, grep, & sort● Advanced combinations
● Schwartzian Transform● Orcish Maneuver● Guttman-Rosler Transform● Alternatives
Copyright 2014 Daina Pettit
map, grep, sort – slide 4
What are they?
map, grep, & sort are iterator functions that operate on lists or arrays.
1. map performs action on each element.
2. grep tests each element.
3. sort orders the elements.
Copyright 2014 Daina Pettit
map, grep, sort – slide 5
General Form
All have similar forms.
@array = map { exp } @list;@array = grep { exp } @list;@array = sort { exp } @list;
and
@array = map exp, @list;@array = grep exp, @list;@array = sort @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 6
General Form—code blocks
Damian Conway in Perl Best Practices* recommends:
“Always use a block with a map and grep”
This is a syntactic aid suggestion to help you prevent yourself from making an error with grouping arguments. Block enclosures actually incur more overhead. Not much, but some.
*Conway, Damian, Perl Best Practices, O'Reilly Media, Sebastopol, CA, 2005, pp 169-170.
@array = map { exp } @list;@array = grep { exp } @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 7
What is map?
Map is essentially a loop that processes a list, much like a foreach loop.
foreach $line ( @lines ) {$line = uc $line;
}
Copyright 2014 Daina Pettit
map, grep, sort – slide 8
What is map?
Map is essentially a loop that processes a list, much like a foreach loop.
foreach $line ( @lines ) {$line = uc $line;
}
@lines = map uc, @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 9
What is map?
Map is essentially a loop that processes a list, much like a foreach loop.
foreach $line ( @lines ) {$line = uc $line;
}
@lines = map uc, @lines;
@lines = map { uc } @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 10
Aside—foreach inside-out
Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.
@lines = map uc, @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 11
Aside—foreach inside-out
Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.
@lines = map uc, @lines;
$_ = uc foreach @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 12
Aside—foreach inside-out
Alternate single line foreach is concise as map, and is slightly faster than map, but more cryptic.
@lines = map uc, @lines;
$_ = uc foreach @lines;
foreach ( @lines ) { $_ = uc;}
Copyright 2014 Daina Pettit
map, grep, sort – slide 13
What is the best way to use map?
● map is best for creating new lists. ● foreach is best for transforming a list.
@words = map { split } @lines;
foreach ( @lines ) {$_ = uc;
}
Copyright 2014 Daina Pettit
map, grep, sort – slide 14
Dumping out a hash alternatives
foreach ( sort keys %h ) { print "$_ => $h{$_}\n";}
Copyright 2014 Daina Pettit
map, grep, sort – slide 15
Dumping out a hash alternatives
foreach ( sort keys %h ) { print "$_ => $h{$_}\n";}
map { print "$_ => $h{$_}\n" } sort keys %h;
Copyright 2014 Daina Pettit
map, grep, sort – slide 16
Dumping out a hash alternatives
foreach ( sort keys %h ) { print "$_ => $h{$_}\n";}
map { print "$_ => $h{$_}\n" } sort keys %h;
print "$_ => $h{$_}\n" foreach sort keys %h;
Copyright 2014 Daina Pettit
map, grep, sort – slide 17
map {} is list context
Damian Conway in Perl Even-Better Practices* recommends:
"Use explicitly scalar map expressions"
*Thoughtstream Pty Ltd, 2013 pp 10-11
@dates = map { localtime $_ # Wrong! } @epoch_times;
Copyright 2014 Daina Pettit
map, grep, sort – slide 18
map {} is list context
Damian Conway in Perl Even-Better Practices* recommends:
"Use explicitly scalar map expressions"
*Thoughtstream Pty Ltd, 2013 pp 10-11
@dates = map { scalar localtime $_ } @epoch_times;
Copyright 2014 Daina Pettit
map, grep, sort – slide 19
map {} is list context
Damian Conway in Perl Even-Better Practices* recommends:
"Use explicitly scalar map expressions"
*Thoughtstream Pty Ltd, 2013 pp 10-11
@words = map { scalar split # Wrong!} @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 20
map {} is list context
Damian Conway in Perl Even-Better Practices* recommends:
"Use explicitly scalar map expressions"
*Thoughtstream Pty Ltd, 2013 pp 10-11
@words = map { split} @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 21
map {} confusion
How does perl know that { 6 } is a code block or a partial hash? Use +{ 6 }. + is required or you will get a syntax error.
map +{ 6 }, @stuff; # hashmap { 6 } @stuff; # code block
Copyright 2014 Daina Pettit
map, grep, sort – slide 22
Using map in void context● Frowned upon. ● Incurs extra overhead.
map { print "$_ => $h{$_}\n" } sort keys %h;
Copyright 2014 Daina Pettit
map, grep, sort – slide 23
Creating a hash in map
map { $age_of{$_} = M } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 24
Creating a hash in map
map { $age_of{$_} = M } @files;
foreach ( @files ) { $age_of{$_} = M;}
Copyright 2014 Daina Pettit
map, grep, sort – slide 25
Creating a hash in map
map { $age_of{$_} = M } @files;
foreach ( @files ) { $age_of{$_} = M;}
$age_of{$_} = M for @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 26
Skipping in map● Drop an item using an empty list.● Do NOT use an explicit return.
@ones = map { $_ < 10 ? $_ : (); } @numbers;
Copyright 2014 Daina Pettit
map, grep, sort – slide 27
What is grep?● Similar to Unix command-line utility grep● Given a list, grep returns only certain items
@ones = map { $_ < 10 ? $_ : (); } @numbers;
Copyright 2014 Daina Pettit
map, grep, sort – slide 28
What is grep?● Similar to Unix command-line utility grep● Given a list, grep returns only certain items
@ones = map { $_ < 10 ? $_ : (); } @numbers;
@ones = grep { $_ < 10 } @numbers;
Copyright 2014 Daina Pettit
map, grep, sort – slide 29
Boolean Scalar Context● Anywhere in perl where a true/false is expected
—if, while, and, or, not, &&, ||, !, etc.● Evaluation results in 0, “0”, 0.0, “”, or undef then
it is false. Everything else is true.
if ( 0 ) {} # Falseif ( 400 ) {} # Trueif ( 1 ) {} # Trueif ( "false" ) {} # True!if ( "00" ) {} # True!undef $x;if ( $x ) {} # False
Copyright 2014 Daina Pettit
map, grep, sort – slide 30
Examples of grep● Expression can be any valid perl expression.● Expression is in scalar boolean context.
@ones = grep { $_ < 10 } @numbers;
@dirs = grep { d } @files;
@no_dup = grep { ! $h{$_}++ } @old;
@errors = grep { /error/i } @log;
@true = grep { $_ } @all;
Copyright 2014 Daina Pettit
map, grep, sort – slide 31
Sorting Basics
Sort can be called in three ways:
1. With no comparison directives
2. With a subroutine that returns comparison directives
3. With a code block (an anonymous subroutine) that returns comparison directives
@sorted = sort @unsorted;@sorted = sort sub @unsorted;@sorted = sort { exp } @unsorted;
Copyright 2014 Daina Pettit
map, grep, sort – slide 32
Sorting Basics
Sort requires the comparison directives value of -1, 0, or 1 to tell whether any two elements, $a and $b, are in order (-1), the same (0), or out of order (1).
cmp and <=> conveniently provide this for string or numeric comparisons, respectively.
We don't have to use cmp and <=>. We just have to return -1, 0, or 1.
$a <=> $b
Copyright 2014 Daina Pettit
map, grep, sort – slide 33
Sorting Basics
Basic ASCII-betical sort:
Basic numeric sort:
@sorted = sort @list;
@sorted = sort { $a <=> $b } @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 34
Sorting Basics
Basic ASCII-betical sort:
Basic numeric sort:
@sorted = sort { $a cmp $b } @list;
@sorted = sort { $a <=> $b } @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 35
Sorting Basics--reverse
Reverse ASCII-betical sort:
Reverse numeric sort:
@sorted = sort { $b cmp $a } @list;
@sorted = sort { $b <=> $a } @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 36
Sorting Basics--reverse
Or just use reverse function:
Reverse numeric sort:
@sorted = reverse sort @list;
@sorted = reverse sort { $a <=> $b } @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 37
Sorting Basics--subroutine
Using a subroutine instead of a code block
You can also use anonymous subroutines.
These subroutines cannot be recursive!
sub compare {uc ( $a ) cmp uc ( $b );
}
$comp = \&compare;
@sorted = sort $comp @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 38
Complicated Sorting
You can sort on anything you can get to through $a and $b.
@sorted = sort { @array_a = split / /, $a; @array_b = split / /, $b; $array_a[5] cmp $array_b[5];
} @lines;
Copyright 2014 Daina Pettit
map, grep, sort – slide 39
Complicated Sorting
Sorting hash keys
Sorting hash keys by value
@sorted_keys = sort keys %hash;
@sorted_keys = sort { $hash{$a} cmp $hash{$b} } keys %hash;
Copyright 2014 Daina Pettit
map, grep, sort – slide 40
Complicated Sorting
We can sorting with multiple keys such as sort by year, then by month, then by day even if the data is mm-dd-yyyy.
@sorted_dates = sort { ( $ma, $da, $ya ) = split //, $a; ( $mb, $db, $yb ) = split //, $b; $ya<=>$yb || $ma<=>$mb || $da<=>$db;} @dates;
Copyright 2014 Daina Pettit
map, grep, sort – slide 41
Complicated Sorting
We don't have to always use the comparison operators. We can make up our own unique order.
@order = sort { return 1 if $a eq 'King' && $b ne 'King'; return 1 if $a ne 'King' && $b eq 'King';
return 0; } @cards; # King first, # the rest doesn't matter.
Copyright 2014 Daina Pettit
map, grep, sort – slide 42
Combinations
Since map, grep, and sort both take and return lists, you can chain them together.
@pics = map { lc } grep { /\.jpe?g$/i } sort @list;
Copyright 2014 Daina Pettit
map, grep, sort – slide 43
Optimizing sort
Given a list of files, sort by the age of the files.
chomp ( @files = `ls 1` );
“file1” “file7” “a.out” “x.pl” “5.dat”
Copyright 2014 Daina Pettit
map, grep, sort – slide 44
Optimizing sort
Sorts by name, but not by age.
@sorted = sort @files; # by name
“file1” “file7” “a.out” “x.pl” “5.dat”
Copyright 2014 Daina Pettit
map, grep, sort – slide 45
Optimizing sort
Sorts by date, but slow for large data sets.
M is called twice every time sort compares!
@sorted = sort { M $a <=> M $b } @files;
“file1” “file7” “a.out” “x.pl” “5.dat”
Copyright 2014 Daina Pettit
map, grep, sort – slide 46
Optimizing sort
We want to call M once for each file, save that and use that each time sort needs to compare.
Map will do this for us!
@order = map { [ $_, M ] } @files;
“file1” “file7” “a.out” “x.pl” “5.dat”
1.2 2.9 3.1 1.1 2.9
Copyright 2014 Daina Pettit
map, grep, sort – slide 47
Optimizing sort
Then we want to sort based on just the date part.
But now we need to get rid of the date part.
@order = sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
“x.pl” “file1” “5.dat” “file7” “a.out”
1.1 1.2 2.9 2.9 3.1
Copyright 2014 Daina Pettit
map, grep, sort – slide 48
Optimizing sortNow use map to extract just element 0 and we are back to the original list and sorted by date.
This is known as the Schwartzian Transform.**Perl idiom named for Randal Schwartz, author of Learning Perl, coined by Tom Christiansen.
@order = map { $_>[0] } sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
“x.pl” “file1” “5.dat” “file7” “a.out”
Copyright 2014 Daina Pettit
map, grep, sort – slide 49
Optimizing sort
Key points to remember for ST:● map sort map idiom
@order = map { $_>[0] } sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 50
Optimizing sort
Key points to remember for ST:● map sort map idiom● Use proper comparison
@order = map { $_>[0] } sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 51
Optimizing sort
Key points to remember for ST:● map sort map idiom● Use proper comparison● Extract value to compare
@order = map { $_>[0] } sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 52
Optimizing sort
Key points to remember for ST:● map sort map idiom● Use proper comparison● Extract value to compare● Everything else stays the same.
@order = map { $_>[0] } sort { $a>[1] <=> $b>[1] } map { [ $_, M ] } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 53
Optimizing sort—Orcish Maneuver*
Uses “or” cache (in a hash) to remember values already computed: ||=
● Simpler than ST● Almost as fast as ST● Faster if list contains duplicates
*Term coined by Joseph Hall in Effective Perl Programming, Addison-Wesley Professional, Boston, MA, 1998.
@order = sort { ( $cache{$a} ||= M $a ) <=> ( $cache{$b} ||= M $b ) } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 54
Optimizing sort—Orcish Maneuver
Key points to remember for OM:● Only sort
@order = sort { ( $cache{$a} ||= M $a ) <=> ( $cache{$b} ||= M $b ) } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 55
Optimizing sort—Orcish Maneuver
Key points to remember for OM:● Only sort● Compute comparison data
@order = sort { ( $cache{$a} ||= M $a ) <=> ( $cache{$b} ||= M $b ) } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 56
Optimizing sort—Orcish Maneuver
Key points to remember for OM:● Only sort● Compute comparison data● Use proper comparison operator
@order = sort { ( $cache{$a} ||= M $a ) <=> ( $cache{$b} ||= M $b ) } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 57
Optimizing sort—Orcish Maneuver
Key points to remember for OM:● Only sort● Compute comparison data● Use proper comparison operator
Everything else stays the same.
@order = sort { ( $cache{$a} ||= M $a ) <=> ( $cache{$b} ||= M $b ) } @files;
Copyright 2014 Daina Pettit
map, grep, sort – slide 58
Optimizing sort—Guttman-Rosler Transform*
This is a tweak on ST. Takes advantage of substr and sprintf being faster than array manipulation. Also uses default string sort which is slightly faster.
*A Fresh Look at Efficient Perl Sorting, Uri Guttman and Larry Rosler, approx. 1999.
@order = map { substr $_, 10 } sort map { m#(\d{4})/(\d+)/(\d+)#; sprintf "%d%02d%02d%s", $1, $2, $3, $_ } @dates;
Copyright 2014 Daina Pettit
map, grep, sort – slide 59
Optimizing sort—Guttman-Rosler Transform
Faster than ST
Harder to code and less readable
Not suitable for all sorts
@order = map { substr $_, 10 } sort map { m#(\d{4})/(\d+)/(\d+)#; sprintf "%d%02d%02d%s", $1, $2, $3, $_ } @dates;
Copyright 2014 Daina Pettit
map, grep, sort – slide 60
Further List & Sort Options
List::Util
shuffle, reduce, any, first, max, min, ...
List::MoreUtils
uniq, natatime, ...
Sort::Key
May be faster than ST or GRT
Sort::Naturally
Automatically sorts numeric when appropriate
Sort::Maker
Internally uses OM, ST, or GRT.
Copyright 2014 Daina Pettit
map, grep, sort – slide 61
Q&A
Comments?
Questions?
Copyright 2014 Daina Pettit
map, grep, sort – slide 62
Resources
http://www.perlmonks.org
http://www.cpan.org
http://www.hidemail.de/blog/perl_tutor.shtml
http://perldoc.perl.org/
http://www.stonehenge.com/writing.html
For profiling:
perldoc Devel::NYTProf
perldoc Benchmark