sed, awk, & perl - virginia techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf ·...
TRANSCRIPT
![Page 1: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/1.jpg)
sed, awk, & perl
CS 2204Class meeting 13
*Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
![Page 2: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/2.jpg)
© Mir Farooq Ali, 2003 2
sed
Stream editorOriginally derived from “ed line editor”Used primarily for non interactive operations
operates on data streams, hence its name
Usage:sed options ‘address action’ file(s)
Example: sed ‘1$s/^bold/BOLD/g’ foo
![Page 3: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/3.jpg)
© Mir Farooq Ali, 2003 3
sed: Line Addressing
using line numbers (like 1,3p)sed ‘3,4p’ foo.txt
“For each line, if that line is the third through fourth line, print the line”
sed ‘4q’ foo.txt“For each line, if that line is the fourth line, stop”
sed –n `3,4p’ foo.txtSince sed prints each line anyway, if we only want lines 3 & 4 (instead of all lines with lines 3 & 4 duplicated) we use the -n
![Page 4: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/4.jpg)
© Mir Farooq Ali, 2003 4
sed: Line addressing (... continued)
sed –n ‘$p’ foo.txt“For each line, if that line is the last line, print”$ represent the last line
Reversing line criteria (!)sed –n ‘3,$!p’ foo.txt
“For each line, if that line is the third through last line, do not print it, else print”
![Page 5: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/5.jpg)
© Mir Farooq Ali, 2003 5
sed: Context Addressing
Use patterns/regular expressions rather than explicitly specifying line numberssed –n ‘/^From: /p’ $HOME/mbox
retrieve all the sender lines from the mailbox file“For each line, if that line starts with ‘From’, print it.” Note that the / / mark the beginning and end of the pattern to matchls –l | sed –n ‘/^.....w/p’“For each line, if the sixth character is a W, print”
![Page 6: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/6.jpg)
© Mir Farooq Ali, 2003 6
sed: Substitution
Strongest feature of sedSyntax is [address]s/expression1/string2/flag
sed ‘s/|/:/’ data.txtsubstitute the character ‘|’ with the character ‘:’
sed ‘s/|/:/g’ data.txt
global
substitute
![Page 7: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/7.jpg)
© Mir Farooq Ali, 2003 7
sed: Using files
Tedious to type in commands at the prompt, especially if commands are repetitiveCan put commands in a file and sed can use themsed –f cmds.sed data.txt
file with commands
![Page 8: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/8.jpg)
© Mir Farooq Ali, 2003 8
awk
Powerful pattern scanning and processing languageNames after its creators Aho, Weinberger and Kernighan (Don’t you love how commands are named?)Most commands operate on entire line
awk operates on fields within each line
Usage:awk options [scriptfile] file(s)
Example: awk –f awk.script foo.txt
![Page 9: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/9.jpg)
© Mir Farooq Ali, 2003 9
awk: Processing model
BEGIN { command executed before any input is read}
{Main input loop for each line of input}END {commands executed after all input is
read}
![Page 10: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/10.jpg)
© Mir Farooq Ali, 2003 10
awk: First example
# Begin ProcessingBEGIN {print "Print Totals"}
# Body Processing{total = $1 + $2 + $3}{print $1 " + " $2 " + " $3 " = "total}
# End Processing
END {print "End Totals"}
![Page 11: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/11.jpg)
© Mir Farooq Ali, 2003 11
Input and output files
Input22 78 44 66 31 70 52 30 44 88 31 66
OutputPrint Totals22 + 78 + 44 = 14466 + 31 + 70 = 16752 + 30 + 44 = 12688 + 31 + 66 = 185End Totals
![Page 12: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/12.jpg)
© Mir Farooq Ali, 2003 12
awk: command line processing
Input1 clothing 3141 1 computers 9161
1 textbooks 21312 2 clothing 3252 2 computers 12321
2 supplies 2242 2 textbooks 15462
Output1 computers 91612 computers 12321
awk ‘if ($2 == "computers“) {print}' sales.dat
![Page 13: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/13.jpg)
© Mir Farooq Ali, 2003 13
awk: Other features
Formatted printing using printfConditional statements (if-else)Loops
forwhiledo-while
![Page 14: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/14.jpg)
© Mir Farooq Ali, 2003 14
awk: Associative arrays
Normal arrays use integers for their indicesAssociative arrays with strings as their indicesExample: Age[“Robert”] = 56
![Page 15: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/15.jpg)
© Mir Farooq Ali, 2003 15
awk: Example# salesDeptLoop.awk scriptBEGIN {OFS = "\t"}{deptSales [$2] += $3}END {for (item in deptSales)
{print item, ":", deptSales[item]totalSales += deptSales[item]} # forprint "Total Sales", ":", totalSales
} # END
![Page 16: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/16.jpg)
© Mir Farooq Ali, 2003 16
Input and output
Input1 clothing 3141 1 computers 9161
1 textbooks 21312 2 clothing 3252 2 computers 12321
2 supplies 2242 2 textbooks 15462
Outputcomputers : 21482supplies : 2242
textbooks : 36774clothing : 6393Total Sales : 66891
![Page 17: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/17.jpg)
© Mir Farooq Ali, 2003 17
awk: Example# salesDeptLoop.awk scriptBEGIN {OFS = "\t"}{deptSales [$2] += $3}END {for (item in deptSales)
{print item, ":", deptSales[item]totalSales += deptSales[item]} # forprint "Total Sales", ":", totalSales
} # END
![Page 18: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/18.jpg)
© Mir Farooq Ali, 2003 18
Perl
"Practical Extraction and Reporting Language"written by Larry Wall and first released in 1987rumour: name came first, then the acronym"Perl is a language for easily manipulating text, files and processes": originally aimed at systems administrators and developers
![Page 19: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/19.jpg)
© Mir Farooq Ali, 2003 19
Features
enables quick development of programsno need to define variable typesportableextensible (module import/export mechanism)powerful "regular expression" capabilitiessimple I/O modelmany modulessupport for static scopingbuilt-in debugger
![Page 20: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/20.jpg)
© Mir Farooq Ali, 2003 20
Common usestext-stream filters
transforming, stripping, annotating, combining
simple text manipulationCommon Gateway Interface (CGI) scriptsreport generationsystem scriptinggeneral solution prototypingHello, World!
print ("Hello, world!\n");print "Hello, world!\n";print STDOUT "Hello, world!\n";
![Page 21: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/21.jpg)
© Mir Farooq Ali, 2003 21
Executing Perl scripts
"bang path" convention for scripts:can invoke Perl at the command line, oradd #!/public/bin/perl at the beginning of the scriptexact value of path depends upon your platform (use "which perl" to find the path)
From the command line:% perlprint "Hello, World!\n";CTRL-DHello, World!
![Page 22: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/22.jpg)
© Mir Farooq Ali, 2003 22
Basics
kinds of variable:scalars, lists, "hashes" (also called "associative arrays" or "dictionaries")some rudimentary support for object-orientation, but not really designed as an OOP languageadvanced perl supports pointers, user-defined structures, subroutine references
![Page 23: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/23.jpg)
© Mir Farooq Ali, 2003 23
Basics (contd)An example:
#!/public/bin/perl
$fruit{"apples"} = 5;$fruit{"oranges"} = 3;$fruit{"lemons"} = 2;$fruit{"limes"} = 2;
@keys = keys(%fruit);
foreach $f (@keys) {print "We have $fruit{$f} $f\n";
}
![Page 24: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/24.jpg)
© Mir Farooq Ali, 2003 24
Control structuresSimilar to that in C:
if () { }if () { } else { }if () { } elsif () { } else { } (note spelling)while () { }do { } while()for (;;) { }
foreach: iterates over each element in a listNo "switch" statement:
must use sequence like "if-elsif-elsif-else"conditional expressions as in C:
non-zero value: truezero value: false
![Page 25: sed, awk, & perl - Virginia Techcourses.cs.vt.edu/~cs2204/spring2005/lectures/sedawkperl.pdf · sed: Line Addressing using line numbers (like 1,3p) sed ‘3,4p’ foo.txt “For each](https://reader034.vdocuments.site/reader034/viewer/2022042712/5f9c43b7bcce3a36fa13a672/html5/thumbnails/25.jpg)
© Mir Farooq Ali, 2003 25
using shell commands in Perlexample:
$file_01 = “/home/foobar/ex1.txt”;$file_02 = “/home/foobar/ex2.txt”;…$result = system (“diff $file_01 $file_02”);if ($result == 0) {
# files were the same} else {
# files were different}if we are interested in only the result value and not the output from the command, redirect output to /dev/nullexample: …
system(“diff $file_01 $file_02 >/dev/null”)