introduction to awk awk is a convenient and expressive programming language that can be applied to a...
TRANSCRIPT
Introduction to Awk
Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.
Awk
Works well on record-type dataReads input file(s) a line at a timeParses each line into fieldsPerforms user-defined tests against each line, performs actions on matches
Other Common UsesInput validation Every record have same # of fields? Do values make sense (negative time,
hourly wage > $1000, etc.)?
Filtering out certain fieldsSearches Who got a zero on lab 3? Who got the highest grade?
Many others
InvocationCan write little one-liners on the command line (very handy): print the 3rd field of every line:$ awk '{ print $3 }' input.txt
Execute an awk script file:$ awk –f script.awk input.txt
Or, use this sha-bang as the first line, and give your script execute permissions:#!/bin/awk -f
Form of an AWK program
AWK programs are entries of the form:pattern { action } pattern – some test, looking for a
pattern (regular expressions) or C-like conditions if null, actions are applies to every line
action – a statement or set of statements if not provided, the default action is to
print the entire line, much like grep
Form of an AWK program
Input files are parsed, a record (line) at a timeEach line is checked against each pattern, in orderThere are 2 special patterns: BEGIN – true before any records are
read END – true at end of input (after all
records have been read)
Awk FeaturesPatterns can be regular expressions or C like conditions.Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed.Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)
Variables
Not declared, nor typedNo character type Only strings and floats (support for
ints)
$n refers to the nth field (where n is some integer value) # prints each field on the linefor( i=1; i<=NF; ++i )
print $i
Some Built-in Variables
FS – the input field separatorOFS – the output field separatorNF – # of fields; changes w/each recordNR – the # of records read (so far). So, the current record #FNR – the # of records read so far, reset for each named file$0 – the entire input line
Example
$ cat emp.dataBeth 4.00 0Dan 3.75 0Kathy 4.00 10Mark 5.00 20Mary 5.50 22Susie 4.25 18
Print pay for those employees who actually worked
$ awk ‘$3>0 {print $1, $2*$3}’ emp.data
Kathy 40
Mark 100
Mary 121
Susie 76.5
Example – CSV file$ cat students.csvsmith,john,js12jones,fred,fj84bee,sue,sb23fife,ralph,rf86james,jim,jj22cook,nancy,nc54banana,anna,ab67russ,sam,sr77loeb,lisa,guitarHottie
$ cat getEmails.awk#!/bin/awk -f
BEGIN { FS = "," }{ printf( "%s's email is: %[email protected]\n", $2, $3 ); }
$ getEmails.awk students.csvjohn's email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]
Example – output separator$ cat out.awk#!/bin/awk -fBEGIN { FS = ","; OFS = "-*-"; }{ print $1, $2, $3; }
$ out.awk students.csvsmith-*-john-*-js12jones-*-fred-*-fj84bee-*-sue-*-sb23fife-*-ralph-*-rf86james-*-jim-*-jj22cook-*-nancy-*-nc54banana-*-anna-*-ab67russ-*-sam-*-sr77loeb-*-lisa-*-guitarHottie
Flow Control
Awk syntax is much like CSame loops, if statements, etc.
AWK: Aho, Weinberger, KernighanKernighan and Ritchie wrote the C language
Associative Arrays
Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables. Total[“Sue”] = 100;
It is possible to loop over all indices that have currently been assigned values.for (name in Total)
print name, Total[name];
Example using Associative Arrays$ cat scores
Fred 90Sue 100Fred 85Sam 70Sue 98Sam 50Fred 70
$ cat total.awk
{ Total[$1] += $2}
END {
for (i in Total)
print i, Total[i];
}
$ awk -f total.awk scores
Sue 198Sam 120Fred 245
Useful one-linersLine count:awk 'END {print NR}'
grepawk '/pat/'
headawk 'NR<=10'
Add line #s to a fileawk '{print NR, $0}'awk '{ printf( "%5d %s", NR, $0 )}'Many more. See the resources tab on the course webpage for links to more examples.