introduction to awk awk is a convenient and expressive programming language that can be applied to a...

16
Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

Upload: jordan-mitchell

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Introduction to Awk

Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks.

Page 2: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Awk

Works well on record-type dataReads input file(s) a line at a timeParses each line into fieldsPerforms user-defined tests against each line, performs actions on matches

Page 3: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Other Common UsesInput validation Every record have same # of fields? Do values make sense (negative time,

hourly wage > $1000, etc.)?

Filtering out certain fieldsSearches Who got a zero on lab 3? Who got the highest grade?

Many others

Page 4: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

InvocationCan write little one-liners on the command line (very handy): print the 3rd field of every line:$ awk '{ print $3 }' input.txt

Execute an awk script file:$ awk –f script.awk input.txt

Or, use this sha-bang as the first line, and give your script execute permissions:#!/bin/awk -f

Page 5: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Form of an AWK program

AWK programs are entries of the form:pattern { action } pattern – some test, looking for a

pattern (regular expressions) or C-like conditions if null, actions are applies to every line

action – a statement or set of statements if not provided, the default action is to

print the entire line, much like grep

Page 6: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Form of an AWK program

Input files are parsed, a record (line) at a timeEach line is checked against each pattern, in orderThere are 2 special patterns: BEGIN – true before any records are

read END – true at end of input (after all

records have been read)

Page 7: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Awk FeaturesPatterns can be regular expressions or C like conditions.Each line of the input is matched against the patterns, one after the next. If a match occurs the corresponding action is performed.Input lines are parsed and split into fields, which are accessed by $1,…,$NF, where NF is a variable set to the number of fields. The variable $0 contains the entire line, and by default lines are split by white space (blanks, tabs)

Page 8: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Variables

Not declared, nor typedNo character type Only strings and floats (support for

ints)

$n refers to the nth field (where n is some integer value) # prints each field on the linefor( i=1; i<=NF; ++i )

print $i

Page 9: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Some Built-in Variables

FS – the input field separatorOFS – the output field separatorNF – # of fields; changes w/each recordNR – the # of records read (so far). So, the current record #FNR – the # of records read so far, reset for each named file$0 – the entire input line

Page 10: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Example

$ cat emp.dataBeth 4.00 0Dan 3.75 0Kathy 4.00 10Mark 5.00 20Mary 5.50 22Susie 4.25 18

Print pay for those employees who actually worked

$ awk ‘$3>0 {print $1, $2*$3}’ emp.data

Kathy 40

Mark 100

Mary 121

Susie 76.5

Page 11: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Example – CSV file$ cat students.csvsmith,john,js12jones,fred,fj84bee,sue,sb23fife,ralph,rf86james,jim,jj22cook,nancy,nc54banana,anna,ab67russ,sam,sr77loeb,lisa,guitarHottie

$ cat getEmails.awk#!/bin/awk -f

BEGIN { FS = "," }{ printf( "%s's email is: %[email protected]\n", $2, $3 ); }

$ getEmails.awk students.csvjohn's email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]'s email is: [email protected]

Page 12: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Example – output separator$ cat out.awk#!/bin/awk -fBEGIN { FS = ","; OFS = "-*-"; }{ print $1, $2, $3; }

$ out.awk students.csvsmith-*-john-*-js12jones-*-fred-*-fj84bee-*-sue-*-sb23fife-*-ralph-*-rf86james-*-jim-*-jj22cook-*-nancy-*-nc54banana-*-anna-*-ab67russ-*-sam-*-sr77loeb-*-lisa-*-guitarHottie

Page 13: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Flow Control

Awk syntax is much like CSame loops, if statements, etc.

AWK: Aho, Weinberger, KernighanKernighan and Ritchie wrote the C language

Page 14: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Associative Arrays

Awk also supports arrays that can be indexed by arbitrary strings. They are implemented using hash tables. Total[“Sue”] = 100;

It is possible to loop over all indices that have currently been assigned values.for (name in Total)

print name, Total[name];

Page 15: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Example using Associative Arrays$ cat scores

Fred 90Sue 100Fred 85Sam 70Sue 98Sam 50Fred 70

$ cat total.awk

{ Total[$1] += $2}

END {

for (i in Total)

print i, Total[i];

}

$ awk -f total.awk scores

Sue 198Sam 120Fred 245

Page 16: Introduction to Awk Awk is a convenient and expressive programming language that can be applied to a wide variety of computing and data manipulation tasks

Useful one-linersLine count:awk 'END {print NR}'

grepawk '/pat/'

headawk 'NR<=10'

Add line #s to a fileawk '{print NR, $0}'awk '{ printf( "%5d %s", NR, $0 )}'Many more. See the resources tab on the course webpage for links to more examples.