chapter 5: advanced editors awk, sed, tr, cut. objectives: after studying this lesson, you should be...
TRANSCRIPT
Chapter 5: Advanced Editors
awk, sed, tr, cut
Objectives:
After studying this lesson, you should be able to:– awk: a pattern scanning and
processing language
– sed: stream editor
– tr: translate one character to another
– cut: cut specific columns vertically
Awk
• awk is a pattern scanning and processing language.
• Named after its developers Aho, Weinberger, and Kernighan. (developed in 1977)
• Search files to see if they contain lines that match specified patterns and then perform associated actions.
awk
Syntax:
awk –F(separator) ‘pattern{action}’ filenames
• awk checks to see if the input records in the specified files satisfy the pattern
• If they do, awk executes the action associated with it. • If no pattern is specified, the action affects every input
record.• A common use of awk is to process input files by
formatting them, and then output the results in the chosen form.
awk
• A sample data file named countries
Canada:3852:25:North AmericaUSA:3615:237:North AmericaBrazil:3286:134:South AmericaEngland:94:56:EuropeFrance:211:55:EuropeJapan:144:120:AsiaMexico:762:78:North AmericaChina:3705:1032:AsiaIndia:1267:746:Asia
• country name, area (km^2), population density(10^6/km^2), continent
awk
awk -F: '{ printf "%-10s \t%d \t%d \t%15s \n",$1,$2,$3,$4 }' countries
Outputs:Canada 3852 25 North America USA 3615 237 North America Brazil 3286 134 South America England 94 56 Europe France 211 55 Europe Japan 144 120 Asia Mexico 762 78 North America China 3705 1032 Asia India 1267 746 Asia
Some build-in Variables
• NF - Number of fields in current record• $NF - Last field of current record• NR - Number of records processed so far• FILENAME - name of current input file• FS - Field separator, space or TAB by
default• $0 - Entire line• $1, $2, …, $n - Field 1, 2, …, n
Formatted output
printf syntax:
printf "control-string" arg1, arg2, ... , argn
• The control-string determines how printf will format arg1 - argn.
• The control-string contains conversion specifications, one for each argument. A conversion specification has the following format: %[-][x[.y]]conv
Formatted output
%[-][x[.y]]conv - causes printf to left justify the argument. x is the minimum field width .y is the number of places to the right of a decimal
point in a number. conv is a letter from the following list:
d decimal e exponential notation f floating point number g use f or e, whichever is shorter o unsigned octal s string of characters x unsigned hexadecimal
printf examples
• printf “I have %d %s\n”, how_many, animal_type
• printf “%-10s has $%6.2f in their account\n”, name, amount
• printf “%10s %-4.2f %-6d\n”, name, interest_rate, account_number
• printf “\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name
awk
• awk opens a file and reads it serially, one line at a time.
• By specifying a pattern, we can select only those lines that contain a certain string of characters.
• For example we could use a pattern to display all countries from our data file which are situated within Europe.
awk '/Europe/' countries
Match operator
• A sample data file named countries
Canada:3852:25:North AmericaUSA:3615:237:North AmericaBrazil:3286:134:South AmericaEngland:94:56:EuropeFrance:211:55:EuropeJapan:144:120:AsiaMexico:762:78:North AmericaChina:3705:1032:AsiaIndia:1267:746:Asia
• awk -F: '$3 == 55' countries• Matching operators are :
==equal to; != not equal to; > greater than; < less than;>= greater than or equal to; <= less than or equal to
File Breaking
• Default is on space and tab and multiple contiguous white space counts as a single white space and leading separators are discarded
Logic Operations
Sample file named cars:ford mondeo 1990 5800
ford fiesta 1991 4575
honda accord 1991 6000
toyota tercel 1992 6500
buick centry 1990 5950
buick centry 1991 6450
• $ awk '$3 >=1991 && $4 < 6250' cars
• $ awk '$1 == "ford" || $1 == "buick"' cars
Data processing
• Sample file named wages Brooks 10 35
Everest 8 40Hatcher 12 20Phillips 8 30Wilcox 12 40
name, $/hour, hours/week
• Calculate $/week, tax/week, (25% on tax).
awk '{ print $1,$2,$3,$2*$3,$2*$3*0.25 }' wages
Other examples
• $ who | awk '{ print $5, $1 }' | sort prints name and login time sorted by time
• $ awk -F: '{ print $1 }' /etc/passwd | sortprint existing user names and sort it
• awk -F: '{ print "username: " $1 "\t\tuid:" $3 }' /etc/passwd print user name and user id
sed
• sed stands for stream editor.• sed is a non-interactive editor used to
make global changes to entire files at once
• An interactive editor like vi would be too cumbersome to try to use to replace large amounts of information at once
• sed command is primarily used to substitute one pattern for another
sed
Typical Usage of sed:• edit files too large for interactive editing• edit any size files where editing sequence
is too complicated to type in interactive mode
• perform “multiple global” editing functions efficiently in one pass through the input
• edit multiples files automatically• good tool for writing conversion programs
sed
• Syntax:
sed –e ‘command’ file(s)
sed –e ‘command’ –e ‘command’ … file(s)
sed –f scriptfile file(s)
sed
• Whole line oriented functionsDELETE d
APPEND a
CHANGE c
SUBSTITUTE s
INSERT i
sed examples
• sed 's/Tx/Texas/' fooreplaces Tx with Texas in the file foo• sed -e '1,10d' foodelete lines 1-10 from the file foo• sed ‘/^Co*t/,/[0-9]$/d’ foodeletes from the first line that begins
with Cot, Coot, Cooot, etc through the first line that ends with a digit
sed examples
• cat file I have three dogs and two cats sed -e 's/dog/cat/g' -e 's/cat/elephant/g' file I have three elephants and two elephants
• sed –e /^$/d foo
deletes all blank lines
• sed -e 6d foo
deletes line 6.
sed examples
• sed 's/Tx/Texas/' foo
replaces Tx with Texas in the file foo• sed -e '1,10d' foo
delete lines 1-10 from the file foo• sed '11,$d' foo
A dollar sign ($) can be used to indicate the last line in a file. For example, delete lines 11 through the end of myfile.
sed examples
• sed can also delete lines based on a matching string. Use /string/d For example, sed '/warning/d' log deletes every line in the file log that contains the string warning.
• To delete a string, not the entire line containing the string, substitut text with nothing. For example, sed 's/draft//g' foo removes the string draft everywhere it occurs in the file foo.
tr
• translates characters from stdin to stdout.
• Syntax:
tr [options] string1 [string2]
Options:
• -c complement set with respect to the entire ASCII character set
• -s squeeze duplicates to single characters• -d delete all input characters contained in string1
tr examples
Typical usages:
• tr chars1 chars2 < inputfile > outputfile
• tr chars1 chars2 < inputfile | less
tr
• tr s z replaces all instances of s with z• tr so zx replaces all instances of s with z and o with x• tr '[a-z]' '[A-Z]' replaces all lower case characters with upper
case• tr '[a-m]' '[A-M]' translates only lower case a through m to
upper case A though M
My first Shell Script
• tr ´.,:;?!´ ´.´
converts all punctuation to a period
• tr –c ´[0-9a-zA-Z]´ ´_´
converts all non-characters to _
• tr –s ´a-zA-Z´
squish all consecutive multiple characters
tr
• The output of tr can be redirected to a file or piped to another filter and its input can be redirected from a file or piped from another command
• This implies that certain characters must be protected from the shell by quotes or \, such as: spaces : ; & ( ) | ^ < > [ ] \ ! NEWLINE TAB
• Example: tr o ‘ ‘ replaces all o’s with a blank (space)
tr
• tr -d lets you delete any character matched in string1.
• Examples
tr -d '[a-z]' deletes all lower case characters
tr -d aeiou deletes all vowels
tr -dc aeiou deletes all character except vowels (note: this includes spaces, TABS, and NEWLINES as well)
tr
• tr -cs '[A-Z][a-z]' '[\n*]' <in_file > out_file
• It replaces all characters that are not a letter (-c) with a newline ( \n ) and then squeezes multiple newlines into a single newline (-s). The * after /n means as many repetitions as needed.
cut
• cut - used to cut specific columns vertically
• cut -c2-5 filename
cut column numbers from 2 to 5 (all inclusive) from the file filename.
• cut -f3-4 filename
if the filename has field delimiters, then individual fields can be cut out using the -f option.
cut
A sample file named bar
madan;SS;MRC-LMB;Ohio
christine;SS;MRC-LMB;Nebraska
This particular examples has 3 fields which are 'delimited' by a ; so to get field number three, you should run
cut -f4 -d';' bar
Summery
• awk: a pattern scanning and processing language
• sed: stream editor
• tr: translate one character to another
• cut: cut specific columns vertically