sed regular expression

79
Contents SED Regular expression

Upload: keerthanasudarshan

Post on 15-Apr-2016

143 views

Category:

Documents


1 download

DESCRIPTION

sed

TRANSCRIPT

Page 1: Sed Regular Expression

Contents

SED Regular expression

Page 2: Sed Regular Expression

Contents

To execute sed from file

Sed regular expression

Page 3: Sed Regular Expression

Contents

To execute sed from file Sed regular expression

Page 4: Sed Regular Expression

Using the file u3, do the following using sed, displaying the result on the screen

1.Output only the lines that contain cow Answer: sed -n '/cow/p' u3

2. Delete any line that contains cow Answer: sed '/cow/d' u3

3. Change the first instance of * on each line to ! Answer: sed 's/*/!/' u3

4. Change all occurrences of * on each line to ! Answer: sed 's/*/!/g' u3

Page 5: Sed Regular Expression

5. Output only the lines that contain either cow or calf Answer: sed -n -e '/cow/p' -e '/calf/p' u3

6. Output the file after changing cow to COW on lines 10-20 Answer: sed '10,20s/cow/COW/g' u3

7. Output the entire file except lines 1-20 Answer: sed '1,20d' u3

8. Delete any lines containing the string "news“ Answer: $ sed '/news/d'

Page 6: Sed Regular Expression

8. line 1 (one) line 2 (two) line 3 (three)

Command: sed -e '1,2s/line/LINE/' file

Output: LINE 1 (one)LINE 2 (two)line 3 (three)

Page 7: Sed Regular Expression

9. Command: sed -e '1,2d' file

Output: line 3 (three)

10. Command: sed -e '3d' file

Output: line 1 (one) line 2 (two)

Page 8: Sed Regular Expression

11. Write a script to insert

12. Write a script to change

Page 9: Sed Regular Expression

Sed from a file

If your sed script is getting long, you can put it into a file, like so:

# This file is named "sample.sed“ # comments can only appear in a block at the beginning s/color/colour/gs/flavor/flavour/gs/theater/theatre/g

Then call sed with the "-f" flag:

sed -f sample.sed filename

Page 10: Sed Regular Expression

Or, you can make an executable sed script:

#!/usr/bin/sed -f # This file is named "sample2.sed" s/color/colour/g s/flavor/flavour/g s/theater/theatre/g

then give it execute permissions: chmod u+x sample2.sed

and then call it like so: ./sample2.sed filename

Page 11: Sed Regular Expression

Note that you have to escape with backslashes the many characters:

curlies \{ \} ,

round brackets \( \),

star \*,

plus \+,

question mark \?

Page 12: Sed Regular Expression

Special characters Usage

^ Matches the beginning of the line

$ Matches the end of the line

. Matches any single character

\* Matches zero or more occurrence of the character

\+ Matches one or more occurrence

\? Matches zero or one instance of the character

[ ] Matches any character enclosed in [ ]

[^ ] Matches any character not enclosed in [ ]

(character)\{m,n\} Match m-n repetitions of (character)

(character)\{m,\} Match m or more repetitions of (character)

(character)\{,n\} Match n or less (possibly 0) repetitions of (character)

(character)\{n\} Match exactly n repetitions of (character)

\(expression\) Group operator. Also memorizes into numbered variables - use for backreference as \1 \2 .. \9

\n Backreference - matches nth group

&

Regular expression in sed

Page 13: Sed Regular Expression

Regular Expressions (character classes)

The following character classes are short-hand for matching special characters.

[:alnum:] Printable characters (includes white space)

[:alpha:] Alphabetic characters

[:blank:] Space and tab characters

[:cntrl:] Control characters

[:digit:]Numeric characters

[:graph:] Printable and visible (non-space) characters

[:lower:] Lowercase characters

[:print:] Alphanumeric characters

[:punct:] Punctuation characters

[:space:] Whitespace characters

[:upper:] Uppercase characters

[:xdigit:] Hexadecimal digits

Page 14: Sed Regular Expression

The '^' character means the beginning of the line. Example: sed 's/^Thu /Thursday/' filename

will turn "Thu " into "Thursday", but only at the beginning of the line.

Example: sed -e '/^#/d’

Page 15: Sed Regular Expression

Example: /[Uu]nix/!d deletes lines that do not contain the word unix.

6d deletes line 6

/^$/d deletes all blank lines

1,10d deletes lines 1 through 10

1,/^$/d deletes from line 1 through the first blank line

/^$/,/$/d deletes from the first blank line through the last line of the file

/^$/,10d deletes from the first blank line through line 10

/^Co*t/,/[0-9]$/d deletes from the first line that begins with Cot, Coot, Cooot, etc through the first line

that ends with a digit

Page 16: Sed Regular Expression

`[a-zA-Z0-9]'

This matches any letters or digits.

`[^a-z A-Z] ' This matches any letters .

Page 17: Sed Regular Expression

Repetition using *

means 0 or more of the previous single character pattern.

[abc]* matches "aaaaa" or "acbca“

Hi Dave.* matches "Hi Dave" or "Hi Daveisgoofy“

0*10 matches "010" or "0000010" or "10"

Page 18: Sed Regular Expression

Repetition using +

+ means 1 or more of the previous single character pattern.

[abc]+ matches "aaaaa" or "acbca“

Hi Dave.+ matches "Hi Dave." or "Hi Dave….“

0+10 matches "010" or "0000010" does not match "10"

a\+b\+ matches one or more `a's followed by one or more `b's: `ab' is the shorter possible match, but other

examples are `aaaab' or `abbbbb' or `aaaaaabbbbbbb'.

Page 19: Sed Regular Expression

? Repetition Operator

? means 0 or 1 of the previous single character pattern.

x[abc]?x matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match "a1b" or "A123B"

Page 20: Sed Regular Expression

`.\{9\}A$'

This matches an A that is the last character on line, with at least nine preceding characters.

`^.\{15\}A‘

This matches an A that is the 16th character on a line.

Page 21: Sed Regular Expression

sed G myfile.txt > newfile.txt

In the above example using the sed command with G would double space the file myfile.txt and output the results to the newfile.txt.

sed = myfile.txt | sed 'N;s/\n/\. /‘

The above example will use the sed command to output each of the lines in myfile.txt with the line number followed by a period and a space before each line. As done with the first example the output could be redirected to another file using > and the file name.

Page 22: Sed Regular Expression

sed 's/test/example/g' myfile.txt > newfile.txt

Opens the file myfile.txt and searches for the word "test" and replaces every occurrence with the word "example".

sed -n '$=' myfile.txt

Above this command count the number of lines in the myfile.txt and output the results.

Page 23: Sed Regular Expression

Regular Expressions (cont…)/^M.*/

/..*/

/^$/

ab|cd

a(b*|c*)d 

[[:space:][:alnum:]] 

Line begins with capital M, 0 or more chars follow

At least 1 character long (/.+/ means the same thing)

The empty line

Either ‘ab’ or ‘cd’

matches any string beginning with a letter a, followed by either zeroor more of the letter b, or zero or more of the letter c, followed by the letter d.

Matches any character that is either a white space character or

alphanumeric.

Note:

Sed always tries to find the longest matching pattern in the input. How would you match a tag in an HTML document?

Page 24: Sed Regular Expression
Page 25: Sed Regular Expression
Page 26: Sed Regular Expression

Grouping with parens

• If you put a subpattern inside parens you can use + * and ? to the entire subpattern.

a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"

Page 27: Sed Regular Expression
Page 28: Sed Regular Expression
Page 29: Sed Regular Expression
Page 30: Sed Regular Expression
Page 31: Sed Regular Expression
Page 32: Sed Regular Expression

9. append three exclamation points to the end of each line in u3 that contains student10.repeat the previous command, but only output the lines that you change.11.If you wanted to actually change the original file for questions #3,4,6,7, and 9, how would youdo it? 9. sed '/student/s/$/!!!/' u310.sed -n '/student/s/$/!!!/p' u311.Save the output of the sed command in a temporary file and then use the mv command to rename itto the original. Never redirect output to the same file you are using for input within the same commandor pipeline! Example (#9):sed '/student/s/$/!!!/' u3 > xxx # <-- the shell overwrites xxx BEFORE it starts sedmv xxx u3

6. change all occurrences of cow to cows and cows using the parenthesis operators and \1 substitution Answer: sed 's/\(cow\)/\1s and \1s/' u3

Page 33: Sed Regular Expression

Using the file u3, do the following using sed, displaying the result on the screen

1.Output only the lines that contain MCIS Answer: sed -n '/MCIS/p' u3

2. Delete any line that contains mcis Answer: sed '/mcis/d' u3

3. Change the first instance of * on each line to ! Answer: sed 's/*/!/' u3

4. Change all occurrences of * on each line to ! Answer: sed 's/*/!/g' u3

Page 34: Sed Regular Expression

5. Output only the lines that contain either MCIS or VLSI

Answer: sed -n -e '/MCIS /p' -e '/VLSI /p' u3

6. Output the file after changing mcis to MCIS on lines 10-20

Answer: sed '10,20s/mcis/MCIS/g' u3

7. Output the entire file except lines 1-20

Answer: sed '1,20d' u3

8. Delete any lines containing the string "news“

Answer: $ sed '/news/d'

Page 35: Sed Regular Expression

9 . line 1 (one) line 2 (two) line 3 (three)

Command: sed -e '1,2s/line/LINE/' file

Output: LINE 1 (one)LINE 2 (two)line 3 (three)

Page 36: Sed Regular Expression

9. Command: sed -e '1,2d' file

Output: line 3 (three)

10. Command: sed -e '3d' file

Output: line 1 (one) line 2 (two)

Page 37: Sed Regular Expression

11. Write a sed script that will take two words and a file name a input from the user.Let the inputs be word1, word2, and filename. Write - scripts to do the following

To insert the word2 at every place word1 is present in the file “u3”

Answer: #!/bin/shecho -n 'Enter the string to which the new string to be appended:'read string1echo -n 'Enter the string which is used to append:'read string2echo -n 'Enter the filename 'read filenamesed '/‘$string 1'/i\‘$string2'' $filename

Page 38: Sed Regular Expression

Sed from a file

If your sed script is getting long, you can put it into a file, like so:

# This file is named "sample.sed“ # comments can only appear in a block at the beginning s/color/colour/gs/flavor/flavour/gs/theater/theatre/g

Then call sed with the "-f" flag:

sed -f sample.sed filename

Page 39: Sed Regular Expression

Or, you can make an executable sed script:

#!/usr/bin/sed -f # This file is named "sample2.sed" s/color/colour/g s/flavor/flavour/g s/theater/theatre/g

then give it execute permissions: chmod u+x sample2.sed

and then call it like so: ./sample2.sed filename

Page 40: Sed Regular Expression

Note that you have to escape with backslashes the many characters:

curlies \{ \} ,

round brackets \( \),

star \*,

plus \+,

question mark \?

Page 41: Sed Regular Expression

Special characters Usage

^ Matches the beginning of the line

$ Matches the end of the line

. Matches any single character

\* Matches zero or more occurrence of the character

\+ Matches one or more occurrence

\? Matches zero or one instance of the character

[ ] Matches any character enclosed in [ ]

[^ ] Matches any character not enclosed in [ ]

(character)\{m,n\} Match m-n repetitions of (character)

(character)\{m,\} Match m or more repetitions of (character)

(character)\{,n\} Match n or less (possibly 0) repetitions of (character)

(character)\{n\} Match exactly n repetitions of (character)

\(expression\) Group operator. Also memorizes into numbered variables - use for backreference as \1 \2 .. \9

\n Backreference - matches nth group

Regular expression in sed

Page 42: Sed Regular Expression

The '^' character means the beginning of the line.

Example:

sed 's/^Thu /Thursday/' filename

will turn "Thu " into "Thursday", but only at the beginning of the line.

Example:

sed -e '/^#/d’

Page 43: Sed Regular Expression

Examples:

1,10d deletes lines 1 through 10/[Uu]nix/!d deletes lines that do not contain the word unix.

6d deletes line 6/^$/d deletes all blank lines1,/^$/d deletes from line 1 through the first blank line/^$/,/$/d deletes from the first blank line through the last

line of the file

/^$/,10d deletes from the first blank line through line 10

Page 44: Sed Regular Expression

`[a-zA-Z0-9]'

This matches any letters or digits.

`[^a-z A-Z] '

This matches any letters .

Page 45: Sed Regular Expression

Print only lines of 65 characters or longersed -n '/^.\{65\}/p‘

Print only lines of less than 65 characterssed -n '/^.\{65\}/!p' # method 1, corresponds to above

Page 46: Sed Regular Expression

Print line number 52

sed -n '52p' # method 1

sed '52!d' # method 2

Page 47: Sed Regular Expression

print section of file between two regular expressions

sed -n '/Iowa/,/Montana/p' # case sensitive

print all of file EXCEPT section between 2 regular expressions

sed '/Iowa/,/Montana/d'

Page 48: Sed Regular Expression

The q or quit command

There is one more simple command that can restrict the changes to a set of lines. It is the "q“ command: quit.

the third way to duplicate the head command is:sed '11 q'

which quits when the eleventh line is reached.

This command is most useful when you wish to abort the editing after some condition is reached.

The "q" command is the one command that does not take a range of addresses.

Page 49: Sed Regular Expression

Relationships between d, p, and !As you may have noticed, there are often several ways to solve the same problem with sed. This isbecause print and delete are opposite functions, and it appears that "!p" is similar to "d," while "!d" issimilar to "p." I wanted to test this, so I created a 20 line file, and tried every different combination. Thefollowing table, which shows the results, demonstrates the difference:Relations between d, p, and !Sed Range Command Results--------------------------------------------------------sed -n 1,10 p Print first 10 linessed -n 11,$ !p Print first 10 linessed 1,10 !d Print first 10 linessed 11,$ d Print first 10 lines

Page 50: Sed Regular Expression

--------------------------------------------------------sed -n 1,10 !p Print last 10 linessed -n 11,$ p Print last 10 linessed 1,10 d Print last 10 linessed 11,$ !d Print last 10 lines--------------------------------------------------------

Page 51: Sed Regular Expression

sed -n 1,10 d Nothing printedsed -n 1,10 !d Nothing printedsed -n 11,$ d Nothing printedsed -n 11,$ !d Nothing printed--------------------------------------------------------sed 1,10 p Print first 10 lines twice,Then next 10 lines oncesed 11,$ !p Print first 10 lines twice,Then last 10 lines once--------------------------------------------------------sed 1,10 !p Print first 10 lines once,Then last 10 lines twicesed 11,$ p Print first 10 lines once,then last 10 lines twice

Page 52: Sed Regular Expression

Obviously the command

sed '1,10 q‘

cannot quit 10 times. Instead

sed '1 q'orsed '10 q‘

is correct.

Page 53: Sed Regular Expression

1. Delete lines that contain "O" at the beginning of the line.

Answer: sed '/^O/d' list.txt

2. Translate capital C,R,O into small c,r,o

Answer: sed 'y/CRO/cro/' list.txt

3. Delete empty lines

Answer: sed '/^$/d' list.txt

4. Remove lines containing anything other than alphabets, numbers, or spaces

Answer: sed '/ ^[0-9a-zA-Z ]/d' list.txt

Page 54: Sed Regular Expression

Specifying a Range of Characters with [...]

If you want to match specific characters,

you can use the square brackets to identify the exact characters you are searching for.

The pattern that will match any line of text that contains exactly one number is

^[0123456789]$

This is verbose.

You can use the hyphen between two characters to specify a range: ^[0-9]$

Page 55: Sed Regular Expression

You can intermix explicit characters with character ranges.

This pattern will match a single character that is a letter, number, or underscore:

[A-Za-z0-9_]

Page 56: Sed Regular Expression

If you wanted to search for a word that

Started with a capital letter "T." Was the first word on a line The second letter was a lower case letter And the third letter was a vowel

the regular expression would be "^T[a-z][aeiou] ."

Page 57: Sed Regular Expression

Delete all lines NOT beginning with an 'a,e,E or I'"

Answer: sed '/^[^aeEI]/d' list.txt

You can easily search for all characters except those in square brackets by putting a "^" as the first character after the "[."

To match all characters except vowels use "[^aeiou]."

Page 58: Sed Regular Expression
Page 59: Sed Regular Expression

*

Page 60: Sed Regular Expression
Page 61: Sed Regular Expression

Repetition using *

means 0 or more of the previous single character pattern.

[abc]* matches "aaaaa" or "acbca“

Hi Dave.* matches "Hi Dave" or "Hi Daveisgoofy“

0*10 matches "010" or "0000010" or "10"

Page 62: Sed Regular Expression

Lets looks at another example:/a*bc[e-g]*[0-9]*/ Matches:aaaaabcfgh19919234bcabcefg123456789abc45Aabcggg87310

d*avid Will match avid, david, ddavid dddavid and any other word with repeated ds followed by avid

Page 63: Sed Regular Expression

Compress all consecutive sequences of zeroes into a single zero.

Answer: s/00*/0/g

Page 64: Sed Regular Expression

Repetition using +

+ means 1 or more of the previous single character pattern.

[abc]+ matches "aaaaa" or "acbca“

Hi Dave.+ matches "Hi Dave." or "Hi Dave….“

0+10 matches "010" or "0000010" does not match "10"

a\+b\+ matches one or more `a's followed by one or more `b's: `ab' is the shorter possible match, but other

examples are `aaaab' or `abbbbb' or `aaaaaabbbbbbb'.

Page 65: Sed Regular Expression
Page 66: Sed Regular Expression

? Repetition Operator

? means 0 or 1 of the previous single character pattern.

x[abc]?x matches "xax" or "xx"

A[0-9]?B matches "A1B" or "AB" does not match "a1b" or "A123B"

`a\?b' Matches `b' or `ab'.

Page 67: Sed Regular Expression
Page 68: Sed Regular Expression

Match any character with .

The character "." is one of those special meta-characters.

By itself it will match any character, except the end-of-line character.

The pattern that will match a line with a single characters is ^.$• Any character (except a metacharacter!)matches itself.• The "." character matches any character except newline."F." Matches an 'F' followed by any character."a.b" Matches 'a' followed by any1 charfollowed by 'b'.

Page 69: Sed Regular Expression
Page 70: Sed Regular Expression

If you really want to match '.',

you can use "\."

a\.b a.b axb

Page 71: Sed Regular Expression

Matching a specified number of the pattern using the curly brackets {}

Using {n}, we match exactly that number of the previous expression.

If we want to match 'aaaa' then we could use: a{4} This would match exactly four a's.

If we want to match the pattern 1999 in our file bazaar.txt, then we would do: sed '/19{3}/p' bazaar.txt This should print all lines containing the pattern 1999 in the bazaar.txt file.

Page 72: Sed Regular Expression

The following expression would match a minimum of four a's but a maximum of 10 a's in a particular pattern: a\{4,10\} Let's say we wanted to match any character a minimum of 3 times, but a maximum of 7 times, then we could affect a regular expression like: .\{3,7\}

Page 73: Sed Regular Expression

`\{I\}' As `*', but matches exactly I sequences (I is a decimal integer; for portability, keep it between 0 and 255 inclusive). `\{I,J\}' Matches between I and J, inclusive, sequences. `\{I,\}' Matches more than or equal to I sequences.

Page 74: Sed Regular Expression

`.\{9\}A$‘

This matches nine characters followed by an `A'. `^.\{15\}A'

This matches the start of a string that contains 16characters, the last of which is an `A'.

Page 75: Sed Regular Expression

`\(REGEXP\)‘

Groups the inner REGEXP as a whole, this is used to: * Apply postfix operators, like `\(abcd\)*': this will search for zero or more whole sequences of `abcd', while `abcd*' would search for `abc' followed by zero or more occurrences of `d'. Note that support for `\(abcd\)*' is required by POSIX 1003.1-2001, but many non-GNU implementations do not support it and hence it is not universally portable.

Page 76: Sed Regular Expression

`REGEXP1\|REGEXP2'

Matches either REGEXP1 or REGEXP2.

Use parentheses to use complex alternative regular expressions.

The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used.

Page 77: Sed Regular Expression

`N‘

Add a newline to the pattern space, then append the next line of input to the pattern space.

If there is no more input then SED exits without processing any more commands.

Page 78: Sed Regular Expression

File spacing:

space a file

sed G file name

insert a blank line below every line which matches "regex“

sed '/regex/G'

Page 79: Sed Regular Expression

count lines (emulates "wc -l")sed -n '$='