regular expressions · notebook #3 date site evil(mvad) may 29 2010 (hartnell) 1029.3 may 30 2010...

51
More Patterns Regular Expressions More Patterns Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information.

Upload: others

Post on 17-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

More Patterns

Regular Expressions

More Patterns

Copyright © Software Carpentry 2010

This work is licensed under the Creative Commons Attribution License

See http://software-carpentry.org/license.html for more information.

Page 2: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #1

Site Date Evil (millivaders)

---- ---- ---------------------- ---- ------------------

Baker 1 2009-11-17 1223.0

Baker 1 2010-06-24 1122.7

Baker 2 2009-07-24 2819.0

Baker 2 2010-08-25 2971.6

Baker 1 2011-01-05 1410.0

Baker 2 2010-09-04 4671.6

⋮ ⋮ ⋮

Regular Expressions More Patterns

Page 3: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #2

Site/Date/Evil

Davison/May 22, 2010/1721.3Davison/May 22, 2010/1721.3

Davison/May 23, 2010/1724.7

Pertwee/May 24, 2010/2103.8

Davison/June 19, 2010/1731.9

Davison/July 6, 2010/2010.7

Pertwee/Aug 4, 2010/1731.3

Pertwee/Sept 3, 2010/4981.0

⋮ ⋮ ⋮

Regular Expressions More Patterns

Page 4: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get date from record.

defdefdefdef get_date(record):

'''Return (Y, M, D) as strings, or None.'''

# 2010-01-01

m = re.search('([0-9]{4})-([0-9]{2})-([0-9]{2})',m = re.search('([0-9]{4})-([0-9]{2})-([0-9]{2})',

record)

ifififif m:

returnreturnreturnreturn m.group(1), m.group(2), m.group(3)

# Jan 1, 2010 (comma optional, day may be 1 or 2 digits)

m = re.search('/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/',

record)

Regular Expressions More Patterns

record)

ifififif m:

returnreturnreturnreturn m.group(3), m.group(1), m.group(2)

return Nonereturn Nonereturn Nonereturn None

Page 5: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 6: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 7: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 8: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 9: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 10: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 11: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# Get all fields from record.

defdefdefdef get_fields(record):

'''Return (Y, M, D, site, reading) or None.'''

patterns = (

('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',('(.+)\t([0-9]{4})-([0-9]{2})-([0-9]{2})\t(.+)',

2, 3, 4, 1, 5),

('(.+)/([A-Z][a-z]+) ([0-9]{1,2}),? ([0-9]{4})/(.+)',

4, 2, 3, 1, 5)

)

forforforfor p, y, m, d, s, r inininin patterns:

m = re.search(p, record)

ifififif m:

Regular Expressions More Patterns

ifififif m:

returnreturnreturnreturn m.group(y), m.group(m), m.group(d),

m.group(s), m.group(r)

return Nonereturn Nonereturn Nonereturn None

Page 12: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

Page 13: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

spaces

Page 14: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

But how to match parentheses?

Page 15: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

But how to match parentheses?

The '()' in '(.+)' don't actually match characters

Page 16: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Put '\(' and '\)' in regular expression to match

parenthesis characters '(' and ')'

Regular Expressions More Patterns

Page 17: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Put '\(' and '\)' in regular expression to match

parenthesis characters '(' and ')'

Another escape sequence, like '\t' in a string for tabAnother escape sequence, like '\t' in a string for tab

Regular Expressions More Patterns

Page 18: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Put '\(' and '\)' in regular expression to match

parenthesis characters '(' and ')'

Another escape sequence, like '\t' in a string for tabAnother escape sequence, like '\t' in a string for tab

In order to get the '\' in the string, must write '\\'

Regular Expressions More Patterns

Page 19: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Put '\(' and '\)' in regular expression to match

parenthesis characters '(' and ')'

Another escape sequence, like '\t' in a string for tabAnother escape sequence, like '\t' in a string for tab

In order to get the '\' in the string, must write '\\'

So the strings representing the regular expressions

that match parentheses are '\\(' and '\\)'

Regular Expressions More Patterns

Page 20: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# find '()' in text

m = re.search('\\(\\)', text)

⋮ ⋮ ⋮ ⋮program text

Regular Expressions More Patterns

Page 21: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# find '()' in text

m = re.search('\\(\\)', text)

⋮ ⋮ ⋮ ⋮program text

\ ( \ ) Python string

Regular Expressions More Patterns

Page 22: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

# find '()' in text

m = re.search('\\(\\)', text)

⋮ ⋮ ⋮ ⋮program text

\ ( \ ) Python string

Regular Expressions More Patterns

(((( ))))finite state

machine

Page 23: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

Page 24: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

'([A-Z][a-z]+) ([0-9]{1,2}) ([0-9]{4}) \\((.+)\\) (.+)'

Page 25: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3

match actual '(' and ')'

May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

'([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)'

Page 26: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3

match actual '(' and ')'

May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

'([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)'

create a group

Page 27: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Notebook #3

Date Site Evil(mvad)

May 29 2010 (Hartnell) 1029.3

match actual '(' and ')'

May 29 2010 (Hartnell) 1029.3

May 30 2010 (Hartnell) 1119.2

June 1 2010 (Hartnell) 1319.4

May 29 2010 (Troughton) 1419.3

May 30 2010 (Troughton) 1420.0

June 1 2010 (Troughton) 1419.8

⋮ ⋮ ⋮

Regular Expressions More Patterns

'([A-Z][a-z]+ [0-9]{1,2} [0-9]{4} \\((.+)\\) (.+)'

create a group save the match

Page 28: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

Regular Expressions More Patterns

Page 29: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

\d digits '0', '4', '9'\d digits '0', '4', '9'

Regular Expressions More Patterns

Page 30: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

\d digits '0', '4', '9'\d digits '0', '4', '9'

\s white space ' ', '\t', '\r', '\n'

Regular Expressions More Patterns

Page 31: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

\d digits '0', '4', '9'\d digits '0', '4', '9'

\s white space ' ', '\t', '\r', '\n'

\w word characters '[A-Za-z0-9_]'

Regular Expressions More Patterns

Page 32: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

\d digits '0', '4', '9'\d digits '0', '4', '9'

\s white space ' ', '\t', '\r', '\n'

\w word characters '[A-Za-z0-9_]'

actually the set of characters that can

appear in a variable name in C (or Python)

Regular Expressions More Patterns

appear in a variable name in C (or Python)

Page 33: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Some character sets come up frequently enough

to be given abbreviations

\d digits '0', '4', '9'\d digits '0', '4', '9'

\s white space ' ', '\t', '\r', '\n'

\w word characters '[A-Za-z0-9_]'

actually the set of characters that can

appear in a variable name in C (or Python)

Regular Expressions More Patterns

appear in a variable name in C (or Python)

Need to double up the '\' when writing as string

Page 34: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

Regular Expressions More Patterns

Page 35: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

Regular Expressions More Patterns

Page 36: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

that's an upper-case 'S'that's an upper-case 'S'

Regular Expressions More Patterns

Page 37: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

that's an upper-case 'S'that's an upper-case 'S'

\W non-word characters

Regular Expressions More Patterns

Page 38: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

that's an upper-case 'S'that's an upper-case 'S'

\W non-word characters

that's an upper-case 'W'

Regular Expressions More Patterns

Page 39: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

that's an upper-case 'S'that's an upper-case 'S'

\W non-word characters

that's an upper-case 'W'

Very easy to mis-type

Regular Expressions More Patterns

Page 40: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

And now for an example of really bad design

\S non-space characters

that's an upper-case 'S'that's an upper-case 'S'

\W non-word characters

that's an upper-case 'W'

Very easy to mis-type

Even easier to mis-read

Regular Expressions More Patterns

Even easier to mis-read

Page 41: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

Regular Expressions More Patterns

Page 42: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

Regular Expressions More Patterns

Page 43: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

re.search('^mask', 'mask size') => match

Regular Expressions More Patterns

Page 44: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

re.search('^mask', 'mask size') => match

re.search('^mask', 'unmask') => None

Regular Expressions More Patterns

Page 45: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

re.search('^mask', 'mask size') => match

re.search('^mask', 'unmask') => None

$ At the end of a pattern, matches the

end of the input text

Regular Expressions More Patterns

end of the input text

Page 46: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

re.search('^mask', 'mask size') => match

re.search('^mask', 'unmask') => None

$ At the end of a pattern, matches the

end of the input text

Regular Expressions More Patterns

end of the input text

re.search('temp$', 'high-temp') => match

Page 47: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

Can also match things that aren't actual characters

^ At the start of a pattern, matches the

beginning of the input textbeginning of the input text

re.search('^mask', 'mask size') => match

re.search('^mask', 'unmask') => None

$ At the end of a pattern, matches the

end of the input text

Regular Expressions More Patterns

end of the input text

re.search('temp$', 'high-temp') => match

re.search('temp$', 'temperature') => None

Page 48: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

\b Boundary between word and non-word

characters

Regular Expressions More Patterns

Page 49: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

\b Boundary between word and non-word

characters

re.search('\\bage\\b', 'the age of') => matchre.search('\\bage\\b', 'the age of') => match

Regular Expressions More Patterns

Page 50: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

\b Boundary between word and non-word

characters

re.search('\\bage\\b', 'the age of') => matchre.search('\\bage\\b', 'the age of') => match

re.search('\\bage\\b', 'phage') => None

Regular Expressions More Patterns

Page 51: Regular Expressions · Notebook #3 Date Site Evil(mvad) May 29 2010 (Hartnell) 1029.3 May 30 2010 (Hartnell) 1119.2 June 1 2010 (Hartnell) 1319.4 May 29 2010 (Troughton) 1419.3

June 2010

created by

Greg Wilson

June 2010

Copyright © Software Carpentry 2010

This work is licensed under the Creative Commons Attribution License

See http://software-carpentry.org/license.html for more information.