regular expressions bkf03 brian ciccolo. agenda definition uses – within aspen and beyond matching...
TRANSCRIPT
Regular ExpressionsBKF03
Brian Ciccolo
Agenda
• Definition
• Uses – within Aspen and beyond
• Matching
• Replacing
What’s a Regular Expression?
In computing, regular expressions, also referred to as regex
or regexp, provide a concise and flexible means for
matching strings of text, such as particular characters,
words, or patterns of characters. A regular expression is
written in a formal language that can be interpreted by a
regular expression processor, a program that either serves
as a parser generator or examines text and identifies parts
that match the provided specification.
http://en.wikipedia.org/wiki/Regular_expression
Why Use a Regex?
• Validate data entry
Example: Verify the format of a date field is mm/dd/yyyy
• Find/replace on steroids
Example: Reformat phone numbers to (###) ###-####
Regex Use in Aspen
• Data validation
o Date, time field input
o Validation rules (new in 3.0 – see session TEC07)
• Find/replace on steroids
o System Log filter
o Field formatting
RegEx Examples Using Notepad++
Select the proper Search ModeSelect this option for our examples
Matching – The Basics
• Literals - plain old text
• ClassesExample Definition
[abc] a, b, or c
[a-z] Any lowercase letter
[a-zA-Z] Any lowercase or uppercase letter
[0-9] Any digit, 0 through 9
[^a-zA-Z] Not a letter (could be a digit or punctuation)
Matching – Predefined ClassesPredefined
ClassDefinition
. Any character
\d Any digit: [0-9]
\D Any non-digit: [^0-9]
\s A whitespace character (space, tab, newline)
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character (i.e., punctuation): [^\w]
Matching – Quantifiers
Quantifier Definition
?Matches 0 or 1 time
(Not supported by Notepad++)
+ Matches 1 or more times
* Matches 0 or more times
{n,m}Matches at least n times but no more than m times
(Not supported by Notepad++)
Matching – Greedy vs. Lazy
• Quantifiers are “greedy” by default –
they match as many characters as possible
• Sometimes you want to match the fewest
characters possible – enter “lazy” quantifiers
Quantifier Lazy Equivalent*
? ??
+ +?
* *?
* Not supported by Notepad++
Replacing – Groups
• “Groups” in the regex can be used in the
replacement value
• Delimited with parentheses in the regex
• Identified with \n where n is the nth
group in the original expression
• \0 represents the entire match
(not supported in Notepad++)
Reformatting Dates
• Change mm/dd/yyyy to yyyy-mm-dd
• Regex: (\d+)/(\d+)/(\d+)
• Replacement: \3-\1-\2
Step 2 – pad the single digits!
• Regex: -(\d)([-"])
• Replacement: -0\1\2
Reformatting Phone Numbers (v1)
• Wrap the area code in parentheses
• Regex: "(\d\d\d)-
• Replacement: "(\1) Ends with a space!
Reformatting Phone Numbers (v2)
• Strip punctuation (numbers only)
• Regex: \((\d+)\) (\d+)-(\d+)
• Replacement: \1\2\3
Reformatting Social Security Numbers
• Format SSN as ###-##-####
• Do it in Aspen!
• Define a record in the Regular Expression
Library table
• Set the regex on the Person ID field in the
Data Dictionary
Define a Regular Expression
Regex and format properties
Update the Data Dictionary
Link to the regex
Verify the Results
Extras
• Wikipedia Entryhttp://en.wikipedia.org/wiki/Regular_expression
• Regular Expressions Cheat Sheet (V2)http://www.addedbytes.com/cheat-sheets/regular-expressions-cheat-sheet
• Java regex supporthttp://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html
• Notepad++ text editor and regex supporthttp://notepad-plus.sourceforge.net
http://notepad-plus.sourceforge.net/uk/regExpList.php
Thank you.