regular expression in action
TRANSCRIPT
Who We Are
We are a Development Partner for our customers
Design software solutions, not just implement them
Focus on the solution – Platform and technology agnostic
Expertise in building applications that are:
Mobile Social Cloud-based Gamified
What We Do Areas of Focus
Enterprise
Custom enterprise applications
Product development targeting the enterprise
Mobile
Custom mobile apps for iOS, Android, Windows Phone, BB OS
Mobile platform (server-to-server) development
Social Media
CMS based websites for consumers and enterprise (corporate, consumer,
community & social networking)
Social media platform development (enterprise & consumer)
Folio3 At a Glance Founded in 2005
Over 200 full time employees
Offices in the US, Canada, Bulgaria & Pakistan
Palo Alto, CA. Sofia, Bulgaria
Karachi, Pakistan
Toronto, Canada
Areas of Focus: Enterprise Automating workflows
Cloud based solutions
Application integration
Platform development
Healthcare
Mobile Enterprise
Digital Media
Supply Chain
Areas of Focus: Mobile Serious enterprise applications for Banks,
Businesses
Fun consumer apps for app discovery,
interaction, exercise gamification and play
Educational apps
Augmented Reality apps
Mobile Platforms
Areas of Focus: Web & Social Media
Community Sites based on
Content Management Systems
Enterprise Social Networking
Social Games for Facebook &
Mobile
Companion Apps for games
Agenda
What are Regular Expressions
Literal characters and Special characters
Build blocks of Regular Expressions
Grouping and Backreferences
Unicode characters in regular expressions
Regex Matching Modes
Lookarounds
Parse a log file…
What are Regular Expressions? Regular expressions provide a concise and flexible means for
matching strings of text, such as particular characters, words, or patterns of characters.
Literal and Special characters
The most basic regular expression consists of a literal which
behaves just like string matching. For e.g.
cat will match cat in About cats and dogs.
Special characters known as meta characters needs to be
escaped with a \ in regular expressions if they are used as
part of a literal:
dogs\. will match dogs. in About cats and dogs.
Meta characters are:
[ \ ^ $ . | ? * + ( ) {
Character Classes and Shorthands With a "character class", also called "character set", you can tell
the regex engine to match only one out of several characters. For e.g. gr[ae]y will match grey and gray both.
Ranges can be specified using dash. For e.g. [0-9] will match any digit from 0 to 9. [0-9a-fA-F] will match any single hexadecimal digit.
Caret after the opening square bracket will negate the character class. The result is that the character class will match any character that is not in the character class. For e.g. [^0-9] will match any thing except number. q[^u] will not match Iraq but it will match Iraq is a country
Character Classes and Shorthands Meta characters works fine without escaping in Character classes.
For e.g. [+*] is a valid expression and match either * or +.
There are some pre-defined character classes known as short hand character classes:
\w stands for [A-Za-z0-9_] \s stands for [ \t\r\n] \d stands for [0-9]
If a character class is repeated by using the ?, * or + operators, the entire character class will be repeated, and not just the character that it matched. For e.g.
[0-9]+ can match 837 as well as 222 ([0-9])\1+ will match 222 but not 837.
Building blocks of Regular Exp. The famous dot “.” operator matches anything. For e.g.
a.b will match abb, aab, a+b etc. ^ and $ are used to match start and end of regular expressions.
For e.g. ^My.*\.$ will match anything starting with My and ending
with a dot. Pipe operator is used to match a string against either its left or
the right part. For e.g. (cat|dog) can match both cat or dog.
Question: If the expression is Get|GetValue|Set|SetValue and string
is SetValue. What will this match and why? What if the expression becomes Get(Value)?|Set(Value)?
* or {0,} and + or {1,} are used to control repititions.
Grouping and Backreferences Round brackets besides grouping part of a regular expression
together, also create a "backreference". A backreference stores the matching part of the string matched by the part of the regular expression inside the parentheses. For e.g. ([0-9])\1+ will match 222 but not 837.
If backreference are not required, you can optimize this regular expression Set(?:Value)?
Backreferences can be used in expressions itself or in replacement text. For e.g. <([A-Za-z][A-Za-z0-9]*)>.*</\1> will match matching opening
and closing tags.
Unicode characters in Regular Exp.
Unicode characters can be used as \uxxxx in regular expressions.
For e.g.
:cat be matched in an expression as عطاری
\u0639\u0637\u0627\u0631\u06cc
Regular Exp. Matching Modes /i makes the regex match case insensitive.
[A-Z] will match A and a with this modifier.
/s enables "single-line mode". In this mode, the dot matches newlines as well.
.* will match sheraz\r\nattari with this modifier.
/m enables "multi-line mode". In this mode, the caret and dollar match before
and after newlines in the subject string.
.* will match only sheraz in sheraz\r\nattari with this modifier.
/x enables "free-spacing mode". In this mode, whitespace between regex
tokens is ignored, and an unescaped # starts a comment.
#sheraz\r\n\r\n.* will match only sheraz in with this modifier.
Lookarounds with Conditions… A conditional is a special construct that will first evaluate a lookaround, and
then execute one sub-regex if the lookaround succeeds, and another sub-
regex if the lookaround fails.
Example of Positive lookahead is:
q(?=uv*) will match q in quvvvv and qu.
Example of Negative lookahead is:
q(?!uv*) will match q not followed by u and uv.
Example of Positive lookbehind is:
(?<=b)a will match a prefixed by b like ba.
Example of Negative lookbehind is:
(?<!b)a will match a not prefixed by b like ca and da etc.
Contact
For more details about our
services, please get in touch with
us.
US Office: (408) 365-4638
www.folio3.com