grokking regex
DESCRIPTION
Understanding regular expressions gives developers another extremely useful and powerful tool they can use to perform some operations that would otherwise be very tedious or difficult. This presentation goes over how to build and test regular expressions so developers can start using them within their own code.TRANSCRIPT
![Page 1: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/1.jpg)
php[tek] 2014
David StocktonMay 21, 2014
Grokking Regex
![Page 2: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/2.jpg)
What are regular expressions?
![Page 3: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/3.jpg)
Patterns to describe text
![Page 4: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/4.jpg)
Regular
![Page 5: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/5.jpg)
Extremely Powerful
![Page 6: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/6.jpg)
Often Abused.
![Page 7: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/7.jpg)
Regular Expression Joke
![Page 8: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/8.jpg)
How to use regex in PHP
● The preg_* functions○ Use Perl compatible regular expressions○ Probably the most common regex syntax
● Don't use ereg_* functions
![Page 9: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/9.jpg)
PHP Functions
preg_match - Search a subject for a match
preg_match_all - Searches a subject for all matches
preg_replace - Replace a pattern with something else
preg_split - Split a string based on regex delimiter
![Page 10: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/10.jpg)
PHP Functions
preg_replace_callback - Replacement defined in a callback
preg_grep - Return array of elements that match a pattern
preg_quote - Quote regular expression characters
preg_last_error - Error code of last regex function
![Page 11: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/11.jpg)
Starting Pattern
● Matches letters, numbers, plus, dash, dots, underscore, plus, equals (1 or more)
● Followed by @● Followed by letters, numbers, dots and
dashes● Followed by a dot● Followed by 2 to 4 letters
/[A-Z0-9._+=]+@[A-Z0-9.-]\.[A-Z]{2,4}/i
![Page 12: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/12.jpg)
What does it mean?
![Page 13: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/13.jpg)
Email Addresses
![Page 14: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/14.jpg)
Some Email Addresses
![Page 15: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/15.jpg)
The "real" email address regex(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ] )+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:( ?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00- 31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)* ](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+ (?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?: (?: )?[ ])*))*|(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: ) ?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: r )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: ) ?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ] )*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])* )(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ] )+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*) *:(?:(?: )?[ ])*)?(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+ ||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31 ]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*]( ?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(? :(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(? : )?[ ])*))*>(?:(?: )?[ ])*)|(?:[^()<>@,;:quot;.[] 00-31]+(?:(? :(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )? [ ]))*"(?:(?: )?[ ])*)*:(?:(?: )?[ ])*(?:(?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" | |(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<> @,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|" (?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ] )*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(? :[^()<>@,;:quot;.[] 00-
![Page 16: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/16.jpg)
More "real" regex31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[ ]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?:[^()<>@,;:quot;.[] 00- 31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" ||( ?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: )?[ ])*(?:@(?:[^()<>@,; :quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([ ^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot; .[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[ ] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;. [] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] r|)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)?(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(?:[^" |.|(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@, ;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|"(? :[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])* (?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;. []]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[ ^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[] ]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:(?: )?[ ])*)(?:,s*( ?:(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:( ?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[ ["()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(? :.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*|(?: [^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;.[ ]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)*<(?:(?: ) ?[ ])*(?:@(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[" ()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: ) ?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<> @,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*(?:,@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@, ;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?:.(?:(?: )?[ ] )*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:".[]]))|[([^[] |)*](?:(?: )?[ ])*))*)*:(?:(?: )?[ ])*)? (?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[["()<>@,;:quot;. []]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ])*)(?:.(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+||(?=[[ "()<>@,;:quot;.[]]))|"(?:[^" ||(?:(?: )?[ ]))*"(?:(?: )?[ ]) *))*@(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ]) +||(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*)(?: .(?:(?: )?[ ])*(?:[^()<>@,;:quot;.[] 00-31]+(?:(?:(?: )?[ ])+| |(?=[["()<>@,;:quot;.[]]))|[([^[] |)*](?:(?: )?[ ])*))*>(?:( ?: )?[ ])*))*)?;s*)
![Page 17: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/17.jpg)
How do we implement this regex?
![Page 18: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/18.jpg)
Time for real learning
![Page 19: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/19.jpg)
Letters and Numbers
Letters and numbers match... letters and numbers
/a/ - Matches a string that contains "a"
/7/ - Matches a string that contains a 7
![Page 20: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/20.jpg)
Match a word
/regex/ - Matches a string with the word "regex" in it
![Page 21: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/21.jpg)
Match a choice of words
Use pipe when you want a choice
/pizza|steak|cheeseburger/
![Page 22: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/22.jpg)
Delimiters
So far, delimiters have been /
Needs to tell regex where to start and end
Can use other delimiters
#\\My\\PHP\\Namespace#
![Page 23: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/23.jpg)
Character Matching
/[Pp][Hh][Pp]/ - Matches PHP in an case
Define ranges
/[abcdefghijklmnopqrstuvwxyz]/ - Any lower case alpha
/[a-z]/ - Any lower case alpha
![Page 24: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/24.jpg)
Character Ranges
Combine Ranges:/[A-Za-z0-9]/ - Matches any alphanumeric/[A-Fa-f0-9]/ - Matches hex character
Invert Character selection/[^0-9]/ - Non digit characters/[^ ]/ - Non space characters/[.!@#$%^&*]/ - Some punctuation
![Page 25: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/25.jpg)
Special Characters
Dot (.) matches any character/.//../ - Matches any two characters
To match an actual dot character, escape it/\./
Not needed in character selection/[.]/
![Page 26: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/26.jpg)
Character Classes
\d means [0-9] (Digit, but also all unicode digits)\D means [^0-9]
\w means word characters - [A-Za-z0-9_]\W means non word - [^A-Za-z0-9_]
\s means whitespace character [ \t\n\r]\S means non-whitespace characters
![Page 27: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/27.jpg)
Repetition
Match two digits in a row● /\d\d/● /[0-9][0-9]/● /\d{2}/● /[0-9]{2}/
Match at least one, as many as possible/\d+/Zero or more: /\d*/
![Page 28: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/28.jpg)
Repetition Repeated
● * match 0 or more● + match 1 or more● {x} match exactly x● {x,} match x or more● {,y} match up to y● {x,y} match between x and y
![Page 29: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/29.jpg)
More special characters
? - Preceding selection is optional
![Page 30: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/30.jpg)
Step by Step
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
![Page 31: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/31.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Opening delimiter
![Page 32: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/32.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Optional open paren
![Page 33: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/33.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Capture group - Parens capture pattern inside
![Page 34: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/34.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Three digits (captured)
![Page 35: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/35.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Optional closing paren
![Page 36: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/36.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Space or dash character
![Page 37: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/37.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Optional space or dash character
![Page 38: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/38.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Another three digit capture group
![Page 39: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/39.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Optional space or dash character
![Page 40: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/40.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Capture group for four digits
![Page 41: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/41.jpg)
Break it down
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Closing delimiter
![Page 42: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/42.jpg)
More special characters
Put it together:
/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/
Matches 720-675-7471 or (720)675-7471 or (720) 675-7471 or 7206757471 or 720 675 7471
![Page 43: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/43.jpg)
Phone number matching
Does not match 720.675.7471 or a number of other formats.
Other ways?
Replace all non-digits, check for length of 10
![Page 44: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/44.jpg)
PHP Codes
$number = preg_replace( '/[^0-9]/', '', $potentialNumber);
$valid = strlen($number) == 10;
![Page 45: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/45.jpg)
Regex Anchors
![Page 46: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/46.jpg)
Specify Position With Anchors
/^ab/ - Matches abcdefg but not cab
/ab$/ - Matches cab but not abcdefg
/^[a-z]+$/ - Matches a string of only lowercase characters
![Page 47: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/47.jpg)
Word Boundaries
\b means word boundaries● Before first character if first character is word
character● After last character if word character● Between two characters if one is a word
character and the other isn't
/\bfish\b/ matches fish but not fisherman or catfish/fish\b/ matches fish and catfish
![Page 48: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/48.jpg)
Alternation
/cow|boy/ Matches cow or boy or cowboy or coward, etc/\b(cow|boy)\b/ - Matches cow or boy but not cowboy or coward
Parens capture the matching word - more on that later
![Page 49: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/49.jpg)
Greedy vs Lazy
Default is greedy - match as much as possible
Grab starting HTML tag:/<.+>/Matches in bold: <h1>Welcome to Tek</h1>
Not what we want.
![Page 50: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/50.jpg)
Make it lazy.
![Page 51: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/51.jpg)
Lazy Matching
/<.+?>/
Now matches:
<h1>Welcome to FRPUG</h1>
![Page 52: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/52.jpg)
Another way to match tags
/<[^>]+>/
Literally match: “Less than” followed by one or more non-“less than” characters followed by a “less than” character.
Faster than the last example. No backtracking.
![Page 53: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/53.jpg)
Capture Part of Regex
![Page 54: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/54.jpg)
Capturing Regex - Backreference
/__(construct|destruct)/
Backreference will contain construct or destruct so you can use it later
/([a-z]+)\1/Matches repeated sequence of characters
![Page 55: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/55.jpg)
Backreference
/([a-z]{3})\1/
Matches words like booboo or bambam
![Page 56: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/56.jpg)
Practical Backreference Uses
Search and replace
preg_replace('/\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/', '(1) 2-3', $phone);
Format phone numbers from a variety of input styles(xxx) xxx-xxxx
![Page 57: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/57.jpg)
More Practical Backreferences
preg_replace( '/\b(\w+)\s+\1\b/', '\1', $string);
Replace duplicated words that that have been inadvertently been left in.
Replace duplicated words that have been inadvertently been left in.
![Page 58: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/58.jpg)
Non-capturing groups
Match an IPv4 address
/((?:\d{1,3}\.){3}\d{1,3})/
Matching 1-3 digits followed by a dot 3 times. Repeat that match 3 times
![Page 59: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/59.jpg)
Non-capturing groups
Match an IPv4 address
/((?:\d{1,3}\.){3}\d{1,3})/
Matching 1-3 digits followed by a dot 3 times. Repeat that match 3 times
![Page 60: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/60.jpg)
Pattern Modifiers
Modifiers after the last delimiter:
i - case insensitive matchingm - multiline matchings - dot matches all characters, including \nx - ignore whitespace characters if not escaped or in a character class
![Page 61: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/61.jpg)
More Pattern Modifiers
D - Anchor matches end of string onlyU - Invert the meaning of greediness
Other modifiers can be seen here:
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
![Page 62: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/62.jpg)
Named Capture Groups
Instead of numbers, get back names
No need to renumber in code later if you add another capture group
![Page 63: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/63.jpg)
Named Capture Group - Phone
preg_match('/
\(? # opt. open paren
(?P<area_code>\d{3}) # area code
\)? # opt. closed paren
[ -]? # opt. space/dash
(?P<exchange>\d{3}) # exchange
[ -]? # opt. space/dash
(?P<number>\d{4}) # last 4 digits
/x', // ignore spaces and comment stuff
$number, $matches);
![Page 64: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/64.jpg)
Named Capture Group Result
array(7) {
[0] => string(10) "7206757471"
['area_code'] => string(3) "720"
[1] => string(3) "720"
['exchange'] => string(3) "675"
[2] => string(3) "675"
['number'] => string(4) "7471"
[3] => string(4) "7471"
}
![Page 65: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/65.jpg)
Positive Look Ahead Matches
Find a pattern followed by another pattern
/p(?=h)/ - Match a p followed by an "h" but don't include the "h"
Matches "phone", "phish", "telegraph"
Does not match "potassium"
![Page 66: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/66.jpg)
Negative Look Ahead
Look for a pattern which is not followed by some other pattern
/p(?!h)/ - p not followed by h
Matches potassium
Does not match phone, telegraph or phish
![Page 67: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/67.jpg)
Look aheads
● Positive and negative lookaheads do not capture anything
● They determine if a match is possible● They are zero-width● /p[^h]/ is not the same as /p(?!h)/● /ph/ is not the same as /p(?=h)/
![Page 68: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/68.jpg)
Look behinds
Positive Look Behind/(?<=oo)d/ - d preceded by oo
- Matches the d in "food" and "mood"
Negative Look Behind/(?<!oo)d/ - d not preceded by oo
- Matches "dude", "crude" and "d"
![Page 69: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/69.jpg)
With Great Power...
Test your regular expressions before they go to production
It's much easier to get them wrong than to get them right if you don't test
Use tools like Sublime Text, Atom
![Page 70: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/70.jpg)
When to not use regex
When they are not needed
If you can use strstr, strpos or str_replace
If you cannot use those, maybe regex is appropriate
Don't use regex when you need a parser
![Page 71: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/71.jpg)
Resources
http://regular-expressions.infohttp://php.net/manual/en/ref.pcre.phphttp://www.php.net/manual/en/reference.pcre.pattern.syntax.php
![Page 72: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/72.jpg)
Photo Credits● http://www.flickr.com/photos/justinbaeder/5317820857 (Hammer & Screw)● http://www.flickr.com/photos/doug88888/5891638442 (Water Pattern)● http://www.flickr.com/photos/mwparenteau/7566437660 (Laxative Cereal)● http://www.flickr.com/photos/auyuchuco/3669864253 (Mantis Shrimp)● http://www.flickr.com/photos/anderspiren/4678572968 (Spray Can)● http://www.flickr.com/photos/dcmatt/473127479 (Comedy Club)● http://www.flickr.com/photos/gschueler/72294706 (License Plate)● http://www.flickr.com/photos/horiavarlan/4514164700 (Puzzle @ sign)● http://www.flickr.com/photos/proimos/4199675334 (Facepalm)● http://www.flickr.com/photos/mklapper/5812224468 (Teacher in Classroom)● http://www.flickr.com/photos/light_arted/3927322326 (Anchor)● http://www.flickr.com/photos/kpcauchi/5376768095 (Lizard)● http://www.flickr.com/photos/focusshoot/5617788347 (Spider web)● http://www.flickr.com/photos/oberazzi/318947873 (Cuff links)
![Page 74: Grokking regex](https://reader034.vdocuments.site/reader034/viewer/2022042521/558a39a1d8b42aa51d8b46b1/html5/thumbnails/74.jpg)
Please rate this talkhttps://joind.in/10642