beneath the surface: regular expressions in ruby
Post on 10-Jul-2015
824 Views
Preview:
DESCRIPTION
TRANSCRIPT
Photo By Mr. Christopher ThomasCreative Commons Attribution-ShareALike 2.0 Generic License
Beneath the Surface
Embracing the True Power of Regular Expressions in Ruby
@nellshamrell
^4[0-9]{12}(?:[0-9]{3})?$
Source: regular-expressions.info
We fear what we do not understand
Regular Expressions
+ Ruby
Photo By ShayanCreative Commons Attribution-ShareALike 2.0 Generic License
Regex Matching in Ruby
RubyMethods
Onigmo
Onigmo
Oniguruma
OnigmoFork
Onigmo
Reads Regex
Onigmo
Reads Regex
AbstractSyntax
Tree
ParsesInto
Onigmo
Reads Regex
AbstractSyntax
Tree
Series ofInstructions
ParsesInto
CompilesInto
Finite State Machines
Photo By Felipe SkroskiCreative Commons Attribution Generic 2.0
A Finite State Machine Shows How
Something Works
Annie the Dog
In the House
Out of House
Annie the Dog
In the House
Out of House
Annie the Dog
Door
In the House
Out of House
Annie the Dog
Door
Door
Finite
State
Machine
Finite
State
Machine
Finite
State
Machine
Multiple States
/force/
re = /force/string = “Use the force”re.match(string)
f o r c e
/force/
“Use the force”
Path Doesn’t Match
f o r c e
/force/
“Use the force”
Still Doesn’t Match
f o r c e
/force/
“Use the force”
Path Matches!
(Fast Forward)
f o r c e
/force/
“Use the force”
f o r c e
/force/
“Use the force”
f o r c e
/force/
“Use the force”
f o r c e
/force/
“Use the force”
f o r c e
/force/
“Use the force”
We Have A Match!
re = /force/string = “Use the force”re.match(string)=> #<MatchData “force”>
Alternation
Photo By ShayanCreative Commons Attribution Generic 2.0
/Y(olk|oda)/
Pipe
re = /Y(olk|oda)/string = “Yoda”re.match(string)
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”
Y oo
l k
d a
/Y(olk|oda)/
Which To Choose?
“Yoda”
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”Saves To Backtrack
Stack
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”Uh Oh, No Match
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”Backtracks To Here
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”
Y oo
l k
d a
/Y(olk|oda)/
“Yoda”
We Have A Match!
re = /Y(olk|oda)/string = “Yoda”re.match(string)=> #<MatchData “Yoda”>
Photo By Fancy HorseCreative Commons Attribution Generic 2.0
Quantifiers
/No+/
PlusQuantifier
re = /No+/string = “Noooo”re.match(string)
N o
o
/No+/
“Noooo”
N o
o
/No+/
“Noooo”
N o
o
/No+/
“Noooo”
Return Match? Or Keep Looping?
N o
o
/No+/
“Noooo”
Greedy Quantifier
KeepsLooping
Greedy quantifiers match as much as possible
Greedy quantifiers use maximum effort for
maximum return
N o
o
/No+/
“Noooo”
N o
o
/No+/
“Noooo”
N o
o
/No+/
“Noooo”
We Have A Match!
re = /No+/string = “Noooo”re.match(string)=> #<MatchData “Noooo”>
Lazy Quantifiers
Lazy quantifiers match as little as possible
Lazy quantifiers use minimum effort for
minimum return
/No+?/
Makes Quantifier
Lazy
re = /No+?/string = “Noooo”re.match(string)
N o
o“Noooo”
/No+?/
N o
o“Noooo”
/No+?/
N o
o“Noooo”
/No+?/
Return Match? Or Keep Looping?
N o
o“Noooo”
/No+?/
We Have A Match!
re = /No+?/string = “Noooo”re.match(string)=> #<MatchData “No”>
Greedy quantifiers are greedy but reasonable
/.*moon/
StarQuantifier
re = /.*moon/string = “That’s no moon”re.match(string)
. m o o n
./.*moon/
“That’s no moon”
. m o o n
.
“That’s no moon”
/.*moon/
. m o o n
.
“That’s no moon”
Loops
/.*moon/
. m o o n
. Which To Match?
(Fast Forward)
“That’s no moon”
/.*moon/
. m o o n
.
“That’s no moon”
Keeps Looping
/.*moon/
. m o o n
.
“That’s no moon”
Keeps Looping
/.*moon/
. m o o n
.
“That’s no moon”
Keeps Looping
/.*moon/
. m o o n
“That’s no moon”No More
Characters?
./.*moon/
. m o o n
“That’s no moon”
Backtrack or Fail?./.*moon/
. m o o n
“That’s no moon”Backtracks
./.*moon/
. m o o n
“That’s no moon”Backtracks
./.*moon/
. m o o n
“That’s no moon”Backtracks
./.*moon/
. m o o n
“That’s no moon”Backtracks
Huzzah!./.*moon/
. m o o n
“That’s no moon”
./.*moon/
. m o o n
“That’s no moon”
./.*moon/
. m o o n
“That’s no moon”
./.*moon/
. m o o n
“That’s no moon”
. We Have A Match!
/.*moon/
re = /.*moon/string = “That’s no moon”re.match(string)=> #<MatchData “That’s no moon”>
Backtracking = Slow
/No+w+/
re = /No+w+/string = “Noooo”re.match(string)
N o
o“Noooo”
/No+w+/
w
w
N o
o“Noooo”
/No+w+/
w
w
N o
o“Noooo”
/No+w+/
w
wLoops
N o
o“Noooo”
/No+w+/
w
wLoops
N o
o“Noooo”
/No+w+/
w
wLoops
N o
o“Noooo”
/No+w+/
w
w
Uh Oh
N o
o“Noooo”
/No+w+/
w
w
Uh Oh
Backtrack or Fail?
N o
o“Noooo”
/No+w+/
w
wBacktracks
N o
o“Noooo”
/No+w+/
w
wBacktracks
N o
o“Noooo”
/No+w+/
w
wBacktracks
N o
o“Noooo”
/No+w+/
w
w
Match FAILS
Possessive Quantifers
Possessive quantifiers do not backtrack
Makes Quantifier Possessive
/No++w+/
N o
o“Noooo”
w
w
/No++w+/
N o
o“Noooo”
w
w
/No++w+/
N o
o“Noooo”
w
wLoops
/No++w+/
N o
o“Noooo”
w
wLoops
/No++w+/
N o
o“Noooo”
w
wLoops
/No++w+/
N o
o“Noooo”
w
w
/No++w+/
N o
o“Noooo”
w
wLoops
Uh Oh
Backtrack or Fail?
/No++w+/
N o
o“Noooo”
w
w
Match FAILS
/No++w+/
Possessive quantifiers fail faster by
controlling backtracking
Tying It All Together
Photo By Keith RamosCreative Commons Attribution 2.0 Generic
snake_case to CamelCase
Find first letter of string and capitalize it
snake_case to CamelCase
Find first letter of string and capitalize it
Find any character that follows an underscore and capitalize it
snake_case to CamelCase
Find first letter of string and capitalize it
Find any character that follows an underscore and capitalize it
Remove underscores
snake_case to CamelCase
Find first letter of string and capitalize it
snake_case to CamelCase
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter.upcase_chars(ʺ″methodʺ″)
result.should == ʺ″Methodʺ″
case_converter_spec.rb
before(:each) do
end@case_converter = CaseConverter.new
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter.upcase_chars(ʺ″methodʺ″)
result.should == ʺ″Methodʺ″
case_converter_spec.rb
before(:each) do
end@case_converter = CaseConverter.new
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter.upcase_chars(ʺ″methodʺ″)
result.should == ʺ″Methodʺ″
case_converter_spec.rb
before(:each) do
end@case_converter = CaseConverter.new
/ /^
Anchors Match To
Beginning Of String
/ /\ w^
Matches Any Word
Character
case_converter.rb
def upcase_chars(string)
end
re = / /\w^string.gsub(re){|char| char.upcase}
case_converter.rb
def upcase_chars(string)
end
re = / /\w^string.gsub(re){|char| char.upcase}
case_converter.rb
def upcase_chars(string)
end
re = / /\w^string.gsub(re){|char| char.upcase}
Spec Passes!
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter
result.should == ʺ″_Methodʺ″
case_converter_spec.rb
.upcase_chars(ʺ″_methodʺ″)
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter
result.should == ʺ″_Methodʺ″
case_converter_spec.rb
.upcase_chars(ʺ″_methodʺ″)
it ʺ″capitalizes the first letterʺ″ do
end
result = @case_converter
result.should == ʺ″_Methodʺ″
case_converter_spec.rb
.upcase_chars(ʺ″_methodʺ″)
Spec Fails!
Expected: ʺ″_Methodʺ″Got: ʺ″_methodʺ″
Spec Failure:
Problem:Matches Letters AND Underscores
\ w^/ /
/ /[a-z]^
Matches Only
Lowercase Letters
/ /[a-z]^[^a-z]
Matches everything
BUT lowercase letters
/ /[a-z][̂^a-z]?
Makes Character
Class Optional
case_converter.rb
def upcase_chars(string)
end
re = string.gsub(re){|char| char.upcase}
/ /[a-z]^[^a-z]?
case_converter.rb
def upcase_chars(string)
endstring.gsub(re){|char| char.upcase}
Spec Passes!
re = / /[a-z]^[^a-z]?
Find any character that follows an underscore and capitalize it
snake_case to CamelCase
it ʺ″capitalizes letters after an underscoreʺ″ do
end
result = @case_converter
result.should == ʺ″Some_Methodʺ″
case_converter_spec.rb
.upcase_chars(ʺ″some_methodʺ″)
it ʺ″capitalizes letters after an underscoreʺ″ do
end
result = @case_converter
result.should == ʺ″Some_Methodʺ″
case_converter_spec.rb
.upcase_chars(ʺ″some_methodʺ″)
/ /[a-z]^[^a-z]?
Pipe For Alternation
| [a-z]/ /[a-z]^[^a-z]?
Look Behind
(?<=_)| [a-z]/ /[a-z]^[^a-z]?
case_converter.rb
def upcase_chars(string)
end
re = string.gsub(re){|char| char.upcase}
| [a-z](?<=_)/ /[a-z]^[^a-z]?
case_converter.rb
def upcase_chars(string)
end
re = string.gsub(re){|char| char.upcase}
| [a-z](?<=_)/ /[a-z]^[^a-z]?
Spec Passes!
Remove underscores
snake_case to CamelCase
it ʺ″removes underscoresʺ″ do
end
result = @case_converter
result.should == ʺ″somemethodʺ″
case_converter_spec.rb
.rmv_underscores(ʺ″some_methodʺ″)
it ʺ″removes underscoresʺ″ do
end
result = @case_converter
result.should == ʺ″somemethodʺ″
case_converter_spec.rb
.rmv_underscores(ʺ″some_methodʺ″)
it ʺ″removes underscoresʺ″ do
end
result = @case_converter
result.should == ʺ″somemethodʺ″
case_converter_spec.rb
.rmv_underscores(ʺ″some_methodʺ″)
MatchesAn
Underscore
/ /_
case_converter.rb
def rmv_underscores(string)
end
re = string.gsub(re, “”)
/ /_
case_converter.rb
def rmv_underscores(string)
endstring.gsub(re, “”)re = / /_
case_converter.rb
def rmv_underscores(string)
endstring.gsub(re, “”)
Spec Passes!
re = / /_
Combine results of two methods
snake_case to CamelCase
it ʺ″converts snake_case to CamelCaseʺ″ do
end
result = @case_converter
result.should == ʺ″SomeMethodʺ″
case_converter_spec.rb
.snake_to_camel(ʺ″some_methodʺ″)
it ʺ″converts snake_case to CamelCaseʺ″ do
end
result = @case_converter
result.should == ʺ″SomeMethodʺ″
case_converter_spec.rb
.snake_to_camel(ʺ″some_methodʺ″)
it ʺ″converts snake_case to CamelCaseʺ″ do
end
result = @case_converter
result.should == ʺ″SomeMethodʺ″
case_converter_spec.rb
.snake_to_camel(ʺ″some_methodʺ″)
case_converter.rb
def snake_to_camel(string)
endupcase_chars(string)
case_converter.rb
def snake_to_camel(string)
endupcase_chars(string)rmv_underscores( )
case_converter.rb
def snake_to_camel(string)
endupcase_chars(string)rmv_underscores( )
Spec Passes!
Code is available here:https://github.com/nellshamrell/snake_to_camel_case
Conclusion
Photo By Steve JurvetsonCreative Commons Attribution Generic 2.0
Develop regular expressions in small pieces
If you write code, you can write regular expressions
Move beyond the fear
Photo By Leonardo PallottaCreative Commons Attribution Generic 2.0
Nell ShamrellSoftware Development Engineer
Blue Box Inc
@nellshamrell
https://gist.github.com/nellshamrell/6031738
Resources:
top related