beneath the surface: regular expressions in ruby

Post on 10-Jul-2015

824 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Many of us approach regular expressions with a certain fear and trepidation, using them only when absolutely necessary. We can get by when we need to use them, but we hesitate to dive any deeper into their cryptic world. Ruby has so much more to offer us. This talk showcases the incredible power of Ruby and the Oniguruma regex library Ruby runs on. It takes you on a journey beneath the surface, exploring the beauty, elegance, and power of regular expressions. You will discover the flexible, dynamic, and eloquent ways to harness this beauty and power in your own code.

TRANSCRIPT

Photo By Mr. Christopher ThomasCreative Commons Attribution-ShareALike 2.0 Generic License

Beneath the Surface

Embracing the True Power of Regular Expressions in Ruby

@nellshamrell

^4[0-9]{12}(?:[0-9]{3})?$

Source: regular-expressions.info

We fear what we do not understand

Regular Expressions

+ Ruby

Photo By ShayanCreative Commons Attribution-ShareALike 2.0 Generic License

Regex Matching in Ruby

RubyMethods

Onigmo

Onigmo

Oniguruma

OnigmoFork

Onigmo

Reads Regex

Onigmo

Reads Regex

AbstractSyntax

Tree

ParsesInto

Onigmo

Reads Regex

AbstractSyntax

Tree

Series ofInstructions

ParsesInto

CompilesInto

A Finite State Machine Shows How

Something Works

Annie the Dog

In the House

Out of House

Annie the Dog

In the House

Out of House

Annie the Dog

Door

In the House

Out of House

Annie the Dog

Door

Door

Finite

State

Machine

Finite

State

Machine

Finite

State

Machine

Multiple States

/force/

re = /force/string = “Use the force”re.match(string)

f o r c e

/force/

“Use the force”

Path Doesn’t Match

f o r c e

/force/

“Use the force”

Still Doesn’t Match

f o r c e

/force/

“Use the force”

Path Matches!

(Fast Forward)

f o r c e

/force/

“Use the force”

f o r c e

/force/

“Use the force”

f o r c e

/force/

“Use the force”

f o r c e

/force/

“Use the force”

f o r c e

/force/

“Use the force”

We Have A Match!

re = /force/string = “Use the force”re.match(string)=> #<MatchData “force”>

/Y(olk|oda)/

Pipe

re = /Y(olk|oda)/string = “Yoda”re.match(string)

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Y oo

l k

d a

/Y(olk|oda)/

Which To Choose?

“Yoda”

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Saves To Backtrack

Stack

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Uh Oh, No Match

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”Backtracks To Here

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

Y oo

l k

d a

/Y(olk|oda)/

“Yoda”

We Have A Match!

re = /Y(olk|oda)/string = “Yoda”re.match(string)=> #<MatchData “Yoda”>

/No+/

PlusQuantifier

re = /No+/string = “Noooo”re.match(string)

N o

o

/No+/

“Noooo”

N o

o

/No+/

“Noooo”

N o

o

/No+/

“Noooo”

Return Match? Or Keep Looping?

N o

o

/No+/

“Noooo”

Greedy Quantifier

KeepsLooping

Greedy quantifiers match as much as possible

Greedy quantifiers use maximum effort for

maximum return

N o

o

/No+/

“Noooo”

N o

o

/No+/

“Noooo”

N o

o

/No+/

“Noooo”

We Have A Match!

re = /No+/string = “Noooo”re.match(string)=> #<MatchData “Noooo”>

Lazy Quantifiers

Lazy quantifiers match as little as possible

Lazy quantifiers use minimum effort for

minimum return

/No+?/

Makes Quantifier

Lazy

re = /No+?/string = “Noooo”re.match(string)

N o

o“Noooo”

/No+?/

N o

o“Noooo”

/No+?/

N o

o“Noooo”

/No+?/

Return Match? Or Keep Looping?

N o

o“Noooo”

/No+?/

We Have A Match!

re = /No+?/string = “Noooo”re.match(string)=> #<MatchData “No”>

Greedy quantifiers are greedy but reasonable

/.*moon/

StarQuantifier

re = /.*moon/string = “That’s no moon”re.match(string)

. m o o n

./.*moon/

“That’s no moon”

. m o o n

.

“That’s no moon”

/.*moon/

. m o o n

.

“That’s no moon”

Loops

/.*moon/

. m o o n

. Which To Match?

(Fast Forward)

“That’s no moon”

/.*moon/

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

. m o o n

.

“That’s no moon”

Keeps Looping

/.*moon/

. m o o n

“That’s no moon”No More

Characters?

./.*moon/

. m o o n

“That’s no moon”

Backtrack or Fail?./.*moon/

. m o o n

“That’s no moon”Backtracks

./.*moon/

. m o o n

“That’s no moon”Backtracks

./.*moon/

. m o o n

“That’s no moon”Backtracks

./.*moon/

. m o o n

“That’s no moon”Backtracks

Huzzah!./.*moon/

. m o o n

“That’s no moon”

./.*moon/

. m o o n

“That’s no moon”

./.*moon/

. m o o n

“That’s no moon”

./.*moon/

. m o o n

“That’s no moon”

. We Have A Match!

/.*moon/

re = /.*moon/string = “That’s no moon”re.match(string)=> #<MatchData “That’s no moon”>

Backtracking = Slow

/No+w+/

re = /No+w+/string = “Noooo”re.match(string)

N o

o“Noooo”

/No+w+/

w

w

N o

o“Noooo”

/No+w+/

w

w

N o

o“Noooo”

/No+w+/

w

wLoops

N o

o“Noooo”

/No+w+/

w

wLoops

N o

o“Noooo”

/No+w+/

w

wLoops

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

N o

o“Noooo”

/No+w+/

w

w

Uh Oh

Backtrack or Fail?

N o

o“Noooo”

/No+w+/

w

wBacktracks

N o

o“Noooo”

/No+w+/

w

wBacktracks

N o

o“Noooo”

/No+w+/

w

wBacktracks

N o

o“Noooo”

/No+w+/

w

w

Match FAILS

Possessive Quantifers

Possessive quantifiers do not backtrack

Makes Quantifier Possessive

/No++w+/

N o

o“Noooo”

w

w

/No++w+/

N o

o“Noooo”

w

w

/No++w+/

N o

o“Noooo”

w

wLoops

/No++w+/

N o

o“Noooo”

w

wLoops

/No++w+/

N o

o“Noooo”

w

wLoops

/No++w+/

N o

o“Noooo”

w

w

/No++w+/

N o

o“Noooo”

w

wLoops

Uh Oh

Backtrack or Fail?

/No++w+/

N o

o“Noooo”

w

w

Match FAILS

/No++w+/

Possessive quantifiers fail faster by

controlling backtracking

snake_case to CamelCase

Find first letter of string and capitalize it

snake_case to CamelCase

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

Find first letter of string and capitalize it

Find any character that follows an underscore and capitalize it

Remove underscores

snake_case to CamelCase

Find first letter of string and capitalize it

snake_case to CamelCase

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter.upcase_chars(ʺ″methodʺ″)

result.should == ʺ″Methodʺ″

case_converter_spec.rb

before(:each) do

end@case_converter = CaseConverter.new

/ /^

Anchors Match To

Beginning Of String

/ /\ w^

Matches Any Word

Character

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

case_converter.rb

def upcase_chars(string)

end

re = / /\w^string.gsub(re){|char| char.upcase}

Spec Passes!

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

it ʺ″capitalizes the first letterʺ″ do

end

result = @case_converter

result.should == ʺ″_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″_methodʺ″)

Spec Fails!

Expected: ʺ″_Methodʺ″Got: ʺ″_methodʺ″

Spec Failure:

Problem:Matches Letters AND Underscores

\ w^/ /

/ /[a-z]^

Matches Only

Lowercase Letters

/ /[a-z]^[^a-z]

Matches everything

BUT lowercase letters

/ /[a-z][̂^a-z]?

Makes Character

Class Optional

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

/ /[a-z]^[^a-z]?

case_converter.rb

def upcase_chars(string)

endstring.gsub(re){|char| char.upcase}

Spec Passes!

re = / /[a-z]^[^a-z]?

Find any character that follows an underscore and capitalize it

snake_case to CamelCase

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

it ʺ″capitalizes letters after an underscoreʺ″ do

end

result = @case_converter

result.should == ʺ″Some_Methodʺ″

case_converter_spec.rb

.upcase_chars(ʺ″some_methodʺ″)

/ /[a-z]^[^a-z]?

Pipe For Alternation

| [a-z]/ /[a-z]^[^a-z]?

Look Behind

(?<=_)| [a-z]/ /[a-z]^[^a-z]?

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

| [a-z](?<=_)/ /[a-z]^[^a-z]?

case_converter.rb

def upcase_chars(string)

end

re = string.gsub(re){|char| char.upcase}

| [a-z](?<=_)/ /[a-z]^[^a-z]?

Spec Passes!

Remove underscores

snake_case to CamelCase

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

it ʺ″removes underscoresʺ″ do

end

result = @case_converter

result.should == ʺ″somemethodʺ″

case_converter_spec.rb

.rmv_underscores(ʺ″some_methodʺ″)

MatchesAn

Underscore

/ /_

case_converter.rb

def rmv_underscores(string)

end

re = string.gsub(re, “”)

/ /_

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)re = / /_

case_converter.rb

def rmv_underscores(string)

endstring.gsub(re, “”)

Spec Passes!

re = / /_

Combine results of two methods

snake_case to CamelCase

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

it ʺ″converts snake_case to CamelCaseʺ″ do

end

result = @case_converter

result.should == ʺ″SomeMethodʺ″

case_converter_spec.rb

.snake_to_camel(ʺ″some_methodʺ″)

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

case_converter.rb

def snake_to_camel(string)

endupcase_chars(string)rmv_underscores( )

Spec Passes!

Develop regular expressions in small pieces

If you write code, you can write regular expressions

Move beyond the fear

top related