parboiled explained

46
Parboiled 2 explained

Upload: paul-popoff

Post on 09-Feb-2017

199 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Parboiled explained

Parboiled2 explained

Page 2: Parboiled explained

Covered

Why Parboiled2Library basicsPerfomance optimizationsBest PracticesMigration

Page 3: Parboiled explained

Features PEG No lexer required Flexible typesfe EDSL Compile-time optimizations Decent error reporting scala.js support

Page 4: Parboiled explained

When regex fail

Parse arbitrary HTML with regexes is like asking Paris Hilton to write an operating system (c)

Page 5: Parboiled explained

When regex fail

Page 6: Parboiled explained

Performance (regex)

Parsing

Warmup

620.38

621.95

Parboiled2Regex

Data is taken from here:http://bit.ly/1XHAJaA

Lower is better

Page 7: Parboiled explained

Performance (json)

Parboiled1

Parboiled2

Argonaut

Json4SNative

Json4SJackson

85.64

13.17

7.01

8.06

4.09

Data is taken from here:http://myltsev.name/ScalaDays2014/#/

Lower is better

Page 8: Parboiled explained

Performance (json)

Parser combinators

Parboiled1

Parboiled2

Argonaut

Json4SNative

Json4SJackson

2385.78

85.64

13.17

7.01

8.06

4.09

Data is taken from here:https://groups.google.com/forum/#!topic/parboiled-user/bGtdGvllGgU

Lower is better

Page 9: Parboiled explained

Alternatives

● Grappa [java]● ANTLR● Regexps● Parser-combinators● Language Workbenches (xtext, MPS)

Page 10: Parboiled explained

<dependency>

<groupId>org.parboiled</groupId>

<artifactId>parboiled_2.11</artifactId>

<version>2.1.0</version>

</dependency>

Page 11: Parboiled explained

import org.parboiled2._

class MyParser (val input: ParserInput) extends Parser { // Your grammar}

Page 12: Parboiled explained

Rule DSL

Page 13: Parboiled explained

Basic match

def CaseDoesntMatter = rule { ignoreCase("string")}

def MyCharRule = rule { 'a' }def MyStringRule = rule { "string" }

def MyCharRule = rule { ch('a') }def MyStringRule = rule { str("string") }

Page 14: Parboiled explained

Basic match

def CaseDoesntMatter: Rule0 = rule { ignoreCase("string") }

def MyCharRule: Rule0 = rule {'a'}

def MyStringRule: Rule0 = rule { "string" }

Page 15: Parboiled explained

Syntactic predicates

● ANY – matches any character except EOI● EOI – virtual chararter represents the end of input

val EOI = '\uFFFF'

You must define EOI at the end of the main/root rule

Page 16: Parboiled explained

Syntactic predicates● anyOf – at least one of the defined chars● noneOf – everything except those chars

def Digit = rule { anyOf("1234567890")}

def Visible = rule { noneOf(" \n\t")}

Page 17: Parboiled explained

Character ranges

def Digit = rule { '0' - '9' }def AlphaLower = rule { 'a' - 'z' }

Good, but not flexible(the main issue of parboiled1)

● Sometimes you don't need ANY character

● You have a range of characters

Page 18: Parboiled explained

Character predicatesThere is set of predifined char predicates:

● CharPredicate.All● CharPredicate.Digit● CharPredicate.Digit19● CharPredicate.HexDigit

Of course you can defien your own

Page 19: Parboiled explained

def AllButQuotes = rule {

CharPredicate.Visible -- "\"" -- "'"

}

def ValidIdentifier = rule {

CharPredicate.AlphaNum ++ "_"

}

CharPredicate from (_.isSpaceChar)

Character predicates

Page 20: Parboiled explained

def ArithmeticOperation = rule {

anyOf("+-*/^")

}

def WhiteSpaceChar = rule { noneOf(" \t\n")}

anyOf/noneOf

Page 21: Parboiled explained

def cows = rule { 1000 times "cow" }

def PRI = rule { 1 to 3 times Digit }

N times

Page 22: Parboiled explained

def OptWs = rule { zeroOrMore(Whitespace) // Whitespace.*}

def UInt = rule { oneOrMore(Digit) // Whitespace.+}

def CommaSeparatedNumbers = rule { oneOrMore(UInt).separatedBy(",")}

0+/1+

Page 23: Parboiled explained

import CharPredicate.Digit

// "yyyy-mm-dd"def SimplifiedRuleForDate = rule { Year ~ "-" ~ Month ~ "-" ~ Day}

def Year = rule { Digit ~ Digit ~ Digit ~ Digit}

def Month = rule { Digit ~ Digit }def Day = rule { Digit ~ Digit }

Sequence

Page 24: Parboiled explained

// zeroOrOnedef Newline = rule { optional('\r') ~ '\n'}

def Newline = rule { '\r'.? ~ '\n'}

Optional

Page 25: Parboiled explained

def Signum = rule { '+' | '-' }

def bcd = rule { 'b' ~ 'c' | 'b' ~ 'd'}

Ordered choice

Page 26: Parboiled explained

// why order mattersdef Operator = rule { "+=" | "-=" | "*=" | "++" | "--" | "+" | "-" | "*" | "/" ...}

def Operators = rule { ("+" ~ ("=" | "+").?) | ("-" ~ ("=" | "-").?) | ...}

Order matters

Page 27: Parboiled explained

Running the parserclass MyParser(val input: ParserInput)

extends Parser {

def MyStringRule: Rule0 = rule {

ignoreCase("match") ~ EOI }

}

Page 28: Parboiled explained

Running the parser

val p1 = new MyParser("match")val p2 = new MyParser("much")

p1.MyStringRule.run() // Success

p2.MyStringRule.run() // Failure

Different delivery schemes are also available

Page 29: Parboiled explained

Running the parser

val p1 = new MyParser("match")val p2 = new MyParser("much")

p1.MyStringRule.run() // Success

p2.MyStringRule.run() // Failure

Different delivery schemes are also available

Page 30: Parboiled explained

BKVserver.name = "webserver"server { port = "8080" address = "192.168.88.88"

settings { greeting_message = "Hello!\n It's me!" }}

Page 31: Parboiled explained

Performance

Page 32: Parboiled explained

Unroll n.times for n <=4

// Slowerrule { 4 times Digit }

// Fasterrule { Digit ~ Digit ~ Digit ~ Digit }

Page 33: Parboiled explained

Faster stack operations

// Much fasterdef Digit4 = rule { Digit ~ Digit ~ Digit ~ Digit ~ push( #(charAt(-4))*1000 + #(charAt(-3))*100 + #(charAt(-2))*10 + #(lastChar) )}

Page 34: Parboiled explained

Do not recreate CharPredicate

class MyParser(val input: ParserInput) extends Parser { val Uppercase = CharPredicate.from(_.isUpper)

}

Page 35: Parboiled explained

Use predicatesdef foo = rule { capture(zeroOrMore(noneOf("\n")))}

def foo = rule { capture(zeroOrMore(!'\n')) //loop here}

def foo = rule { capture(zeroOrMore( !'\n' ~ ANY ))}

Page 36: Parboiled explained

Best Practices

Page 37: Parboiled explained

Best Practices

● Unit tests● Small rules● Decomposition● Case objects instead of strings

Page 38: Parboiled explained

Push case objectsdef LogLevel = rule {

capture("info" | "warning" | "error")

}

def LogLevel = rule {

“info” ~ push(LogLevel.Info)

| “warning" ~ push(LogLevel.Warning)

| “error" ~ push(LogLevel.Error)

}

Page 39: Parboiled explained

Simple syntax for object capture

case class Text(s: String)

def charsAST: Rule1[AST] = rule {

capture(Chars) ~> ((s: String) => Text(s))

}

def charsAST = rule {

capture(Chars) ~> Text

}

Page 40: Parboiled explained

Named rulesdef Header: Rule1[Header] =

rule("I am header") { ... }

def Header: Rule1[Header] = namedRule("header") {...}

def UserName = rule {

Prefix ~ oneOrMore(NameChar).named("username")

}

Page 41: Parboiled explained

Migration

Page 42: Parboiled explained

Migration

● Separate classpath org.parboiled vs org.parboiled2

● Grammar is hard to break● Compotition: trait → abstract class● Removing primitives library

Page 43: Parboiled explained

Drawbacks

Page 44: Parboiled explained

Drawbacks

● PEG (absence of lexer)● No support for left recursive grammars● No error recovery mechanism● No IDE support● No support for indentation based grammars● Awful non informative error messages

Page 45: Parboiled explained
Page 46: Parboiled explained

Q/A