9/10/2015 1 text processing with boost or, "anything you can do, i can do better"

Download 9/10/2015 1 Text Processing With Boost Or,

Post on 27-Dec-2015

216 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • 9/10/2015 1 Text Processing With Boost Or, "Anything you can do, I can do better"
  • Slide 2
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 2 Talk Overview Goal: Become productive C++ string manipulators with the help of Boost. 1. The Simple Stuff Boost.Lexical_cast Boost.String_algo 2. The Interesting Stuff Boost.Regex Boost.Spirit Boost.Xpressive
  • Slide 3
  • 9/10/2015 3 Part 1: The Simple Stuff Utilities for Ad Hoc Text Manipulation
  • Slide 4 >> int('123') 123 >>> str(123) '123' C++: int i = atoi("123");">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 4 A Legacy of Inadequacy Python: >>> int('123') 123 >>> str(123) '123' C++: int i = atoi("123"); char buff[10]; itoa(123, buff, 10); No error handling! Complicated interface Not actually standard!
  • Slide 5 > i; // OK, i == 789 }">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 5 Stringstream: A Better atoi() { std::stringstream sout; std::string str; sout > str; // OK, str == "123" } { std::stringstream sout; int i; sout > i; // OK, i == 789 }
  • Slide 6
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 6 Boost.Lexical_cast // Approximate implementation... template Target lexical_cast(Source const & arg) { std::stringstream sout; Target result; if(!(sout > result)) throw bad_lexical_cast( typeid(Source), typeid(Target)); return result; } Kevlin Henney
  • Slide 7
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 7 Boost.Lexical_cast int i = lexical_cast ( "123" ); std::string str = lexical_cast ( 789 ); Ugly name Sub-par performance No i18n Clean Interface Error Reporting, Yay! Extensible
  • Slide 8
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 8 Boost.String_algo Extension to std:: algorithms Includes algorithms for: trimming case-conversions find/replace utilities ... and much more! Pavol Droba
  • Slide 9
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 28 Iterator: regex_token_iterator Tokenizes input according to pattern Example: scan HTML for email addresses std::string html = ; regex mailto(" ", regex_constants::icase); sregex_token_iterator begin(html.begin(), html.end(), mailto, 1), end; std::ostream_iterator out(std::cout, "\n"); std::copy(begin, end, out); // write all email addresses to std::cout std::string html = ; regex mailto(" ", regex_constants::icase); sregex_token_iterator begin(html.begin(), html.end(), mailto, 1), end; using namespace boost::lambda; std::for_each(begin, end, std::cout > expr >> ')'; fact = spirit::int_p | group; term = fact >> *(('*' >> fact) | ('/' >> fact)); expr = term >> *(('+' >> term) | ('-' >> term));">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 34 group ::= '(' expr ')' fact ::= integer | group; term ::= fact (('*' fact) | ('/' fact))* expr ::= term (('+' term) | ('-' term))* Infix Calculator Grammar In Boost.Spirit spirit::rule group, fact, term, expr; group = '(' >> expr >> ')'; fact = spirit::int_p | group; term = fact >> *(('*' >> fact) | ('/' >> fact)); expr = term >> *(('+' >> term) | ('-' >> term));
  • Slide 35
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 35 Balanced, Nested Braces In Extended Backus-Naur Form: In Boost.Spirit: braces ::= '{' ( not_brace+ | braces )* '}' not_brace ::= not '{' or '}' rule braces, not_brace; braces = '{' >> *( +not_brace | braces ) >> '}'; not_brace = ~chset_p("{}");
  • Slide 36
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 36 Spirit Parser Primitives SyntaxMeaning ch_p('X')Match literal character 'X' range_p('a','z')Match characters in the range 'a' through 'z' str_p("hello")Match the literal string "hello" chseq_p("ABCD")Like str_p, but ignores whitespace anychar_pMatches any single character chset_p("1234")Matches any of '1', '2', '3', or '4' eol_pMatches end-of-line (CR/LF and combinations) end_pMatches end of input nothing_pMatches nothing, always fails
  • Slide 37
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 37 Spirit Parser Operations SyntaxMeaning x >> yMatch x followed by y x | yMatch x or y ~xMatch any char not x (x is a single-char parser) x - yDifference: match x but not y *xMatch x zero or more times +xMatch x one or more times !xx is optional x[ f ]Semantic action: invoke f when x matches
  • Slide 38 r2 = ch_p('a') >> 'b';// OK! rule r3 = + str_p("hello");// OK!">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 38 Static DSEL Gotcha! Operator overloading woes At least one operand must be a parser! rule r1 = anychar_p >> 'b'; // OK rule r2 = 'a' >> 'b';// Oops! rule r3 = + "hello";// Oops! rule r2 = ch_p('a') >> 'b';// OK! rule r3 = + str_p("hello");// OK!
  • Slide 39 storable = not_brace.copy();">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 39 Storing rules spirit::rule polymorphic rule holder Caveat: spirit::rule can be tricky Assignment operator has unusual semantics. Cannot put them in std containers! spirit::stored_rule for value semantics spirit::rule not_brace = ~chset_p("{}"); spirit::stored_rule storable = not_brace.copy();
  • Slide 40 struct.">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 40 Algorithm: spirit::parse #include using namespace boost; int main() { spirit::rule group, fact, term, expr; group = '(' >> expr >> ')'; fact = spirit::int_p | group; term = fact >> *(('*' >> fact) | ('/' >> fact)); expr = term >> *(('+' >> term) | ('-' >> term)); assert( spirit::parse("2*(3+4)", expr).full ); assert( ! spirit::parse("2*(3+4", expr).full ); } Parse strings as an expr (start symbol = expr ). spirit::parse returns a spirit::parse_info struct.
  • Slide 41 struct.">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 41 Algorithm: spirit::parse spirit::rule group, fact, term, expr; //... expr = term >> *(('+' >> term) | ('-' >> term)); spirit::parse_info info = spirit::parse("1+3", expr); if(info.full) { std::cout
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 42 Algorithm: spirit::parse spirit::rule group, fact, term, expr; //... spirit::parse_info info = spirit::parse("1 + 3", expr, spirit::space_p); if(info.full) { std::cout
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 43 Spirit Grammars Group rules together Decouple tokenizing/lexing/scanning struct calculator : spirit::grammar { template struct definition { spirit::rule group, fact, term, expr; definition(calculator const & self) { //... expr = term >> *(('+' >> term) | ('-' >> term)); } rule const & start() const { return expr; } };
  • Slide 44
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 44 Algorithm: spirit::parse struct calculator : spirit::grammar { //... }; // This works! spirit::parse("1+3", calculator()); // This works, too! spirit::parse("1 +3", calculator(), spirit::space_p); Infix Calculator, revolutions
  • Slide 45 > (*alpha_p)[&write] >> '}'); Match alphabetic characters, call write() with range of characters that matched.">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 45 Semantic Actions Action to take when part of your grammar succeeds void write(char const *begin, char const *end) { std::cout.write(begin, end begin); } // This prints "hi" to std::cout spirit::parse("{hi}", '{' >> (*alpha_p)[&write] >> '}'); Match alphabetic characters, call write() with range of characters that matched.
  • Slide 46 > int_p[&write] >> ')'); int_p "returns" an int. using namespace lambda; // This prints "42" to std::cout spirit::parse("(42)", '(' >> int_p[cout > ')'); We can use a Boost.Lambda expression as a semantic action!">
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 46 Semantic Actions A few parsers process input first void write(int d) { std::cout > int_p[&write] >> ')'); int_p "returns" an int. using namespace lambda; // This prints "42" to std::cout spirit::parse("(42)", '(' >> int_p[cout > ')'); We can use a Boost.Lambda expression as a semantic action!
  • Slide 47
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 47 Closures Data associated with a rule. struct calc_closure : spirit::closure { member1 val; }; rule expr; A calc_closure stack frame contains a double named val. When parsing rule expr, its calc_closure stack frame is accessed as members of expr.
  • Slide 48
  • copyright 2006 David Abrahams, Eric Niebler 9/10/2015 48 Closures and Phoenix Phoenix: the next version of Lambda struct calculator : spirit::grammar { template struct definition { spirit::rule group, fact, term, expr; spirit::rule top; definition(calculator const & self) { using namespace phoenix; top = expr[ self.val = arg1 ]; //... expr = term[ expr.val = arg1 ] >> *(('+' >> term[ expr.val += arg1 ]) | ('-' >> term[ expr.val -= arg1 ])); } spirit::rule const & start() const { return top; } }; struct calculator : spirit::grammar { template struct definition { spirit::rule group, fact, term, expr; spirit::rule top; definition(calculator const & self) { using namespace phoenix; top = expr[ self.val = arg1 ]; //... expr = term[ expr.val = arg1 ] >> *(('+