lex(1) and flex(1). lex public interface file *yyin; /* set before calling yylex() */ int yylex();...

13
lex(1) and flex(1)

Upload: terence-robinson

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

lex(1) and flex(1)

Page 2: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Lex public interface

• FILE *yyin; /* set before calling yylex() */• int yylex(); /* call once per token */• char yytext[]; /* chars matched by yylex()

*/• int yywrap(); /* end-of-file handler */

Page 3: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

.l file format

header

%%body

%%helper functions

Page 4: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Lex header

• C code inside %{ … %}– prototypes for helper functions– #include’s that #define integer token categories

• Macro definitions, e.g.letter [a-zA-Z]digit [0-9]ident {letter}({letter}|{digit})*

• Warning: macros are fraught with peril

Page 5: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Lex body

• Regular expressions with semantic actions“ “ { /* discard */ }{ident} { return IDENT; }“*” { return ASTERISK; }“.” { return PERIOD; }• Match the longest r.e. possible• Break ties with whichever appears first• If it fails to match: copy unmatched to stdout

Page 6: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Lex helper functions

• Follows rules of ordinary C code• Compute lexical attributes• Do stuff the regular expressions can’t do• Write a yywrap() to switch files on EOF

Page 7: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Lex regular expressions

• \c escapes for most operators• “s” match C string as-is (superescape)• r{m,n} match r between m and n times• r/s match r when s follows• ^r match r when at beginning of line• r$ match r when at end of line

Page 8: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

struct token

struct token { int category; char *text; int linenumber; int column; char *filename; union literal value;}

Page 9: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

“string removal tool”

%%“zap me”

Page 10: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

whitespace trimmer

%%[ \t]+ putchar(‘ ‘);[ \t]+ /* drop entirely */

Page 11: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

string replacement

%%username printf(“%s”, getlogin() );

Page 12: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Line/word counter

int lines=0, chars=0;%%\n++lines; ++chars;. ++chars;%%main() { yylex(); printf(“lines: %d chars: %d\n”, lines, chars);}

Page 13: Lex(1) and flex(1). Lex public interface FILE *yyin; /* set before calling yylex() */ int yylex(); /* call once per token */ char yytext[];/* chars matched

Example: C reals

• Is it: [0-9]*.[0-9]*• Is it: ([0-9]+.[0-9]* | [0-9]*.[0-9]+)