ll parser generator assignment

4
CS440 Programming Languages -Project 1 Page 1 of 4 Scheme: LL Parser Generator In our unit on syntax analysis we’ve learned how LL(1) PREDICT sets are constructed from FIRST and FOLLOW sets. In the current project you will build, in a purely functional subset of Scheme, a parser generator that implements these constructions. To get you started, I’m providing a 300-line skeleton file . You will want to study the code in this file carefully. The key function you are to implement is the following: (define parse-table (lambda (grammar) ;;; your code here; my version is about 15 lines long, ;;; (but it calls other functions described below) )) The input grammar must consist of a list of lists, one per non-terminal in the grammar . The first element of each sub-list should be the non-terminal ; the remaining elements should be the right-hand sides of the productions for which that non-terminal is the left-hand side. The sub- list for the start symbol must come first. Every grammar symbol must be represented as a quoted string. As an example, here is our familiar LL(1) calculator grammar in the required format: (define calc-gram '(("P" ("SL" "$$")) ("SL" ("S" "SL") ()) ("S" ("id" ":=" "E") ("read" "id") ("write" "E")) ("E" ("T" "TT")) ("T" ("F" "FT")) ("TT" ("ao" "T" "TT") ()) ("FT" ("mo" "F" "FT") ()) ("ao" ("+") ("-")) ("mo" ("*") ("/")) ("F" ("id") ("num") ("(" "E" ")")) )) The parse table , as returned by function parse-table, must have the same format, except that every right-hand side is replaced by a pair (a 2-element list) whose first element is the predict set for the corresponding production, and whose second element is the right-hand side . If you type (parse-table calc-gram) the Scheme interpreter should respond (("P" (("$$" "id" "read" "write") ("SL" "$$"))) ("SL" (("id" "read" "write") ("S" "SL")) (("$$") ())) ("S" (("id") ("id" ":=" "E")) (("read") ("read" "id")) (("write") ("write" "E"))) ("E" (("(" "id" "num") ("T" "TT"))) ("T" (("(" "id" "num") ("F" "FT"))) ("TT" (("+" "-") ("ao" "T" "TT")) (("$$" ")" "id" "read" "write") ()))

Upload: tgldr0511

Post on 05-Jan-2016

231 views

Category:

Documents


0 download

DESCRIPTION

LL Parser Generator Assignment

TRANSCRIPT

Page 1: LL Parser Generator Assignment

CS440 Programming Languages -Project 1

Page 1 of 4

Scheme: LL Parser GeneratorIn our unit on syntax analysis we’ve learned how LL(1) PREDICT sets are constructedfrom FIRST and FOLLOW sets. In the current project you will build, in a purely functionalsubset of Scheme, a parser generator that implements these constructions.

To get you started, I’m providing a 300-line skeleton file. You will want to study the code inthis file carefully.

The key function you are to implement is the following:

(define parse-table(lambda (grammar)

;;; your code here; my version is about 15 lines long,;;; (but it calls other functions described below)

))The input grammar must consist of a list of lists, one per non-terminal in the grammar. The firstelement of each sub-list should be the non-terminal; the remaining elements should be theright-hand sides of the productions for which that non-terminal is the left-hand side. The sub-list for the start symbol must come first. Every grammar symbol must be represented as aquoted string. As an example, here is our familiar LL(1) calculator grammar in the requiredformat:

(define calc-gram'(("P" ("SL" "$$"))("SL" ("S" "SL") ())("S" ("id" ":=" "E") ("read" "id") ("write" "E"))("E" ("T" "TT"))("T" ("F" "FT"))("TT" ("ao" "T" "TT") ())("FT" ("mo" "F" "FT") ())("ao" ("+") ("-"))("mo" ("*") ("/"))("F" ("id") ("num") ("(" "E" ")"))))

The parse table, as returned by function parse-table, must have the same format, except thatevery right-hand side is replaced by a pair (a 2-element list) whose first element is the predictset for the corresponding production, and whose second element is the right-hand side. If youtype

(parse-table calc-gram)

the Scheme interpreter should respond(("P" (("$$" "id" "read" "write") ("SL" "$$")))("SL" (("id" "read" "write") ("S" "SL")) (("$$") ()))("S"(("id") ("id" ":=" "E"))(("read") ("read" "id"))(("write") ("write" "E")))("E" (("(" "id" "num") ("T" "TT")))("T" (("(" "id" "num") ("F" "FT")))("TT" (("+" "-") ("ao" "T" "TT")) (("$$" ")" "id" "read" "write") ()))

Page 2: LL Parser Generator Assignment

CS440 Programming Languages -Project 1

Page 2 of 4

("FT" (("*" "/") ("mo" "F" "FT")) (("$$" ")" "+" "-" "id" "read""write") ()))("ao" (("+") ("+")) (("-") ("-")))("mo" (("*") ("*")) (("/") ("/")))("F" (("id") ("id")) (("num") ("num")) (("(") ("(" "E" ")"))))

A parse function is provided that accepts a grammar and an input string as arguments. It callsthe parse-table function and then uses it to parse the input, printing a trace of its actions asit does so, in a manner reminiscent of the –Dparse output from the PL/0 compiler. You canuse this function to test your code.

A possible implementation strategy

There are many ways to implement parse-table. Feel free to choose whatever strategy appealsto you. If you’re not sure where to start, there is a skeleton of a few routines that may providesome guidance. These don’t necessarily embody the best strategy.This code uses two main data structures: a “right context” structure and a “knowledge”structure. A right-context function is provided to generate the former for any givensymbol B. The function returns a list of pairs. Each pair consists of a symbol A and a list ofsymbols β such that for some α, A → α B β. As an example, if you type

(right-context "SL" calc-gram)

the Scheme interpreter should respond

(("P" ("$$")) ("SL" ()))

This tells us that SL appears on the right-hand of two productions in the grammar: one with P onthe left-hand side and one with SL on the left-hand side. In the former, the portion of the right-hand side after the SL is $$. In the latter, the portion of the right-hand side after the SL is empty(that is, SL is the last thing on the right-hand side). In a similar vein, if you type

(right-context "mo" calc-gram)

the Scheme interpreter should respond

(("FT" ("F" "FT")))

This tells us there is only one production with a mo on the right-hand side. It has FT on the left-hand side, and F FT after the mo on the right-hand side.

The right-context information is useful for constructing FOLLOW sets.

Assuming you use the suggested strategy, you will need to compute the “knowledge” structurerecursively. This structure consists of a list of 4-element sub-lists, one per non-terminal. Eachsub-list contains (1) the non-terminal itself (call it A), (2) a Boolean indicating whether wecurrently think that A →* ε, (3) our current estimate of FIRST(A) − {ε}, and (4) our currentestimate of FOLLOW(A) − {ε}. It is much easier in to keep track of ε separately, rather than toinclude it in the FIRST and FOLLOW sets.

The function to generate the knowledge structure is

(define get-knowledge

Page 3: LL Parser Generator Assignment

CS440 Programming Languages -Project 1

Page 3 of 4

(lambda (grammar);;; your code here; my version is a little under 30 lines

))

If you type

(get-knowledge calc-gram)

the interpreter should respond

(("P" #f ("$$" "id" "read" "write") ())("SL" #t ("id" "read" "write") ("$$"))("S" #f ("id" "read" "write") ("$$" "id" "read" "write"))("E" #f ("(" "id" "num") ("$$" ")" "id" "read" "write"))("T" #f ("(" "id" "num") ("$$" ")" "+" "-" "id" "read" "write"))("TT" #t ("+" "-") ("$$" ")" "id" "read" "write"))("FT" #t ("*" "/") ("$$" ")" "+" "-" "id" "read" "write"))("ao" #f ("+" "-") ("(" "id" "num"))("mo" #f ("*" "/") ("(" "id" "num"))("F" #f ("(" "id" "num") ("$$" ")" "*" "+" "-" "/" "id" "read" "write")))

This tells us, for example, that FT generates epsilon, but F does not, andthat FOLLOW(mo) = {(, id, num}.

As the base of its recursion, get-knowledge uses an initial, empty structure generated byfunction initial-knowledge, which is provided. At each step of the recursion the function makesuse of utility routines that extract information from the current structure:

(define generates-epsilon?(lambda (w knowledge grammar)

;;; your code here; my version is 7 lines long))

(define first(lambda (w knowledge grammar)

;;; your code here; my version is 10 lines long))

(define follow(lambda (A knowledge)

(cadddr (symbol-knowledge A knowledge)))); This is simpler than the other two functions, because it only needs; to work for individual non-terminals, not for lists of symbols.

If you work in pairs on this assignment, one possible division of labor is for one partner towrite generates-epsilon?, first, and parse-table, while the other partner writes get-

knowledge. A better strategy, however, may be to start by having one partner write generates-

epsilon? while the other writes first. Then sit down together and write get-

knowledge and parse-table. This is one of those assignments where two heads may work betterthan one.

Important: you are required to use only the functional features of Scheme; functions with anexclamation point in their names (e.g. set!) and input/output mechanisms other than load andthe regular read-eval-print loop are not allowed. (You may find imperative features useful fordebugging. That’s ok, but get them out of your code before you hand anything in.)

Page 4: LL Parser Generator Assignment

CS440 Programming Languages -Project 1

Page 4 of 4

Extra Credit suggestions

1. Modify your parse-table function to print a helpful error message if the inputgrammar is not LL(1).

2. Modify your parse-table function to print warning messages if thereare useless symbols in the grammar: symbols that can’t appear in any validsentential form (i.e. in any derivation of a string of terminals from the startsymbol).

3. Modify the parse function so that it builds and then displays the parse tree.4. (Hard) Implement syntactic error recovery.

Quiz 2

Before the beginning of the next class, finish the quiz 2 on Moodle by answering thefollowing questions:

1. What does the following code do? Explain your answer.

(apply * (map + '(1 2 3) '(4 5 6) '(7 8 9)))

2. The sort routine in the skeleton file implements a simple version of the classicquicksort algorithm. Which element does this version use as a pivot (the valuearound which to partition the list)?

3. When you open a program in DrScheme/Racket, what color does it use todisplay quoted character strings?