dr. philip cannata 1 lexical and syntactic analysis chomsky grammar hierarchy lexical analysis –...

36
Dr. Philip Cannata Lexical and Syntactic Analysis • Chomsky Grammar Hierarchy • Lexical Analysis Tokenizing • Syntactic Analysis – Parsing • Hmm Concrete Syntax • Hmm Abstract Syntax Programming Languages Noam Chomsky

Upload: lizbeth-park

Post on 04-Jan-2016

238 views

Category:

Documents


2 download

TRANSCRIPT

Dr. Philip Cannata 1

Lexical and Syntactic Analysis

• Chomsky Grammar Hierarchy

• Lexical Analysis – Tokenizing

• Syntactic Analysis – Parsing

• Hmm Concrete Syntax

• Hmm Abstract Syntax

Programming Languages

Noam Chomsky

Dr. Philip Cannata 2

• Regular grammar – used for tokenizing

• Context-free grammar (BNF) – used for parsing

• Context-sensitive grammar – not really used for programming languages

Chomsky Hierarchy

Dr. Philip Cannata 3

• Simplest; least powerful

• Equivalent to:– Regular expression (think of perl)– Finite-state automaton

• Right regular grammar: Terminal*,

A and B Nonterminal

A → B

A →

• Example:Integer → 0 Integer | 1 Integer | ... | 9 Integer |

0 | 1 | ... | 9

Regular Grammar

Dr. Philip Cannata 4

• Less powerful than context-free grammars

• The following is not a regular language

{ aⁿ bⁿ | n ≥ 1 }

i.e., cannot balance: ( ), { }, begin end

Regular Grammar

Dr. Philip Cannata 5

Regular Expressions

x a character x \x an escaped character, e.g., \n{ name } a reference to a nameM | N M or NM N M followed by NM* zero or more occurrences of MM+ One or more occurrences of MM? Zero or one occurrence of M[aeiou] the set of vowels[0-9] the set of digits. any single character

Dr. Philip Cannata 6

Regular Expressions

Dr. Philip Cannata 7

Regular Expressions

Dr. Philip Cannata 8

(S, a2i$) ├ (I, 2i$)

├ (I, i$)

├ (I, $)

├ (F, )

Thus: (S, a2i$) ├* (F, )

Finite State Automaton for Identifiers

Dr. Philip Cannata 9

Deterministic Finite State Automaton Examples

Dr. Philip Cannata 10

Production:

α → β

α Nonterminal

β (Nonterminal Terminal)*

ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty).

Context-Free Grammar

Dr. Philip Cannata 11

Production:

α → β |α| ≤ |β|

α, β (Nonterminal Terminal)*

ie, lefthand side can be composed of strings of terminals and nonterminals

Context-Sensitive Grammar

Dr. Philip Cannata 12

• The syntax of a programming language is a precise description of all its grammatically correct programs.

• Precise syntax was first used with Algol 60, and has been

used ever since.

• Three levels:– Lexical syntax - all the basic symbols of the language

(names, values, operators, etc.)– Concrete syntax - rules for writing expressions,

statements and programs.– Abstract syntax - internal representation of the program,

favoring content over form.

Syntax

Dr. Philip Cannata 13

GrammarsGrammars: Metalanguages used to define the concrete syntax of a language.

Backus Normal Form – Backus Naur Form (BNF)• Stylized version of a context-free grammar (cf. Chomsky hierarchy)

• First used to define syntax of Algol 60

• Now used to define syntax of most major languages Production: α → β α Nonterminal β (Nonterminal Terminal)*ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty).

• ExampleInteger Digit | Integer Digit

Digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Dr. Philip Cannata 14

Extended BNF (EBNF)

Additional metacharacters{ } a series of zero or more( ) must pick one from a list[ ] pick none or one from a list

ExampleExpression -> Term { ( + | - ) Term }IfStatement -> if ( Expression ) Statement [ else Statement ]

EBNF is no more powerful than BNF, but its production rules are often simpler and clearer.

Javacc EBNF

( … )* a series of zero or more( … )+ a series of one or more[ … ] optional

Dr. Philip Cannata 15

For more details, see Chapter 2 of“Programming Language Pragmatics, Third Edition (Paperback)”Michael L. Scott (Author)

Dr. Philip Cannata 16

Internal Parse Tree

Abstract Syntax

int main ()

{

return 0 ;

}

Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0

Instance of a Programming Language:

Dr. Philip Cannata 17

Now we’ll focus on the internal parse tree

Dr. Philip Cannata 18

Parse Trees

Integer Digit | Integer DigitDigit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Parse Tree for 352 as an Integer

Dr. Philip Cannata 19

Arithmetic Expression Grammar

Expr Expr + Term | Expr – Term | TermTerm 0 | ... | 9 | ( Expr )

Parse of 5 - 4 + 3

Dr. Philip Cannata 20

• A grammar can be used to define associativity and precedence among the operators in an expression.

E.g., + and - are left-associative operators in mathematics;

* and / have higher precedence than + and - .

• Consider the following grammar:Expr -> Expr + Term | Expr – Term | Term

Term -> Term * Factor | Term / Factor | Term % Factor | Factor

Factor -> Primary ** Factor | Primary

Primary -> 0 | ... | 9 | ( Expr )

Associativity and Precedence

Dr. Philip Cannata 21

Associativity and Precedence

Parse of 4**2**3 + 5 * 6 + 7

Dr. Philip Cannata 22

Precedence Associativity Operators3 right **2 left * / %1 left + -

Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level.

Associativity and Precedence

Dr. Philip Cannata 23

• A grammar is ambiguous if one of its strings has two or more diffferent parse trees.

• Example:Expr -> Expr Op Expr | ( Expr ) | IntegerOp -> + | - | * | / | % | **

• Equivalent to previous grammar but ambiguous

Ambiguous Grammars

Dr. Philip Cannata 24

Ambiguous Parse of 5 – 4 + 3

Ambiguous Grammars

Dr. Philip Cannata 25

Dangling Else Ambiguous Grammars

IfStatement -> if ( Expression ) Statement |

if ( Expression ) Statement else Statement

Statement -> Assignment | IfStatement | Block

Block -> { Statements }

Statements -> Statements Statement | Statement

With which ‘if’ does the following ‘else’ associate

if (x < 0)if (y < 0) y = y - 1;else y = 0;

Dr. Philip Cannata 26

Dangling Else Ambiguous Grammars

Dr. Philip Cannata 27

Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject}

Function : ( ) Block

MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier Function } }

MyObject: Identifier Identifier = create Identifier callArgs

Constructor: Identifier ([{ Parameter } ]) block

Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] }

Type : int|bool| float | list |tuple| object | string | void

Statements : { Statement }

Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|ReturnStatement

Block : { Statements }

ForEach: for( Expression <- Expression ) Block

Assignment : Identifier [ [ Expression ] ]= Expression ;

Parameter : Type Identifier

IfStatement: if ( Expression ) Block [elseifStatement| Block ]

WhileStatement: while ( Expression ) Block

Hmm BNF (i.e., Concrete Syntax)

Dr. Philip Cannata 28

Expression : Conjunction {|| Conjunction }

Conjunction : Equality {&&Equality }

Equality : Relation [EquOp Relation ]

EquOp: == | !=

Relation : Addition [RelOp Addition ]

RelOp: <|<= |>|>=

Addition : Term {AddOp Term }

AddOp: + | -

Term : Factor {MulOp Factor }

MulOp: * | / | %

Factor : [UnaryOp]Primary

UnaryOp: - | !

Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension|

ObjFunction

callOrLambda : Identifier callArgs|LambdaDef

callArgs : ([Expression |passFunc { ,Expression |passFunc}] )

passFunc : Identifier (Type Identifier { Type Identifier } )

LambdaDef : (\\ Identifier { ,Identifier } -> Expression)

Hmm BNF (i.e., Concrete Syntax)

Dr. Philip Cannata 29

Hmm BNF (i.e., Concrete Syntax)

IdentifierOrArrayRef : Identifier [ [Expression] ]

subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] )

ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<-

Expression ] } ]

ObjFunction: Identifier . Identifier . Identifier callArgs

Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)}

Literal : Integer | True | False | ClFloat | ClString

Integer : Digit { Digit }

ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9}

ClString: ” {~[“] }”

Dr. Philip Cannata 30

Clite Operator AssociativityUnary - ! none* / left+ - left< <= > >= none== != none&& left|| left

Associativity and Precedence for Hmm

Dr. Philip Cannata 31

Hmm Parse Tree Example

z = x + 2 * y;

Dr. Philip Cannata 32

Now we’ll focus on the Abstract Syntax

Dr. Philip Cannata 33

Hmm Parse Tree

z = x + 2 * y;

=

Dr. Philip Cannata 34

Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata 35

Assignment = Variable target; Expression source

Expression = VariableRef | Value | Binary | Unary

VariableRef = Variable | ArrayRef

Variable = String id

ArrayRef = String id; Expression index

Value = IntValue | BoolValue | FloatValue | CharValue

Binary = Operator op; Expression term1, term2

Unary = UnaryOp op; Expression term

Operator = ArithmeticOp | RelationalOp | BooleanOp

IntValue = Integer intValue…

Very Approximate Hmm Abstract Syntax

Dr. Philip Cannata 36

Binary

BinaryOperator

Operator

Variable

VariableValue

+

2 y*

x

Hmm Abstract Syntax – Binary Example

z = x + 2 * y

=