regular grammars non-terminals (arbitrary names) terminals (characters)

20
8 January 2004 Department of Software & Media Technology 1 Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal Regular grammars cannot count: cannot express size limits on identifiers, literals Cannot express proper nesting (parentheses) Scanning, or Lexical Analysis Scanning, or Lexical Analysis.

Upload: kristen-dotson

Post on 30-Dec-2015

24 views

Category:

Documents


1 download

DESCRIPTION

Scanning, or Lexical Analysis. Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 1

Regular Grammars– Non-terminals (arbitrary names)– Terminals (characters)– Productions limited to the following:

• Non-terminal ::= terminal• Non-terminal ::= terminal Non-terminal• Treat character class (e.g. digit) as terminal

– Regular grammars cannot count: cannot express size limits on identifiers, literals

– Cannot express proper nesting (parentheses)

Scanning, or Lexical AnalysisScanning, or Lexical Analysis.

Page 2: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 2

Regular GrammarsRegular Grammars

grammar for real literals with no exponent• digit :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• REALVAL ::= digit REALVAL1 • REALVAL1 ::= digit REALVAL1 (arbitrary size)• REALVAL1 ::= . INTEGERVAL • INTEGERVAL ::= digit INTEGERVAL (arbitrary size)• INTEGERVAL ::= digit

• – Start symbol is ?

Page 3: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 3

Regular ExpressionsRegular Expressions

RE are defined by an alphabet (terminal symbols) and three operations:– Alternation RE1 | RE2

– Concatenation RE1 RE2

– Repetition RE* (zero or more RE’s)

Language of RE’s = regular grammars– Regular expressions are more convenient for some

applications

Page 4: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 4

Finite State Machines or Finite Automata Finite State Machines or Finite Automata (FSM or FA)(FSM or FA)

A language defined by a grammar is a (possibly infinite) set of strings

An automaton is a computation that determines whether a given string belongs to a specified language

A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions)

Simplest automaton: memory is single number (state)

Page 5: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 5

Specifying an Finite State Machine (FA)Specifying an Finite State Machine (FA)

A set of labeled states, directed arcs between states labeled with character

One or more states may be terminal (accepting) Start is a distinguished state Automaton makes transition from state S1 to S2

– If and only if arc from S1 to S2 is labeled with next character in input Token is legal if automaton stops on terminal state

Page 6: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 6

FA from GrammarFA from Grammar

One state for each non-terminal A rule of the form

– Nt1 ::= terminal, generates transition from a state to final state

A rule of the form– Nt1 ::= terminal Nt2– Generates transition from state 1 to state 2 on an arc

labeled by the terminal

Page 7: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 7

Graphic representation of FAGraphic representation of FA

S

digitdigit

letterletter lette

r

digitdigit

underscore

identifier

Page 8: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 8

FA from REFA from RE

Each RE corresponds to a grammar For all REs

– A natural translation to FSM exists– Alternation often leads to non-deterministic machines

Page 9: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 9

Deterministic Finite Automata (DFA)Deterministic Finite Automata (DFA)

For all states S– For all characters C

• There is at most one arc from any state S that is labeled with C

Easier to implement No backtracking

Conventions for DFA: Error transitions are not explicitly shown Input symbols that result in the same transition are grouped together (this set can even be

given a name) Still not displayed: stopping conditions and actions

Page 10: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 10

Non-Deterministic Finite Automata (NFA)Non-Deterministic Finite Automata (NFA)

A non-deterministic FA– Has at least one state

• With two arcs to two distinct states

• Labeled with the same character

– Example: from start state, a digit can begin an integer literal or a real literal

– Implementation requires backtracking

Page 11: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 11

Lookahead & Backtracking in NFALookahead & Backtracking in NFA

letter start in_id

letter

[other] return id

finish

digit

Page 12: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 12

Implementation of FAImplementation of FA

letter start in_id

letter

[other] return id

finish

digit

Page 13: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 13

From RE to DFA & RE to NFAFrom RE to DFA & RE to NFA

letter start in_id

letter

[other] return id

finish

digit

Page 14: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 14

NFA to DFANFA to DFA

There is an algorithm for converting a non-deterministic machine to a deterministic one

Result may have exponentially more states– Intuitively: need new states to express uncertainty

about token: int or real

Other algorithms for minimizing number of states of FSM, for showing equivalence, etc.

Page 15: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 15

Example DFAExample DFA

a start accept

b

a or b or c

error

b a

c c

Page 16: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 16

Another view of the same DFAAnother view of the same DFA

a start accept

b|c

a|b|c

error

b|c a

Page 17: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 17

Yet another view of the same DFAYet another view of the same DFA

a start accept

b|c

Page 18: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 18

State Minimization in DFAState Minimization in DFA

a start accept

b|c

Page 19: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 19

TINY DFA:TINY DFA:

START

INNUM

DONE

INASSIGN

INCOMMENT

digit

digit

[other] letter

: =

letter [other]

other { }

other

white space

[other]

INID

Page 20: Regular Grammars Non-terminals (arbitrary names) Terminals (characters)

8 January 2004 Department of Software & Media Technology 20

Lex for ScannerLex for Scanner

– Lex Conventions for RE– Format of a Lex Input File