regular grammars non-terminals (arbitrary names) terminals (characters)
DESCRIPTION
Scanning, or Lexical Analysis. Regular Grammars Non-terminals (arbitrary names) Terminals (characters) Productions limited to the following: Non-terminal ::= terminal Non-terminal ::= terminal Non-terminal Treat character class (e.g. digit) as terminal - PowerPoint PPT PresentationTRANSCRIPT
8 January 2004 Department of Software & Media Technology 1
Regular Grammars– Non-terminals (arbitrary names)– Terminals (characters)– Productions limited to the following:
• Non-terminal ::= terminal• Non-terminal ::= terminal Non-terminal• Treat character class (e.g. digit) as terminal
– Regular grammars cannot count: cannot express size limits on identifiers, literals
– Cannot express proper nesting (parentheses)
Scanning, or Lexical AnalysisScanning, or Lexical Analysis.
8 January 2004 Department of Software & Media Technology 2
Regular GrammarsRegular Grammars
grammar for real literals with no exponent• digit :: = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9• REALVAL ::= digit REALVAL1 • REALVAL1 ::= digit REALVAL1 (arbitrary size)• REALVAL1 ::= . INTEGERVAL • INTEGERVAL ::= digit INTEGERVAL (arbitrary size)• INTEGERVAL ::= digit
• – Start symbol is ?
8 January 2004 Department of Software & Media Technology 3
Regular ExpressionsRegular Expressions
RE are defined by an alphabet (terminal symbols) and three operations:– Alternation RE1 | RE2
– Concatenation RE1 RE2
– Repetition RE* (zero or more RE’s)
Language of RE’s = regular grammars– Regular expressions are more convenient for some
applications
8 January 2004 Department of Software & Media Technology 4
Finite State Machines or Finite Automata Finite State Machines or Finite Automata (FSM or FA)(FSM or FA)
A language defined by a grammar is a (possibly infinite) set of strings
An automaton is a computation that determines whether a given string belongs to a specified language
A finite state machine (FSM) is an automaton that recognize regular languages (regular expressions)
Simplest automaton: memory is single number (state)
8 January 2004 Department of Software & Media Technology 5
Specifying an Finite State Machine (FA)Specifying an Finite State Machine (FA)
A set of labeled states, directed arcs between states labeled with character
One or more states may be terminal (accepting) Start is a distinguished state Automaton makes transition from state S1 to S2
– If and only if arc from S1 to S2 is labeled with next character in input Token is legal if automaton stops on terminal state
8 January 2004 Department of Software & Media Technology 6
FA from GrammarFA from Grammar
One state for each non-terminal A rule of the form
– Nt1 ::= terminal, generates transition from a state to final state
A rule of the form– Nt1 ::= terminal Nt2– Generates transition from state 1 to state 2 on an arc
labeled by the terminal
8 January 2004 Department of Software & Media Technology 7
Graphic representation of FAGraphic representation of FA
S
digitdigit
letterletter lette
r
digitdigit
underscore
identifier
8 January 2004 Department of Software & Media Technology 8
FA from REFA from RE
Each RE corresponds to a grammar For all REs
– A natural translation to FSM exists– Alternation often leads to non-deterministic machines
8 January 2004 Department of Software & Media Technology 9
Deterministic Finite Automata (DFA)Deterministic Finite Automata (DFA)
For all states S– For all characters C
• There is at most one arc from any state S that is labeled with C
Easier to implement No backtracking
Conventions for DFA: Error transitions are not explicitly shown Input symbols that result in the same transition are grouped together (this set can even be
given a name) Still not displayed: stopping conditions and actions
8 January 2004 Department of Software & Media Technology 10
Non-Deterministic Finite Automata (NFA)Non-Deterministic Finite Automata (NFA)
A non-deterministic FA– Has at least one state
• With two arcs to two distinct states
• Labeled with the same character
– Example: from start state, a digit can begin an integer literal or a real literal
– Implementation requires backtracking
8 January 2004 Department of Software & Media Technology 11
Lookahead & Backtracking in NFALookahead & Backtracking in NFA
letter start in_id
letter
[other] return id
finish
digit
8 January 2004 Department of Software & Media Technology 12
Implementation of FAImplementation of FA
letter start in_id
letter
[other] return id
finish
digit
8 January 2004 Department of Software & Media Technology 13
From RE to DFA & RE to NFAFrom RE to DFA & RE to NFA
letter start in_id
letter
[other] return id
finish
digit
8 January 2004 Department of Software & Media Technology 14
NFA to DFANFA to DFA
There is an algorithm for converting a non-deterministic machine to a deterministic one
Result may have exponentially more states– Intuitively: need new states to express uncertainty
about token: int or real
Other algorithms for minimizing number of states of FSM, for showing equivalence, etc.
8 January 2004 Department of Software & Media Technology 15
Example DFAExample DFA
a start accept
b
a or b or c
error
b a
c c
8 January 2004 Department of Software & Media Technology 16
Another view of the same DFAAnother view of the same DFA
a start accept
b|c
a|b|c
error
b|c a
8 January 2004 Department of Software & Media Technology 17
Yet another view of the same DFAYet another view of the same DFA
a start accept
b|c
8 January 2004 Department of Software & Media Technology 18
State Minimization in DFAState Minimization in DFA
a start accept
b|c
8 January 2004 Department of Software & Media Technology 19
TINY DFA:TINY DFA:
START
INNUM
DONE
INASSIGN
INCOMMENT
digit
digit
[other] letter
: =
letter [other]
other { }
other
white space
[other]
INID
8 January 2004 Department of Software & Media Technology 20
Lex for ScannerLex for Scanner
– Lex Conventions for RE– Format of a Lex Input File