so far

Post on 30-Jan-2016

38 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

So far. A language is a set of strings over an alphabet. Languages serve two purposes in computing: (a) communicating instructions or information (b) defining valid communications. We have defined languages by: (i) regular expressions (ii) finite state automata. - PowerPoint PPT Presentation

TRANSCRIPT

So far ...

A language is a set of strings over an alphabet.

We have defined languages by:

(i) regular expressions(ii) finite state automata

Both (i) and (ii) give us exactly the same class of languages.

Languages serve two purposes in computing:

(a) communicating instructions or information(b) defining valid communications

What about languages outwith this class?

Specifying Non-Regular Languages

We have already seen a number of languages that are not regular. In particular,

{anbn : n ≥ 0}the language of matched round bracketsarithmetic expressionsstandard programming languages

are not regular. However, these languages are all systematic constructions, and can be clearly and explicitly defined.

Consider L = {anbn : n ≥ 0}:

(i) L(ii) if x L, then axb L(iii) nothing else is in L

This is a clear and concise specification of L.

Can we use it to generate members of L?

Generating Languages

Using the previous definition of L, and the notion of string substitution, we can give a generative definition of L. Let X be a new symbol.

1) X -> 2) X -> aXb

This definition says that if we have a symbol X, we can replace it by the empty string, or by aXb. We now define L to be all strings over {a,b} formed by starting with X and applying rules 1) and 2) until we get a string with no X's.

Example:

X => aXb => aaXbb => aabb

X =>

X => aXb => aaXbb => aaaXbbb => aaabbb

Grammar

Formalising the previous notion of a generative definition based on string substitution, we get:

A grammar is a 4-tuple, G = (N, T, S, P), whereN is a finite alphabet called the non-terminals;T is a finite alphabet, called the terminals;N T = ;S N is the start symbol; and P is a finite set of productions of the form

, where (N T)+, has at leastone member from N, and (N T)*

Thus the previous example is a grammar where

N = {X}T = {a, b}S = XP = { X -> , X -> aXb}

so G = ({X}, {a,b}, {X}, {X -> , X -> aXb})

Definitions and Notation

Let G = (N,T,S,P) be a grammar.

If s, t, x, y, u and v are strings s.t. s = xuy , t = xvy, and (u -> v ) P then s directly derives t.,written s => t.

If there is a sequence of strings s0, s1, ..., sn s.t.s0 => s1 => ... => sn-1 => sn, then s0 derives sn, written s0 =>* sn.

A sentential form of G is a string w (N T)* s.t.S =>* w.

A sentence of G is a sentential form w T* i.e. one with no non-terminals.

The language defined by G is the set of allsentences of G, denoted L(G).

aaaSbbb => aaaaSbbbb.

S =>* aaaabbbb.

aaaSbbb is a sentential form of G

aaaabbbb is a sentence of G.

L(G) = {, ab, aabb, aaabbb, ...}, which is {anbn: n ≥ 0}

Definitions and Notation (cont.)

Notation: we normally order the set of productions, and assign them numbers. If x => y by using rule number i, then we write x =>i y

-> 1 | 2 | 3 ... | n is shorthand for

-> 1

-> 2

: -> n

In general, non-terminals will be uppercase,while terminals will be lowercase.

A context-free grammar (CFG) is one in which all productions are of the form -> , where N - i.e. the left-hand side is a singlenon-terminal.

A context-free language (CFL) is one that can be defined by a context-free grammar.

Context-Free Grammars

A CFG is called context-free because the left-hand side of all productions contain only single symbols, and so a production can be applied to a symbol without needing to consider the symbol's context.

We only consider context-free grammars in this course.

Some languages are not context-free.

Example: {anbncn : n ≥ 0}

Some languages cannot be definedby any grammar.

It is believed that these are thesame languages that cannot be defined byany algorithm or effective procedure.

Example CFG

G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} )

Example CFG

G = ({S}, {a, +, *, (, )}, S, { S -> S+S | S*S | (S) | a} )

This is a grammar of algebraic expressions.

The productions are:1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Example derivation: S => S * S => a * S => a * (S) => a * (S + S)

=> a * (a + S) => a * (a + a).

Note that there are many other ways of deriving the same string.

Why Grammar?

In English, the grammar is the set of conventionsdefining the structure of sentences - e.g.

a sentence must have a subject and an object

verbs must agree with nouns e.g. "John walks" & "John and Mary walk"

adjectives come before nounse.g. "the red car" and not "the car red"

We have shown a formalisation of this notion.

We now can write explicit clear statementsof what sentences are in a language.

Grammars can be used in the processing ofnatural language by computer (4th year option),in formalising design, in pattern recognition,and many other areas.

A grammar for a small part of English

S -> NP VPNP -> Det NP1 | PNNP1 -> Adj NP1| NDet -> a | thePN -> peter | paul | maryAdj -> large | blackN -> dog | cat | horseVP -> V NPV -> is | likes | hates

Can you derive:

peter is a large black cat

A grammar for a small part of English

S -> NP VPNP -> Det NP1 | PNNP1 -> Adj NP1| NDet -> a | thePN -> peter | paul | maryAdj -> large | blackN -> dog | cat | horseVP -> V NPV -> is | likes | hates

Example derivations:

S => NP VP => PN VP => mary VP =>mary V NP => mary hates NP =>mary hates Det NP1 => mary hates the NP1 =>mary hates the N => mary hates the dog

S => NP VP => NP V NP => NP V Det NP1 =>NP V a NP1 => NP V a Adj NP1 =>NP is a Adj NP1 => NP is a Adj Adj NP1 =>NP is a large Adj NP1 => NP is a large Adj N =>NP is a large black N => NP is a large black cat=> PN is a large black cat =>peter is a large black cat

Regular Grammars

A grammar is regular if each production is ofthe form:

(i) A -> t or(ii) A -> tB(iii) A ->

where A, B N, t T.

Example:

S -> aA | bBA -> aS | aB -> bS | b

Is this s sentence of the language?

aaaabb

Regular Grammars

A grammar is regular if each production is ofthe form:

(i) A -> t or(ii) A -> tB(iii) A ->

where A, B N, t T.

Example:

S -> aA | bBA -> aS | aB -> bS | b

S => aA => aaS => aaaA => aaaaS =>aaaabB => aaaabb

Regular Grammars

A grammar is regular if each production is ofthe form:

(i) A -> t or(ii) A -> tB(iii) A ->

where A, B N, t T.

Example:

S -> aA | bBA -> aS | aB -> bS | b

S => aA => aaS => aaaA => aaaaS =>aaaabB => aaaabb

The language generated by this grammaris the language denoted by …..

Regular Grammars

A grammar is regular if each production is ofthe form:

(i) A -> t or(ii) A -> tB(iii) A ->

where A, B N, t T.

Example:

S -> aA | bBA -> aS | aB -> bS | b

S => aA => aaS => aaaA => aaaaS =>aaaabB => aaaabb

The language generated by this grammaris the language denoted by (aa + bb)+

Regular Grammars and Regular Languages

Thus we now have three different definitionsof the one class of languages:

regular expressions

finite state automata

regular grammars

Theorem: (stated here without proof)

A language is regular iff it can be defined bya regular grammar.

All three are useful in Computing Science

Example CFG (2)

1) S -> XaaX2) X -> aX3) X -> bX4) X ->

S => XaaX => bXaaX => baXaaX =>

babXaaX => babaaX => babaaaX =>

babaaabX => babaaab

This grammar defines the language:

………

21 3 3

324

4

Example CFG (2)

1) S -> XaaX2) X -> aX3) X -> bX4) X ->

S => XaaX => bXaaX => baXaaX =>

babXaaX => babaaX => babaaaX =>

babaaabX => babaaab

This grammar defines the language

(a + b)*aa(a + b)*

21 3 3

324

4

...as a Regular Grammar

1) S -> aS2) S -> bS3) S -> aM4) M -> aB5) B -> aB6) B -> bB7) B ->

S => bS => baS => babS => babaM =>

babaaB => babaaaB => babaaabB => babaaab

S => bS => baM => baaB => baa

2 2

2

1 3

3 4

4

5 6 7

7

Backus-Naur Form

A notation devised for defining the languageAlgol 60. PASCAL syntax rules are oftenpresented in this form.

Example:

<simple decl> ::= <type> <id list><type> ::= real | integer | boolean<id list> ::= identifier | <id list> identifier

This formalism is equivalent to CFG's, wherenames enclosed in <...> are non-terminals,names in bold are terminals, and ::= is thesame as the -> notation.

Constructing Grammars

Suppose we wanted to construct a grammar forthe language of all strings of the formaccc...cb or abab...abcc....cabab...ab

n times n times

We need to find rules to create:(i) sequences of strings - ccc....c(ii) bracketed strigs - accc...cb, and(iii) nested strings - abab...ab<...>abab...ab

SequencingA -> aA | or A -> Aa |

e.g. A => aA => aaA => ... => aaaaaA => aaaaa

BracketingA -> aBb or A -> BbB ->xB B -> ax | Bx

e.g. A => aBb => axBb => axxBb => ... => axxxxxb

S -> abSab | abBabB -> cB | c

What language does this generate? (Say it precisely)

Constructing Grammars (cont.)

Nesting

A -> aAb | BB -> xB |

e.g. A => aAb => aaAbb => aaaAbbb => ... => aaaaaAbbbbb => aaaaaBbbbbb => ... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb

Example:

S -> abSab | abBabB -> cB | c

What language does this generate?The language (ab)n+cm+(ab)n

(where n>0 and m>0)

Constructing Grammars (cont.)

Nesting

A -> aAb | BB -> xB |

e.g. A => aAb => aaAbb => aaaAbbb => ... => aaaaaAbbbbb => aaaaaBbbbbb => ... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb

Example:

S -> abSab | abBabB -> cB | c

Example derivations:

S => abBab => abcBab => ... abccccab

S => abSab => ababSabab =>abababSababab => abababBababab => abababcBababab => ... => abababccccababab

Constructing Grammars (cont.)

Nesting

A -> aAb | BB -> xB |

e.g. A => aAb => aaAbb => aaaAbbb => ... => aaaaaAbbbbb => aaaaaBbbbbb => ... => aaaaaxxxBbbbbb => aaaaaxxxbbbbb

Example:

top related