formal grammars and abstract machines

Post on 05-Dec-2021

13 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Formal Grammars and Abstract Machines

Sahar Al Seesi

What are Formal Languages

• Describing the sentence structure of a language in a formal way

• Used in – Natural Language Processing Applications (translators,

grammar checking tools, etc..) • Language: English, French, Spanish, Chinese, etc..

– RNA/Protein Structure Analysis • RNA in general, ribosomal RNA, protein, etc..

– Compilers for programming languages • C, Java, Python, Linux shell script, Assembly, etc..

• To build a program for any of the above applications, the language rules must be described in a formal inclusive way.

Formal Languages and Grammars

• G={ Σ, R, S }

• Σ : Non-terminals (NT) Terminals (T) {S, VERB, SUBJECT, OBJECT} {children, sam, play, eat, ball} {S, A1, A2} {a, c, g, t} {a,c,g,u}

• R : Production rules {S SUBJECT VERB OBJECT}

• S : Starting symbol

• L(G) : The language defined by G; a finite or infinite set of strings (words/sentences)

• L(G) ⊆ T*

Chomsky Hierarchy

Regular

Context-free

Context-sensitive

Unrestricted Grammars Recursively Enumerable Languages

Pow

er o

f ex

pre

ssio

n

Ru

le c

om

ple

xity

Pa

rsin

g ti

me

com

ple

xity

Parsing/Accepting Abstract Machines

Grammar Parsing Automaton

Regular grammars Finite State Machine (FSM)

Context free grammars Push-Down Automaton (PDA)

Context sensitive grammars Linear-Bounded Automaton (LBA)

Unrestricted grammars Turing Machine (TM)

Regular Languages & Regular Expressions

• A regular language can be represented by a regular expression

• Let Σ = {a,b} • Let Lr be the language defined by regular

expression r. r Lr

Σ* the set of all strings over Σ of length 0 or more (includes the empty string, ) Σ+ the set of all strings over Σ of length 1 or more (does not include ) a+ the set of all strings of 1 or more a’s {a, aa, aaa, …} b* the set of all strings of 0 or more b’s {, b, bb, bbb, …}

Combining Regular Languages

• Concatenation

– Let r and s be 2 regular expressions, rs corresponds to the language LrLs

– Example:

• r = a*, s = b+

• LrLs : the set of strings consisting of 0 or more a’s followed by 1 or more b’s

• {b, bb, ab, aabbbb} ⊂ LrLs

Combining Regular Languages

• Union

– Let r and s be 2 regular expressions, r+s corresponds to the language Lr∪Ls

– Example:

• r = a*, s = b+

• Lr∪Ls : the set of strings consisting of 0 or more a’s and strings of 1 or more b’s

• {b, bb, a, aa, bbbb} ⊂Lr ∪ Ls

Combining Regular Languages

• Closure

– Let r be a regular expression, r* corresponds to the language Lr*

– Example:

• r = ab

• Lr* : the set of strings consisting of 0 or more “ab”s (ab)*

• {, ab, abab, abab, ababab} ⊂ Lr*

Example

• R = (a+c+t)ykk(p+q)*vdt(l+z+)pq

• Strings that belong to the language defined by R

ayykppvdtlpq

cyykpqppqvdtpq

tyykqvdtzpq

Regular Grammars

• Can be represented by a regular expression

• Grammar rules are of the form NT T NT

NT T

• Example: The set of all DNA strings

• Regular Expression: {a,c,g,t}+

• G= { Σ, {S}, R, S }

• Σ = {S, a, c, g, t}

• R = {S aS | cS | gS | tS | a | c | g | t}

Finite State Machine

• M={Q, Σ, , q0, F}

• Q: Finite set of states

• Σ: Language alphabet

• : Transition function (Qx Σ Q)

• q0 : Starting state

• F : Set of final states

Finite State Machine Example • M={Q, Σ, , q0, F}

• A FSM for

R = {S aS | cS | gS | tS | a | c | g | t}

2

a,c,g,t

Q = {1,2} Σ = {a, c, g, t}

q0 = 1 F = {2}

(1,a) = 2 (1,c) = 2

(1,g) = 2 (1,t) = 2

(2,a) = 2 (2,c) = 2

(2,g) = 2 (2,t) = 2

a,c,g,t 1

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input1: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input1: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input1: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input1: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input1: 0100

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input2: 1101

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input3: 11011

Another Example

L = The set of all strings in {0,1}*that either begin or end (or both) with 01

R = (01(0+1)*)+((0+1)*01)

S B A

E C D

1

0

1

1

1

1 0

0,1

0

0

0

Input3: 11011

Context Free Grammars (CFG) and Languages

• CFGs Can represent nested pair-wise correlation between terminal symbols in the string

• Famous example: palindrome language (wwr) a b a a a a b a

• Can you write a regular grammar for wwr?

• Grammar rules are of the form – NT (T+NT)+

• M={Q, Σ, , , q0, F}

• Q: Finite set of states

• Σ: Language alphabet

• : Stack alphabet

• : Transition function (Q x Σ x Q x *)

• q0 : Starting state

• F : Set of final states

CFG and Push Down Automata

http://epsilonvectorplusplus.wordpress.com

Grammar wwr

• G={ Σ, V, R, S }

• Σ = {a, b} , V = {S}

• R = {S aSa| bSb | aa | bb}

Parse tree for string: abbbba

S

a S a

b S b

b b

Context Free Grammar for an RNA stem loop

• Language : wvwcr

• G={ Σ, R, S }

• Σ = {S, L, a, c, g, u}

• R = {S aSu| uSa | gSc | cSg | L,

L aL | cL | gL | uL | a |c |g | u}

Durbin et. al., Biological Sequence Analysis, adapted

Context Sensitive Grammars and Languages

• Can represent crossing pair-wise correlation between terminal symbols in the string

• Famous example: copy language (ww) a a b b a a b b

• Grammar rules are of the form:

– (T+NT)*NT (T+NT)* (T+NT)+

– |LHS| <= |RHS| (generated RHS cannot shrink from one production step to the next)

CSG and Linear Bounded Automata

SKIP FOR NOW

Non-deterministic and stochastic models

• A stochastic grammar has a probability associated with each rule in the grammar

• Similarly, in automata, a probability would be associated with each transition

Unrestricted Grammars and Recursively Enumerable Languages

• Grammar rules are of the form:

- (T+NT)*NT (T+NT)* (T+NT)*

- The only rule is that the left hand side must

contain at least one variable

• A recursively enumerable language is one that can be represented by an unrestricted grammar

• M={Q, Σ, , , q0, B, F}

• Q: Finite set of states

• Σ: Language alphabet

• : tape alphabet (Σ ⊆ )

• : Transition function (Q x Σ Q x x {L,R})

• q0 : Starting state

• B: The blank symbol

• F : Set of final states

Turing Machines

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # a a b b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X a b b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X a b b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X a Y b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X a Y b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X a Y b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y b # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

Example

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X X Y Y # # # # # # #

What is the language this TM accepts?

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

Example -cont. (input 2)

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # a b a b # # # # # # #

Example -cont. (input 2)

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X b a b # # # # # # #

Example -cont. (input 2)

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X Y a b # # # # # # #

Example -cont. (input 2)

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X Y a b # # # # # # #

Example -cont. (input 2)

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

# # # # X Y a b # # # # # # #

Language: anbn

q0 q1 q2 q3 q4

a/X,R

Y/Y,R

b/Y,L

a/a,R Y/Y,R

a/a,L Y/Y,L

X/X,R

Y/Y,R

#/#,R

Computing with Turing Machines

Examples: A TM that accepts a number x divisible by 3 in unary format and outputs the results of the computation x/3

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # 111111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # X11111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XX1111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # # # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXX111 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXX11 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXX1 # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 1 # # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 11 # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 11 # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 11 # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM

# # # # XXXXXX # 11 # # # #

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

Divide by 3 TM Try to parse 1111

q0 q1 q2 q3 q5 1/X,R 1/X,R

1/1,R

#/#,R q4

1/X,R

1/1,R

#/1,L

1/1,L

q6

#/#,L

q8

X/X,R

q7

1/1,L

1/1,L X/X,R

More complex TM models

• Several tapes

• Several read/write heads

A Turing machine can simulate a computer.

Back to Linear Bounded Automata

state

$ $

boundary boundary

• LBA is a TM whose read/write head never moves off the portion of the tape occupied by the input string

top related