toc_uniti

Theory of Computation

Unit I

N.SairamSchool of Computing, SASTRA University

August 12, 2014 1/83

Course Objectives

I Understand and develop mathematical models to simulatecomplex systems

I Analyze and examine alternate solutions to a problem

I Identify the suitable hardware and software to improve theefficiency

I Determine the correctness and efficiency of a systemdesign and implementation

August 12, 2014 2/83

Overview of the Syllabus

I Unit II Introduction to basic conceptsI Finite AutomatonI Regular languagesI Lex in syntactical analysisI Properties of regular languages

I Unit III Context Free LanguagesI Syntactical Analysis using YaccI Simplification and Normal Forms

August 12, 2014 3/83


I Unit IIII Push Down AutomatonI Properties of CFLI Turing MachinesI Other Models of TM

I Unit IVI Hirerarchy of formal languages and automaton-RL,

RE,CSG and Chomsky HirerarchyI Limits of algorithm computation - Undecidable problems

PCPI Other models of Computation-Recursive functions,

rewritable systemsI Overview of computational complexity

August 12, 2014 4/83


I Text BooksI Peter Linz, An Introduction to Formal Languages and

Automata, 5th Edition, Jones and Bartlet LearningInternational, United Kingdom, 2011.

I Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D.Ullman, Compilers Principles, Techniques, & Tools,Pearson Education, 2007.

I John R. Levine, Tony Mason, Doug Brown, Lex and Yacc,Oreilly Media, 1992.

August 12, 2014 5/83

Mathematical Preliminaries

I SetsI A Set is a collection of elements without any structure

other than membership

I Example: S=0,1,2I Example: Ellipses are used whenever the meaning is clear.a,b,c,...,z. S=i:i>0, i is even

I If x is an element of the set S, we write x ∈ S. Thestatement that x is not in S is written x not in S.

August 12, 2014 6/83



other than membershipI Example: S=0,1,2

I Example: Ellipses are used whenever the meaning is clear.a,b,c,...,z. S=i:i>0, i is even


August 12, 2014 6/83



other than membershipI Example: S=0,1,2I Example: Ellipses are used whenever the meaning is clear.a,b,c,...,z. S=i:i>0, i is even


August 12, 2014 6/83

Set Operations and Laws

I Union, Intersection and Difference

I Empty Set: A set with no elements. Denoted by φI DeMorgan Laws:

1.¯S1 ∪ S2 = S1 ∩ S2

2.¯S1 ∩ S2 = S1 ∪ S2

August 12, 2014 7/83



I Empty Set: A set with no elements. Denoted by φ

I DeMorgan Laws:

1.¯S1 ∪ S2 = S1 ∩ S2

2.¯S1 ∩ S2 = S1 ∪ S2

August 12, 2014 7/83



I Empty Set: A set with no elements. Denoted by φI DeMorgan Laws:

1.¯S1 ∪ S2 = S1 ∩ S2

2.¯S1 ∩ S2 = S1 ∪ S2

August 12, 2014 7/83

Subset, disjoint sets, finite and infinite sets

I Subset S1 ⊆ S

I Proper Subset: S1 ⊂ S

I Disjoint sets: S1 ∩ S2 = φ

I Finite set: A set having finite number of elements

I Cardinality of a Set: No. of elements of a set. Denoted by|S|

I Power set: Set of all subsets of a set. Denoted by 2S

I If S is finite then |2S | = 2|S |

August 12, 2014 8/83

Cartesian Product and Partition

I Cartesian Product: S1 X S2 = (x,y): x ∈ S1, y ∈ S2I Partition: A set can be divided by separating it into a

number of subsets. Suppose that S1, S2, ... , Sn aresubsets of a given set S and that the following holds:

1. The subsets S1, S2,...,Sn are mutually disjoint2. S1 ∪ S2 ∪...∪ Sn = Si

3. None of the Si is empty

I Then S1, S2,...Sn is called a partition of S

August 12, 2014 9/83


I Cartesian Product: S1 X S2 = (x,y): x ∈ S1, y ∈ S2

I Partition: A set can be divided by separating it into anumber of subsets. Suppose that S1, S2, ... , Sn aresubsets of a given set S and that the following holds:




August 12, 2014 9/83







August 12, 2014 9/83




1. The subsets S1, S2,...,Sn are mutually disjoint

2. S1 ∪ S2 ∪...∪ Sn = Si



August 12, 2014 9/83







August 12, 2014 9/83

Binary relation and its properties

I Relation: A subset of a cartesian product

I Reflexive, anti symmetric and transitive (Partial orderrelation)

I Reflexive, symmetric and transitive ( Equivalence relation)

August 12, 2014 10/83

Example

I Let us consider a relation ≡ on X

I Since x≡x ∀ x ∈X, ≡ is reflexive

I Let x≡y for x,y∈X. x≡y ⇒ y≡x. ∴ ≡ is symmetric

I Let x≡y and y≡z for x,y,z ∈ X. x≡y ⇒ y≡ x (BySymmetric Property). Since y≡x and yequivz ⇒ x≡ z.Hence ≡ is transitive

I Since ≡ is reflexive, symmetric and transitive ≡ is anequivalence relation

August 12, 2014 11/83

Functions

I Function: A rule that assigns to elements of one set aunique element of another set. If f denotes a function,then the first set is called the domain of f and the secondset is its range. We write f: S1 -> S2 to indicate that thedomain of f is a subset of S1 and that the range of f is asubset of S2

I If the domain of f is all of S1, we say that f is a totalfunction on S1. Otherwise f is said to be a partial function

August 12, 2014 12/83

Big Oh, omega and theta notations

I Example: f(n)=2n2 + 3n, g(n)= n3 and h(n)=10n2+100

I f(n)=O(g(n)), g(n)= Ω(n), f(n)=θ(h(n))

I O(n) + O(n) = 2O(n)

August 12, 2014 13/83

Graphs

I Definition of a graph

I Directed and undirected graph

I Walk: A sequence of edges (vi,vj),(vj,vk),...(vm,vn) iscalled a walk from vi to vn

I Length of a walk: Total number of edges traversed ingoing from the initial vertex to the final one

I Path: A walk with no edge repeated is said to be a path

I Simple Path: A path is simple if no vertex is repeated

I Cycle: A walk from vi to itself with no repeated edges iscalled a cycle with base vi

I Loop: An edge from a vertex to itself is called a loop

August 12, 2014 14/83

Trees

I Connected graph: A graph G(V,E) is said to be connectedif there is a path between any two vertices of G

I Tree: A tree is a connected graph with no cycles

I Root of a tree: A specially designated vertex with noincoming edges

I Leaves: There are some vertices without out going edges.Such vertices are called leaves

I Parent of a node: If there is an edge from vi to vj then viis said to be the parent of vj . vj is the child of vi

August 12, 2014 15/83

Trees

I Level of a vertex: The level associated with each vertex isthe number of edges in the path from the root to thevertex

I Height of the tree: Largest level number of any vertexI Sometimes an ordering is associated with the nodes at

each level

Figure: Example

August 12, 2014 16/83

Proof Techniques

I Proof: A sequence of accepted rules of deductive reasoning

I Types of Proof techniques: Direct Proof, Method ofInduction and Method of Contradiction

I Proof by InductionI Let P1,P2,.. be a sequence of statements we want to prove

to be true. Furthermore, suppose also that the followingholds.

I For some k ≥ 1, we know that P1,P2,..,Pk are trueI The problem is such that for n ≥k, the truths of P1,P2,..

Pn imply the truth of Pn+1

I We can then use induction to show that every statement inthis sequence is true

I

August 12, 2014 17/83

Proof by Induction

I The statements P1,P2,..,Pk are called the basis of theinduction

I The step connecting Pn with Pn+1 is called the inductivestep

August 12, 2014 18/83

Example

TheoremA binary tree is a tree in which no parent can have more thantwo children. Prove that a binary tree of height n has atmost2n leaves.

Proof.

Let l(n) denote the maximum number of leaves of a binary treeof height n.We want to show that l(n) ≤ 2n.Basis: l(0) = 1 = 20, since a tree of height 0 can have nonodes other than the root.Inductive Assumption: Let l(i) ≤ 2i , for i=0,1,..,nInductive Step: We want to prove the result for i=n+1. To geta binary tree of height n+1 from one of height n, we can createatmost two leaves in place of each previous one. ∴ l(n+1) =2l(n) ≤ 2 X 2n = 2n+1

August 12, 2014 19/83

Example


Proof.Let l(n) denote the maximum number of leaves of a binary treeof height n.We want to show that l(n) ≤ 2n.

Basis: l(0) = 1 = 20, since a tree of height 0 can have nonodes other than the root.Inductive Assumption: Let l(i) ≤ 2i , for i=0,1,..,nInductive Step: We want to prove the result for i=n+1. To geta binary tree of height n+1 from one of height n, we can createatmost two leaves in place of each previous one. ∴ l(n+1) =2l(n) ≤ 2 X 2n = 2n+1

August 12, 2014 19/83

Example


Proof.Let l(n) denote the maximum number of leaves of a binary treeof height n.We want to show that l(n) ≤ 2n.Basis: l(0) = 1 = 20, since a tree of height 0 can have nonodes other than the root.

Inductive Assumption: Let l(i) ≤ 2i , for i=0,1,..,nInductive Step: We want to prove the result for i=n+1. To geta binary tree of height n+1 from one of height n, we can createatmost two leaves in place of each previous one. ∴ l(n+1) =2l(n) ≤ 2 X 2n = 2n+1

August 12, 2014 19/83

Example


Proof.Let l(n) denote the maximum number of leaves of a binary treeof height n.We want to show that l(n) ≤ 2n.Basis: l(0) = 1 = 20, since a tree of height 0 can have nonodes other than the root.Inductive Assumption: Let l(i) ≤ 2i , for i=0,1,..,n

Inductive Step: We want to prove the result for i=n+1. To geta binary tree of height n+1 from one of height n, we can createatmost two leaves in place of each previous one. ∴ l(n+1) =2l(n) ≤ 2 X 2n = 2n+1

August 12, 2014 19/83

Example


Proof.Let l(n) denote the maximum number of leaves of a binary treeof height n.We want to show that l(n) ≤ 2n.Basis: l(0) = 1 = 20, since a tree of height 0 can have nonodes other than the root.Inductive Assumption: Let l(i) ≤ 2i , for i=0,1,..,nInductive Step: We want to prove the result for i=n+1. To geta binary tree of height n+1 from one of height n, we can createatmost two leaves in place of each previous one. ∴ l(n+1) =2l(n) ≤ 2 X 2n = 2n+1

August 12, 2014 19/83

Example for Proof by Contradiction

Theorem√2 is irrational

Proof.

Let us assume that√

2 is rational∴√

2 = mn where m and n are integers without a common

factor other than 1⇒ 2m2 = n2. ∴ n2 is even. ⇒ n is evenWe can write n as n=2k. ∴ 2m2 = 4k2 and m2 = 2k2

∴ m is even.This contradicts the assumption that m and n have nocommon factors other than 1. Our assumption is wrong. Hence√

2 is irrational.

August 12, 2014 20/83



Proof.Let us assume that

√2 is rational

∴√


factor other than 1

⇒ 2m2 = n2. ∴ n2 is even. ⇒ n is evenWe can write n as n=2k. ∴ 2m2 = 4k2 and m2 = 2k2


2 is irrational.

August 12, 2014 20/83




√2 is rational

∴√


factor other than 1⇒ 2m2 = n2. ∴ n2 is even. ⇒ n is even

We can write n as n=2k. ∴ 2m2 = 4k2 and m2 = 2k2


2 is irrational.

August 12, 2014 20/83




√2 is rational

∴√



∴ m is even.

This contradicts the assumption that m and n have nocommon factors other than 1. Our assumption is wrong. Hence√

2 is irrational.

August 12, 2014 20/83




√2 is rational

∴√




2 is irrational.

August 12, 2014 20/83

Additional Exercises

1. Show that if S1 and S2 are finite sets with |S1| =n and|S2| = m, then |S1 ∪ S2| ≤ n+m

2. Consider the relation between two sets defined by S1 ≡ S2iff |S1| = |S2|. Show that this is an equivalence relation.

3. Show that if f(n)=O(g(n)) and g(n) = O(f(n)), thenf(n)=θ(g(n))

4. Show that 2n = O(3n)

5. Show that n2 + 5 log n = O(n2) and 3n = O(n!)

6. Show that if f(n)=O(n2) and g(n)=O(n3) thenf(n)+g(n)=O(n3)

7. Show that for all n≥4 the inequality 2n < n! holds

August 12, 2014 21/83

Three Basic Concepts

1. Languages

2. Grammars

3. Automata

August 12, 2014 22/83

Languages

I Alphabet: A non-empty set∑

of symbols

I String:Finite sequences of symbols from the alphabet.Example: If the alphabet

∑=a,b, then abab and

aaabbba are strings on∑

I Note: We use a,b,c,... for elements of∑

and u,v,w,...forstring names. We write w=abaaa to indicate that thestring named w has the specific value abaaa

I Concatenation of two Strings: The concatenation of twostrings w and v is the string obtained by appending thesymbols of v to the right end of w. (i.e) If w=a1a2..an andv=b1b2...bm, then the concatenation of w and v denotedby wv is wv=a1a2...anb1b2...bm

August 12, 2014 23/83

Language Contd..

I Reverse of a String: A string obtained by writing thesymbols in reverse order. If w is a string then its reverse isdenoted by wR and is defined as wR=an...a2a1

I Length of a String: It is denoted by |w |, is the number ofsymbols in the string

I Empty String:A string with no symbols at all is called anempty string. It will be denoted by λ. The followingsimple relations |λ|

I |λ|=0I λw=wλ = w

I Substring of w: Any string of consecutive symbols in somew is said to be a substring of w. If w=vu, then thesubstrings v and u are said to be a prefix and a suffix of wrespectively. For example, if w=abbab, then λ, a, ab,abb, abba, abbab is the set of all prefixes of w, while bab,ab, b are some its suffixes

August 12, 2014 24/83

Language Contd..

I wn stands for the string obtained by repeating w ”n” times

I w0=λ, ∀ w

I If∑

is an alphabet, then we use∑∗ to denote the set of

strings obtained by concatenating zero or more symbolsfrom

∑. The set

∑∗ always contains λ

I To exclude the empty string, we define∑+=

∑∗-λI A Language is a subset of

∑∗I A string in a language L will be called a sentence of L

August 12, 2014 25/83

Language Contd..

I Complement of L: L =∑∗ - L

I Reverse of L: LR = wR :w ∈LI Concatenation of two Langugages: L1L2 = xy:x∈L1,

y∈L2I Ln is defined as concatenating L ”n” times

I L0 = λ and L1=L

I L∗= L0 ∪ L1 ∪...

I L+ = L1 ∪ L2 ∪...

August 12, 2014 26/83

Grammars

DefinitionA grammar G is defined as a quadruple G=(N,T,P,S) where

I N is a set of symbols called Non-Terminals

I T is a set of symbols called Terminals

I P is a non empty set of rules called production rules of theform α → β where α ∈ (N ∪ T)+ containing atleast onenon terminal and β ∈ (N ∪ T)∗

I S ∈ N is called the Start Symbol

I It will be assumed that the sets N and T are non-emptyand disjoint

August 12, 2014 27/83

Grammars Contd..

I → denotes replacement

I Given a string w of the form w=uxv, we may use theproduction x→y, thereby obtaining a new string z=uyv

I The above can be written as w ⇒ z

I We call it as w derives z

I Successive strings are derived by applying the productionsof the grammar in arbitrary order

I If w1 ⇒ w2 ⇒ ... ⇒ wn, we write w1 ⇒∗ wn

August 12, 2014 28/83

Language Generated by a grammar G

DefinitionLet G=(N,T,P,S) be a grammar. Then the set L(G)=w ∈ T∗:S⇒∗ w is the language generated by G

I If w ∈ L(G), then the sequence S ⇒ w1 ⇒ ... ⇒ wn ⇒ wis a derivation of the sentence w.

I The strings S, w1,w2,...,wn which contain variables as wellas terminals, are called sentential forms of the derivation

August 12, 2014 29/83

Examples

1. Consider the grammar G=(S,a,b, S, P), with P givenby S → aSb, S→ λ. What language does the grammarwith the above productions?

2. Find a grammar that generates L=anbn+1:n>03. Let

∑=a,b. Construct a grammar for the language

L=w:na(w)=nb(w) where w ∈∑∗

I Two grammars G1 and G2 are equivalent if they generatethe same language. That is if L(G1)=L(G2)

August 12, 2014 30/83

Examples


2. Find a grammar that generates L=anbn+1:n>0

3. Let∑

=a,b. Construct a grammar for the languageL=w:na(w)=nb(w) where w ∈

∑∗I Two grammars G1 and G2 are equivalent if they generate

the same language. That is if L(G1)=L(G2)

August 12, 2014 30/83

Examples


2. Find a grammar that generates L=anbn+1:n>03. Let

∑=a,b. Construct a grammar for the language

L=w:na(w)=nb(w) where w ∈∑∗

I Two grammars G1 and G2 are equivalent if they generatethe same language. That is if L(G1)=L(G2)

August 12, 2014 30/83

Automata

DefinitionAutomaton: An abstract model of a digital computer

I Has a mechanism for reading input

I Assumed that the input is a string over given alphabetwritten on an input file, which the automaton can readbut not change

I Input file is divided into cells each of which can hold onesymbol

I Input is read from left to right one symbol at a time

August 12, 2014 31/83

Automata

I Input mechanism can also detect the end of the inputstring

I Can produce output in some form

I May have a temporary storage device, consisting of anunlimited number of cells, each capable of holding a singlesymbol from an alphabet(not necessarily the same one asthe input alphabet)

I The automaton can read and change the contents of thestorage cells

I The automaton has a control unit, which can be in anyone of a finite number of internal states and which canchange state in some defined manner

August 12, 2014 32/83

Automata

I An automata is assumed to operate in a discrete timeframe

I At any given time, the control unit is in some internalstate, and the input mechanism is scanning a particularsymbol on the input file

I The internal state of the control unit at the next time stepis determined by the next-state or transition function

I It gives the next state interms of the current state, thecurrent input symbol and the information currently in thetemporary storage

I During the transition from one time interval to the next,output may be produced or content in the temporarystorage changed

August 12, 2014 33/83

Automata

DefinitionConfiguration: Refers to a particular state of teh control unit,input file and temporary storage

DefinitionMove: The transition of the automaton from one configurationto the next will be called a Move

DefinitionDeterministic Automata: Each move is uniquely determined bythe current configuration

DefinitionNon Deterministic Automata:Opposite of DeterministicAutomata

August 12, 2014 34/83

Automata

DefinitionAn automaton whose output response is limited to a simple”yes” or ”no” is called an accepter. If it produces strings ofsymbols as output is called a transducer

August 12, 2014 35/83

Assignment II

I Submission date:on or before 30.07.2014

1. Find grammar for∑

= a,b that generate the sets of allstrings with exactly one ’a’

2. What language does the grammar with the productionsgiven below generate? S→aA, A→B, B→Aa

3. Find grammars for the language L=w:na(w)=nb(w)+1.Assume

∑=a,b

August 12, 2014 36/83

Deterministic Finite Accepters

I Definition of DFAI Transition graphI Extended Transition Function

I δ∗:Q X∑∗ → Q

I δ∗ is a string, rather than a single symbol and its valuegives the state the automaton will be in after reading thestring

I For example, if δ(q0,a)=q1 and δ(q1,b)=q2, thenδ∗(q0,ab)=q2

I Formally we can define δ∗ recursively by

δ∗(q, λ) = q (1)

δ∗(q,wa) = δ(δ∗(q,w), a) (2)

I for all q ∈ Q, w ∈∑∗, a∈

∑August 12, 2014 37/83

Languages and DFA’s

DefinitionLanguage accepted by a DFA: Set of all strings on

∑accepted

by M. Formally, L(M)=w∈∑∗:δ∗(q0,w) ∈ F

DefinitionLanguage not accepted by a DFA: ¯L(M)=w∈

∑∗:δ∗(q0,w)not in F

DefinitionTrap State: If a DFA goes into state q, from which it can neverescape is called a trap state

August 12, 2014 38/83

Example 1

I Construct a DFA for the language L=anb:n≥0

q0start q1 q2

a

b a,b

a,b

August 12, 2014 39/83

Example 2

I Find a deterministic finite accepter that recognizes the setof all strings on

∑=a,b starting with the prefix ab

q0start q1 q2

q3

a

b

b

a

a,b

a,b

August 12, 2014 40/83

Example 3

I Find a DFA that accepts all the strings on 0,1, exceptthose containing the substring 001

λstart 0 00 001

1

0 0

1

0

1

0,1

August 12, 2014 41/83

Regular Languages

DefinitionA Language L is called regular if and only if there exists someDFA M such that L=L(M)

I Show that the language L=awa:w ∈ a, b∗ is regular

q0start

q1

q2 q3

b

a

a,b

b

a

a

b

August 12, 2014 42/83

Non Deterministic Finite Accepters

I Definition

I Construct a NFA that accepts strings λ, 1010 and 101010but not 110 and 10100

q0start q1 q21 λ 0,1

0

August 12, 2014 43/83

Language accepted by a NFA

I L(M)=w ∈∑∗: δ∗(q0,w) ∩ F 6= φ

I Dead Configuration:If a transition from a state on an inputis undefined then the situation is called a deadconfiugration

I Why Non Determinism?

August 12, 2014 44/83

Why Non Determinism?

I Many deterministic algorithms require that one make achoice at some stage

I Example Game playing programI Frequently, the best move is not known, but can be found

using an exhaustive search with backtrackingI When several alternatives are possible, we choose one and

follow it until it becomes clear whether or not it was bestI If not, we retreat to the last decision point and explore the

other choicesI A non deterministic algorithm can make the best choice

would be able to solve the problem without backtracking,but a deterministic can simulate non determinism withsome extra work

I Non deterministic machines can serve as models of searchand back-track algorithms

August 12, 2014 45/83

Equivalence of NFA and DFA

DefinitionTwo Finite accepters M1 and M2 are eqivalent if L(M1)=L(M2)

I Let M=(q0,q1,0,1, δ,q0,q1) be an NFA whereδ(q0,0)=q0,q1, δ(q0,1)=q1, δ(q1,0)=φ,δ(q1,1)=q0,q1. Construct an equivalent DFA

August 12, 2014 46/83

Regular Languages and Regular Grammars

DefinitionRegular Expression: Let

∑be a given alphabet. Then

1. φ, λ and a∈∑

are all regular expressions. These arecalled primitive regular expressions

2. If r1 and r2 are regular expressions, so are r1+r2, r1r2, r∗1and (r1)

3. A string is a regular expression if and only if it can bederived fro the primitive regular expressions by a finitenumber of applications of the rules in step 2

August 12, 2014 47/83

Examples

1. If L1=10,1 and L2=011,11. L1L2=10011,1011,111.10,11∗ = λ,10,11,10101,...

2. Regular Expression 00 denotes the language 003. Regular Expression (0+1)∗ denotes the language strings of

0’s and 1’s

4. Regular Expression (0+1)∗00(0+1)∗ denotes the languageover 0’s and 1’s having atleast two consequtive zeros

5. Regular Expression(1+10)∗ denotes the language over 0’sand 1’s beginning with 1 and not having two consequtivezeros

August 12, 2014 48/83

Examples

1. If L1=10,1 and L2=011,11. L1L2=10011,1011,111.10,11∗ = λ,10,11,10101,...

2. Regular Expression 00 denotes the language 00

3. Regular Expression (0+1)∗ denotes the language strings of0’s and 1’s



August 12, 2014 48/83

Examples

1. If L1=10,1 and L2=011,11. L1L2=10011,1011,111.10,11∗ = λ,10,11,10101,...

2. Regular Expression 00 denotes the language 003. Regular Expression (0+1)∗ denotes the language strings of

0’s and 1’s



August 12, 2014 48/83

Examples Contd..

6. R.E. (0+1)∗011 denotes the language over 0’s and1’sending with 011

7. 0*1*2* denotes any number of 0’s, 1’s and 2’s

8. a+ba* denote the set of strings consisting of either asingle a or a b followed by zero or more a’s

9. aa+ab+ba+bb denote all strings over a and b of lengthtwo

10. (aa+ab+ba+bb)* denote all strings of even length

11. In the programming language C, a variable name consistsof letters, digits and the underscore symbol and it mustbegin with a letter or under-score. This can be describedby the regular expression [a-zA-Z ][a-zA-Z 0-9] .

August 12, 2014 49/83

Examples Contd..

12. An integer constant is an optional sign followed by anon-empty sequence of digits: [+-]?[0-9]+

13. A floating point constant is denoted by [+-]?((([0-9]+ .[0-9]|. [0-9]+ )([eE][+-]?[0-9]+ )?)—[0-9]+[eE][+-]?[0-9]+ )

August 12, 2014 50/83

Laws of Regular Expressions

I Two regular expressions R and S are said to be equal ifthey denote the same language

I R+S=S+R

I R+(S+T)=(R+S)+T

I R(ST)=(RS)T

I R(S+T)=RS+RT

I (S+T)R=SR+TR

I λR=Rλ = R

August 12, 2014 51/83

Languages Associated with Regular Expressions

I Exhibit the language L(a*(a+b)) in set notation

L(a*(a+b))=L(a*)L(a+b)= a,aa,aaa,...,b,ab,aab,...

I If r=(aa)*(bb)*b then find L(r)L(r)=a2nb2m+1:n≥0,m≥0

August 12, 2014 52/83


I Exhibit the language L(a*(a+b)) in set notationL(a*(a+b))=L(a*)L(a+b)= a,aa,aaa,...,b,ab,aab,...

I If r=(aa)*(bb)*b then find L(r)

L(r)=a2nb2m+1:n≥0,m≥0

August 12, 2014 52/83


I Exhibit the language L(a*(a+b)) in set notationL(a*(a+b))=L(a*)L(a+b)= a,aa,aaa,...,b,ab,aab,...

I If r=(aa)*(bb)*b then find L(r)L(r)=a2nb2m+1:n≥0,m≥0

August 12, 2014 52/83

RE to NFA

I Thomson’s AlgorithmI Let r be a regular expression. Then there exists some NFA

that accepts L(r).I Subset Construction Algorithm:

I While there is an unmarked state x=s1,s2,...sn of D dobeginmark xfor each input symbol a dobeginLet T be the set of states in which there is a transition ona from state si in xy=λ-CLOSURE(T)if y has not yet been added to the set of states D thenmake y an ”unmarked” state of Dadd a transition from x to y labeled a if not already presentendend

August 12, 2014 53/83

Minimizing the Number of States

I Input: A DFA M with set of states K, inputs I, transitionsdefined for all states and inputs, initial state q0 and set offinal states F

I Output: A DFA M’ accepting the same language as M andhaving as few states as possible

I Method: Construct a partition π of the set of states.Initially π consists of two groups, the final states F andthe non final states S-F. Then we construct a partitionπnew by the following procedure

August 12, 2014 54/83

Algorithm Contd..

for each group G of π dobeginpartition G into subgroups such that two states s and t of G arein the same group iff for all input symbols a , states s and thave transitions to states in the same group of πplace all subgroups so formed in πnewend

August 12, 2014 55/83

Regular Expressions for Regular Languages

DefinitionA generalized transition graph(GTG) is a transition graphwhose edges are labeled with regular expressions. The label ofany walk from the initial state to a final state is theconcatenation of several regular expressions and hence itself aregular expression. The Strings denoted by such regularexpressions are a subset of the language accepted by thegeneralized transition graph, with the full language being theunion of all such generated subsets

August 12, 2014 56/83

Example

I The figure given below represents a generalized transitiongraph. The language accepted by it is L(a*+a*(a+b)c*)

I The edge (q0,q0) labeled a is a cycle that can generateany number of a’s, that is L(a*)

q0start q1

a

a+b

c*

August 12, 2014 57/83

GTG Contd..

I The graph of any non deterministic finte accepter can beconsidered a generalized transition graph if the edge labelsare interpreted properly

I An edge labeled with a single symbol a is interpreted as anedge labeled with the expression a, while an edge labeledwith multiple symbols a, b,.. is interpreted as an edgelabeled with the expression a+b+...

I From the above observation, it follows that for everyregular language, there exists a generalized transitiongraph that accepts it

I Conversely, every language accepted by a generalizedtransition graph is regular

I A Complete GTG is a graph in which all edges are presentI If a GTG, after conversion from an nfa, has some edges

missing, put them in and label them with φAugust 12, 2014 58/83

GTG Contd..

I A Complete GTG with |V | vertices has exactly |V |2 edges

August 12, 2014 59/83

Example

I The GTG given below is not complete

q0

q1

q2

a b

e

c

d

August 12, 2014 60/83

Example Contd..

I The figure given below shows how it is completed

q0

q1

q2

φ

a

φ

b

e

c

φ

d

φ

August 12, 2014 61/83

Example

I Let us consider the simple two state complete GTG shownbelow

q0start q1

r1

r2

r3

r4

I Mentally tracing through this GTG we can convinceourself that the regular expression r=r∗1r2(r4+r3r∗1r2)∗

covers all possible paths and so is the correct regularexpression associated with the graph

I When a GTG has more than two states, we can find anequivalent graph by removing one state at a time

August 12, 2014 62/83

NFA to rex

1. Start with an nfa with states q0,q1,...qn and a single finalstate, distinct from its initial state

2. Convert the nfa into a complete generalized transitiongraph. Let rij stand for the label of the edge from qi to qj

3. If the GTG has only two states, with qi as its initial stateand qj its final state, its associated regular expression isr=r∗ii rij(rjj+rji r

∗ii rij)

∗

4. If the GTG has three states, with initial state qi , finalstate qj and third state qk , introduce new edges labeledrpq+rpk r∗kk rkq for p=i,j, q=i,j. When this is done, removevertex qk and its associated edges

August 12, 2014 63/83

NFA to rex

5. If the GTG has four or more states, pick a state qk to beremoved. Apply rule 4 for all pairs of states (qi ,qj), i 6=k,j6=k. At each step apply the simplifying rulesr+φ=rrφ=φφ∗=λ wherever possible. When this is done, remove stateqk

6. Repeat steps 3 to 5 until the correct regular expression isobtained

August 12, 2014 64/83

Regular Grammars

DefinitionRight Linear Grammar: A grammar G =(N,T,P,S) is said to beright linear if all the productions of the form A→xB, A→x,where A, B ∈ N, and x∈T∗.

DefinitionLeft Linear Grammar: A grammar is said to be left linear if allproductions are of the form A→Bx, A→x.

DefinitionA regular grammar is one that is either right linear or left linear

August 12, 2014 65/83

Example

I A grammar with productions S→abS|a is right linear

I A grammar with productions S→S1ab, S1 →S1ab|S2, S2

→a, is left linear

I A grammar with productions S→A, A→aB| λ, B→Ab, isneither right linear nor left linear and therefore is notregular. The grammar is called a linear grammar

DefinitionA linear grammar is a grammar in which at most one variablecan occur on the right side of any production, withoutrestriction on the position of this variable. A regular grammaris always linear, but not all linear grammars are regular

August 12, 2014 66/83

Regular grammar to Finite automaton

I V0 → aV1, V1 →abV0 | b, where V0 is the start variable

V0start V1 Vfa b

ab

August 12, 2014 67/83

FA to regular grammar

I Consider the NFA with the following transitionsδ(a0,a)=q1, δ(q1,a)=q2, δ(q2,b)=q2,δ(q2,a)=qf , qf ∈F

I Production based on the first transition is q0 →aq1

I Production based on the second transition is q1 →aq2.Similarly, q2 →bq2, q2 →aqf , qf → λ

August 12, 2014 68/83

Properties of Regular Languages

I Closure Properties of Regular LanguagesI If L1 and L2 are regular languages, then so are L1 ∪ L2, L1

∩L2, L1L2, L1 and L∗1

I Proof: If L1 and L2 are regular, then there exist regularexpressions r1 and r2 such that L1=L(r1) and L2 = L(r2).By definition r1+r2, r1r2 and r∗1 are regular expressionsdenoting the language L1 ∪L2, L1L2 and L∗

1 , respectivelyI Thus closure under union , concatenation and star-closure

is immediate

August 12, 2014 69/83

Closure under complementation and Intersection

I To show closure under complementation, let M=(K,I,δ,q0,F) be a dfa that accepts L1. Then the dfaM=(K,I,δ,q0,Q-F) that accepts L1

I To show closure under intersection use L1 ∩ L2 = L1 U L2I To show the difference is closed. L1 - L2 = L1 ∩ L2

August 12, 2014 70/83

Closure under other operations

DefinitionSuppose

∑and Γ are alphabets. Then a function h:

∑→ Γ∗ is

called a homomorphism

I Homomorphism is a substitution in which a single letter isreplaced with a string. The domain of the function h isextended to strings. If w=a1a2...an, thenh(w)=h(a1)h(a2)...h(an)

I If L is a language on∑

, then its homomorphic image isdefined as h(L)=h(w):w∈L

August 12, 2014 71/83

Example for homomorphism

I Let∑

=a,b and Γ=a,b,c and define h by h(a)=ab,h(b)=bbc. Then h(aba)=abbbcab

I The homomorphic image of L=aa,aba is the languageh(L)=abab,abbbcab

I Let∑

=a,b and Γ=b,c,d. Define h by h(a)=dbcc,h(b)=bdc. If L is the regular language denoted byr=(a+b∗)(aa)∗, then r1=(dbcc+(bdc)∗)(dbccdbcc)∗

denotes the regular language h(L)

August 12, 2014 72/83

Exercises

1. Let∑

=a,b. Find a grammar that generates

1.1 L1=anbm:n≥0, m>n1.2 L2=anb2n:n≥01.3 L1L2

1.4 L1 ∪ L2

2. Find a grammar for the language L=w:|w | mod 3 =0,w∈a∗

3. Find a grammar that generates the languageL=wwR :w∈a,b+

4. Are the two grammars with respective productionsS→aSb|ab| λ, and S→aAb|ab, A→aAb|λ equivalent?Assume S is the start symbol.

August 12, 2014 73/83

Exercises Contd..

5. Give dfa for the language L=ban:n≥1,n6=56. Convert the following nfa into an equivalent dfa

q0start q1 q2

λ,0

1

0,1

0

0

1

7. Find an nfa that accepts the language L(aa*(a+b))

8. Find a regular expression for the set anbm:(n+m) is even9. Prove that (r∗1)∗ ≡ r∗1

10. Construct a dfa that accepts the language generated bythe grammar S→abA, A→baB, B→aA|bb

August 12, 2014 74/83

Right Quotient

DefinitionLet L1 and L2 be languages on the same alphabet. Then theright quotient of L1 and L2 is defined as L1/L2 =x:xy ∈ L1 forsome y in y2

TheoremIf L1 and L2 are regular languages, then L1/L2 is also regular.We say that the family of regular languages is closed underright quotient with a regular language

August 12, 2014 75/83

Right Quotient Contd..

Proof.Let L1=L(M), where M=(K,I,δ,q0,F) is a dfa. We constructanother dfa M=(K,I,δ,q0,F ) as follows.For each qi ∈ K, determine if there exists a y ∈L2 such thatδ∗(qi ,y)=qf ∈FThis can be done by using dfa’s Mi=(K,I,δ,qi ,F).The automaton Mi is M with the initial state q0 replaced by qi

Now let us determine whether ∃ a y in L(Mi ) that is also in L2

Find the transition graph for L2 ∩ L(Mi ).If there is any path between its initial vertex and any finalvertex, then L2 ∩ L(Mi ) is not empty. In this case, add qi to FRepeating this for every qi ∈ K, we determine F and therebyconstruct M

August 12, 2014 76/83

Proof Continued

Proof (Cont.)

To prove that L(M)=L1/L2, let x be any element of L1/L2.Then there must exist a y ∈ L2 such that xy ∈L1.⇒ δ∗(q0,xy) ∈ F, so that there must be some q∈Q such thatδ∗(q0,x)=q and δ∗(q,y)∈F.∴ by construction, q∈ F , and M accepts x because δ∗(q0,x) isin F .Conversely, for any x accepted by M, we haveδ∗(q0,x) = q ∈ F .Again by construction, this implies that ∃ a y ∈ L2 such thatδ∗(q,y) ∈ F.∴, xy is in L1 and x is in L1/L2.∴ conclude that L(M) = L1/L2, and from this that L1/L2 isregular

August 12, 2014 77/83

Example

I Find L1/L2 for L1=L(a*baa*) and L2=L(ab*)

I Solution: The automaton for L1 is given below.

q0start q1 q2

q3

a

b a

b b

a

a,b

August 12, 2014 78/83

Solution Contd..

I From the previous diagram we can construct anautomaton that is shown below.

q0start q1 q2

q3

a

b a

b b

a

a,b

August 12, 2014 79/83

Solution Contd..

I From the first figure it is quite evident thatI L(M0)∩ L2 = φI L(M1)∩ L2 = a6= φI L(M2)∩L2 = a6= φI L(M3)∩L2 = φ

I ∴ the automaton accepting L1/L2 is obtained

I It is shown in the second figure

I The language accepted by the second automaton isa*b+a*baa* which is equal to a*ba*

I ∴ L1/L2 = L(a*ba*)

August 12, 2014 80/83

Example 2

I If L1 = anbm: n≥1,m≥0∪ba and L2=bm:m≥1,then compute L1/L2

q0start q1 q2

q3

q4

q5

a

b

a

b

b

a

a

b

a,b

a,b

August 12, 2014 81/83

Example 2

I Automaton M is given below

q0start q1 q2

q3

q4

q5

a

b

a

b

b

a

a

b

a,b

a,b

August 12, 2014 82/83

Example 2

I L1/L2 = anbm: n≥1,m≥0

August 12, 2014 83/83

Lexers and lexer generators

I What is a lexer?I A program that does lexical analysis

I Functions of a Lexer:I A lexer has to distinguish between several different types of

tokens, for example numbers, variables and keywords.Each of these are described by its own regular expression.

I A lexer does not check if its entire input is included in thelanguages defined by the regular expressions. Instead, ithas to cut the input into pieces (tokens), each of which isincluded in one of the languages.

I If there are several ways to split the input into legal tokens,the lexer has to decide which of these it should use.

August 12, 2014 84/83

What is a lexer generator?

I A program that takes a set of token definitions (eachconsisting of a regular expression and a token name) andgenerates a lexer is called a lexer generator.

I Simplest Approach:I Tokens are defined by regular expressions r1 , r2 , . . . , rnI The regular expression r1 ‖ r2 ‖ . . . ‖ rn describes the

union of the languages r1 , r2 , . . . , rnI The DFA constructed from this combined regular

expression will scan for all token types at the same time.

August 12, 2014 85/83

How to distinguish between different token types?

1. Construct NFAs N1 , N2 , . . . , Nn for each of r1 , r2 , .. . , rn .

2. Mark the accepting states of the NFAs by the name of thetokens they accept.

3. Combine the NFAs to a single NFA by adding a newstarting state which has epsilon-transitions to each of thestarting states of the NFAs.

4. Each accepting state of the DFA consists of a set of NFAstates, some of which are accepting states which wemarked by token type in step 2. These marks are used tomark the accepting states of the DFA so each of these willindicate the token types it accepts.

August 12, 2014 86/83

Same accepting state accept different tokens?

I Let the lexer generator generate an error and require theuser to make sure the tokens are disjoint.

I Let the user of the lexer generator choose which of thetokens is preferred.

August 12, 2014 87/83

Note

I It can be quite difficult (though always possible) withregular expressions to define, e.g., the set of names thatare not keywords.

I it is common to let the lexer choose according to aprioritised list.

I the order in which tokens are defined in the input to thelexer generator indicates priority

I keywords are usually defined before variable names, whichmeans that, for example, the string if is recognised as akeyword and not a variable name.

I When an accepting state in a DFA contains acceptingNFA states with different marks, the mark correspondingto the highest priority token is used.

August 12, 2014 88/83

Splitting the tokens

I the string if17 can be split in many different ways:I As one token, which is the variable name if17.I As the variable name if1 followed by the number 7.I As the keyword if followed by the number 17.I As the keyword if followed by the numbers 1 and 7.I As the variable name i followed by the variable name f17.I And several more.

I A common convention is that it is the longest prefix of theinput that matches any token which will be chosen.Hence, the first of the above possible splittings of if17 willbe chosen.

I Note that the principle of the longest match takesprecedence over the order of definition of tokens, so eventhough the string starts with the keyword if, which hashigher priority than variable names, the variable name ischosen because it is longer.

August 12, 2014 89/83

Example

August 12, 2014 90/83

Lex A Lexical Analyzer Generator

I What is Lex?I Lex is a tool that takes as input a set of regular

expressions that describe tokens, creates a DFA thatrecognizes that set of tokens, and then creates C code thatimplements that DFA.

I A lex file consists of regular expression, action pairs, whereactions are represented by blocks of C code.

I Given a lex file, lex creates a definition of the C functionint yylex(void)

August 12, 2014 91/83

What is the function of yylex()?

I When the function yylex is called, the input file isexamined to see which regular expression matches the nextcharacters in the input file.

I The action associated with that regular expression isperformed, and then lex continues looking for more regularexpression matches.

August 12, 2014 92/83

Structure of a Lex File

August 12, 2014 93/83

Various Sections in the structure

1. Section One contains #includes and C definitions that canbe used in the rest of the file.

2. Section Two contains contains simple name definitionsand state declarations

3. Section Three lex rules, in the form of regular expression /action pairs. This is where we define the tokens we wantthe lexer to recognize, and what to do when we recognizea token.

August 12, 2014 94/83

Lex Contd.

I What happens if there is more than one match in a regularexpression?

I When there is more than one match, lex uses the followingstrategy:

1. Always match to the longest possible string.2. If two different rules match the same longest string, use

the regular expression that appears first in the input file.

August 12, 2014 95/83

Named Regular Expressions

I Lex allows us to break regular expressions into smallerpieces, and give those regular expression fragmentssymbolic names.

I We can then use the symbolic names in other regularexpressions, to make our rules more readable.

I Format: name regular expression

I Once a name is defined, it can be used in a regularexpression for a lex rule by enclosing it in braces and .

I Example: DIGIT [0-9]. DIGIT would be a symbolicname for the regular expression [0-9].

August 12, 2014 96/83

Tokens with Values

I How can we have yylex return both which token wasmatched, and the value of the token?

I We can use a global variable to communicate the extrainformation.

I yylex() can set the value of the global variable beforereturning the token type.

I The function that calls yylex() can then examine the valueof this global variable to determine the value of the token

I We can make use of yylval.I Structure of yylval:

I union I int integer valueI char *string valueI yylval

August 12, 2014 97/83

I Why do we call this variable yylval instead of a moremeaningful name like tokenValue?

I When we use lex in conjunction with another tool(yacc),Yacc has a rather odd naming convention, and we have tofollow it for the two tools to work together.

August 12, 2014 98/83

toc_uniti

Documents

written x

set operations

membershipi example

s1 s22

s1 s2august

collection of elements

mathematical models

models of computation