cits2211 discrete structures lectures for semester 2 2017 ......1, 01, 001 rejected: 0, 00, 10 we...

21
Regular Expressions Kleene’s theorem CITS2211 Discrete Structures Lectures for Semester 2 2017 Regular Languages October 12, 2017

Upload: others

Post on 06-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

CITS2211 Discrete StructuresLectures for Semester 2 2017

Regular Languages

October 12, 2017

Page 2: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Highlights

What we know so far:

FSMs accept certain input strings

The set of all input strings accepted by an FSM is called thelanguage recognised by that FSM

So we can specify languages using techniques we alreadyknow for sets

But that is not very convenient.

Today we will study:

a compact way to describe the language recognised by a FSMusing regular expressions.

Page 3: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Why (study regular expressions)

Regular expressions (REs) are widely used in ComputerScience

REs are used to search text for strings that match certainpatterns

Linux grep, awk, Perl, text editors all provide regularexpression mechanisms

Programming languages such as Java, Python provide regularexpression libraries

Page 4: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Reading

Introduction to the Theory of Computation by Michael SipserChapter 1: Regular LanguagesSection 1.3 Regular Expressions

Page 5: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Lecture Outline: Regular Expressions

In this class we will study

Regular expressions

Regular languages

We will see that not all languages are regular

Page 6: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Regular Expressions

We need a compact way to describe the languages that can berecognized by finite state machines.

Define a regular expression over an alphabet Σ as follows:

1 The symbol ∅ and the symbol ε are regular expressions

2 Any symbol i ∈ Σ is a regular expression

3 If A and B are regular expressions, then so are (AB), (A + B)and (A)∗

Notice that this is a recursive definition, very similar to thedefinition of a well-formed formula. Brackets are only for groupingand can be omitted if the resulting expression is unambiguous.

Page 7: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Regular Languages

Associated with every regular expression is a collection of stringsknown as a regular set or regular language.

1 The regular expression ∅ represents the empty language ∅2 The regular expression ε represents the set {ε} containing only

of the empty string

3 The regular expression i represents {i}4 The regular expression AB represents the set of all strings of

the form αβ where α ∈ A and β ∈ B.

5 The regular expression A + B represents the union of theregular sets represented by A and B

6 The regular expression (A)∗ represents the set of allconcatenations of strings in the regular set represented by A.

Page 8: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Examples of regular languages

1 0∗ represents the set {ε, 0, 00, 000, 0000, . . .}2 0 + 1∗ represents the set {0, ε, 1, 11, 111, 1111, . . .}3 0∗1∗ represents the set

ε 1 11 111 1111 . . .0 01 011 0111 01111 . . .00 001 0011 00111 001111 . . ....

......

......

...

4 (0∗)(0 + 1∗) represents the set

0 ε 1 11 111 1111 . . .00 0 01 011 0111 01111 . . .000 00 001 0011 00111 001111 . . ....

......

......

...

Page 9: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Non-regular languages

Q: Are all languages regular ?A: No

We can use a diagonalization argument from earlier in the courseto show that there are non-regular languages. In fact, there areeven some non-regular languages over the alphabet {0, 1}.

Firstly, notice that the set of regular expressions is countable— wecan arrange them in lexicographical order. Therefore we can forma complete list R1, R2, R3, . . ., of regular expressions.

Let L1, L2, . . ., be the corresponding list of regular languagesdescribed by the above regular expressions.

Now we will form a new language L and show that it is not in thelist above.

Page 10: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

The diagonalization argument

Recall that the earlier diagonalization arguments formed a newobject that was definitely different from every object on thesupposedly complete list.We will copy this strategy as follows:Consider the set of strings 1, 11, 111, 1111, and form the newlanguage L as follows:If 1 /∈ L1, then add it to L (else do nothing)If 11 /∈ L2, then add it to LIf 111 /∈ L3, then add it to LIf 1111 /∈ L4, then add it to L and so on...Then clearly L is different from L1, L2, . . . and thus L is notregular.This is an example of a non-constructive proof. We have shownthe existence of a non-regular language, but have not actuallyspecified one.

Page 11: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

FSMs and regular languages

Consider the following recognizer with start state s0 and acceptingstate s1:

s0 s1

1

0

0 1

What language does this machine recognize?

Page 12: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

We can start by looking at some input strings and seeing if there isa pattern...Accepted:1, 01, 001Rejected:0, 00, 10We hypothesize that this FSM accepts strings ending in 1, and amoment’s thought shows that this is indeed correct.Therefore the language accepted by this FSM is

L(M) = (0 + 1)∗1

Page 13: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Another example

Write down some accepted strings for this FSM. And somerejected ones. Can you see a pattern?

s0 s1

s2s3

0

1 00

1

10, 1

Page 14: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

For this example we can see that if the FSM ever reaches state s3then it can never reach the accepting states (s0 or s2). Thereforeany accepted string must start with 0.

After than, another 0 will change the state to s3, and therefore thesecond character must be a 1.

We continue this argument to see that the FSM will only be in anaccepting state provided it has received a sequence of 01 pairs. Assoon as this pattern is broken, the machine changes to state s3.

Therefore the language accepted by this FSM is

L(M) = (01)∗

Page 15: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Kleene’s theorem

The languages recognized by theFSMs given earlier could all bedescribed with regular expressions

Kleene’s theorem shows that this isno coincidence

Stephen Kleene, 1909-94, American

Amongst many achievements, Kleeneinvented regular expressions

Photo source: wikipedia

Page 16: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

An application of Kleene’s theorem

Theorem: Kleene’s Theorem A language is regular if and only ifit is the language recognized by some finite state machine.

Before we consider the proof of Kleene’s theorem, we consider anapplication of it:We will prove that the language

L = {01, 0011, 000111, . . .} = {0n1n : n ∈ N ∧ n > 0}

is not regular.

Page 17: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

We will use a proof by contradiction, whereby we assume that thelanguage is regular, and show that this leads to a contradiction.

Proof: If L is regular, then there is some finite state machine Mthat recognizes L. Suppose that M has m states. Now considerthe operation of M on the input strings

0, 00, 000, . . . , 0m+1

As M has only m states, the pigeonhole principle implies that (atleast) two of these strings (of length v and w), say 0v , 0w leave Min the same state. But then because M accepts the string 0v1v itmust also accept the string 0w1v which is not in L since thenumbers of 0s and 1s are different since w 6= vTherefore there is no finite state machine that recognizes L. QED

Page 18: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Proof of Kleene’s theorem

It is necessary to show two things in order to prove Kleene’stheorem:

(1) Given a regular expression, we can find a finite statemachine to accept the strings described by that regularexpression(2) Given a finite state machine, we can find a regularexpression to describe the strings accepted by thatmachine

Page 19: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Regular expression → FSM

Given a regular expression, we wish to find a finite state machinethat can recognize the corresponding language.This is hard to do directly, and so is done in two steps:

• Finding a non-deterministic finite state machine thatrecognizes the appropriate language• Converting the non-deterministic finite state machineinto a deterministic one.

The concept of non-deterministic computation is surprisinglypowerful in theoretical computer science, even though impossiblein reality (barring quantum computing).

Page 20: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Finite state machine → regular expression

This is also a two-stage process:

• Convert the finite state machine into an equivalentgeneralized non-deterministic finite state machine• Reduce the number of states of this machine down to asingle state labelled with the corresponding regularexpression.

This procedure is tedious, but mechanical and so we omit thedetails. See Sipser (text).

Page 21: CITS2211 Discrete Structures Lectures for Semester 2 2017 ......1, 01, 001 Rejected: 0, 00, 10 We hypothesize that this FSM accepts strings ending in 1, and a moment’s thought shows

Regular Expressions Kleene’s theorem

Summary

What we know now

Regular expressions are a compact notation for specifyinglanguages (sets of strings)

Every regular expression determines a regular language

Not all languages are regular

A language is regular iff it is the language recognised by someFSM (Kleene’s theorem)