lexical analysis - compiler design

39
LEXICAL ANALYSIS & ITS ROLE Jeena Thomas, Asst Professor, CSE, SJCET Palai 1

Upload: muhammed-afsal-villan

Post on 14-Feb-2017

540 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Lexical analysis - Compiler Design

Jeena Thomas, Asst Professor, CSE, SJCET Palai

1

LEXICAL ANALYSIS & ITS ROLE

Page 2: Lexical analysis - Compiler Design

2

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Lexical analysis

» The scanning/lexical analysis phase of a compiler performs the task of reading the source program as a file of characters and dividing up into tokens.

» Usually implemented as subroutine or co-routine of parser.

» Front end of compiler.

Page 3: Lexical analysis - Compiler Design

3

Jeena Thomas, Asst Professor, CSE, SJCET Palai

tokens» Each token is a sequence of characters that

represents a unit of information in the source program.

Page 4: Lexical analysis - Compiler Design

4

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Example-tokens» Keywords which are fixed string of letters .eg: “if”,

“while”.» Identifiers which are user-defined strings

composed of letters and numbers.» Special symbols like arithmetic symbols.

Page 5: Lexical analysis - Compiler Design

5

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Applications» Scanners perform pattern matching process.» The techniques used to implement lexical analyzers

can also be applied to other areas such as query languages and information retrieval systems.

» Since pattern directed programming is widely useful, pattern action language called Lex for specifying lexical analyzers.

» In lex , patterns are specified by regular expressions, and a compiler for lex can generate an efficient finite-automaton recognizer for the regular expression.

Page 6: Lexical analysis - Compiler Design

6

Jeena Thomas, Asst Professor, CSE, SJCET Palai

» A software tool that automates the construction of lexical analyzers allows people with different backgrounds to use pattern matching in their own areas.

» Jarvis[1976] Lexical analyzer generator to create a program that recognizes imperfections in printed circuit boards.

» The circuits are digitally scanned and converted into “strings” of line segments at different angles.

» The “lexical analyzer” looked for patterns corresponding to imperfections in the string of line segments.

Page 7: Lexical analysis - Compiler Design

7

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Advantage-lexical analyzer generator

» It can utilize the best-known pattern-matching algorithms and thereby create efficient lexical analyzers for people who are not experts in pattern-matching techniques.

Page 8: Lexical analysis - Compiler Design

Win

ter 2

007

SEG2101 Chapter 8

8

The Role of Lexical Analyzer» Lexical analyzer is the first phase of a compiler.» Its main task is to read input characters and produce as

output a sequence of tokens that parser uses for syntax analysis.

Page 9: Lexical analysis - Compiler Design

9

Jeena Thomas, Asst Professor, CSE, SJCET Palai

A Simple Lexical Analyzer

Page 10: Lexical analysis - Compiler Design

Example Tokens

Type Examples

ID foo n_14 last

NUM 73 00 517 082

REAL 66.1 .5 10. 1e67 5.5e-10

IF if

COMMA ,

NOTEQ !=

LPAREN (

RPAREN )

Page 11: Lexical analysis - Compiler Design

Example NonTokens

Type Examples

comment /* ignored */

preprocessor directive #include <foo.h>

#define NUMS 5, 6

macro NUMS

whitespace \t \n \b

Page 12: Lexical analysis - Compiler Design

12

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Tasks –lexical analyzer

» Separation of the input source code into tokens.» Stripping out the unnecessary white spaces from

the source code.» Removing the comments from the source text.» Keeping track of line numbers while scanning the

new line characters. These line numbers are used by the error handler to print the error messages.

» Preprocessing of macros.

Page 13: Lexical analysis - Compiler Design

13

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Issues in Lexical Analysis» There are several reasons for separating the analysis

phase of compiling into lexical analysis and parsing:» It leads to simpler design of the parser as the

unnecessary tokens can be eliminated by scanner.» Efficiency of the process of compilation is improved.

The lexical analysis phase is most time consuming phase in compilation. Using specialized buffering to improve the speed of compilation.

» Portability of the compiler is enhanced as the specialized symbols and characters(language and machine specific) are isolated during this phase.

Page 14: Lexical analysis - Compiler Design

14

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Tokens, Patterns, Lexemes» Connected with lexical analysis are three important

terms with similar meaning.» Lexeme» Token» Patterns

Page 15: Lexical analysis - Compiler Design

15

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Tokens, Patterns, Lexemes

» A token is a pair consisting of a token name and an optional attribute value. Token name: Keywords, operators, identifiers, constants, literal strings, punctuation symbols(such as commas,semicolons)

» A lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. E.g.Relation {<.<=,>,>=,==,<>}

Page 16: Lexical analysis - Compiler Design

16

Jeena Thomas, Asst Professor, CSE, SJCET Palai

» A pattern is a description of the form that the lexemes of token may take.

» It gives an informal or formal description of a token.

» Eg: identifier» 2 purposes» Gives a precise description/ specification of

tokens.» Used to automatically generate a lexical

analyzer

Page 17: Lexical analysis - Compiler Design

17

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Example of tokens

» const pi = 3.1416;» The substring pi is a lexeme for the token

“identifier.”

Page 18: Lexical analysis - Compiler Design

18

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Identify tokens and lexemes?

» x=x*(acc+123)

Page 19: Lexical analysis - Compiler Design

19

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Lexical Analysis

Page 20: Lexical analysis - Compiler Design

20

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Lexical errors» 1.) let us consider a statement “fi(a==f)”. Here “fi”

is a misspelled keyword. This error is not detected in lexical analysis as “fi” is taken as an identifier. This error is then detected in other phases of compilation.

» 2.) in case the lexical analyzer is not able to continue with the process of compilation, it resorts to panic mode of error recovery.

• Deleting the successive characters from the remaining input until a token is detected.

• Deleting extraneous characters.

Page 21: Lexical analysis - Compiler Design

21

Jeena Thomas, Asst Professor, CSE, SJCET Palai

• Inserting missing characters• Replacing an incorrect character by a correct

character.• Transposing two adjacent characters

Page 22: Lexical analysis - Compiler Design

22

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Minimum distance error correction

» Is the strategy generally followed by the lexical analyzer to correct the errors in the lexemes.

» It is nothing but the minimum number of the corrections to be made to convert an invalid lexeme to a valid lexeme.

» But it is not generally used in practice because it is too costly to implement.

Page 23: Lexical analysis - Compiler Design

Jeena Thomas, Asst Professor, CSE, SJCET Palai

23

SPECIFICATION OF T0KENS USING

REGULAR EXPRESSION

Page 24: Lexical analysis - Compiler Design

24

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Specification of tokens» Scanners are special pattern matching

processors.» For representing patterns of strings of

characters, Regular Expressions(RE) are used. » A regular expression (r) is defined by set of

strings that matches it.» This set is called as the language generated by

the regular expression and is represented as L(r).

» The set of symbols in the language is called the alphabet of the language is represented as ∑.

Page 25: Lexical analysis - Compiler Design

25

Jeena Thomas, Asst Professor, CSE, SJCET Palai

» An alphabet is a finite set of symbols.» Example» A set of alphabetic characters is represented as

L={A,…,Z,a,…,z} and set of digits is represented as D={0,1,…,9}.

» LUD is a language.» Strings over LUD- Begin,Max1, max1, 123, €…

Page 26: Lexical analysis - Compiler Design

26

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Operations on Languages

Page 27: Lexical analysis - Compiler Design

27

Jeena Thomas, Asst Professor, CSE, SJCET Palai

» Intersection» L∩M={ s|s is in L and S is in M}» Exponentiation» Li =L Li-1

Page 28: Lexical analysis - Compiler Design

28

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Regular expression operations» Choice among alternates» Concatenation» Repetition

Page 29: Lexical analysis - Compiler Design

29

Jeena Thomas, Asst Professor, CSE, SJCET Palai

1. CHOICE AMONG ALTERNATES

» Indicated by metacharacter ‘|’(vertical bar)» r|s» R.E that matches any string that is matched either

by r or s.» L(r|s)= L(r) U L(s)

Page 30: Lexical analysis - Compiler Design

30

Jeena Thomas, Asst Professor, CSE, SJCET Palai

example

» Consider L(r)={a},» L(s)={b},» L(t)= {€},» L(u)= {}.» What do the following R.E represent?» (i) r|s» (ii) r|t» (iii) r|u

Page 31: Lexical analysis - Compiler Design

31

Jeena Thomas, Asst Professor, CSE, SJCET Palai

2. CONCATENATION

» rs» It matches any string that is a concatenation of 2

strings, the first of which matches r and second of which matches s.

» L(rs) = L(r) L(s)

Page 32: Lexical analysis - Compiler Design

32

Jeena Thomas, Asst Professor, CSE, SJCET Palai

example

» Consider L(r) ={a}, L(s)={b}, L(t) ={€}, L(u)={ }, L(v)={c}. What do following R.E represent?

» (i) rs» (ii) rt» (iii) ru» (iv) (r|s)v

Page 33: Lexical analysis - Compiler Design

33

3. REPETITION» Also called Kleene closure» Represents any finite concatenation of strings

each matches strings from L(r).» r*» Let S={a}, then L(a*)={€, a, aa, aaa,…}» S*={€}USUSSUSSSU….=»

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Page 34: Lexical analysis - Compiler Design

34

Jeena Thomas, Asst Professor, CSE, SJCET Palai

example

» Consider L(r)= {a}, » L(s)={b}. » What do the following R.E represent?» (i) r* » (ii) (rs) * » (iii) (r|s)* » (iv) (r|ss)*

Page 35: Lexical analysis - Compiler Design

35

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Precedence of operators

» Repetition --------------(highest)» Concatenation left associative» Choice-------------------(lowest)

Page 36: Lexical analysis - Compiler Design

36

Jeena Thomas, Asst Professor, CSE, SJCET Palai

» One or more instances: (r)+» Zero of one instances: r?» Zero or more instances: r*

Page 37: Lexical analysis - Compiler Design

37

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Examples of Regular Definitions

Page 38: Lexical analysis - Compiler Design

38

Jeena Thomas, Asst Professor, CSE, SJCET Palai

Example-unsigned numbers

Page 39: Lexical analysis - Compiler Design

39

Jeena Thomas, Asst Professor, CSE, SJCET Palai

SUMMARY

» Regular Expression(RE): represents pattern of string of characters.

» Language (L(r)): set of strings» Alphabet(∑): set of symbols» Meta character: is a special character (not a part

of ∑) used in R.E eg: *, | etc» Basic R.E: R.E consisting of only one character» Regular language: language defined by RE