3. a grammar is left-recursive if and only if there exists

3. A grammar is left-recursive if and only if there exists a non terminal symbol that can

derive to a sentential form with itself as the leftmost symbol.

A->Aa

Yes it is left recursive.

4. A “handle” of a string is a substring that matches the RHS of a production and whose reduction to the non-terminal (on the LHS of the production) represents one step along the reverse of a rightmost derivation toward reducing to the start symbol.

If S →* αAw →* αβw, then A → β in the position following α is a handle of αβw.

In such a case, it is suffice to say that the substring β is a handle of αβw, if the position of β and the corresponding production are clear.

Handle Pruning: A rightmost derivation in reverse can be obtained by "handle pruning". Two Problems:

1. To locate the substring to be reduced in right-sentential form.

2. To determine the production with the same substring on the right-hand side to be chosen.

6. There are four common error-recovery strategies that can be implemented in the parser to deal with errors in the code.

Panic mode

When a parser encounters an error anywhere in the statement, it ignores the rest of the statement by not processing input from erroneous input to delimiter, such as semi-colon. This is the easiest way of error-recovery and also, it prevents the parser from developing infinite loops.

Statement mode

When a parser encounters an error, it tries to take corrective measures so that the rest of inputs of statement allow the parser to parse ahead. For example, inserting a missing semicolon, replacing comma with a semicolon etc. Parser designers have to be careful here because one wrong correction may lead to an infinite loop.

Error productions

https://en.wikipedia.org/wiki/Formal_grammar#The_semantics_of_grammars

Some common errors are known to the compiler designers that may occur in the code. In addition, the designers can create augmented grammar to be used, as productions that generate erroneous constructs when these errors are encountered.

Global correction

The parser considers the program in hand as a whole and tries to figure out what the program is intended to do and tries to find out a closest match for it, which is error-free. When an erroneous input (statement) X is fed, it creates a parse tree for some closest error-free statement Y. This may allow the parser to make minimal changes in the source code, but due to the complexity (time and space) of this strategy, it has not been implemented in practice yet.

7. Static Allocation

In this allocation scheme, the compilation data is bound to a fixed location in the memory and it does not change when the program executes. As the memory requirement and storage locations are known in advance, runtime support package for memory allocation and de-allocation is not required.

Stack Allocation

Procedure calls and their activations are managed by means of stack memory allocation. It works in last-in-first-out (LIFO) method and this allocation strategy is very useful for recursive procedure calls.

8.

Syntax Directed Translation are augmented rules to the grammar that facilitate semantic analysis. SDT involves passing information bottom-up and/or top-down the parse tree in form of attributes attached to the nodes. Syntax directed translation rules use 1) lexical values of nodes, 2) constants & 3) attributes associated to the non-terminals in their definitions.

The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree and compute the values of attributes at the nodes of the tree by visiting them in some order. In many cases, translation can be done during parsing without building an explicit tree.

9. The characteristics of basic blocks are-

They do not contain any kind of jump statements in them.

There is no possibility of branching or getting halt in the middle.

All the statements execute in the same order they appear.

They do not lose lose the flow control of the program.

Example Of Basic Block-

Three Address Code for the expression a = b + c + d is-

Flow Graphs-

A flow graph is a directed graph with flow control information added to

the basic blocks.

The basic blocks serve as nodes of the flow graph.

There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in the code.

10. Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes. A code optimizing process must follow the three rules given below:

The output code must not, in any way, change the meaning of the program.

Optimization should increase the speed of the program and if possible, the program should demand less number of resources.

Optimization should itself be fast and should not delay the overall compiling process.

Efforts for an optimized code can be made at various levels of compiling the process.

At the beginning, users can change/rearrange the code or use better algorithms to write the code.

After generating intermediate code, the compiler can modify the intermediate code by address calculations and improving loops.

While producing the target machine code, the compiler can make use of memory hierarchy and CPU registers.

Optimization can be categorized broadly into two types : machine independent and machine dependent.

Machine-independent Optimization

In this optimization, the compiler takes in the intermediate code and transforms a part of the code that does not involve any CPU registers and/or absolute memory locations. For example:

do { item = 10; value = value + item; } while(value<100);

This code involves repeated assignment of the identifier item, which if we put this way:

Item = 10; do { value = value + item; } while(value<100);

should not only save the CPU cycles, but can be used on any processor.

Machine-dependent Optimization

Machine-dependent optimization is done after the target code has been generated and when the code is transformed according to the target machine architecture. It involves CPU registers and may have absolute memory references rather than relative references. Machine-dependent optimizers put efforts to take maximum advantage of memory hierarchy.

11.a) In computer science, a context-free grammar is said to be an ambiguous grammar if there exists a string which can be generated by the grammar in more than one way (i.e., the string admits more than one parse tree or, equivalently, more than one leftmost derivation). A context-free language is inherently ambiguous if all context-free grammars generating that language are ambiguous. Some programming languages have ambiguous grammars; in this case, semantic information is needed to select the intended parse tree of an ambiguous construct. For example, in C the following: x*y ; can be interpreted as either: * the declaration of an identifier named y of type pointer-to-x, or * an expression in which x is multiplied by y and then the result is discarded. To correctly choose between the two possible interpretations, a compiler must consult its symbol table to find out whether x has been declared as a typedef name that is visible at this point.

11. b) Compiler operates in various phases each phase transforms the source program from one representation to another. Every phase takes inputs from its previous stage and feeds its output to the next phase of the compiler.

There are 6 phases in a compiler. Each of this phase help in converting the high-level langue the machine code. The phases of a compiler are:

1. Lexical analysis

2. Syntax analysis

3. Semantic analysis

4. Intermediate code generator

5. Code optimizer

6. Code generator

All these phases convert the source code by dividing into tokens, creating parse trees, and optimizing the source code by different phases.

Phase 1: Lexical Analysis

Lexical Analysis is the first phase when compiler scans the source code. This process can be left to right, character by character, and group these characters into tokens.

Here, the character stream from the source program is grouped in meaningful sequences by identifying the tokens. It makes the entry of the corresponding tickets into the symbol table and passes that token to next phase.

The primary functions of this phase are:

Identify the lexical units in a source code Classify lexical units into classes like constants, reserved words, and enter them in

different tables. It will Ignore comments in the source program Identify token which is not a part of the language

Example:

x = y + 10

Tokens

X identifier

= Assignment operator

Y identifier

+ Addition operator

10 Number

Phase 2: Syntax Analysis

Syntax analysis is all about discovering structure in code. It determines whether or not a text follows the expected format. The main aim of this phase is to make sure that the source code was written by the programmer is correct or not.

Syntax analysis is based on the rules based on the specific programing language by constructing the parse tree with the help of tokens. It also determines the structure of source language and grammar or syntax of the language.

Here, is a list of tasks performed in this phase:

Obtain tokens from the lexical analyzer

Checks if the expression is syntactically correct or not Report all syntax errors Construct a hierarchical structure which is known as a parse tree

Example

Any identifier/number is an expression

If x is an identifier and y+10 is an expression, then x= y+10 is a statement.

Consider parse tree for the following example

(a+b)*c

In Parse Tree

Interior node: record with an operator filed and two files for children Leaf: records with 2/more fields; one for token and other information about the token Ensure that the components of the program fit together meaningfully Gathers type information and checks for type compatibility Checks operands are permitted by the source language

https://www.guru99.com/images/1/020819_1119_PhasesofCom2.png

Phase 3: Semantic Analysis

Semantic analysis checks the semantic consistency of the code. It uses the syntax tree of the previous phase along with the symbol table to verify that the given source code is semantically consistent. It also checks whether the code is conveying an appropriate meaning.

Semantic Analyzer will check for Type mismatches, incompatible operands, a function called with improper arguments, an undeclared variable, etc.

Functions of Semantic analyses phase are:

Helps you to store type information gathered and save it in symbol table or syntax tree Allows you to perform type checking In the case of type mismatch, where there are no exact type correction rules which satisfy

the desired operation a semantic error is shown Collects type information and checks for type compatibility Checks if the source language permits the operands or not

Example

float x = 20.2;

float y = x*30;

In the above code, the semantic analyzer will typecast the integer 30 to float 30.0 before multiplication

Phase 4: Intermediate Code Generation

Once the semantic analysis phase is over the compiler, generates intermediate code for the target machine. It represents a program for some abstract machine.

Intermediate code is between the high-level and machine level language. This intermediate code needs to be generated in such a manner that makes it easy to translate it into the target machine code.

Functions on Intermediate Code generation:

It should be generated from the semantic representation of the source program Holds the values computed during the process of translation Helps you to translate the intermediate code into target language Allows you to maintain precedence ordering of the source language It holds the correct number of operands of the instruction

Example

For example,

total = count + rate * 5

Intermediate code with the help of address code method is:

t1 := int_to_float(5)

t2 := rate * t1

t3 := count + t2

total := t3

Phase 5: Code Optimization

The next phase of is code optimization or Intermediate code. This phase removes unnecessary code line and arranges the sequence of statements to speed up the execution of the program without wasting resources. The main goal of this phase is to improve on the intermediate code to generate a code that runs faster and occupies less space.

The primary functions of this phase are:

It helps you to establish a trade-off between execution and compilation speed Improves the running time of the target program Generates streamlined code still in intermediate representation Removing unreachable code and getting rid of unused variables Removing statements which are not altered from the loop

Example:

Consider the following code

a = intofloat(10)

b = c * a

d = e + b

f = d

Can become

b =c * 10.0

f = e+b

Phase 6: Code Generation

Code generation is the last and final phase of a compiler. It gets inputs from code optimization phases and produces the page code or object code as a result. The objective of this phase is to allocate storage and generate relocatable machine code.

It also allocates memory locations for the variable. The instructions in the intermediate code are converted into machine instructions. This phase coverts the optimize or intermediate code into the target language.

The target language is the machine code. Therefore, all the memory locations and registers are also selected and allotted during this phase. The code generated by this phase is executed to take inputs and generate expected outputs.

Example:

a = b + 60.0

Would be possibly translated to registers.

MOVF a, R1

MULF #60.0, R2

ADDF R1, R2

12. a) First and Follow sets are needed so that the parser can properly apply the needed production rule at the correct position.

First(α) is a set of terminal symbols that begin in strings derived from α.

Consider the production rule-

A → abc / def / ghi

Then, we have-

First(A) = { a , d , g }

Rules For Calculating First Function-

Rule-01:

For a production rule X → ∈,

First(X) = { ∈ }

Rule-02:

For any terminal symbol ‘a’,

First(a) = { a }

Rule-03:

For a production rule X → Y1Y2Y3,

Calculating First(X)

If ∈ ∉ First(Y1), then First(X) = First(Y1)

If ∈ ∈ First(Y1), then First(X) = { First(Y1) – ∈ } ∪ First(Y2Y3)

Calculating First(Y2Y3)

If ∈ ∉ First(Y2), then First(Y2Y3) = First(Y2)

If ∈ ∈ First(Y2), then First(Y2Y3) = { First(Y2) – ∈ } ∪ First(Y3)

Similarly, we can make expansion for any production rule X → Y1Y2Y3…..Yn.

Follow Function-

Follow(α) is a set of terminal symbols that appear immediately to the right of α.

Rules For Calculating Follow Function-

Rule-01:

For the start symbol S, place $ in Follow(S).

Rule-02:

For any production rule A → αB,

Follow(B) = Follow(A)

Rule-03:

For any production rule A → αBβ,

If ∈ ∉ First(β), then Follow(B) = First(β) If ∈ ∈ First(β), then Follow(B) = { First(β) – ∈ } ∪ Follow(A)

12. b)

A context-free grammar G = (VT, VN, S, P) whose parsing table has no multiple entries is said to be LL(1). In the name LL(1),

the first L stands for scanning the input from left to right, the second L stands for producing a leftmost derivation, and the 1 stands for using one input symbol of lookahead at each step to make parsing

action decision.

A language is said to be LL(1) if it can be generated by a LL(1) grmmar. It can be shown that LL(1) grammars are

not ambiguous and not left-recursive.

A context-free grammar G = (VT, VN, S, P) is LL(1) if and if only if for every nonterminal A and

every strings of symbols , such that and A | we have

1. FIRST( ) FIRST( ) = ,

2. if then FIRST( ) FOLLOW(A) = .

14. a) Storage Allocation Strategies:

There are three different storage allocation strategies based on runtime storage. They are:

Static Allocation

Stack Allocation

Heap Allocation

Static Allocation:

Storage is allocated at compile time

Static storage has fixed allocation that does not change during program execution

As bindings do not change at runtime, no runtime support is required

At compile time, compiler can fill the address at which the target code can find the data it operates on

FORTRAN uses the static allocation strategy

Limitations

Size of data objects should be known at compile time

Recursion is not supported

Data structures cannot be created at runtime

Stack Allocation

Stack allocation manages the runtime storage as a stack, i.e., control stack

Activation records are pushed and popped as activation begins and end respectively

Locals are always bound to fresh storage in each activation, because a new activation is onto a stack when a call is made

Values of locals are deleted as activation ends

The data structure can be created dynamically for stack allocation

Limitations

Values of locals cannot be retained once activation ends

The memory addressing can be done using pointers and indexed registers

This type of allocation is slower than static allocation

Heap allocation

Storage can be allocated and deallocated in any order

If the values of non-local variables must be retained even after the activation record then such a retaining is not possible by stack allocation

It is used for retaining of local variables

The heap allocation allocates the continuous block of memory when required for storage of activation records. This allocated memory can be deallocated when activation ends

Free space can be further reused by heap manager

It supports for recursion and data structures can be created at runtime

Limitation

Heap manages overhead.

Comparison of

Deep and Shallow Access Home

Introduction to Code

Optimization

14. b) Storage Allocation

Runtime environment manages runtime memory requirements for the following entities:

Code : It is known as the text part of a program that does not change at runtime. Its memory requirements are known at the compile time.

Procedures : Their text part is static but they are called in a random manner. That is why, stack storage is used to manage procedure calls and activations.

Variables : Variables are known at the runtime only, unless they are global or constant. Heap memory allocation scheme is used for managing allocation and de-allocation of memory for variables in runtime.

Static Allocation

In this allocation scheme, the compilation data is bound to a fixed location in the memory and it does not change when the program executes. As the memory requirement and storage locations are known in advance, runtime support package for memory allocation and de-allocation is not required.

Stack Allocation

Procedure calls and their activations are managed by means of stack memory allocation. It works in last-in-first-out (LIFO) method and this allocation strategy is very useful for recursive procedure calls.

Heap Allocation

Variables local to a procedure are allocated and de-allocated only at runtime. Heap allocation is used to dynamically allocate memory to the variables and claim it back when the variables are no more required.

http://estudies4you.blogspot.com/2017/09/comparison-of-deep-and-shallow-access.html

http://estudies4you.blogspot.com/2017/09/comparison-of-deep-and-shallow-access.html

http://estudies4you.blogspot.com/2017/11/introduction-to-code-optimization.html

http://estudies4you.blogspot.com/2017/11/introduction-to-code-optimization.html

Except statically allocated memory area, both stack and heap memory can grow and shrink dynamically and unexpectedly. Therefore, they cannot be provided with a fixed amount of memory in the system.

As shown in the image above, the text part of the code is allocated a fixed amount of memory. Stack and heap memory are arranged at the extremes of total memory allocated to the program. Both shrink and grow against each other.

15. a) Directed Acyclic Graphs (i) Nodes in a syntax tree represent constructs in the source program A DAG is used to identify common sub expressions. e.g. a+a*(b-c)+(b-c)*d By doing so it gives the compiler important hints on how to generate efficient code to evaluate the expressions. DAG for a+a*(b-c)+(b-c)*d

15 b) 1. Quadruples-

In quadruples representation, each instruction is splitted into the following 4 different fields-

op, arg1, arg2, result

Here-

The op field is used for storing the internal code of the operator.

The arg1 and arg2 fields are used for storing the two operands used.

The result field is used for storing the result of the expression.

Exceptions

There are following exceptions-

Exception-01:

To represent the statement x = op y, we place-

op in the operator field

y in the arg1 field

x in the result field

arg2 field remains unused

Exception-02:

To represent the statement like param t1, we place-

param in the operator field

t1 in the arg1 field

Neither arg2 field nor result field is used

Exception-03:

To represent the unconditional and conditional jump statements, we place label of the target in the result field.

2. Triples-

In triples representation,

References to the instructions are made.

Temporary variables are not used.

3. Indirect Triples-

This representation is an enhancement over triples representation.

It uses an additional instruction array to list the pointers to the triples in the desired order.

Thus, instead of position, pointers are used to store the results.

It allows the optimizers to easily re-position the sub-expression for producing the optimized code.

Problem-01:

Translate the following expression to quadruple, triple and indirect triple-

a + b x c / e ↑ f + b x c

Solution-

Three Address Code for the given expression is-

T1 = e ↑ f

T2 = b x c

T3 = T2 / T1

T4 = b x a

T5 = a + T3

T6 = T5 + T4

Now, we write the required representations-

Quadruple Representation-

Location Op Arg1 Arg2 Result

(0) ↑ e f T1

(1) x b c T2

(2) / T2 T1 T3

(3) x b a T4

(4) + a T3 T5

(5) + T5 T4 T6

Triple Representation-

Location Op Arg1 Arg2

(0) ↑ e f

(1) x b c

(2) / (1) (0)

(3) x b a

(4) + a (2)

(5) + (4) (3)

Indirect Triple Representation-

Statement

35 (0)

36 (1)

37 (2)

38 (3)

39 (4)

40 (5)

Location Op Arg1 Arg2

(0) ↑ e f

(1) x b e

(2) / (1) (0)

(3) x b a

(4) + a (2)

(5) + (4) (3)

16 a) In the code generation phase, various issues can arises:

1. Input to the code generator

2. Target program

3. Memory management

4. Instruction selection

5. Register allocation

6. Evaluation order

1. Input to the code generator

o The input to the code generator contains the intermediate representation of the source

program and the information of the symbol table. The source program is produced by the

front end.

o Intermediate representation has the several choices:

a) Postfix notation

b) Syntax tree

c) Three address code

o We assume front end produces low-level intermediate representation i.e. values of names

in it can directly manipulated by the machine instructions.

o The code generation phase needs complete error-free intermediate code as an input

requires.

2. Target program:

The target program is the output of the code generator. The output can be:

a) Assembly language: It allows subprogram to be separately compiled.

b) Relocatable machine language: It makes the process of code generation easier.

c) Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.

3. Memory management

o During code generation process the symbol table entries have to be mapped to actual p

addresses and levels have to be mapped to instruction address.

o Mapping name in the source program to address of data is co-operating done by the front

end and code generator.

o Local variables are stack allocation in the activation record while global variables are in

static area.

4. Instruction selection:

o Nature of instruction set of the target machine should be complete and uniform.

o When you consider the efficiency of target machine then the instruction speed and

machine idioms are important factors.

o The quality of the generated code can be determined by its speed and size.

Example:

The Three address code is:

1. a:= b + c

2. d:= a + e

Inefficient assembly code is:

1. MOV b, R0 R0→b

2. ADD c, R0 R0 c + R0

3. MOV R0, a a → R0

4. MOV a, R0 R0→ a

5. ADD e, R0 R0 → e + R0

6. MOV R0, d d → R0

5. Register allocation

Register can be accessed faster than memory. The instructions involving operands in register are shorter and faster than those involving in memory operand.

The following sub problems arise when we use registers:

Register allocation: In register allocation, we select the set of variables that will reside in register.

Register assignment: In Register assignment, we pick the register that contains variable.

Certain machine requires even-odd pairs of registers for some operands and result.

For example:

Consider the following division instruction of the form:

1. D x, y

Where,

x is the dividend even register in even/odd register pair

y is the divisor

Even register is used to hold the reminder.

Old register is used to hold the quotient.

6. Evaluation order

The efficiency of the target code can be affected by the order in which the computations are performed. Some computation orders need fewer registers to hold results of intermediate than others.

16 b) Lex is a program designed to generate scanners, also known as tokenizers, which recognize lexical patterns in text. Lex is an acronym that stands for "lexical analyzer generator." It is intended primarily for Unix-based systems.

Lex can perform simple transformations by itself but its main purpose is to facilitate lexical analysis, the processing of character sequences such as source code to produce symbol sequences called tokens for use as input to other programs such as parsers. Lex can be used with a parser generator to perform lexical analysis. It is easy, for example, to interface Lex and Yacc, an open source program that generates code for the parser in the C programming language.

https://whatis.techtarget.com/definition/acronym

https://searchdatacenter.techtarget.com/definition/Unix

https://searchmicroservices.techtarget.com/definition/source-code

https://whatis.techtarget.com/definition/token

https://searchmicroservices.techtarget.com/definition/parser

https://whatis.techtarget.com/definition/yacc-yet-another-compiler-compiler

https://whatis.techtarget.com/definition/open-source

https://searchwindowsserver.techtarget.com/definition/C

Use of Lex

• lex.l is an a input file written in a language which describes the generation of lexical analyzer. The lex compiler transforms lex.l to a C program known as lex.yy.c.

• lex.yy.c is compiled by the C compiler to a file called a.out.

• The output of C compiler is the working lexical analyzer which takes stream of input characters and produces a stream of tokens.

• yylval is a global variable which is shared by lexical analyzer and parser to return the name and an attribute value of token.

• The attribute value can be numeric code, pointer to symbol table or nothing.

17 a) LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the canonical collection of LR (1) items.

In the LALR (1) parsing, the LR (1) items which have same productions but different look ahead are combined to form a single set of items

LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.

Example

LALR ( 1 ) Grammar

1. S → AA

2. A → aA

3. A → b

http://ecomputernotes.com/images/Creating-lexical-analyzer.jpg

Add Augment Production, insert '•' symbol at the first position for every production in G and also add the look ahead.

1. S` → •S, $

2. S → •AA, $

3. A → •aA, a/b

4. A → •b, a/b

I0 State:

Add Augment production to the I0 State and Compute the ClosureL

I0 = Closure (S` → •S)

Add all productions starting with S in to I0 State because "•" is followed by the non-terminal. So, the I0 State becomes

I0 = S` → •S, $ S → •AA, $

Add all productions starting with A in modified I0 State because "•" is followed by the non-terminal. So, the I0 State becomes.

I0= S` → •S, $ S → •AA, $ A → •aA, a/b A → •b, a/b

I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $ I2= Go to (I0, A) = closure ( S → A•A, $ )

Add all productions starting with A in I2 State because "•" is followed by the non-terminal. So, the I2 State becomes

I2= S → A•A, $ A → •aA, $ A → •b, $

I3= Go to (I0, a) = Closure ( A → a•A, a/b )


I3= A → a•A, a/b A → •aA, a/b A → •b, a/b

Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3) Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)

I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $ I6= Go to (I2, a) = Closure (A → a•A, $)


I6 = A → a•A, $ A → •aA, $ A → •b, $

Go to (I6, a) = Closure (A → a•A, $) = (same as I6) Go to (I6, b) = Closure (A → b•, $) = (same as I7)

I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $ I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $

If we analyze then LR (0) items of I3 and I6 are same but they differ only in their lookahead.

I3 = { A → a•A, a/b A → •aA, a/b A → •b, a/b }

I6= { A → a•A, $ A → •aA, $ A → •b, $ }

Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can combine them and called as I36.

I36 = { A → a•A, a/b/$ A → •aA, a/b/$ A → •b, a/b/$ }

The I4 and I7 are same but they differ only in their look ahead, so we can combine them and called as I47.

I47 = {A → b•, a/b/$}

The I8 and I9 are same but they differ only in their look ahead, so we can combine them and called as I89.

I89 = {A → aA•, a/b/$}

Drawing DFA:

LALR (1) Parsing table:

17 b) Peephole optimization is a type of Code Optimization performed on a small part of the code. It is performed on the very small set of instructions in a segment of code. The small set of instructions or small part of code on which peephole optimization is performed

is known as peephole or window.

It basically works on the theory of replacement in which a part of code is replaced by shorter and faster code without change in output.

Peephole is the machine dependent optimization.

Objectives of Peephole Optimization:

https://www.geeksforgeeks.org/compiler-design-code-optimization/

The objective of peephole optimization is:

1. To improve performance 2. To reduce memory footprint 3. To reduce code size

Peephole Optimization Techniques: 1. Redundant load and store elimination:

In this technique the redundancy is eliminated. 2. Initial code: 3. y = x + 5; 4. i = y; 5. z = i; 6. w = z * 3; 7. 8. Optimized code: 9. y = x + 5; 10. i = y;

w = y * 3; 11. Constant folding:

The code that can be simplified by user itself, is simplified. 12. Initial code: 13. x = 2 * 3; 14. 15. Optimized code:

x = 6; 16. Strength Reduction:

The operators that consume higher execution time are replaced by the operators consuming less execution time.

17. Initial code: 18. y = x * 2; 19. 20. Optimized code: 21. y = x + x; or y = x << 1; 22. 23. Initial code: 24. y = x / 2; 25. 26. Optimized code:

y = x >> 1; 27. Null sequences:

Useless operations are deleted. 28. Combine operations:

Several operations are replaced by a single equivalent operation.

17 c) Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function

names, objects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.

A symbol table may serve the following purposes depending upon the language in hand:

To store the names of all entities in a structured form at one place.

To verify if a variable has been declared.

To implement type checking, by verifying assignments and expressions in the source code are semantically correct.

To determine the scope of a name (scope resolution).

A symbol table is simply a table which can be either linear or a hash table. It maintains an entry for each name in the following format:

<symbol name, type, attribute>

For example, if a symbol table has to store information about the following variable declaration:

static int interest;

then it should store the entry such as:

<interest, int, static>

The attribute clause contains the entries related to the name.

Implementation

If a compiler is to handle a small amount of data, then the symbol table can be implemented as an unordered list, which is easy to code, but it is only suitable for small tables only. A symbol table can be implemented in one of the following ways:

Linear (sorted or unsorted) list

Binary Search Tree

Hash table

Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol.

Operations

A symbol table, either linear or hash, should provide the following operations.

insert()

This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens are identified and names are stored in the table. This operation is used to add information in the symbol table about unique names occurring in the source code. The format or structure in which the names are stored depends upon the compiler in hand.

An attribute for a symbol in the source code is the information associated with that symbol. This information contains the value, state, scope, and type about the symbol. The insert() function takes the symbol and its attributes as arguments and stores the information in the symbol table.

For example:

int a;

should be processed by the compiler as:

insert(a, int);

lookup()

lookup() operation is used to search a name in the symbol table to determine:

if the symbol exists in the table.

if it is declared before it is being used.

if the name is used in the scope.

if the symbol is initialized.

if the symbol declared multiple times.

The format of lookup() function varies according to the programming language. The basic format should match the following:

lookup(symbol)

This method returns 0 (zero) if the symbol does not exist in the symbol table. If the symbol exists in the symbol table, it returns its attributes stored in the table.

Scope Management

A compiler maintains two types of symbol tables: a global symbol table which can be accessed by all the procedures and scope symbol tables that are created for each scope in the program.

To determine the scope of a name, symbol tables are arranged in hierarchical structure as shown in the example below:

. . . int value=10; void pro_one() { int one_1; int one_2; { \ int one_3; |_ inner scope 1 int one_4; |

} / int one_5; { \ int one_6; |_ inner scope 2 int one_7; | } / } void pro_two() { int two_1; int two_2; { \ int two_3; |_ inner scope 3 int two_4; | } / int two_5; } . . .

The above program can be represented in a hierarchical structure of symbol tables:

The global symbol table contains names for one global variable (int value) and two procedure names, which should be available to all the child nodes shown above. The names mentioned in the pro_one symbol table (and all its child tables) are not available for pro_two symbols and its child tables.

This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a name needs to be searched in a symbol table, it is searched using the following algorithm:

first a symbol will be searched in the current scope, i.e. current symbol table.

if a name is found, then search is completed, else it will be searched in the parent symbol table until,

either the name is found or global symbol table has been searched for the name.

3. a grammar is left-recursive if and only if there exists

Documents