cpsc 325 - compiler tutorial 9 review of compiler

26
CPSC 325 - Compiler Tutorial 9 Review of Compiler

Post on 18-Dec-2015

254 views

Category:

Documents


3 download

TRANSCRIPT

CPSC 325 - Compiler

Tutorial 9

Review of Compiler

Compiler and compilation

A high level programming language is usually described in terms of a grammar

– Grammar specifies the form of, syntax, of legal statements in the language

– Compilation = matching statements written by the programmer to structures defined by the grammar and generating the appropriate object code for each statement

We can see a source program as a sequence of tokens

– Keywords, variables, block, etc.

Lexical analysis/scanner

The task of scanning the source statement, recognizing and classifying the various tokens

Part of compiler that performs lexical analysis Help the parser to parse and make the

parser run/work more efficiently

Parser

Each statement in the program is recognized as some language construct, such as a declaration or an assignment statement described by grammar.

Symbol table/Analyzer (Optional)

Build the symbol table and located the memory locations for the program. It can be a very messy task.

Many different way to implement it The symbol table will be used through the

whole program Once the location had been located, then we

do NOT need the symbol table anymore

Code generator

Generate the Object/Target code Sometimes the target/object code be

optimize by the optimizer

Note1: the optimizer is totally optional

Note2: It is possible to compile a program in a single pass.

Note3: Compilers that perform code optimization generally make several passes

Compiler ideas

Compilers divide their problem into steps or passes to conquer it

Initial pass takes the source program as input

The last pass output the code for execution

Compiler

A program or set of programs that translates one language into another

Passes

Pass 1: Preprocessor– Macro and constitution– Strip Comments from source code

Pass 2: Lexical analyzer, Parser, Code generator– Heart of the compiler– Translates source into a platform independent

language much like assembler (Intermediate code)

Passes (cont.)

Pass 3: Optimizer– Improves the quality of the intermediate code

Pass 4: Back end– Translates the optimized code to real assembler

language or directly to some binary executable code

– Provides target independence for earlier phases

Lexical analyzer/Scanner

Scanning the program to be compiled and recognizing the tokens that make up the source statements

Converts the incoming source into a series of basic language elements

– A = B +3 has 5 tokens. Tokens have meaning and are indivisible

– In C, “while” is one token, you can’t say “wh” “and ile”– Can be placed into symbol table and have information asso

ciated with them Type, value, name, relationship to other structures Can be referenced by unique integer for later usage

Lexical Analyzer/Scanner (cont)

Scanners are usually designed to recognize keywords, operators, and identifiers as well as integers, floating-point number and others

The “Longest Match Rule” – which match the longest tokens in the library; if not otherwise stated. (For example >> is NOT > and > )

Variable are recognize as ONE token instead of many Characters

Lexical Analyzer/Scanner (cont)

The output of the scanner is a sequence of token coding

Token specifier: gives the identifier name, value, etc., that was found by scanner– Some scanner are designed to enter identifiers dir

ectly into a symbol table– Token specifier for an identifier might be a pointer

to the symbol-table entry for that idnetifier

Parser

The parser analyses the source grammatically to determine whether it meets the language specification and to develop a representation better suited to code generation

Parser invokes the lexical analyzer to get the next token (reference into symbol table) and its corresponding lexeme

Check the syntax of a sentence

Parser (cont)

Parser (cont)

Parser (cont)

To summarize– Parser breaks the token stream into a parse tree– Parse tree is a structural representation of the

sentence or program being parsed

Analyzer and Symbol Table

Omit – Since not everyone in the class do it Analyzer generate the symbol table for later

use

Code Generator

Last task of compilation generation of object code

Most compilers generate the output of the code generator as the parse progresses instead of leaving it until after a parse tree is build

Small part of the parse tree fill in code templates that are generated by the code generator

Code generator (cont)

Code generator can generate– Executable– Advantage: fast– Some aspects of optimization can still take place by observi

ng the final linear instruction stream

OR– Intermediate language representation that is close to assem

bler but has additional information– Makes it easier for optimizers to perform further optimizatio

ns to generate faster code

Intermediate Language

All code generation is machine dependent as we must know the instruction set of a computer to generate code for it

Intermediate form: syntax and semantics of the source statements have been completely analyzed, but the actual translation into machine code has not yet been performed.

Transportable: from one to the others. (Intel, Motorola, etc)

Processed by interpreters (TM, JM – byte code)

BNF – Backus Naur Form

Describe the grammars for language– Set ok tokens called terminal symbols

For things like numbers, key words, predefined symbols

– Set of definitions called non-terminal symbols For example: a := b | c (a is either b or c)

– Definitions create a system in which every legal structure can be represented

– Grammars are typically recursive, so recursion can be used to parse the grammar

BNF example

BNF Example (cont)

Summary

Compilers can recognize when templates or objects are instantiated and destroyed– These are part of the language definition– Once the pattern is matched, it can output

intermediate level code to support these operations

Template parameters can be filled in Calls are made to appropriate routines to

construct/destroy objects

Summary (cont)

Interpreters– Give flexibility but are slower– Can modify the interpreted program on the fly and

see the impact immediately without a regeneration of code

– Interpretation can be at the source program level, or at an intermediate language level