1 cst 320 compiler methods. 2 week 1 introduction go over syllabus grammar review compiler overview...

73
1 CST 320 COMPILER METHODS

Upload: ruth-rodgers

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

1

CST 320COMPILER METHODS

2

Week 1

• Introduction• Go over syllabus• Grammar Review• Compiler Overview• Preprocessor

• Symbol Table• Preprocessor Directives

• Adding a lexical analyzer

3

Instructor

• Sherry Yang• [email protected] or [email protected] • Wilsonville Room 213• Office Hours: Mon/Thurs 4-6 or by appointment• Class webpage: http://www.oit.edu/faculty/sherry.yang/CST320

4

Instructor Background

Professor of Software Engineering Technology

Department of Computer Systems Engineering Technology

Ph.D. in Computer Science

Senior Software Engineer

Application Software Engineer

Klamath Falls

5

Getting to Know Each Other

• Pair up with one other person.• Find out a little more about the person.

• Name• Year in program• Something interesting about the person• Any previous compiler experience

• Introduce the person to the class.

6

Course Description

• This course is designed to introduce the basic concepts of compiler design and operation. Topics include lexical and syntactical analysis, parsing, translation, semantic processing and code generation. In addition, students will implement a small compiler.

• We might use other tools (Spirit, Pargen, etc.)

7

Evaluation Methods

2 Tests 40%

Homework & Labs 35%

Project 15%Class Participation 10%

(including in-class exercises)

 

8

Grading

Your grade will be calculated as follows:*

90%+ = A

80%+ = B

70%+ = C

60%+ = D

59%- = F•* Class participation will be considered in evaluating "borderline" grades.•† You must turn in ALL of the labs and complete the project to pass the course with a C or better.•Incompletes will be given if you failed to turn in all labs and project.

9

Textbook

• Text:• Cooper, Keith D. & Linda Torczon, Engineering A Compiler, 2nd

edition, Morgan Kaufmann, 2012.

• References:• Parsons, Introduction to Compiler Construction• Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and

Tools, Addison-Wesley, 1986.• Fischer and LeBlanc, Crafting a Compiler with C, Benjamin

Cummings, 1991.

10

Student Responsibilities

• Lecture and Lab Attendance:• Students are expected to attend all class sessions. If you know

you will be absent on a certain day, please inform the instructor in advance so arrangements can be made to provide you with the materials covered. Please make every effort to attend all class sessions. There will be no make up in-class exercises.

• Lab sessions will be used as help sessions and to check off lab assignments.

11

Student Responsibilities

• Tests:• All tests are open book, open notes. No electronic devices are

allowed. • There will be no make up tests unless there is an emergency. If

you miss a test for any reason, you can do additional project to make it up.

• In case of emergency, please contact Student Affairs office. They will inform all of your instructors.

12

• Academic Dishonesty:• No plagiarism or cheating is allowed in this class. Please refer to

your student handbook regarding policies on academic dishonesty. A copy of the policy is posted on the class webpage.

• It is okay to get help on your assignments. Please acknowledge all source of help, including them in the program documentation as appropriate.

Student ResponsibilitiesStudent Responsibilities

13

• Homework & Labs:• All labs are due via email by midnight on the due date. You must

follow the assignment submission guidelines below. • All labs must be checked off by the instructor. There will be a

check-off list posted for each lab.

Student ResponsibilitiesStudent Responsibilities

14

Lab Submission Guidelines

• All labs are due via email by midnight on the due date. The instructor will send out an email upon receiving your lab. If you do not receive an email within 24 hours of submitting the lab, it is YOUR responsibility to contact the instructor by email or phone. If you do not contact the instructor within 48 hours after the due date, the lab is considered late.

• There will be a 20% penalty per week for late labs.• All labs, project and late labs must be turned in by

Wednesday of Finals week to be graded.

15

Lab Submission Guidelines

• 1. Zip up all files required to build the lab.• 2. Include a “Readme” file as appropriate.• 3. The archive should also include any other

deliverables as called out in the assignment write-up. • 4. The archive will be attached to an email with

subject line: CST320 Lab #x – first name last name• Email the archive to

[email protected]

16

• Any student with a disability who anticipates a need for accommodation in this course is encouraged to talk with the instructor about his/her needs as soon as possible.

17

Grammar Review

• Three main concepts• Language• Machine• Grammar

• Regular vs. Context-Free Languages• Notation for describing languages

• Regular Expression• Context-Free grammar

• Recognizers• Finite automata• Pushdown Automata

18

In-Class Exercise#1

Given ∑={0, 1}

L1 = { wv | w, v ∑* and v = 00}. Define a regular ∈expression to describe L1.

19

In-Class Exercise#1

Given ∑={0, 1}

L2 = {w| w ∑* and w contains 3 consecutive 0’s}. Define ∈a deterministic finite automata (DFA) to recognize this language.

20

In-Class Exercise#1

Given ∑={0, 1}

Lp = {wwr | w ∑*}. Define a context-free grammar for ∈Lp.

21

In-Class Exercise#1

Given ∑={0, 1}

Lp = {wwr | w ∑*}. Define a context-free grammar for ∈Lp.

Is Lp regular?

22

In-Class Exercise#1

Find the regular expressions for the following automata. Is this a deterministic finite automata?

23

In-Class Exercise#1

Remove lambda productions from the following grammar:

 

S -> ABc

A -> aaA

A -> λ

B -> B b

B -> λ

Conventional Translatorsource

program preprocessorModified

source program

library, relocatable object files

compiler

assembler

target assembly program

loader / linker

relocatable machine

code

absolute machine

code

Compilers Lexical Analyzer (scanner)

Source Program

ParserTokens Semantic

Analysis

Parse Tree

Optimizer

Code Generator

Intermediate Representation

Target code

Symbol Table

Uses Regular Expressions to define tokens

Is a Finite Automata

Structure of tokens is Regular

Uses Context-Free Grammar to define program structures

Is a Pushdown Automata

Structure of program is Context-Free

26

Why study compilers?

• Ties lots of things you know together:• Theory (finite automata, grammars)• Data structures• Modularization• Utilization of software tools

• You might build a parser.• The theory of computation/formal language still applies

today. • As long as we still program with 1-D text.

• Helps you to be a better programmer

27

One-dimensional Text

int x;

cin >> x;

if(x>5)

cout << “Hello”;

else

cout << “BOO”;

int x;cin >> x;if(x>5) cout << “Hello”; else …

The formatting has no impact on the meaning of program

28

What is a translator?

• Takes input (SOURCE) and produces output (TARGET)

28

SOURCE TARGET

ERROR

29

Conventional Translator

29

skeletal source

programpreprocessor

source

program

library, relocatable object files

compiler

assembler

target assembly program

loader / linker

relocatable machine

code

absolute machine

code

30

Translator for Java

30

Java source code

Java compiler

Java

bytecode

absolute machine

code

Java interpreter

Bytecode compiler

Java bytecode

31

Types of Translators

• Compilers• Conventional (textual source code)

• Imperative, ALGOL-like languages• Other paradigms

• Interpreters• Macro processors• Text formatters• Silicon compilers

32

Types of Translators (cont.)

• Visual programming language • Interface

• Database• User interface• Operating System

33

Conventional Translator

33

skeletal source

programpreprocessor

source

program

library, relocatable object files

compiler

assembler

target assembly program

loader / linker

relocatable machine

code

absolute machine

code

34

Structure of Compilers

Lexical Analyzer (scanner)

Modified Source Program

Syntax Analysis(Parser)

Tokens Semantic Analysis

Syntactic Structure

Optimizer

Code Generator

Intermediate Representation

Target machine code

Symbol Table

skeletal source

programpreprocessor

35

Symbol Table

• What is a symbol?• Variable name• Function name• Type name• Constant• Class name• Method name• ….• Any ID that you use in a program

36

Symbol Table

• Information about a symbol• Name• Type (int, double, char, string, etc.)• Use (variable name, constant name, type name, function name,

etc.)• Value (i.e. value of constant)• Scope

37

Symbol table operations

• Insert a symbol into the symbol table• Flag as error if symbol already exists in some cases

• Search for a symbol in the symbol table• Delete a symbol from the symbol table

38

Symbol table examples w/ preprocessor

#define MAX 50

#define SOMESYMBOL

#define SOMESYMBOL

#undef SOMESYMBOL

#define MIN 10

#define MAX 100

39

Code example

#define MAX 5

void main()

{ int x;

int y;

x = MAX;

#define MAX 10

y = MAX;

}

40

Symbol table example w/ parser (lab 2)

void main()

{

int x;

string str1;

int x;

x = 3;

y = 10;

str1 = 30;

{ double x;

x = 4.301;

}

}

41

Preprocessor

• Remove all comments• If a language is not case sensitive, preprocessor may

change the program text to all uppercase or all lowercase.• Process preprocessor directives.

• C/C++ directives:• #include• #define (unlike C#’s #define, C/C++ can define a constant value)• #if / #else / #endif • #undef• #ifdef • #ifndef

skeletal source

programpreprocessor

source

program

42

#include

#include “b.h”

#define MIN 10

int x;

if (x < MIN) …

x = MAX;

#define MAX 5

a.h b.h

43

#ifdef

#ifndef A_H

#define A_H

#endif

44

#ifdef

#if DLEVEL == 0

#define STACK 0

#elif DLEVEL == 1

#define STACK 100

#elif DLEVEL > 5

display( debugptr );

#else

#define STACK 200 #endif

45

Standalone Preprocessor

input.cpp temp.cpp

preprocessor

#define MAX 50//this is a commentvoid main(){ int x;//more comments x = MAX;

#define MIN 10 int y; x = y – MIN; //blah}

void main(){ int x; x = 50; int y; x = y – 10;}

Produces a modified source file

46

Standalone Lexical Analyzer

LexicalAnalyzer

void main(){ int x; x = 50; int y; x = y – 10;} Produces a list

of tokens

void

keyword

main

ID

(

symbol

)

symbol

{

symbol

int

keyword

47

Preprocessor & Lexical Analyzer

both

Produces a list of tokens

void

keyword

main

ID

(

symbol

)

symbol

{

symbol

int

keyword

#define MAX 50//this is a commentvoid main(){ int x;//more comments x = MAX;

#define MIN 10 int y; x = y – MIN; //blah}

48

Output from Lab1

List of tokens

void

keyword

main

ID

(

symbol

)

symbol

{

symbol

int

keyword

Print out of tokens:

void keywordmain ID( symbol) symbol{ symbolInt keyword…..

49

Preprocessor

• Preprocessor symbols• Defined by #define

• #define MYHEADER_H• #define LARGEST 10

• Defined in the compilation process• Command Line (/D)• Preprocessor Definitions

50

1. #include <iostream>2. //comment3. #define LARGEST 100

4. void main()5. { int x, y;6. x = 10;7. y = LARGEST;

8. #ifdef MYSYMBOL9. cout << "X=" << x;10. #endif

11. #if TEST == 112. cout << "1" << endl;13. #elif TEST == 214. cout << "2" << endl;15. #else16. cout << "Blah" << endl;17. #endif18. cout << “The end” << endl;}

In-Class Exercise #2

Show result of preprocessor

What’s left in the file?

What’s changed in the file?

51

52

53

1. #include <iostream>2. //comment3. #define LARGEST 100

4. void main()5. { int x, y;6. x = 10;7. y = LARGEST;

8. #ifdef MYSYMBOL9. cout << "X=" << x;10. #endif

11. #if TEST == 112. cout << "1" << endl;13. #elif TEST == 214. cout << "2" << endl;15. #else16. cout << "Blah" << endl;17. #endif18. cout << “The end” << endl;}

54

#include

• #include “myfile.h”• Assumes that myfile.h is in the current directory

• #include “c:\\somedirectory\myfile.h”• Absolute path

• #include <array>• Will look for array in the include folder in the program files folder

• #include <sys/types.h>• types.h file will be in the sys subdirectory of include

55

56

Standalone Lexical Analyzer

LexicalAnalyzer

void main(){ int x; x = 50; int y; x = y – 10;} Produces a list

of tokens

void

keyword

main

ID

(

symbol

)

symbol

{

symbol

int

keyword

57

Structure of Compilers

Lexical Analyzer (scanner)

Source Program

Tokens

int x;

cin >> x;

if(x>5)

cout << “Hello”;

else

cout << “BOO”;

int x ;

cin >> x ;

if ( x > 5 )

cout << “Hello” ;

else

cout << “BOO” ;

What about white spaces? Do they matter?

58

Tokenize First or as needed?

int x;

cin >> x;

if(x>5)

cout << “Hello”;

else

cout << “BOO”;

intdatatype

xID

;symbol

cin >>

Tokens = Meaningful units in a program

Value/Type pairs

59

Tokenize First or as needed?

Array<Array<int>> someArray;

Array < int

>

Array<Array<int> > someArray;

Array < int >

>>

60

Structure of Compilers

Lexical Analyzer (scanner)

Source Program

Syntax Analysis(Parser)

Tokens Syntactic Structure

Parse Tree

61

Parse Tree (Parser)

int x ; cin >>

datatype ID

DataDeclaration

Program

62

Who is responsible for errors?

• int x$y;• int 32xy;• 45b• 45ab• x = x @ y;

Lexical Errors / Token Errors?

63

Who is responsible for errors?

• X = ;• Y = x +;• Z = [;

Syntax errors

64

Who is responsible for errors?

• 45ab • One wrong token?• Two tokens (45 & ab)? Are whitespaces needed?

• Either way is okay. • Lexical analyzer can catch the illegal token (45ab)• Parser can catch the syntax error. Most likely 45 followed by ab

will not be syntactically correct.

65

Structure of Compilers

Lexical Analyzer (scanner)

Source Program

Syntax Analysis(Parser)

Tokens Semantic Analysis

Syntactic Structure

Symbol Table

int x;

cin >> x;

if(x>5)

x = “SHERRY”;

else

cout << “BOO”;

66

Structure of Compilers

Lexical Analyzer (scanner)

Source Program

Syntax Analysis(Parser)

Tokens Semantic Analysis

Syntactic Structure

Optimizer

Code Generator

Intermediate Representation

Target machine code

Symbol Table

67

Structure of Compilers

Lexical Analyzer (scanner)

Source Program

Syntax Analysis(Parser)

Tokens Semantic Analysis

Syntactic Structure

Optimizer

Code Generator

Intermediate Representation

Target machine code

Symbol Table

68

Translation Steps:

• Recognize when input is available.• Break input into individual components.• Merge individual pieces into meaningful structures. • Process structures. • Produce output.

69

Translation (Compilers) Steps:

• Break input into individual components. (lexical analysis)• Merge individual pieces into meaningful structures.

(parsing)• Process structures. (semantic analysis)• Produce output. (code generation)

70

Compilers

• Two major tasks:• Analysis of source• Synthesis of target

• Syntax-directed translation• Compilation process driven by syntactic structure of the source

being translated

71

Interpreters

• Executes source program without explicitly translating to target code.

• Control and memory management reside in interpreter, not user program.

• Allow:• Modification of program as it executes.• Dynamic typing of variables• Portability

• Huge overhead (time & space)

72

Structure of Interpreters

Interpreter

Source Program

Data

Program Output

73

Misc. Compiler Discussions

• History of Modern Compilers• Front and Back ends• One pass vs. Multiple passes• Compiler Construction Tools

• Compiler-Compilers, Compiler-generators, Translator-writing Systems• Scanner generator• Parse generator• Syntax-directed engines• Automatic code generator• Dataflow engines