1 cst 320 compiler methods. 2 week 1 introduction go over syllabus grammar review compiler overview...
TRANSCRIPT
2
Week 1
• Introduction• Go over syllabus• Grammar Review• Compiler Overview• Preprocessor
• Symbol Table• Preprocessor Directives
• Adding a lexical analyzer
3
Instructor
• Sherry Yang• [email protected] or [email protected] • Wilsonville Room 213• Office Hours: Mon/Thurs 4-6 or by appointment• Class webpage: http://www.oit.edu/faculty/sherry.yang/CST320
4
Instructor Background
Professor of Software Engineering Technology
Department of Computer Systems Engineering Technology
Ph.D. in Computer Science
Senior Software Engineer
Application Software Engineer
Klamath Falls
5
Getting to Know Each Other
• Pair up with one other person.• Find out a little more about the person.
• Name• Year in program• Something interesting about the person• Any previous compiler experience
• Introduce the person to the class.
6
Course Description
• This course is designed to introduce the basic concepts of compiler design and operation. Topics include lexical and syntactical analysis, parsing, translation, semantic processing and code generation. In addition, students will implement a small compiler.
• We might use other tools (Spirit, Pargen, etc.)
7
Evaluation Methods
2 Tests 40%
Homework & Labs 35%
Project 15%Class Participation 10%
(including in-class exercises)
8
Grading
Your grade will be calculated as follows:*
90%+ = A
80%+ = B
70%+ = C
60%+ = D
59%- = F•* Class participation will be considered in evaluating "borderline" grades.•† You must turn in ALL of the labs and complete the project to pass the course with a C or better.•Incompletes will be given if you failed to turn in all labs and project.
9
Textbook
• Text:• Cooper, Keith D. & Linda Torczon, Engineering A Compiler, 2nd
edition, Morgan Kaufmann, 2012.
• References:• Parsons, Introduction to Compiler Construction• Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and
Tools, Addison-Wesley, 1986.• Fischer and LeBlanc, Crafting a Compiler with C, Benjamin
Cummings, 1991.
10
Student Responsibilities
• Lecture and Lab Attendance:• Students are expected to attend all class sessions. If you know
you will be absent on a certain day, please inform the instructor in advance so arrangements can be made to provide you with the materials covered. Please make every effort to attend all class sessions. There will be no make up in-class exercises.
• Lab sessions will be used as help sessions and to check off lab assignments.
11
Student Responsibilities
• Tests:• All tests are open book, open notes. No electronic devices are
allowed. • There will be no make up tests unless there is an emergency. If
you miss a test for any reason, you can do additional project to make it up.
• In case of emergency, please contact Student Affairs office. They will inform all of your instructors.
12
• Academic Dishonesty:• No plagiarism or cheating is allowed in this class. Please refer to
your student handbook regarding policies on academic dishonesty. A copy of the policy is posted on the class webpage.
• It is okay to get help on your assignments. Please acknowledge all source of help, including them in the program documentation as appropriate.
Student ResponsibilitiesStudent Responsibilities
13
• Homework & Labs:• All labs are due via email by midnight on the due date. You must
follow the assignment submission guidelines below. • All labs must be checked off by the instructor. There will be a
check-off list posted for each lab.
Student ResponsibilitiesStudent Responsibilities
14
Lab Submission Guidelines
• All labs are due via email by midnight on the due date. The instructor will send out an email upon receiving your lab. If you do not receive an email within 24 hours of submitting the lab, it is YOUR responsibility to contact the instructor by email or phone. If you do not contact the instructor within 48 hours after the due date, the lab is considered late.
• There will be a 20% penalty per week for late labs.• All labs, project and late labs must be turned in by
Wednesday of Finals week to be graded.
15
Lab Submission Guidelines
• 1. Zip up all files required to build the lab.• 2. Include a “Readme” file as appropriate.• 3. The archive should also include any other
deliverables as called out in the assignment write-up. • 4. The archive will be attached to an email with
subject line: CST320 Lab #x – first name last name• Email the archive to
16
• Any student with a disability who anticipates a need for accommodation in this course is encouraged to talk with the instructor about his/her needs as soon as possible.
17
Grammar Review
• Three main concepts• Language• Machine• Grammar
• Regular vs. Context-Free Languages• Notation for describing languages
• Regular Expression• Context-Free grammar
• Recognizers• Finite automata• Pushdown Automata
18
In-Class Exercise#1
Given ∑={0, 1}
L1 = { wv | w, v ∑* and v = 00}. Define a regular ∈expression to describe L1.
19
In-Class Exercise#1
Given ∑={0, 1}
L2 = {w| w ∑* and w contains 3 consecutive 0’s}. Define ∈a deterministic finite automata (DFA) to recognize this language.
21
In-Class Exercise#1
Given ∑={0, 1}
Lp = {wwr | w ∑*}. Define a context-free grammar for ∈Lp.
Is Lp regular?
22
In-Class Exercise#1
Find the regular expressions for the following automata. Is this a deterministic finite automata?
23
In-Class Exercise#1
Remove lambda productions from the following grammar:
S -> ABc
A -> aaA
A -> λ
B -> B b
B -> λ
Conventional Translatorsource
program preprocessorModified
source program
library, relocatable object files
compiler
assembler
target assembly program
loader / linker
relocatable machine
code
absolute machine
code
Compilers Lexical Analyzer (scanner)
Source Program
ParserTokens Semantic
Analysis
Parse Tree
Optimizer
Code Generator
Intermediate Representation
Target code
Symbol Table
Uses Regular Expressions to define tokens
Is a Finite Automata
Structure of tokens is Regular
Uses Context-Free Grammar to define program structures
Is a Pushdown Automata
Structure of program is Context-Free
26
Why study compilers?
• Ties lots of things you know together:• Theory (finite automata, grammars)• Data structures• Modularization• Utilization of software tools
• You might build a parser.• The theory of computation/formal language still applies
today. • As long as we still program with 1-D text.
• Helps you to be a better programmer
27
One-dimensional Text
int x;
cin >> x;
if(x>5)
cout << “Hello”;
else
cout << “BOO”;
int x;cin >> x;if(x>5) cout << “Hello”; else …
The formatting has no impact on the meaning of program
29
Conventional Translator
29
skeletal source
programpreprocessor
source
program
library, relocatable object files
compiler
assembler
target assembly program
loader / linker
relocatable machine
code
absolute machine
code
30
Translator for Java
30
Java source code
Java compiler
Java
bytecode
absolute machine
code
Java interpreter
Bytecode compiler
Java bytecode
31
Types of Translators
• Compilers• Conventional (textual source code)
• Imperative, ALGOL-like languages• Other paradigms
• Interpreters• Macro processors• Text formatters• Silicon compilers
32
Types of Translators (cont.)
• Visual programming language • Interface
• Database• User interface• Operating System
33
Conventional Translator
33
skeletal source
programpreprocessor
source
program
library, relocatable object files
compiler
assembler
target assembly program
loader / linker
relocatable machine
code
absolute machine
code
34
Structure of Compilers
Lexical Analyzer (scanner)
Modified Source Program
Syntax Analysis(Parser)
Tokens Semantic Analysis
Syntactic Structure
Optimizer
Code Generator
Intermediate Representation
Target machine code
Symbol Table
skeletal source
programpreprocessor
35
Symbol Table
• What is a symbol?• Variable name• Function name• Type name• Constant• Class name• Method name• ….• Any ID that you use in a program
36
Symbol Table
• Information about a symbol• Name• Type (int, double, char, string, etc.)• Use (variable name, constant name, type name, function name,
etc.)• Value (i.e. value of constant)• Scope
37
Symbol table operations
• Insert a symbol into the symbol table• Flag as error if symbol already exists in some cases
• Search for a symbol in the symbol table• Delete a symbol from the symbol table
38
Symbol table examples w/ preprocessor
#define MAX 50
#define SOMESYMBOL
#define SOMESYMBOL
#undef SOMESYMBOL
#define MIN 10
#define MAX 100
40
Symbol table example w/ parser (lab 2)
void main()
{
int x;
string str1;
int x;
x = 3;
y = 10;
str1 = 30;
{ double x;
x = 4.301;
}
}
41
Preprocessor
• Remove all comments• If a language is not case sensitive, preprocessor may
change the program text to all uppercase or all lowercase.• Process preprocessor directives.
• C/C++ directives:• #include• #define (unlike C#’s #define, C/C++ can define a constant value)• #if / #else / #endif • #undef• #ifdef • #ifndef
skeletal source
programpreprocessor
source
program
44
#ifdef
#if DLEVEL == 0
#define STACK 0
#elif DLEVEL == 1
#define STACK 100
#elif DLEVEL > 5
display( debugptr );
#else
#define STACK 200 #endif
45
Standalone Preprocessor
input.cpp temp.cpp
preprocessor
#define MAX 50//this is a commentvoid main(){ int x;//more comments x = MAX;
#define MIN 10 int y; x = y – MIN; //blah}
void main(){ int x; x = 50; int y; x = y – 10;}
Produces a modified source file
46
Standalone Lexical Analyzer
LexicalAnalyzer
void main(){ int x; x = 50; int y; x = y – 10;} Produces a list
of tokens
void
keyword
main
ID
(
symbol
)
symbol
{
symbol
int
keyword
47
Preprocessor & Lexical Analyzer
both
Produces a list of tokens
void
keyword
main
ID
(
symbol
)
symbol
{
symbol
int
keyword
#define MAX 50//this is a commentvoid main(){ int x;//more comments x = MAX;
#define MIN 10 int y; x = y – MIN; //blah}
48
Output from Lab1
List of tokens
void
keyword
main
ID
(
symbol
)
symbol
{
symbol
int
keyword
Print out of tokens:
void keywordmain ID( symbol) symbol{ symbolInt keyword…..
49
Preprocessor
• Preprocessor symbols• Defined by #define
• #define MYHEADER_H• #define LARGEST 10
• Defined in the compilation process• Command Line (/D)• Preprocessor Definitions
50
1. #include <iostream>2. //comment3. #define LARGEST 100
4. void main()5. { int x, y;6. x = 10;7. y = LARGEST;
8. #ifdef MYSYMBOL9. cout << "X=" << x;10. #endif
11. #if TEST == 112. cout << "1" << endl;13. #elif TEST == 214. cout << "2" << endl;15. #else16. cout << "Blah" << endl;17. #endif18. cout << “The end” << endl;}
In-Class Exercise #2
Show result of preprocessor
What’s left in the file?
What’s changed in the file?
53
1. #include <iostream>2. //comment3. #define LARGEST 100
4. void main()5. { int x, y;6. x = 10;7. y = LARGEST;
8. #ifdef MYSYMBOL9. cout << "X=" << x;10. #endif
11. #if TEST == 112. cout << "1" << endl;13. #elif TEST == 214. cout << "2" << endl;15. #else16. cout << "Blah" << endl;17. #endif18. cout << “The end” << endl;}
54
#include
• #include “myfile.h”• Assumes that myfile.h is in the current directory
• #include “c:\\somedirectory\myfile.h”• Absolute path
• #include <array>• Will look for array in the include folder in the program files folder
• #include <sys/types.h>• types.h file will be in the sys subdirectory of include
56
Standalone Lexical Analyzer
LexicalAnalyzer
void main(){ int x; x = 50; int y; x = y – 10;} Produces a list
of tokens
void
keyword
main
ID
(
symbol
)
symbol
{
symbol
int
keyword
57
Structure of Compilers
Lexical Analyzer (scanner)
Source Program
Tokens
int x;
cin >> x;
if(x>5)
cout << “Hello”;
else
cout << “BOO”;
int x ;
cin >> x ;
if ( x > 5 )
cout << “Hello” ;
else
cout << “BOO” ;
What about white spaces? Do they matter?
58
Tokenize First or as needed?
int x;
cin >> x;
if(x>5)
cout << “Hello”;
else
cout << “BOO”;
intdatatype
xID
;symbol
cin >>
Tokens = Meaningful units in a program
Value/Type pairs
59
Tokenize First or as needed?
Array<Array<int>> someArray;
Array < int
>
Array<Array<int> > someArray;
Array < int >
>>
60
Structure of Compilers
Lexical Analyzer (scanner)
Source Program
Syntax Analysis(Parser)
Tokens Syntactic Structure
Parse Tree
62
Who is responsible for errors?
• int x$y;• int 32xy;• 45b• 45ab• x = x @ y;
Lexical Errors / Token Errors?
64
Who is responsible for errors?
• 45ab • One wrong token?• Two tokens (45 & ab)? Are whitespaces needed?
• Either way is okay. • Lexical analyzer can catch the illegal token (45ab)• Parser can catch the syntax error. Most likely 45 followed by ab
will not be syntactically correct.
65
Structure of Compilers
Lexical Analyzer (scanner)
Source Program
Syntax Analysis(Parser)
Tokens Semantic Analysis
Syntactic Structure
Symbol Table
int x;
cin >> x;
if(x>5)
x = “SHERRY”;
else
cout << “BOO”;
66
Structure of Compilers
Lexical Analyzer (scanner)
Source Program
Syntax Analysis(Parser)
Tokens Semantic Analysis
Syntactic Structure
Optimizer
Code Generator
Intermediate Representation
Target machine code
Symbol Table
67
Structure of Compilers
Lexical Analyzer (scanner)
Source Program
Syntax Analysis(Parser)
Tokens Semantic Analysis
Syntactic Structure
Optimizer
Code Generator
Intermediate Representation
Target machine code
Symbol Table
68
Translation Steps:
• Recognize when input is available.• Break input into individual components.• Merge individual pieces into meaningful structures. • Process structures. • Produce output.
69
Translation (Compilers) Steps:
• Break input into individual components. (lexical analysis)• Merge individual pieces into meaningful structures.
(parsing)• Process structures. (semantic analysis)• Produce output. (code generation)
70
Compilers
• Two major tasks:• Analysis of source• Synthesis of target
• Syntax-directed translation• Compilation process driven by syntactic structure of the source
being translated
71
Interpreters
• Executes source program without explicitly translating to target code.
• Control and memory management reside in interpreter, not user program.
• Allow:• Modification of program as it executes.• Dynamic typing of variables• Portability
• Huge overhead (time & space)
73
Misc. Compiler Discussions
• History of Modern Compilers• Front and Back ends• One pass vs. Multiple passes• Compiler Construction Tools
• Compiler-Compilers, Compiler-generators, Translator-writing Systems• Scanner generator• Parse generator• Syntax-directed engines• Automatic code generator• Dataflow engines