410/510 1 of 31 week 1 – lecture 1 introduction the textbook assessment overview compiler...

31
410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

Upload: brendan-glenn

Post on 13-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 1 of 31

Week 1 – Lecture 1

• Introduction• The Textbook• Assessment• Overview

Compiler Construction

Page 2: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 2 of 31

The Big Picture

• In this course we will be constructing a compiler!

• Moving from a High Level Language to a Low Level Language

• Compilers are complex programs– > 10,000 lines of code

• Integrate aspects from many different areas of CS– Formal language theory, algorithms, data structures, HLL &

LLL (obviously), user interaction (error reporting)

Page 3: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 3 of 31

What is a compiler?

• A specialization of a language translator• Usually in CS:

– the Source is a high level programming language– the Target is a machine code for a micro-processor

L1 L2

Source Target

C x86 processor

Page 4: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 4 of 31

Applications of Compiler Techniques

• Potential Source languages include:– Natural languages (English, French,….)

– Circuit layout languages

– Mark-up languages (HTML, XML, …)

– Command line languages (SQL interface)

• Potential Target languages include:– Natural languages

– Printer drivers

– Markup languages

• e.g. HTML to RTF converter– Could involve many of the aspects we will cover in compiler

construction

Page 5: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 5 of 31

Compilers for Programming Languages

• If we had 1 compiler for each {Source,Target} pair then we would have a lot of compilers!

Source Languages Target Languages

CompilersC

Prolog

Java

Lisp Haskell

C++C#

Fortran

PascalSather

x86 (MMX)

JVM

PowerPC 750 (G3)

ARM

SPARCAMD K6

Page 6: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 6 of 31

Modularity for Code Generation

Compilers

x86

ARM

G4

Source

Intermediate Representation

Compiler portability (man gcc – lists different target machines)

Page 7: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 7 of 31

Modularity for Source Languages?

Compilers

Intermediate Representation

Sources Targets

C

Java

Prolog

Typically compilers only compile one source language– but the techniques used are very similar and are shared across different compilers

Page 8: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 8 of 31

Typical Compiler

IntermediateRepresentationSource Target

Front-end Back-endIndependent of Sourceand Target languages

Analysis Synthesis

For a new Source language – we can add a new front-end to an existing back-endFor a new Target language – we can add a new back-end to an existing front-end

coursenow week 6

Ideally:

Page 9: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 9 of 31

Front End

• Knowledge about the source language– Lexical structure (tokens)– Syntax

• Programming constructs– Conditionals, iteration etc

– Semantics• Type checking

• Error-reporting– UI component

• Often basic (and unhelpful!)• May vary if part of an IDE or standalone

Source program

Lexical analyser

Syntaxanalyser

Semanticanalyser

Symboltable

ErrorHandler

Page 10: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 10 of 31

Lexical Analysis

Lexical Tasks the compiler has to perform:group together the 3 characters ‘max’ to form the single variable identifier maxgroup together the 2 characters ‘<=’ to form the single relational operator <= (less than or equal to)

int max = 20, x;read(x);if ( x <= max )

print(‘ok’);else

print(‘too big’);

Page 11: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 11 of 31

Syntactic Analysis

• Recognise the if .. then … else structure• Group the x <= max into a single expression

with a relational operator• Recognise the format of the variable declaration

list– Such that x is correctly declared to be an int

• Loops, program blocks (begin…end)• Arithmetic expressions, etc

Page 12: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 12 of 31

Semantic analysis

• Check that x <= max is a sensible thing to do– If x was a boolean and max a string then we would have a

type error

• Check that the ‘20’ is in fact an integer and so can be assigned to an int

• And also (can be split over several phases)– Keep a note of all the variables used so we make sure they all

refer to the same value (in memory)

Page 13: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 13 of 31

Data Structures

• Stream of text as the source file• Group together text into larger units from a limited set• Nearly all programming constructs can be represented

as tree structures

If statement

if Boolean expression statement else statement

Relationaloperator

expression expression

Page 14: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 14 of 31

Data Structures

• Lexical Analyzer Stream of tokens (enumerated type)– NUMBER OPERATOR NUMBER

• Syntax Analyzer / Parser Tree of program structure

program

if_statementassignment while_loop output_statement

Page 15: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 15 of 31

Back-end

• Knowledge about target processor / virtual machine– Instruction set

• ‘costs’ of different:– op-codes– instructions

– Registers– Memory

Semantic analyser

Intermediate code generator

Code optimiser

Codegenerator

Symbol tablemanager

Error handler

Page 16: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 16 of 31

Putting it together

Source program

Lexical analyser

Syntaxanalyser

SemanticanalyserSymbol

tableError

Handler

Intermediate code generator

Code optimiser

Codegenerator

Compiler

Skeletal source program

preprocessor

compiler

assembler

Loaderlink-editor

Target asse mbly program

Relocatable machine code

Absolute machine code

Source program

A language-processing system

Page 17: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 17 of 31

Grammars

• We define/describe HL languages with grammars

• A Grammar consists of:– T, set of Terminals– N, set of Non-terminals

• N T = – P, set of Productions

• Where and are members of T N

– S, special member of N, the Start symbol

• G = {T, N, P, S}

Page 18: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 18 of 31

Chomsky’s Grammar Hierarchy

Type 3 Regular Grammar

Type 2 Context Free Grammar

Type 1 Context-Sensitive Grammar

Type 0 Unrestricted Grammar

Page 19: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 19 of 31

Grammars

• Type 0 (unrestricted) , and are unrestricted sequences, is not null– languages formed from Type 0 grammars can be recognised

by non-deterministic Turing machines

• Type 1 (context sensitive) A B – A becomes B in the context of … – Complex for computer analysis

Page 20: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 20 of 31

Grammars

• Type 2 (context free)– A

• A is a Non-terminal is a member of T N (can be empty)

– Equivalent to a push-down automaton

• Type 3 (regular)– A wB, A w (right linear)

• w is a string of Terminals• A and B are Non-Terminals

– Finite state automata

Page 21: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 21 of 31

In a compiler

• Use the minimum complexity grammars that let us successfully cope with HL programming languages (and process them efficiently)

• Regular grammars (=regular expressions) in the Lexical Analysis phase– ‘recognise the words’

• Context-free grammars in the Syntax Analysis phase– ’recognise the phrases’ define our HLL as a grammar based on the output of the Lexical

Analysis

• Deal with context sensitivity in the Semantic Analysis phase

Page 22: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 22 of 31

Overall Front-End View

Source programText file

Lexical Analyser

Syntax Analyser

tokens

Semantic Analyser

Tree structure

Intermediate Representation

Type-safeTree structure

Back-end

Tree / Linearized tree

Context-free grammar

Regular grammar

Flex

Bison

Page 23: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 23 of 31

The Textbook

Compilers: principles, techniques & tools

Aho, Sethi & UllmanAddison-Wesley{‘The Dragon Book’}

Page 24: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 24 of 31

Assessment

• Building a compiler for a new language• Front-end

– Lexical analysis– Parsing

• Back end– Generating assembler code

• Some formal and some practical– Formal more at the front-end

Page 25: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 25 of 31

Programming & Tools

• Lexical analysis generator – lex / flex• Parser generator – yacc / bison• C / C++

– To implement the remainder of the compiler

• Unix environment– make files will be useful for coordinating lex and yacc

Page 26: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 26 of 31

Instant Compilation

• Consider the program:main()

{ int a = 3; a = a + 1; }

Given a reasonably sensible assembly language a hand-compilation might be:

LDA #3STA 1LDA 1ADD a, #1STA 1

Page 27: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 27 of 31

& an Instant Compiler could look like …

Switch( source_code_construct ) {case INT_DEC: print( “LDA #”, INT.value)

print(“STA 1”)

break

case INT_ADD: print(“LDA 1”)

print(“ADD a,#”, ADD.value)

print(“STA 1”)

break

} /* end switch */

Page 28: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 28 of 31

The Problems ….

• Not efficient, (LDA #4; STA 1)• Only works for 1 variable • Only works at one location in memory

– (usually let assembler deal with symbolic addresses)

• Only has 2 programming constructs!• Not even slightly portable:

– 1 instruction set & 1 source language

Page 29: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 29 of 31

More problems…

• No error reporting– type checking?

• Assumes:– Program is correct– Recognition of programming language constructs

• int a = 3 INT_DEC

– Access to values • INT.value, ADD.value

– 1:1 relationship between integers and memory locations

Page 30: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 30 of 31

Solutions

• We can view compilers as a solution to all of these problems

• E.g.– Only compile correct programs to object code– Recognise all constructs in the language– Improve the efficiency of code

• Execution speed• Memory usage

– Meaningful error messages to the user– Cope with different target architectures

Page 31: 410/510 1 of 31 Week 1 – Lecture 1 Introduction The Textbook Assessment Overview Compiler Construction

410/510 31 of 31

Why are compilers called compilers?

• In early compilers one of the main tasks was connecting object program to – standard library functions, I/O devices

• collecting information from different sources(e.g. libraries)– OS and processor dependent

• This is now performed by ‘linkers’• Compile – ‘construct by collecting from different

sources’