chapter 3 language translation issues & program verifications

79
Chapter 3 Language Translation Issues & Program Verifications

Upload: madlyn-reeves

Post on 27-Dec-2015

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 3 Language Translation Issues & Program Verifications

Chapter 3

Language Translation Issues

&

Program Verifications

Page 2: Chapter 3 Language Translation Issues & Program Verifications

To specify the syntax of a language, the context-free grammar or Backus-Naur form grammar were developed, but it is realized that (only) syntax was insufficient. (how about the semantics?)

3.1 Programming language syntax

Syntax ::= “ the arrangement of words as elements in a sentence to show their relationship,” describes the sequence of symbols that make up valid programs.

X := Y + Z (valid for a Pascal program)

let X = Y + Z (Basic)

X = Y + Z (C language)

Add Y , Z to X (COBOL)

Page 3: Chapter 3 Language Translation Issues & Program Verifications

How much is 2 + 3 * 4 ? (14 or 20 ? It depends on syntax.)

In a statement like X = 2.45 + 3.67, syntax cannot tell us whether Variable X was declared or declared as type real. Results of X=5, X=6, and X=6.12 are all possible. We need more than just syntactic structures for the full description of a PL. Other attributes, under the general term semantics, such as the use of declarations, operations, sequence control, and referencing environments, affect a variable and are not always determined by syntax rules.

Page 4: Chapter 3 Language Translation Issues & Program Verifications

3.1.1 General Syntactic Criteria

The details of syntax are chosen largely on the basis of secondary criteria, such as readability, writeability, ease of verifiability, ease of translation, lack of ambiguity, which are unrelated to the primary goal of communicating information to the language processor.

Readability: It is enhanced by such language features as natural statement formats, structured statements, liberal use of keywords and noise(optional) words, provision for embedded comments, unrestricted length identifiers, mnemonic operator symbols, free-field format, and complete data declarations.

Writeability: The syntactic features that makes a program easy to write are often in conflict with those features that make it easy to read.

Page 5: Chapter 3 Language Translation Issues & Program Verifications

It is enhanced by use of concise and regular syntactic structures, whereas for readability a variety of more verbose( tedious) constructs are helpful.

Implicit syntactic conventions that allow declarations and operations to be left unspecified make programs shorter and easier to write but harder to read.

A syntax is redundant if it communicates the same item of information in more than one way. The disadvantage is that redundancy makes programs more verbose and thus harder to write.

Ease of verifiability:( to be covered at Sec.4-2-4)

Ease of translation: The LISP syntax provides an example of a program structure that is neither particularly readable nor particularly writable but that is extremely simple to translate.

Page 6: Chapter 3 Language Translation Issues & Program Verifications

Lack of ambiguity: for example; “a dangling else” problem in Pascal, a reference to A(I, J) in FORTRAN. How to solve them?

Page 7: Chapter 3 Language Translation Issues & Program Verifications

Delimiters & brackets: Brackets are paired delimiters.

Free- & fixed-field formats: free format for HLL now.

Expressions: from which the statements are built.

Statements: SNOBOL4 has only one basic statement syntax while COBOL provides different syntactic structures for each statement type. APL & SNOBOL4 not allow “embedded statements”.

Page 8: Chapter 3 Language Translation Issues & Program Verifications

3.1.2 Syntactic Elements of a Language

Character set: from 6-bit to 8-bit char’s, now up to 16-bit char’s.

Identifiers: a string beginning with a letter, and …, length ?

Operator symbols: **, ^, sqr, sqrt, =, EQ, …

Keywords & reserved words: FORTRAN VS. COBOL

Noise words: GO [TO]

Comments: allow comments in several ways; …

Blanks (spaces): In SNOBOL4, it is used as a concatenation opr.

Page 9: Chapter 3 Language Translation Issues & Program Verifications

3.1.3 Overall Program-Subprogram Structure

Separate subprogram def’s:each subp as a separate syntactic unit.(c)

Separate data def’s: the class mechanism in Java, C++ .

Nested subprogram def’s:Pascal with this concept to any depth.

Separate interface def’s: 若允許副程式單獨編譯 , 則介面需小心 ,然不如是 , 則再編譯之花費不紫 .

Data description separated from executable statements: COBOL’s.

Unseparated subprogram def’s: SNOBOL4’s lack of organization.

Page 10: Chapter 3 Language Translation Issues & Program Verifications

3.2 Stages in Translation

Page 11: Chapter 3 Language Translation Issues & Program Verifications

原始程式 編譯程式 目的程式

Position := initial + rate * 60

Lexical analyzer

id1 := id2 + id3 * 60

Syntax analyzer

語法樹 ( 一 )

Semantic analyzer

語法樹 ( 二 )

Intermediate code generator

temp1:=inttoreal(60) temp2:=id3 * temp1 temp3:=id2 + temp2 id1:=temp3

Code optimizer

temp1:=id3 * 60.0 id1 := id2 + temp1

Code generator

目的碼

Page 12: Chapter 3 Language Translation Issues & Program Verifications

3.2.1 Analysis of the Source ProgramLexical analysis (scanning): It reads successive lines of input program, breaks them down into individual lexical items (tokens).

The formal model used to design lexical analyzers is the finite-state automata.

How about : DO 10 I = 1, 5 and DO 10 I = 1.5

Syntactic analysis (parsing): Here the larger program structures are identified by the help of GRAMMARS.

Semantic analysis: The bridge between the analysis and synthesis parts of translation. It includes the following functions:

Page 13: Chapter 3 Language Translation Issues & Program Verifications

1. Symbol-table maintenance: The symbol-table entry contains more than just the identifier. It contains additional data concerning the attributes of that identifier: its type, type of values, referencing environment, and whatever other information is available from the input program through declarations and usage. The semantic analyzers enter this information into the symbol table as they process declarations, subprogram headers, and program statements.

2. Insertion of implicit information: take example of FORTRAN.

3. Error detection: The semantic analyzer must not only recognize those errors but determine the appropriate way to continue with syntactic analysis of the remainder of the program.

4. Macro processing and compile-time operation: Macro is a piece of program text that has been separately defined and that is to be inserted into the program during translation. A compile-time operation is an operation to be performed during translation to control the translation of the source program.

Page 14: Chapter 3 Language Translation Issues & Program Verifications

3.2.2 Synthesis of the Object Program

1. Optimization: Much research has been done on program optimization, and many sophisticated techniques are known.

2. Code generation: The output code may be directly executable, or there may be other translation steps to follow (e.g., assembly or linking and loading).

3. Linking and loading: In the optional final stage of translation, the pieces of code resulting from separate translation of subprograms are coalesced( merged) into the final executable program.

Page 15: Chapter 3 Language Translation Issues & Program Verifications

2004/3/15 10

Q1: C語言的compiler可以用C語言撰寫?

Ans: well, if you know how to bootstrap it !

CC M

C’

CC’ M

M CC M

M

Page 16: Chapter 3 Language Translation Issues & Program Verifications

3.3 Formal Translation Models

The syntactic recognition parts of compiler theory are fairly standard and generally based on the context-free theory of languages. We briefly summarize that theory in the next few pages.

The two classes of grammars useful in compiler technology include the BNF grammar (or context-free grammar) and the regular grammar.

Page 17: Chapter 3 Language Translation Issues & Program Verifications

3.3.1 BNF Grammars The BNF and context-free grammar forms are equivalent in power; the differences are only in notation. For this reason, the terms BNF grammar and context-free grammar are usually interchangeable in discussion of syntax.

Page 18: Chapter 3 Language Translation Issues & Program Verifications

Parse trees: Given a grammar, we can use a single-replacement rule to generate strings in our language. For example, the following grammar generates all sequences of balanced parentheses:

S SS | (S) | ( )

Try to figure out the above parse tree .

Now, we have figure 3.4, with the grammar depicted in fig. 3.5.

Page 19: Chapter 3 Language Translation Issues & Program Verifications
Page 20: Chapter 3 Language Translation Issues & Program Verifications
Page 21: Chapter 3 Language Translation Issues & Program Verifications

Ambiguity: In natural language, we have

They are flying planes.

For grammars below, we have

S SS | 0 | 1 T 0T | 1T | 0 | 1

Page 22: Chapter 3 Language Translation Issues & Program Verifications

Extensions to BNF Notation:

Square brackets, parentheses, and an asterisk.

Page 23: Chapter 3 Language Translation Issues & Program Verifications
Page 24: Chapter 3 Language Translation Issues & Program Verifications

Try to process 100101 yourself.

Page 25: Chapter 3 Language Translation Issues & Program Verifications

Nondeterministic Finite Automata:

Page 26: Chapter 3 Language Translation Issues & Program Verifications

The NFA is a useful concept in proving theorems. Also, the concept of nondeterminism plays a central role in both the theory of languages and the theory of computation, and it is useful to understand this notion( i.e., concept) fully in a very simple context initially.

A finite automaton(FA) consists of a finite set of states and a set of transitions from state to state that occur on input symbols chosen from an alphabet Σ . For each input symbol there is exactly one transition out of each state (possibly back to the state itself). One state, usually denoted q0 , is the initial state, in which the automaton starts. Some states are designed as final or accepting states.

q0 q1

a a

b

b

(a*ba*ba*)*

Page 27: Chapter 3 Language Translation Issues & Program Verifications

Consider modifying the finite automaton model to allow zero, one, or more transitions from a state on the same input symbol. This new model is called a nondeterministic finite automaton(NFA). Note that the FA( i.e., DFA for emphasis) is a special case of the NFA in which for each state there is a unique transition on each symbol.

We may extend our model of the NFA to include transitions on the empty

input ε. Therefore, our NFA may be depicted as:

q0

q1

q2

a

a

b

(ab+ | ab+a)*

Page 28: Chapter 3 Language Translation Issues & Program Verifications

正規表示式 (Regular Expression)

非決定性有限自動機 (Nondeterministic Finite State Automaton)

決定性有限自動機 (Deterministic Finite State Automaton)

最小之決定性有限自動機 (Minimized DFA) = Transition Diagram

Source Program MDFA + Driver

token

Lex by Lesk in 1975

RE

NFA

DFA

Page 29: Chapter 3 Language Translation Issues & Program Verifications

(1) Regular Expression NFA

[1] 若 RE 是 ø ( 空集合 ), 則其 NFA 為 [2] 若 RE 是 ε(空字串 ),則其 NFA為 ε

[3] 若 RE 是 a ( ), 則其 NFA 為 a

[4] 若 S 與 T 兩 NFA 分別為 MS MT

則 {i} S|T 之 NFA 為 MS

MT

則 {ii} S•T 之 NFA為

ε ε

ε

ε

MS MTε ε ε

則 {iii} S* 之 NFA為 MS

ε

ε

ε

Page 30: Chapter 3 Language Translation Issues & Program Verifications

Computational Power of an FSA: The set of strings they can recognize is limited( with the help of Pumping Lemma to check it out).

Page 31: Chapter 3 Language Translation Issues & Program Verifications

A Push-Down Automaton (PDA) is a septuple P=(Q, , , , q0, z, F), where

Q is finite set of states,

is a finite input alphabet,

is a finite stack alphabet,

maps elements of Q x ( x {}) x into finite subsets of Q x *

q0 Q is start state,

z is start stack symbol,

F Q is set of final states.

Example: Let P=({q0, q1, q2}, {0,1}, {Z, 0}, , q0, Z, {q0}) where

(q0, 0, Z) = {(q1, 0Z)}(q1, 0, 0) = {(q1, 00)}(q1, 1, 0) = {(q2, )}(q2, 1, 0) = {(q2, )}(q2, , Z) = {(q0, )}

L(P)={0n1n| n 0} ? Why ?

Page 32: Chapter 3 Language Translation Issues & Program Verifications
Page 33: Chapter 3 Language Translation Issues & Program Verifications

( 一 ) 最早的語法解析方式 1. 利用 recursive procedure撰寫

2. 可能需要 back-tracking token

例示 S ::= cAdA ::= abA ::= a

若 input string 是 cad 其 top-down parsing 如下 :

S

c A d

(1)

S

c A d

(2)

a b

S

c A d

a (3)

Procedure S( )begin if input symbol = ‘c’ then ADVANCE( ) if A( ) then if input symbol = ‘d’ then ADVANCE( ) return true end if end if end if return falseend

Procedure A( )begin isave = input-point if input symbol = ‘a’ then ADVANCE( ) if input symbol = ‘b’ then ADVANCE( ) return true end if input-point = isave // 無法找到 ab // if input symbol = ‘a’ then ADVANCE( ) return true end if else return false end ifend

Page 34: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-34

Introduction The General Problem of Describing

Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

Dynamic Semantics

Page 35: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-35

Introduction Syntax: thethe form or structurestructure of the

expressions, statements, and program units Semantics: the meaning the meaning of the expressions,

statements, and program units Syntax and semantics provide a language’s

definition Users of a language definition

language designers Implementers Programmers (the users of the language)

Page 36: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-36

The General Problem of Describing Syntax: TerminologyTerminology A sentence is a string of characters over

some alphabet

A language is a set of sentences

A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin)

A token token is a category category of lexemes (e.g., identifier) pls refer to next page!

Page 37: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-37

Lexemes Tokens

Index identifier= equal_sign2 int_literal* mult_opcount identifier+ plus_op17 int_literal; semicolon

Index = 2 * count + 17 ;

Page 38: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-38

Formal Definition of Languages Recognizers

A recognition device reads input strings over the alphabet of the language and decides whether the input strings belong to the language

Example: syntax analysis part of a compiler - Detailed discussion of syntax analysis appears in Chapter 4

Generators A device that generates sentences of a language One can determine if the syntax of a particular sentence is

syntactically correct by comparing it to the structure of the generator

Page 39: Chapter 3 Language Translation Issues & Program Verifications

BNF and Context-Free Grammars Context-Free Grammars

Developed by Noam Chomsky in the mid-1950s Language generators, meant to describe the syntax of

natural languages Define a class of languages called context-free

languages ( ref to next page )

Backus-Naur Form (1959) Invented by John Backus to describe Algol 58 BNF is equivalent to context-free grammars

Copyright © 2009 Addison-Wesley. All rights reserved. 1-39

Page 40: Chapter 3 Language Translation Issues & Program Verifications

40

Type 0: Unrestricted Grammars

any

Type 1: Context Sensitive Grammars(CSG)

for all , || ||

Type 2: Context Free Grammars(CFG)

for all , N (i.e., A )

Type 3: Right (or Left)-Linear Grammars

if all productions are of the form

A x or A xB

G2 = ({S, B, C}, {a, b, c}, P, S)P: S aSBC

S abCCB BCbB bbbC bccC cc

Which Type ?

G3 :S S + SS S * SS (S)S a

Which Type ?

Page 41: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-41

BNF Fundamentals In BNF, abstractions abstractions are used to represent classes of syntactic structures--they act like syntactic

variables (also called nonterminal symbols, or just nonnonterminals)

TerminalsTerminals are lexemes are lexemes or tokens A rule has a left-hand side (LHS), which is a nonterminal, and a right-hand side (RHS), which is a string

of terminals and/or nonterminals

Nonterminals are often enclosed in angle brackets

Examples of BNF rules:<ident_list> → identifier | identifier, <ident_list><if_stmt> → if <logic_expr> then <stmt>

Grammar: a finite non-empty set of rules

A start symbol is a special element of the nonterminals of a grammar

Page 42: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-42

BNF Rules

An abstraction (or nonterminal abstraction (or nonterminal symbol) can have more than one RHS

<stmt> <single_stmt> || begin <stmt_list> end

Page 43: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-43

Describing Lists Syntactic lists are described using

recursionrecursion

<ident_listident_list> ident | ident, <ident_listident_list>

A derivation is a repeated application of A derivation is a repeated application of rulesrules, starting with the start symbol and ending with a sentence (all terminal symbols)

Page 44: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-44

An Example GrammarGrammar<program> <stmts>

<stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | const

Page 45: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-45

An Example DerivationDerivation<program> => <stmts> => <stmt>

=> <var> = <expr>

=> a = <expr>

=> a = <term> + <term>

=> a = <var> + <term>

=> a = b + <term>

=> a = b + const

Page 46: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-46

Derivations Every string of symbols in a derivation is a

sentential form A sentence is a sentential form that has only

terminal symbols A leftmost derivation is one in which the

leftmost nonterminal in each sentential form is the one that is expanded

A derivation may be neither leftmost nor rightmost

Page 47: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-47

Parse Tree A hierarchical representation of a

derivation

<program>

<stmts>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

Page 48: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-48

Ambiguity in Grammars A grammar is ambiguous if and only if it

generates a sentential form that has two or more distinct parse trees

Page 49: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-49

An Ambiguous Expression Grammar<expr> <expr> <op> <expr> | const<op> / | -

<expr>

<expr> <expr>

<expr> <expr>

<expr>

<expr> <expr>

<expr> <expr>

<op>

<op>

<op>

<op>

const const const const const const- -/ /

<op>

Page 50: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-50

An Unambiguous Expression Grammar If we use the CFG grammar to indicate to indicate

precedence levels of the operatorsprecedence levels of the operators, we cannot have ambiguity

<expr> <expr> - <term> | <term><term> <term> / const| const

<expr>

<expr> <term>

<term> <term>

const const

const/

-

Page 51: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-51

Associativity of Operators Operator associativity can also be indicated by a grammarcan also be indicated by a grammar

<expr> -> <expr> + <expr> | const (ambiguous)<expr> -> <expr> + const | const (unambiguous)

<expr><expr>

<expr>

<expr> const

const

const

+

+

Page 52: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-52

ExtendedExtended BNF OptionalOptional parts are placed in brackets [ ]

<proc_call> -> ident [(<expr_list>)]

Alternative Alternative parts parts of RHSs are placed inside parentheses and separated via vertical bars <term> → <term> (+|-) const

RepetitionsRepetitions (0 or more) are placed inside braces { }<ident> → letter {letter|digit}

Page 53: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-53

BNF and EEBNF BNF <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor>

EEBNF <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}

Page 54: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-54

Recent Variations Recent Variations in EBNF Alternative RHSs are put on separate lines Use of a colon instead of => Use of opt for optional parts

Use of oneof for choices

Page 55: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-55

StaticStatic Semantics Nothing to do with Nothing to do with (actual) (actual) meaningmeaning Context-free grammars (CFGsCFGs) cannot

describe all of the syntax of programming languages (CSGs might be ?!)(CSGs might be ?!)

Categories of constructs that are troubletrouble:

- Context-free, but cumbersome (e.g.,

types of operands in expressions)

- Non-context-free (e.g., variables must

be declared before they are used)

The analysis required to check these specifications The analysis required to check these specifications (e.g. type compatibility) can be done (e.g. type compatibility) can be done at compile at compile timetime..

Page 56: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-56

Attribute Grammars (AGs) (Knuth, 1968)

Cfgs cannot describe all of the syntax of programming languages

Additions to cfgs to carry some semantic info along through parse trees

Primary value of AGs: Static semantics specification Compiler design (static semantics checking)

Static semantics: including data-type, forward-branching locations

Page 57: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 57

Attribute Grammars

Def: An attribute grammar is a cfg G = (S, N, T, P) with the following additions: For each grammar symbol x there is a set A(x)

of attribute values Each rule has a set of functions that define

certain attributes of the nonterminals in the rule

Each rule has a (possibly empty) set of predicates to check for attribute consistency Attribute grammars are grammars to which have been added

attributes, attribute computation functions, and predicate functions.

Page 58: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 58

Attribute Grammars Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... ,

A(Xn)) define synthesized attributes Functions of the form I(Xj) = f(A(X0), ... ,

A(Xj-1)), for 1 <= j <= n, define inherited attributes

Initially, there are intrinsic attributesintrinsic attributes on the leaves

X0

X1 X2 Xn...

Page 59: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 59

Attribute Grammars Example: expressions of the form id + id

id's can be either int_type or real_type types of the two id's must be the same type of the expression must match it's expected

type(from top down) BNF: <expr> <var> + <var> <var> id Attributes:

actual_type - synthesizedsynthesized for <var> and <expr> expected_type - inheritedinherited for <expr>

Page 60: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 60

1. <assign> <var> = <expr>2. <expr> <var>1 + <var>2

3. <expr> <var> 4. <var> A | B

Syntax rule: <assign> <var> = <expr> Semantic rule: <expr>.expected_type <var>.actual_type

Syntax rule: <expr> <var>1 + <var>2

Semantic rule: <expr>.actual_type if (<var>1 .actual_type = int) and

(<var>2 .actual_type = int) then int else real

end if Predicate:Predicate: <expr>.actual_type = <expr>.expected_type

Rule 1

Rule 2

Page 61: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 61

Syntax rule: <expr> <var> Semantic rule: <expr>.actual_type <var>.actual_type PredicatePredicate: <expr>.actual_type = <expr>.expected_type

Syntax rule: <var> A | B Semantic rule: <var>.actual_type look-up(<var>.string)

To compute the attribute values, the following is a possible sequence:

1. <var>.actual_type look-up(A) (Rule 4)

2. <expr>.expected_type <var>.actual_type (Rule 1)

3. <var>1 .actual_type look-up(A) (Rule 4)

<var>2 .actual_type look-up(A) (Rule 4)

4. <expr>.actual_type either int or real (Rule 2)

5. <expr>.expected_type = <expr>.actual_type is either

True or False (Rule 2)

Rule 3

Rule 4

Page 62: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 62

<assign>

<var> <var>1 <var>2

<expr>

A = A + B

Actual_typeActual_type

Actual_type

Actual_type

Expected_type

③④

⑤ ⑤

T or F ?⑥

Now you may take a close look of a “Fully attributed” parse tree.

Page 63: Chapter 3 Language Translation Issues & Program Verifications

112/04/19 63

Attribute Grammars How are attribute values computed?How are attribute values computed?

If all attributes were inherited, the tree could be decorated in top-down order.

If all attributes were synthesized, the tree could be decorated in bottom-up order.

In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used.

Page 64: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-64

SemanticsSemantics There is no single widely acceptable notation

or formalism for describing semantics Several needs for a methodology &

notation for semanticsnotation for semantics: ProgrammersProgrammers need to know what statements mean Compiler writers Compiler writers must know exactly what language constructs do Correctness proofs would be possible Compiler generators Compiler generators would be possible DesignersDesigners could detect ambiguities and inconsistencies

Page 65: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-65

Although much is known about the programming language syntax, we have less knowledge of how to correctly define the semantics of a language. The problem of semantic definition has been the object of theoretical study for as long as the problem of syntactic definitions, but a satisfactory solution has been much more difficult to find. Many different methods( 5 indeed) for the formal definition of semantics have been developed;

1. Grammatical models( attribute grammars; Knuth 1968): By adding attributes to each rule in a grammar. (done already)(done already)

Page 66: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-66

2. Operational models: The Vienna Definition Language (VDL) is an operational approach form in the 1970s. Typically the definition of the virtual computer is described as an automaton.(state machine type)

3. Denotational models: (functional model type)

4. Axiomatic models: This method extends the predicate calculus to include programs.(by Hoareby Hoare)

5. Specification model: The algebraic data type is a form of formal specification. For example

pop( push (S, x))= S.

No single semantic definition method No single semantic definition method has been found useful for both user and implementor of a language.

Page 67: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-67

We can use some of the previous discussion on language semantics to aid in correctness issues in three ways:

1. Given Program P, what does it mean? That is, what is its Specification S?(semantic modeling issue)

2. Given Specification S, develop Program P that implement that specification.(the central problem in SE today)

3. Do Specification S and Program P perform the same function?(the central problem in program verification)

Page 68: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 3-68

Q := 0

R := X

R>=Y

R := R - Y

Q := Q + 1

X>=0 ^ Y>0

X>=0 ^ Y>0 ^ Q=0

X>=0 ^ Y>0 ^ Q=0 ^ R=X

X>=0 ^ Y>0 ^ Q>=0 ^ R>=0 ^ X=Q*Y+R

X>=0 ^ Y>0 ^ R>=0 ^ Q>=0 ^ X=Q*Y+R ^ R>=Y

X>=0 ^ Y>0 ^ R>=0 ^ Q>=0 ^ X=(Q+1)*Y+R

X>=0 ^ Y>0 ^ R>=0 ^ Q>=1 ^ X=Q*Y+R

X>=0 ^ Y>0 ^ Q>=0 ^ X=Q*Y+R ^ 0<=R<Y

Assigning meanings to programs (R.W.Floyd 1967)

Page 69: Chapter 3 Language Translation Issues & Program Verifications

Modeling Language Properties

1. Formal Grammars2. Language Semantics3. Program Verification

Page 70: Chapter 3 Language Translation Issues & Program Verifications

Type 0: Unrestricted Grammars

any

Type 1: Context Sensitive Grammars(CSG)

for all , || ||

Type 2: Context Free Grammars(CFG)

for all , N (i.e., A )

Type 3: Right (or Left)-Linear Grammars

if all productions are of the form

A x or A xB

G2 = ({S, B, C}, {a, b, c}, P, S)P: S aSBC

S abCCB BCbB bbbC bccC cc

Which Type ? What the language is?

G3 :S S + SS S * SS (S)S a

Which Type ?What language ?

Formal Grammars

Page 71: Chapter 3 Language Translation Issues & Program Verifications

Q: Is there a limit to what we can compute with a computer ?

Halting problem may answer this question.

Turing Machines can define the class of all computable funs.

Page 72: Chapter 3 Language Translation Issues & Program Verifications
Page 73: Chapter 3 Language Translation Issues & Program Verifications

1. A finite-state automata consists of a finite state graph and a one-way tape. For each operation, the automaton reads the next symbol from the tape and enters a new state.

2. A pushdown automaton adds a stack to the finite automaton. For each operation, the automaton reads the next tape symbol and the stack symbol, writes a new stack symbol, and enters a new state.

3. A linear-bounded automaton is similar to the finite-state automaton with the additions that it can read and write to the tape for each input symbol and it can move the tape in either direction.

4. A Turing machine is similar to a linear-bounded automaton except that the tape is infinite in either direction.

Page 74: Chapter 3 Language Translation Issues & Program Verifications

Although much is known about the programming language syntax, we have less knowledge of how to correctly define the semantics of a language. The problem of semantic definition has been the object of theoretical study for as long as the problem of syntactic definitions, but a satisfactory solution has been much more difficult to find. Many different methods( 5 indeed) for the formal definition of semantics have been developed;

1. Grammatical models( attribute grammars; Knuth 1968): By adding attributes to each rule in a grammar. Please refer to the Fig.4.3 for more details.

Page 75: Chapter 3 Language Translation Issues & Program Verifications

2+4*(1+2)

E E + T | T

T T * P | P

P ( E ) | digit

Page 76: Chapter 3 Language Translation Issues & Program Verifications

2. Imperative or operational models: The Vienna Definition Language (VDL) is an operational approach form the 1970s. Typically the definition of the virtual computer is described as an automaton.(state machine type)

3. Applicative models: (functional model type; denotational sem.)

4. Axiomatic models: This method extends the predicate calculus to include programs.(by Hoare)

5. Specification model: The algebraic data type is a form of formal specification. For example

pop( push (S, x))= S.

No single semantic definition method has been found useful for both user and implementor of a language.

Page 77: Chapter 3 Language Translation Issues & Program Verifications

We can use some of the previous discussion on language semantics to aid in correctness issues in three ways:

1. Given Program P, what does it mean? That is, what is its Specification S?(semantic modeling issue)

2. Given Specification S, develop Program P that implement that specification.(the central problem in SE today)

3. Do Specification S and Program P perform the same function?(the central problem in program verification)

Page 78: Chapter 3 Language Translation Issues & Program Verifications

Q := 0

R := X

R>=Y

R := R - Y

Q := Q + 1

X>=0 ^ Y>0

X>=0 ^ Y>0 ^ Q=0

X>=0 ^ Y>0 ^ Q=0 ^ R=X

X>=0 ^ Y>0 ^ Q>=0 ^ R>=0 ^ X=Q*Y+R

X>=0 ^ Y>0 ^ R>=0 ^ Q>=0 ^ X=Q*Y+R ^ R>=Y

X>=0 ^ Y>0 ^ R>=0 ^ Q>=0 ^ X=(Q+1)*Y+R

X>=0 ^ Y>0 ^ R>=0 ^ Q>=1 ^ X=Q*Y+R

X>=0 ^ Y>0 ^ Q>=0 ^ X=Q*Y+R ^ 0<=R<Y

Page 79: Chapter 3 Language Translation Issues & Program Verifications

Copyright © 2009 Addison-Wesley. All rights reserved. 1-79

Summary BNF and context-free grammars are

equivalent meta-languages Well-suited for describing the syntax of

programming languages An attribute grammar is a descriptive

formalism that can describe both the syntax and the semantics of a language

Three primary methods of semantics Three primary methods of semantics description( description( by 2009by 2009)) Operation, axiomatic, denotationalOperation, axiomatic, denotational