i. programming domains and paradigms in software developme… · represents meaning of language...

Lecture Notes Alex Cummaudo COS30023—LSD 13/08/2014

I. Programming Domains and Paradigms

Programming Domains ▸ Languages are designed for a specific purpose▸ There isn’t a one-size-fits-all approach to every problem (i.e., no universal language)

- Scientific Applications (Fortran, Argol): Floating Point Arithmetic- Business Applications (COBOL): Report Generation, Decimal Arithmetic- Artificial Intelligence (Lisp, Prolog): Symbolic Computation- System Programming (C, Pascal): Operating Systems- Scripting Languages (sh, awk, Perl, Tcl): System Configuration

Programming Domains Imperative Style: program = algorithms + data

▸ Oldest style of programming▸ Instructions such as assignments, branches, control flow etc.▸ Data is held in variables so instructions within algorithm can vary▸ Fortran, Assembly, Algol, Pascal, C, Ada

Functional Style: program = function · function ▸ Mathematical approach to programming based on Lambda Calculus ▸ No such thing as variables:

- There is no such thing as state, memory or side affects- Everything is conceptually a function

▪ We push arguments to functions to alter the output▪ The number of arguments we can push is not fixed

▪ We can push as many as we desire

▪ The result of the pure functional program only depends on the arguments supplied ▸ Haskel, Scheme

Logic Style: program = facts + rules ▸ Similar to functional programming in that it takes a mathematical approach via Formal

Logic- Deduce falsehoods and truths from predicates of the facts supplied- Program is devised of predicates, which are the rules that govern a problem - Uses Logical Inference to deduce conclusions from the rules started from the axioms

and theorems already provided▸ Satisfaction of formulae by finding solutions▸ Describe WHAT the problem is, NOT HOW it should be solved (declarative language)▸ Prolog

� / �1 48


Object Oriented Style: program = objects + messages ▸ Programs build on types and inheritance

- Inheritance: realising new functionality from existing classes▸ Invoke messages on instances of objects

- The object responds by calling a method- The object could respond by throwing a runtime error should it not understand the

message it received

Prolog▸ Atoms: think string constant

- lowercase

fred, true, emacs

▸ Variable: substitutions Prolog can make with this variable to satisfy a rule- Uppercase- Think of as placeholders that can be wired-up with each other within the problem

editor(zen, SomeEditor)

▸ Facts/Compound Terms: functors or rules that has one or more arguments provided- Characterised by their name and arity (think number of arguments)

editor(zen, emacs). editor/2append(List1, List2, List3). append/3

▸ Goals/Predicates: When prolog is asked a query, it will try and solve each of its goals such that all predicates result in true- Prolog will try to find a set of substitutions for any variables to make all goals true- Think comma as an AND (rule1 AND rule2 AND rule3)

editor(Person1, vim),editor(Person2, vim),Person1 \== Person2

- In the above example, Prolog will try and find Person 1 and Person 2 such that they are not the same person and that two people exist in its rule base where both people use vim as their editor:

editor(ben, vim). editor(fred, vim). editor(fred, emacs).

� / �2 48


▸ Rules: Condenses a set of queries to one line statements:- Note how we can make Editor variable on the fly in the example below- This just wires up the two predicates such that their editor must be the same editor

pair(Person1, Person2) :- editor(Person1, Editor), editor(Person2, Editor), Person1 \== Person2.

▸ Lists: A list contains a number of atoms, numeric values or Variables- The bar or pipe symbol cuts the head of the list from its tail- We can use recursion of the same rule to iterate through the list

H { T } // Read as H bar T[alex, declan, rueben]Member ExampleSomething is a member of a list if it’s the first element (head) in listmember(H, [H|_]) :- true.Something is a member of a list if it’s somewhere in the tail of the listmember(X,[_|T]) :- member(X,T).

Syntax and Semantics

Syntax ▸ Concerned with the form of the programming language

- How do we put together expressions, commands and declarations to form a program?- Is it syntactically correct to say x++?

▸ Syntax guides semantics - Syntax alone isn’t enough, it just defines what makes valid sentences- Syntax is used to guide programmers into proper semantics

Semantics ▸ Concerned with the meaning of the programming language

- How does it behave when a program is executed? Why is it so?- Semantics assign meaning to every sentence that is of valid syntax

▸ We can use Axiomatic Semantics to prove the correctness of a programming language- That is, we know that our programming language has executed correctly since we have

mathematically proved that the logic is right based on the outcomes we see

� / �3 48


Hilbert-Style Proofs ▸ Axioms of proof systems are the formulae that are provable by definition

- Basic building blocks of proof- E.g., true

▸ Inference rules are assertions that infer the correctness of the conclusion of one formula from another formula

▸ Proofs are a structured object build from formulae according to constraints of axioms and inference rules - Axioms + Inference = Proof- A Proof Tree exists with a premise at the top of the bar and the conclusion at the bottom

▸ For example, if we think in prolog standards:

Anakin is the parent of Lukeparent(anakin,luke).Anakin is a malemale(anakin).All males that are parents are fathersfather(X,Y) :- parent(X,Y), male(X).As Anakin is a male and he is Luke’s parent, he is therefore Luke’s father

Axiomatic Semantics ▸ Assess the correctness of programs based on its output▸ Use Axiomatic Semantics to decompose a conclusion back into its premises

- It is a backwards analysis of conclusions- Work backwards to discover the preconditions that made the program tick to its

conclusion

� / �4 48


Hoare Triples ▸ An axiomatic method of defining meaning of a language using logical assertions

{ E1 } C { E2 }The precondition E1 is true, then the C command happens, now the postcondition E2 is true▸ The boolean expression E1 holds before the computation C, then E2 is true must hold too

� Proving the behaviour of the increment statement

� SEQUENCE RULE: Proving the behaviour of a C1;C2 statement

� ASSIGNMENT STATEMENT: — C = C.Target := C.Source

� / �5 48


� IF-THEN-ELSE STATEMENT — C = if C.Test then C.Then else C.Else

Note that ALL possible cases (Then or Else) lead to Q

� RULES OF CONSEQUENCE

We can discover that P -> C -> Q even though we are doing this via subtypes primes

� / �6 48


Worked Example

[ P ] { true }if ( a >= b ) then m := a [ Cthen ] else m := b [ Celse ]end[ Q ] {m = MAX(a,b)}

▸ Full Proof Tree needs - DECOMPOSE: Cthen and Celse Proof Tree using Rules of Consequence- As we have decomposed both branches, we have explored all possible outcomes for Q

� / �7 48


Operational Semantics ▸ Uses reduction to see how state evolves in the program

- Keep reducing the program to get to its normal form - Sometimes, we can never get to normal form (e.g., recursion)… which means program

won't be able to evaluate its output▸ Provides the means to display the computational steps undertaken when a program is

evaluated to its output.

Assignment Statement

C = C.Target := C.Source

If then else statement

C = if C.Test then C.Then else C.Else

� / �8 48


Denotational Semantics ▸ Represents meaning of language concepts as mathematical objects (functions)

- Maps well-typed derivation trees to their mathematical meanings- Th method does not maintain states, but the meaning if the program is given as a

function that interprets all language elements of a corresponding set of values- Small-step semantics; reducing the concepts to partial functions

▪ Function: the mapping of all elements from a source domain to a dst. co-domain▪ Partial Function: function is not defined for all elements (i.e., a restriction of the

domain to the co-domain)

Assignment Statement

C = C.Target := C.Source

If then else statement

C = if C.Test then C.Then else C.Else

We define the If-Then-Else statement concept (S2) using If-Then-Else

Bootstrap Mathematics (S1)… S1 ≠ S2!

� / �9 48


Types and Type Systems ▸ Type Systems

- Restricts the number of expressions to prevent runtime errors(i.e., which expressions can be assigned a valid type using the underlying type system)

- Types are collections of values (think sets) that share some common properties(i.e., v ∈ T — all values are subsets of a type)

- Most programming languages have a type system that will:

▪ check the type during compilation (C, C++, C#)—statically typed

▪ check the type during runtime (Objective-C, Ruby)—dynamically typed ▸ Values

- Values: an expression used solely to denote a value- Constants: a named abstraction in the program; refers to an immutable value- Primitive Values: values which cannot be further decomposed (e.g., int, string etc.)- Composite Values: values built up from primitives (e.g., records, arrays, lists, enums)- Pointers: denote the location of values

� / �10 48


Inductive Sets of Data

Interpreters and Compilers Interpeter

▸ An interpreter consists of:- a front end

▪ converts the program text to an abstract syntax tree (AST)

▪ the AST is the internal representation of the program text

▪ the AST will check for soundness of the program text; AST parsers will ensure that the AST is sound and valid

- an evaluator

▪ receives at the AST and performs associated actions, dependent on the AST

▪ doesn't actually see or work with the original program input▪ converts the AST + other input => program answer

�

� / �11 48


Compiler ▸ A compiler translates program text into another language

- e.g., C to Machine Code, to C# to ByteCode etc.▸ Compilers consist of:

- a front end

▪ converts the program text to an abstract syntax tree (AST) - a set of compiler phases that do particular tasks▪ Analysis Phases: ▹ Semantic Analysis▹ Optimisation

▪ Translator Phases: ▹ Optimisation▹ Code Emission

▸ An evaluator for a compiler may be an interpreter (JVM) or a machine (von Neumann)▸ The process may repeat as many times as it needs in order to resolve dependencies

- That is, it might take one compilation process, and use the output of that compilation process as the input for another compilation process and so on…

� / �12 48


�

� / �13 48


Programming Language Artefacts ▸ There are always two sets of values:

- Expressed Values ▪ Specified via literal expressions

▪ string, numbers, characters - Denoted Values ▪ Value is bound to a name

▪ x := 123 ▸ Kinds of languages

- Source

▪ Language used to write programs▪ Evaluated by compiler or interpreter

▪ Java, C, Ruby - Host ▪ Language used to specify the compiler or interpreter

▪ Java, C - Target ▪ Language that is executed

▪ We translate to target languages

▪ C, Machine Code, JVM - E.g.: Scheme (Source) written in Java (Host) written in C (Target)

▸ Lexical Tokens - Building blocks of the language—atoms- A sequence of characters, as one unit, in the programming language definition- When compiling/interpreting, we digest the raw input stream into these tokens- Examples:▪ Identifiers

Give meaning to denoted values

foo, N14, $dbctn, symbol?

▪ Integers▪ Real

▪ KeywordsReserved identifiers with specific meaningif, while, downto, return, final, sealed

▪ Operators

� / �14 48


▪ PunctuationAid in the reading if your program, both as a human and as a computer( ) . ; { }

Recognising Tokens ▸ Lexical tokens can be defined using the formal language of regular expressions ▸ Lexical tokens can be recognised using finite deterministic automata (DFA) ▸ DFAs is a quintuple with the following elements:

- Σ — a set of actions or alphabet of the state machine- Q = { q0 , q1 … qn } — a set of states (with each state being q)▪ q0 — designated start state

- F ⊆ Q — F is a subset of Q, where F is the accepting states- σ ⊆ Q × Σ × Q — a subset representing transitions between states (over Σ)

▪ Transitions describe behaviour within the DFA

▪ (q, a, q') ⊆ σ — note the state change from q to q' with automaton a

▸ Automaton A is finite if Q is finite ▸ Can be represented as a

transition graph (right)- The automaton A0 represents

transitions over state q0, q1, q2, q3…

- T h e a u t o m a t o n h a s a n alphabet Σ = { a, b, c }

- In the event we never end up at the final state q1, then we have invalid input

- In the event we approach the black hole state q3, we have invalid input▪ This will lead to recursion or infinite loop affect

▸ Automaton Language - A string is provided as input over an automaton for the alphabet Σ▪ if w is a string over Σ, then length of w is considered to be | w |

▪ a string with | w | = 0 is known as the empty string, or ε▹ an empty string will stop the automaton

- Consider inputs for Σ = { a, b, c } over A0:

▪ a, b, c, a is valid as we end up at q1 (q0 -> q2 -> q3 -> q1)

� / �15 48


▪ b, c, a, b, c ad inf. is invalid as we are stuck in the black hole q3

▪ a, b is invalid as we do not end up at the final state q1

Inductive Specification

▸ A mechanism to formally specify possible infinite sets of values▸ Define an infinite set in terms of itself: recursion

- Clauses▪ Base Clause: A candidate (usually the smallest possible set) to represent some truth

about the set

▪ Inductive Clause: A recursive statement which can be decomposed from the base clause

- E.g.

▪ 0 ∈ S, where S is the smallest set of all natural numbers

▪ Hence: x ∈ S ⇒ x + 3 ∈ S

▪ Hence, as we truly said 0 is a subset of S, have defined x is a subset of S, we

therefore know x + 3 is part of

Mathematical Induction ▸ Once the sets have been defined inductively:

- we can use the inductive definition to deduce the properties about members in the set- we write down an infinite proof in a finite way

▸ Take the mathematical concept of reason on numbering applied to reason the structure of languages

▸ Well-formed formulae:- Base form:▪ true and false

▪ p where p is a propositional value- Induced (Inductive specification) from base form:▪ ( ¬ p ) "not p"

is well-formed, given p is well-formed

▪ ( p ∧ q ) "p and q" is well formed, given p and q are well-formed

▪ ( p ∨ v ) "p or q" is well formed, given p and q are well-formed

� / �16 48


▪ ( p ⇒ q ) "if p then q" ( p implies q )

is well formed, given p and q are well-formed

▪ ( p ⇔ q ) "p if and only if q; q if and only if p" ( p is equivalent to q )

is well formed, given p and q are well-formed

Example I ▸ Proof by mathematical induction that P is true for every positive integer, n (i.e. P(n))▸ Form a base clause that is wholly true

P(1) ⇒ true

"The integer 1 is a positive one"▸ Form the inductive step which can induct all other values for n

P(n) ⇒ P(n+1)

"Given P of n is true (that is, n is positive as implied from the base clause), this will imply that one added to whatever value n is is also true"

Example II ▸ Proof by mathematical induction that sum(n) = n(n+1) / 2 ▸ Form a base clause that is wholly true

sum(0) = 0 ( 0 + 1 ) / 2 = 0 ⇒ true

▸ Form the inductive step which can induct all other values for nAssume base clause holdsWe must show that n ∖ n + 1 will hold (i.e., swap n with n + 1) sum ( n ) = n ( n + 1 ) / 2∴ sum ( n + 1 ) = ( n + 1 ) ( ( n + 1 ) + 1 ) / 2

∴ sum ( n + 1 ) = ( n + 1 ) ( n + 2 ) / 2Now use inference to apply sum ( n ) on RHSWe must show that the sum of the first n numbers is ( n + 1 ) sum ( n + 1 ) = sum ( n ) + ( n + 1 )We must show that sum (n)∖n ( n + 1) / 2 will hold∴ sum ( n + 1 ) = ( n ( n + 1 ) / 2 ) + ( n + 1 )

∴ sum ( n + 1 ) = ( n ( n + 1 ) + 2( n + 1 ) ) / 2∴ sum ( n + 1 ) = ( n2 + n + 2n + 2 ) / 2

∴ sum ( n + 1 ) = ( n2 + 3n + 2 ) / 2∴ sum ( n + 1 ) = ( n + 1 ) ( n + 2 ) / 2

Q.E.D� / �17 48


Structural Induction ▸ Uses the fact that it is possible to break up a larger substructure into smaller substructures

- Think onions: we infer the inner shells, but look at the outer shell to peel back▸ Similar to mathematical induction

- Base step: Simple structures (without the substructures) are wholly true- Induction step: If the induction hypothesis on the substructures of a given object, then

it is true on the overall structure itself

Example I ▸ Prove the following BNF specification

- <exp> is a set of expressions- <idf> is a set of identifiers

<exp> ::= <idf> | ( lambda ( <idf> ) <exp> ) | ( <exp> <exp> )

▸ Use induction prove that the structure of expression <e> if e ∈ <e>, that e has the same

number of left and right parentheses - e ≡ id

"e is identical to id"#( = 0#) = 0

∴ #( = #) Q.E.D

- e ≡ ( lambda (id) e1 )

Assume induction hypothesis (base clause)Since e1 ∈ <e>, #( e1 = #) e1

let m#( = #( e1

let k#( = #) e1 m = kHence e1 has been proved. Time for e… (why 2? look at # braces!!!!!) #( e = 2 + m #) e = 2 + k∴ #( e = 2 + m = 2 + k = #) e Q.E.D

� / �18 48


Arden's Rule ▸ The following base operations exist for sets of strings S

- AlternationS1 | S2 = { s | s ∈ S1 ⋁ s ∈ S2 }

The string s is S1 or S2 - Concatenation

S1 · S2 = { s1s2 | s ∈ S1, s ∈ S2 }

The string is a combination of both strings - Iteration

S* = { ε | S | S · S | S · S · S | S · S · S · S … }We iterate from an empty string, then keep iterating…

▸ The following equations hold for strings S- (S1 · S2) · S3 = S1 · (S2 · S3) - (S1 | S2) · T = S1 · T | S2 · T- T · (S1 | S2) = T · S1 | T · S2- S · ε = S- S · ∅ = ∅

- S · (T · S)* = (S · T)* · S

▪ Note, ∅ means “no path” (think infinite loop state with DFA)

▹ When we have no path, we cancel out that path from out automata calculations

▪ ε means “empty path" (think no more paths to continue from with DFA—dead end)▸ Therefore, Arden's Rule is defined as:

X = S · X | T ⇒ X = S* · T

▸ That is, if X is recursively defined with a concatenation for S or T, then the automaton accepts S and continues to accept S until we find T

Regular Expression Notation

�

Regular Expression Notation

79

a An ordinary character stands for itself.Empty string (we write ε)

M | N Alternation, choosing one M or one N.M N Concatenation, one M followed by one N. M* Repetition, zero or more times M.M+ Repetition, one or more times M.M? Optional, zero or one occurrence of M.[a-zA-Z] Character set alternation. A period stands for any single character (except newline).“while” Quotation, a string in quotes stands for itself.

� / �19 48


Regular Expressions to Finite Deterministic Automata ▸ Use epsilon for end nodes (double circle)

- However, epsilon may be used in other transitions… (e.g., 1->2 via epsilon)

Disambiguation Rules ▸ In the event that two DFAs can accept the same string, they may be able to parse the

same thing. ▸ Ambiguities arise in token recognition which are disambiguated using the following rules.▸ Longest Match

- Consume as much as you possibly can - The longest initial substring of the input that can match any regular expression is taken

as the next token

Finite Automata for Tokens

81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n

a-zA-Z1 2 a-Z-A-Z0-9

0-91 2 0-9


81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n


0-91 2 0-9


81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n


0-91 2 0-9


81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n


0-91 2 0-9


81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n


0-91 2 0-9


81

i f1 2 3

1 2.

2 4.

1 3.

0-90-9

0-9 0-9

2 3/1 4/.

" ",\t, \n

\n

5" ",\t, \n


0-91 2 0-9

Input DFA

if keyword 'if'exactly i then f

1. Consume i2. Consume f3. Consume ε to stop

[0-9]+integer token one or more sequential occurrences of 0-9

1. Consume 0-92. Consume 0-9 and repeat

stop repeat on ε

[a-zA-Z][a-zA-Z0-9]*identifier token start with letter, continue with 0 or more letters or numbers

1. Consume a-zA-Z2. Consume a-zA-Z0-9 and repeat

stop repeat on ε

([0-9]+"."[0-9]*)|([0-9]*"."[0-9]+) real number token start with one or more numbers, then a period then continue with zero or more numbers

OR start with zero or more numbers, then a period then continue with one or more numbers

1. goto state 2 if 0-9 consumed goto state 3 if period consumed

2. Consuming 0-9 and repeatstop repeat on period

3. Consume 0-94. Consume 0-9 and repeat

stop repeat on ε

("//".*"\n")|(" "|"\n"|"\t") whitespace (inc. //) start with //, then zero or more any chars then newline

OR space or newline or tab

1. goto state 2 if / consumed goto state 5 if space, \t or \n cons.

2. Consume /3. Consume anything and repeat

stop repeat on \n4. Consume ε5. Consume space, \t or \n and

repeatstop repeat on ε

. else input is error any sequence of chars that do not satisfy any of the DFAs above

1. Consume anything2. Consume ε

� / �20 48


▪ "596.354"⇒ real▪ "85893a" ⇒ integer as 85893

▸ Rule Priority - For a longest initial substring, the first regular expression that can match determines its

token type- E.g.: for the text input if and the following regular expressions:

RegEx1: ifRegEx2: [a-zA-Z][a-zA-Z0-9]*

- The order of definitions of regular expressions determines its priority▪ Because the if keyword RegEx (1) is defined before the identifier RegEx (2) then (1)

gets priority over (2) for input of 'if'

Combining DFAs ▸ To turn all the individual machines into one machine, we create:

- Finite Nondeterministic Automata (NFA) - It is nondeterministic because it can decide which path to go to even if there are

multiple same-transition paths- It is a refinement of the NDA:

▪ It still satisfies the quintuple (Σ, Q, q0, σ, F) butσ ⊆ Q × Σ × 2Q

That is, transitions (σ) are now subsets of:1. a relation starting at state Q2. the FDA looks at a label in subset Σ

3. adds in a state that is a subset of a power set of all the states (2Q)In other words, we can end up at multiple states at once

DFA NFA

Discrete labels from one state to another; all labels labelled differently and branch off discretely.

Can have two transitions out of one state with the same label (q0 and a1 goes to both q1 and q2)

� / �21 48


▸ NFA's can be expressed into equivalent DFA'sWorked Example:

DFA NFA

q0 = (a · (b | a · c))*

is the regular expression to express both automata

� / �22 48


▸ A note on pruning- q3 = ( 0 | 1 ) · q3 where Σ = { 0, 1 }

Above is an empty path since it is recursively infinite…"Consume 0 or 1 and then goto q3 again"Hence since this is a "black hole": q3 = ( 0 | 1 ) · q3∴ q3 = ∅ Now all occurrences of q3 can be pruned and substituted with nothing

- a | ε ≙ [a]

For all expressions where a or ε, replace with optional a = [a]

Syntax Analysis & Grammar ▸ While DFA/NFAs describe valid input, it doesn't mean it helps with the form of the input

- The DFA/NFA will sort the whole stream of input into tokens▸ The parser will ensure that the tokens have appropriate grammar

- grammar describes all possible sequences that have correct formi.e., does it make sense to have if while for then?

▸ Unfortunately, regular expressions can only go so far- Regular expressions are stateless and do not keep track of what has been processed- For example

we can't keep track of the number of { and the number of } to see if they match - We can use Pumping Lemma for regular languages to achieve this

any sufficiently long string in the language contains a section that can be removed, or repeated any number of times, with the resulting string remaining in that language.

▸ Use Grammar to validate form- Grammar is a quadruple

G = (V, T, S, P) - V = vocabulary

set of all terms used - T ⊂ V

T is a true subset of VT is the terminal tokens

- S ∈ (V - T)

start symbols - P ⊆ (V* - T*) ⨉ V*

productions over the vocabulary and tokens

� / �23 48


▸ Language of a grammar is defined as the set of all strings of terminals that are derivable from the start string for the particular grammar:L(G) = { w ∈ T* | S ⇒+ w }

- w ∈ T* All strings of the terminal sets

- S ⇒+ w

Apply the productions of the grammar at least once from start to end ▸ Types of grammar

- Type 0: Free

▪ No restrictions on productions

▪ Just a simple, arbitrary rule set▪ Prolog

- Type 1: Context-Sensitive

▪ Only ok if we have context; can only apply the rule if its in context- Type 2: Context-Free

▪ A grammar can only have productions

▪ One, and one only, non-terminal symbol- Type 3: Regular

▪ A grammar can only have productions of the form:a is a terminalaB, Ba, a, ε, where B is a non-terminal

Backus-Naur Form (BNF) ▸ Specify the grammar of a language using a context-free grammar, BNF▸ Follows the format:

lhs ::= rhs

- Where lhs is a nonterminal and rhs is a list (separated by |) of terminals and nonterminals

< ListOfNumbers > ::= "()" | "(" < Number > "." < ListOfNumbers > ")"Bolded = nonterminalUnderlined = terminal

A list of numbers is either an empty list ( ) OR in the format (H.T)

� / �24 48


Concrete Syntax Tree ▸ Represents the external representation of

statements, expressions and values▸ We can construct a syntax tree from the

tokens that are made available to us▸ Remember:

- 1. Input text stream undergoes a lexical analysis using a DFA/NFA

- 2. Tokens are then consumed using the syntax tree to their terminals

- 3. Once we are at the leafs of the syntax tree, we have made the syntax tree- Goal: For all possible input strings, we get exactly one syntax tree

▸ The problem is we may have ambiguous grammars:- Ambiguous Grammars:

The same input and achieve multiple syntax trees- Non-Ambiguous Grammars:

One input string to get one syntax tree (i.e., this string produces a unique syntax tree)- Consider the following ambiguous grammar:

< Expression > ::= < Identifier >< Expression > ::= < Number >< Expression > ::= < Expression > + < Expression >< Expression > ::= < Expression > - < Expression >< Expression > ::= < Expression > * < Expression >< Expression > ::= < Expression > / < Expression >< Expression > ::= ( < Expression > )

▸ Precedence Problem - The higher up the syntax rules the lower the priority- An operator has tighter binding (higher priority) if it's defined deeper in the hierarchy- In the above example:

▪ Followed BODMAS since the order of the rules are defined with higher priorities last ▸ Associativity Problem

- If an operator uses left-recursive structure, then it is associated to the left of that rule

< Expression > ::= < Expression > + < Expression >

- If an operator uses right-recursive structure, then it is associated to the right of that rule

< Expression > ::= - < Expression >

� / �25 48

Low Priority

Higher Priority


Consider the following Syntax Rules…

< Expression > ::= < Identifier >< Expression > ::= < Number >< Expression > ::= < Expression > + < Expression >< Expression > ::= < Expression > - < Expression >< Expression > ::= < Expression > * < Expression >< Expression > ::= < Expression > / < Expression >< Expression > ::= ( < Expression > )

Priority In ActionSince the * operator is defined after the + operator, the RHS tree is taken

Associativity In ActionSince the + operator is defined with a left-recursive structure, the left route is taken

� / �26 48


Abstract Syntax Tree ▸ Represents the internal representation of the statements, expressions and values▸ In the abstract syntax tree, we have no terminals

- The building blocks are tokens, rather than terminals.▸ Abstract trees generate ambiguous trees, but isn't meant for parsing (so this is okay)

Concrete Tree Define in terms of actual terminal characters (bolded)

< Lines > ::= ( < Expression > "\n" )*< Expression > ::= < Term > ( ( "+" | "-" ) < Term > )*< Term > ::= < PrimaryExpression > ( ( "x" | "/" ) < Primary > )*< PrimaryExpression > ::= "(" < Expression > ")" | "-" < Primary > | < Number >

Abstract Tree Define in abstract terms, without terminals

< Lines > ::= ε | < Expression > < Lines >< Expression > ::= < Expression > - < Expression > | < Expression > + < Expression > | < Expression > * < Expression > | < Expression > / < Expression > | - < Expression > | < Number >

� / �27 48


Converting Concrete Trees to JavaCC code

▸ Map up each of the BNF rules to the respective JavaCC code- Production names (LHS) mapped to the respective production declaration- Expand each of the internal RHS rules to the production body

� / �28 48


Converting Abstract Trees to JavaCC code ▸ Start by creating a class hierarchy for the Category on the RHS of the BNF

- i.e., the make the superclass the AST root node ▸ For each sub-node, map to a subclass of the root node that will implement an evaluation

in the way it needs to…

▸ Implement each subclass with its appropriate implementation- Binary operators should take two objects in the constructor- Unary operators should take one objects in the constructors- Literals will take the token supplied at the creation of the AST and will use that token's

image for the creation of its node

▪ The token's image is just the literal string of the token input…

� / �29 48


� ▸ Map the classes together in JavaCC

Concrete Syntax<Lines> ::= ( <Expression> “\n” )*!!

<Expression> ::= <Term> ( (“+”| “-”) <Term> )*!!

<Term> ::= <Primary> ( (“x” | “/”) <Primary> )*!!

<Primary> ::= “(” <Expression> “)”! | “-” <Primary>! | <Number>

123

� / �30 48





123





123


Turing Machines and Effective Compatibility ▸ A Turing machine is an abstract computer software that solves problems

- Consists of a readwrite head that runs along a tape as it passes in and out the machine- The tape can run bidirectionally- The tape usually reads binary only, but can have any alphabet as desired- The machine is usually fed a table of instructions - It is NOT a Finite Deterministic Automata!!!

▸ Effective Computability - Church’s Thesis proves that it is not possible to build a machine that is more powerful

than a Turing machine—if an algorithm is expressible in Lambda Calculus then it is effectively computable

▸ If an algorithm can pass the turing machine, then it is “turing computable”▸ Halting Problem is an example of what is not computable

- “The Control-C problem” - How will the program eventually produce a result?- There is no way to write a program that will check if your program will eventually halt!

� / �31 48


Lambda Calculus λ▸ “The mother of all computer programming languages!”

- A language that uses operational and denotational semantics to express algorithms- A language that can denote mathematical proofs- A language that can assign and specify type systems- A medium to represent mathematics on computers with the aim to exchange and store

reliable mathematical knowledge- A mathematical formalism for expressing computation by functions- A pure functional programming language

▸ Lambda Expression Syntax:

▸ We say that lambda functions will consume as many arguments as it needs to- Any arguments left over will be left over for he next function

▸ Examples: - Identify Function

The λx introduces x as a new argument in inner scope, which is immediately returned

- Pair FunctionThe λx introduces x, λx introduces y; this is mapped into λz, that introduces z, where z is mapped to a pair expression made of x and y

� / �32 48


References vs. Declarations ▸ Declaration: λx.e

The occurrence of x in the λ abstraction introduces the variable x as a placeholderThink of it like a prolog variable; eventually this variable will be bound to a concrete value

▸ Reference: f x yHere, all 3 variables (f, x, y) appear as references whose meanings are defined in the enclosing declaration (i.e., are concretely defined by the context by which they are in)

▸ Binding & Declarations - The value which the variable names is known as its meaning or denotation- The denotation comes from some declaration▪ We say that the variable is bound by that declaration (or refers to the declaration)

- The declaration have a limited scope (area where the variable is applicable)- As such, we can have multiple, different variables with the same name

▸ Scoping Rules - Static Scoping

▪ Determine variable declaration and its scope by looking at program text alone▪ Think most normal languages—we put our finger on the paper and hand execute it!

- Dynamic Scoping

▪ Determine variable declaration and its scope at runtime▹ Look at the program trace to see what’s going on▹ No such thing as inner functions or “final/sealed” keyword

▸ Binding Rules - In λx.e, x is a declaration that binds all occurrences of that variable in e

Wherever we see an x in e, it was bound in λx.e

Look for the binding at λx.

� / �33 48


▸ Free Occurrence and Bound Occurrence - Free Occurrence ▪ We use a variable that has a does NOT have a previous binder

▪ A variable x occurs free in an expression e if and only if there is:▹ a use of x in e▹ x is not bound to any declaration in e

▪ E.g. λx.x yy is used but not bound anywhere in the expression

▪ Think of it like cout in C++; it’s freely used but not defined anywhere (in your scope) - Bound Occurrence

▪ We use a variable that has a previous binder▪ A variable x occurs free in an expression e if and only if there is:▹ a use of x in e▹ x is bound to a declaration in e

▪ E.g. λf.λx.fxboth f and x are being used, and they have been defined

� / �34 48


Scope, Blocks and Visibility ▸ Blocks

- We can have as many nested regions as we need- The scope of the variable can include inner regions that hide the variable - Where we use a variable of the same name, the innermost use of the variable hides the

outermost variable

▸ Contour Diagrams - Regions can also be expressed as a contour diagram

Currying ▸ A way in which we can model functions with multiple parameters as curried functions

- Non-curried to curry = curry- Curry on non-curried = uncurry

� / �35 48


Lexical Addresses ▸ A way in which we can refer to variables without using the name

- We use an [d, p] expression, where:

▪ d is the relative depth up in the region to use and

▪ p is the position of the variable

▪ Going to d=n means we move n region’s out (down and out)

▪ Going to p=s means we move s places right (after moving down and out)

Free and Bound Variables ▸ In λx.e, x is bound by the enclosing λ▸ A variable that is not bound by λ is free ▸ Expressions with

- no free variables are closed: λx.x- free variables are open: λy.x y

▸ Lambda expressions with no free variables are called combinators - Procedures when applied to all arguments (i.e., a procedure call) is a combinator

Variable fv(x) = {x} The free variables of x is the singleton x

Function fv(λx.e) = fv(e) \ {x}The free variables of λx.e is all of the free vars of e minus x; because x is bound by λ and is therefore NOT free!

Application fv(e1.e2) = fv(e1) ∪ fv(e2) The free variables of an application is just a union of both free variables in both exprs.

Variable bv(x) = ∅ There are no bound variables in x; empty set

Function bv(λx.e) = bv(e) ∪ {x}x is bound by λ; therefore the bound variables of a function λx.e is all of the bound variables of e union with that of x singleton

Application bv(e1.e2) = bv(e1) ∪ bv(e2) The free bound of an application is just a union of both bound variables in both exprs.

� / �36 48


Lambda Calculus Reduction Rules ▸ Describe transitional relations from one state to another

α-conversion ▸ Allows the renaming of bound variables

- A bound name x in λx.e may be renamed to y as long as there are no FREE occurrences of y in the expression e

- We say that y is “fresh” in e

β-reduction (parameter passing!) ▸ The computational engine for lambda calculus

- We reduce a term in order to “run or execute the term”

▸ Applicable in applications only- Evolution of applications

▸ Avoid name capture; parameter passing- It is the mathematical

representation of the stack!!- We swap e2 (the argument expression) with a concrete binding (e.g., x)

� / �37 48


η-reduction ▸ Allows removal of “redundant lambdas”▸ Whenever x does not occur free in f, then we can

rewrite:

Substitution Rules ▸ What parameter passing entails in Lambda Calculus▸ We define substitution carefully to avoid name capture:

- [e/x]x = eReplace x with e in x; results in e (x singleton)

- [e/x]y = yReplace x with e in y; results in y (y does not contain any bound x variables)

- [e/x](e1e2) = ([e/x]e1 [e/x]e2)Replace x with e in both expressions; leave as an application but distribute the x into each of the subexpressions

- [e/x](λx.e1) = (λx.e1)Replace x with e in the expression; introduces a new binder in x and therefore no changes are made

- [e/x](λy.e1) = (λy.[e/x]e1) if x ≠ y and y ∉ fv(e)Replace x with e in the expression; introduces a new binder y and therefore we can therefore replace y GIVEN y is not in the free variables of the expression e and not = x

- [e/x](λy.e1) = (λz.[e/x][z/y]e1) if x ≠ y and z ∉ (fv(e) ∪ fv(e1))

Replace x with e in the expression; introduces a new binder z and therefore we can therefore swap e for x with y GIVEN z is not in the free variables of the expression e and e1 and not = x

� / �38 48


Representing ADTs as Lambda Functions ▸ Booleans

▸ Pairs

▸ Church Numbers

SETS: See lecture notes…

� / �39 48

Booleans• We can represent Boolean values and operations that manipulate Boolean values as functions: ! TRUE ≡ λx.λy. x!

FALSE ≡ λx.λy. y!

not ≡ λb.((b FALSE) TRUE)!

if then else ≡ λb.λx.λy.((b x) y)!

• Example: ! if TRUE then 1 else 2 ≡ λb.λx.λy.((b x) y) TRUE 1 2!

= ((TRUE 1) 2)! = 1

181

Pairs• Although tuples are not supported by the lambda calculus, they can easily be modeled as higher-order functions that “wrap” pairs of values. N-tuples can be modeled by composing pairs:! pair ≡ λx.λy.λz.((z x) y)!

first ≡ λp.(p TRUE)!

second ≡ λp.(p FALSE) !

• Example: ! first (pair 1 2)! ➔ 1! second (pair 1 2)! ➔ 2

182

succ 0

1 = succ 0 = λn.λs.λz.(s (n s z)) λs.λz.z !

→ λs.λz.(s ((λs.λz.z) s z))!

→ λs.λz.(s ((λz.z) z))!

→ λs.λz.(s z)

184


Evaluation Order ▸ A lambda expression is in normal form if it can no longer be reduced by β- or η-reduction▸ We can keep down-reducing our expressions until they are no longer reducible▸ We say that normal form is the output of our program (we evaluate the program to

normal form) - though not all lambda expressions have a normal form, like the Ω expression:

- This shows that there exists non-normal-form expressions

▪ e.g., programs that cannot reduce—endless loop!

Strict Evaluation (Applicative Order Reduction) ▸ Evaluates the expression before control is passed over to the function▸ i.e., call by value; always evaluate/reduce

expressions, regardless of if it’s used later on- This means that a function that is

applied to a nontermintating expression will cause no output: f ⊥ = ⊥ A function f is applied to expression that is non-reductive; this means that the function will not reduce too

▸ This models call by value—i.e., evaluate the arguments, then invoke the function ▸ This means:

- in the application (e1 e2):

▪ reduce e2 (args) first using applicative order reduction, then

▪ reduce e1 (func) second using applicative order reduction, then▪ if e1 is a function, then apply beta reduction and reduce the result to normal form

▪ else return a new application consisting of reduced e1 and e2 � / �40 48


▸ Using fully parenthesised notation, always perform the RIGHTMOST and INNERMOST beta reduction

▸ That is, work from top right to bottom left—repeatedly scan for rightmost innermost (left parenthesis) occurrence of ((λx.e1) e2) terms

Lazy Evaluation (Normal Order Reduction) ▸ Evaluates the expression only when they are needed ▸ This means we keep on passing along the unevaluated expression until it gets to a point

where we need to evaluate the expression to move onto the next reduction- This means that a function that is applied to a nontermintating expression will still be

reducible: f ⊥ ≠ ⊥ A function f is applied to expression that is non-reductive; this means that the function will still be able to be reduced as we have never evaluated the non-reductive expression as it is not needed.

� / �41 48


Confluence ▸ If an expression has a normal form, then it doesn't matter

how you get to that normal form- There can be multiple ways of evaluating to a normal

form- 2 programs are the same if they both evaluate to ∃T

(i.e., normalised)

▪ Can go from M ➝ P ➝ ∃T orCan go from M ➝ Q ➝ ∃T

▪ You still end up at a normalised ∃T

▸ Evaluation order does not matter in lambda calculus- That said applicative-order reduction may not terminate even if a normal form exists- Consider the Ω expression = ( λ x.y ) (( λ.x x x ) ( λ x.x x))

▪ Applicative-order reduction does not reduce this to a normal form since it attempts to reduce itself, which leads to a reduction into itself (i.e., self replicating)

▪ Normal-order reduction will not ever evaluate Ω over again since it doesn't need to

Recursion ▸ Recession is fundamental to programming, but it is not a primitive▸ We need to find a way in which we can “program” recursion▸ Consider:

plus = λn m.(if n then (plus (dec n)(inc m)) else m)

- Here we are trying to define plus in terms of itself- We can’t do that because we haven’t defined what plus actually is yet

▸ Instead, we define a closed expression (i.e., one with no free variables—without the recursive plus) by abstracting over plus- Redefine the function plus into rplus, where:

rplus = λplus n m.(if n then (plus (dec n)(inc m)) else m)

- Now, we can redefine the actual addition function we want as fplus - We pass rplus as a parameter before we perform any operations:

(rplus fplus) - Hence:

rplus fplus ⟷ fplus

- This means that plus is the fixed point of rplus - Fixed points are functions to an argument that just return the argument

� / �42 48


Type Systems▸ Type systems ensures that there will be:

- documentation in our code- prevention of runtime errors if enforced- ensure the expected result is met

▸ We have a whole bunch of types at our disposal

< TYPE > ::= int primitive integer | char primitive char | string primitive string | < IDENTIFIER > type variable | ( < TYPE > ) lists | ( { < TYPE > }*(✕)) tuples | ( { < TYPE > }*(✕) -> < TYPE >) functions< TYPED EXPRESSION > ::= ( < EXPRESSION > < TYPE > )

▸ Note that the superscript (✕) refers to a separation of types by ✕ (i.e., cross factor)

Function Types ( { < TYPE > }*(✕) -> < TYPE >) ▸ Deduces the types of expressions without the need to evaluate them

List Types ( { < TYPE > } ) ▸ Lists ::= ( { < EXPRESSION > }* )▸ A list of type a has the type "(a)"

((1 2 3 4 5)(int)) valid int list(("Hello" "World")(string)) valid string list(("Hello" 123 'a')(??)) invalid list (3 different types)

Tuples ( { < TYPE > }*(✕) ) ▸ Tuples ::= ( { < EXPRESSION > }* )▸ A tuple must have matching type

specifications- A tuple (x1, x2, x3… xn) has a type (a1 ✕ a2 ✕ a3 … an)

� / �43 48


▸ Static Typing - Possible to determine the static type of an object from the programming text alone- WYSIWYG approach—runtime is the same as definition- Type check at compile time

▸ Dynamic Typing - Types are determined at runtime- Type checking before variable is used- Type changes or determined at runtime

▸ Type Consistency - Strongly typed languages ensure every expression is type-consistent from program text- Type consistency is ensured using

▪ compile-time type-checks

▪ type interference▪ dynamic type-checks

▸ Kinds of types - Primitive: bool, int, char, float- Composite: functions, lists, tuples- User-Defined: enumerations, recursive types, generic types

Lambda Calculus and Types ▸ Three type checking mechanisms:

- Type Assignment

▪ Just like any assignment▪ We bind an expression, M, to a type, τ

M : τ

- Type Contexts/Assumption/Environment

▪ A type environment, Γ, is a partial function

Γ : V → V

▪ It defines a 'hash-map' of possibly empty set of variables that are bound to types

▪ That way, we can lookup the variable name and see the type definition

Γ = { int32: int, string: char* }

▪ Hence, now we can say that:Γ(int32) = int, Γ(string) = char* i.e., in our type environment int32 maps to int etc.

� / �44 48


- Type Judgement

▪ The triple (Γ, M, τ) specify a type judgement▪ It is a formal specification of the type environment in our program

Γ ⊢ M : τunder type gamma, there is an expression M which is assigned type tau

{ int32 : int, string : char* } ⊢ string : char*

▸ Subject Reduction - Reduction of lambda calculus preserves the type- If a program is assigned the type τ and M is reduced to normal form N, N has type τ:

If Γ ⊢ M : τ and M →β N, then Γ ⊢ N : τ

- If this is the case, then there is no need for runtime type checks!▸ Type Safety

- Does the reduction of M lead to stages that are type-safe?- We can ensure this if the term M has the type τ—then it is safe- This helps us reduce mismatch of the:▪ declared type of a variable and

▪ the application type when that variable is used

▪ i.e., int foo = 12; foo + "hi";▸ Typed Lambda Calculus

� / �45 48


▸ Type Lambda Calculus Reduction Rules - Simply the same; with preservation of types

▸ Type Lambda Calculus Type Rules

ɑAlpha Conversion

βBeta Reduction

η Eta Reduction

�

e ::= xτVariable

e ::= (λxσ.eτ)(σ→τ)

Function

e ::= e1(σ→τ) e2σ

Application

� / �46 48


- Polymorphism and Type Systems

▪ Monomorphic type system ensure a definition of generic abstraction remains hidden

▪ Polymorphic type expressions describe families of types

- We can deduce the types of expressions using polymorphic type functions - Polymorphism types:

▪ Parametric: NSObject in ObjC, object in C#, void* in C

▪ Inclusion: sub-typing ▪ Overloading: Operator + applied to both integers and floating point

▪ Coercion: Integer values being used where floating points are expected

Type Inference

▸ The set of untyped lambda-terms are defined as

� ▸ Whereas the set of typed lambda-terms are defined as

� ▸ This means that et being the set of typed lambda-terms is obtained by modifying the

second clause in the definition of the untyped lambda terms

�

Typed Lambda-Terms• The set e of untyped lambda-terms are defined as follows:! e ::= x! | (λx.e)!

| (e1 e2)! • The set T of types is defined as follows:! t ::= K - basic types! | (t1 → t2) - function types!

• The set eT of typed lambda-terms is obtained by modifying the second clause in the definition of the untyped lambda-terms:! (λx : t.e)

246




246




246

� / �47 48

Polymorphic Types• We can deduce the types of expressions using polymorphic functions by simply binding type variables to concrete types:!Consider:!

(length ((a) -> int))!

(stringlength (string -> int))!

(map ((a -> b) -> (a) -> (b)))!

Then:!((map stringlength) ((string) -> (int)))!((“Hello” “World” “!”) (string))!((map stringlength (“Hello” “World” “!”)) (int))

239


▸ Type inference for closed terms can be stated as:

Given a closed lambda-term e, find all types t such that (∅, e, t).

▸ Wand’s algorithm

� Action Table:

�

Wand’s Algorithm - SkeletonInput: !

• A lambda-term e0.!

Initialization:!• Set E = ∅ and G = {(Γ0, e0, t0)}, where t0 is a type variable and Γ0 maps the free variables of e0 to other distinct type variables.!

Loop Step:!• If G = ∅, then halt and return E. Otherwise, choose a subgoal (Γ, e, t) from G, delete it from G, and add to E and G new verification conditions and subgoals, as specified in an action table.!

End of Skeleton

249

The Action Table(Γ, x, t):!

Generate the equation t = Γ(x).!

(Γ, (λx . e ), t):!Let τ1 and τ2 be fresh type variables. Generate the equation t = (τ1 → τ2) and the subgoal (Γ; x : τ1, e, τ2).!

(Γ, (e1 e2), t):!Let τ1 be a fresh type variable. Then generate the subgoals (Γ, e1, τ1 → t) and (Γ, e2, τ1).

250

� / �48 48

i. programming domains and paradigms in software developme… · represents meaning of language...

Documents