funtional and logic programs - wordpress.com · any programming language that supports a plain (not...

FUNCTIONAL AND LOGIC PROGRAMS

Contents Language Specific Compilation

• Context Handling

• Identification

• Scope

• Overloading

• Imported Scope

• Type Checking

• Type table

• Type equivalence

• Coercions

• Casts and conversions

Object Oriented Language Issues

Routines and Activation

Code Generation and Control Flow

Contents

Funtional Programming Introduction

Basic Compilation

Polymorphic type checking

Compiling to register oriented architectures

JavaCC

Language Specific Compilation

Concept of compiler remains similar for any languages however its implementations differ from paradigm to paradigm(or language to language)

This happens due to the syntactic and semantic differences.

The main difference however in any paradigm lie in code generation : OO and structured oriented languages generate code at assembler or low level, whereas many compilers for functional, parallel and distributed languages generate code in C or C++.

Context Handling

means in short semantic analysis.

Concerned with type checking

it relates the type of a variable in a declaration

to its use

It is also related to identification

Context Handling

Identification Type

Checking

Identification

The process of finding the defining occurrence of a given applied occurrence of an identifier or operator

Defining occurrence of the identifier is the place where it is declared. This gives us the information about the identifier : its initial value, whether it is a constant or a variable , a module or class etc.

Applied occurrence of an identifier are the consumers of the information. i.e use

Some languages like LISP or Prolog have no declaration of identifiers i.e no defining occurences.

The identification process stores all the occurences of an identifier in the symbol table.

Identification

int month[12]; ………………..1 …… month = 1; ………………………2 do { cout<<month_name[month]; …………….3 } while (month<=12); ……………4 In above code , 1 is defining occurrence 2,3,4 are

applied occurrence of identifier month.

Identification

In some languages like ‘C’, forward declaration is provided, which means having more than one defining occurrences.

int x; ………………..1

int get( )

{

return x; ………………………2

}

int x = 5; …………………….3

main()

{

int x = 3; ………………………………4

cout<<“Value of x = “ << get() ;

}

In above code , 1,3,4 are defining occurrence 2 is applied occurrence of identifier x. This is forward declaration.

Identification

The identifiers can be searched in the symbol table depending upon their namespaces

General namespace is the one in which identifiers like variable names, structure names, funtion names are stored

Special namespace is the one for field selectors like struct members, class members etc

Identification

struct one_int

{

int i;

} i;

.. ..

i.i = 3; ……….. The first ‘i’ before dot is searched in general namespace whereas second in special namespace.

Identification

Labels, Module names can also live in special namespaces

Namespaces depends upon the syntax of various languages

In ‘C’ there are three namespaces : One for enums, structs and unions, Second for Labels and the last containing variable names, function names, type names

Identification ….Scope

Some namespaces are scope- structured

Scopes are arranged in the form of stack. For each scope there is an entry in the stack.

Rules for creating stack of scopes

• A new Scope is pushed into the stack.

• Declared identifiers are entered in the top scope element

• For each applied occurrence, search scope elements from top to bottom

• Upon scope exit, remove the top scope element and all its declarations.

A scoped hash-based symbol table

Name of identifier

Pointer to declaration

Other Information

Properties of the identifier

Level at which identifier is present

Pointer to the next node

A scoped hash-based symbol table

”mies”

decl

…

bucket 0

bucket 1

bucket 2

bucket 3

”aap”

decl

…

”noot”

decl

…

aap( int noot)

{ int mies, aap;

....

}

prop 2

prop 0 prop 2

prop 1

level

2

1

0

scope stack

hash table

Identification ….Overloading

The ambiguity caused by overloading is resolved by considering the context in which the name to be identified occurs

The overloaded identifiers come up with a set of definitions. These definitions are selected from the list of pointer to declaration field

Identification ….Imported Scopes

C++ scope resolution operator x::

Modula FROM module IMPORT ...

Solution: stack (or merge) the new scope

Type checking

Operators and functions impose restrictions on the types of the arguments

Types

• basic types

• structured types

• type names

Some languages support forward references i.e referring to an identifier that is not yet declared

Type information in a compiler must be implemented in such a way that all these checks must be performed conveniently

Forward Referencing Example

Class A; Class B { A a; //forward reference …………….. } Class A { ……….. }

Type checking

To resolve forward references, whenever it is met , it is added to the symbol table, marking as forward reference

When type declaration for this forward reference is met , its symbol table entry is modified to represent the actual type instead of forward reference

A check must be added for loose ends i.e any forward references not modified

A check must also be there for circularity i.e. TYPE x = y;

TYPE y = x;

Type checking ... Type table

All Type information for each type is stored in a type table

The entry contains following :

• Its type constructor (‘basic’, ‘record’, ‘array’ , ‘pointer’ , and others);

• The size and alignment requirement of a variable of the type

• The types of components

Various information is being recorded for types:

• For basic types : its precise type(integer, real etc.)

• For record type : The list of record fields, their names and types

• For Array : the number of dimension, index type(s) and element type

• For pointer : the referenced type

Type table Example

TYPE a = b;

TYPE b = Pointer to a;

TYPE c = d;

TYPE d = c;

Type table Example

For, TYPE a = b;

TYPE TABLE SYMBOL TABLE

TYPE 0 : INTEGER integer : TYPE 0

TYPE 1 : ID_REF “b” a” : TYPE 1

“b” : UNDEFINED TYPE

Type table Example

TYPE a = b;

TYPE b = Pointer to a;



TYPE 1 : ID_REF “b” “a” : TYPE 1

TYPE 2 : ID_REF “a” “b” : TYPE 3

TYPE 3 : Pointer to TYPE 2

Type table Example

Finally we have,



TYPE 1 : ID_REF “b” “a” : TYPE 1

TYPE 2 : ID_REF “a” “b” : TYPE 3

TYPE 3 : Pointer to TYPE 2 “c” : TYPE4

TYPE 4 : ID_REF d “d” : TYPE 5

TYPE 5 : ID_REF c

Type table Example

Finally we have,

TYPE TABLE

TYPE 0 : INTEGER

TYPE 1 : TYPE 3

TYPE 2 : TYPE 1

TYPE 3 : Pointer to TYPE 2

TYPE 4 : TYPE 5

TYPE 5 : TYPE 4

Type checking ... Type Equivalence

Always while performing type checking of an expression or the formal and actual parameters ofa routine call check the equality of two types.

When two types are equivalent value of these types are usually having same representations

name equivalence [all types get a unique name]

•VAR a : ARRAY [Integer 1..10] OF Real;

•VAR b : ARRAY [Integer 1..10] OF Real;

Type checking ... Type Equivalence

• structural equivalence [difficult to check]

TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD;

TYPE d = RECORD

i : Integer;

p : POINTER TO RECORD

i : Integer;

p : POINTER to c;

END RECORD;

END RECORD;

Type Checking : Coercions

• implicit data and type conversion

to match operand (argument) type

• coercions complicate identification

(ambiguity)

• two phase approach

– expand a type to a set by applying coercions

– reduce type sets based on constraints imposed by (overloaded) operators and language semantics

VAR a : Real;

...

a := 5;

3.14 + 7

8 + 9

Variable: value or location? ...Kind checking

• two usages of variables

rvalue: value

lvalue: location

• insert coercion to dereference variable

• checking rules:

VAR p : Real;

VAR q : Real;

...

p := q;

found

expected

lvalue rvalue

lvalue - deref

rvalue

ERROR ...l-

value

required -

:=

(location of) p

deref

(location of) q

Object Oriented Source language Issues

Basic types of data

Enumeration : copied,compared ,incremented & decremented

Pointer : Typed , Generic

Structure and Union

Array

Object : Inheritance, , Polymorphism, Dynamic Binding, Method Overriding, Multiple Inheritance

Object Oriented Source language Issues : Pointer

The run-time representation of pointer is an unsigned integer

Operations include copy, assignment, comparison, increment, decrement, dereferencing

Deferencing means obtaining the value of the data structure that the pointer refers to. : For Eg. ptr-> data or (*ptr).data.

Pointers are usually of two types:

• Typed : Pointer to a specific data type. Eg. int *p;

• Generic : One that can be coerced to any other type For Eg. void *p;

Object Oriented Source language Issues : Pointer Issues

Issue No Issue Solution

1 Pointer Never Intialized

Automatic initialization

2 NULL Pointer not assigned any value

Dereferencing of null pointers must be done

3 Dangling Pointers : Arised when a pointer refers to a location and that location is freed

Use garbage collector

Object Oriented Source language Issues : Pointer Issues

Scope of a pointer is that of the location into which it points.

int x, *ptr, a; x = 5; ptr = &x; a= x; cout<< *ptr; cout<< a; { int y = 20; ptr = &y; a = y;

cout<< *ptr; cout<< a; } cout<< *ptr; cout<< a; }

o/P : 5 5 20 20 20 5

Object Oriented Source language Issues : Structure

struct emp { int empid; //requires 4 bytes double salary; //requires 8 bytes }; • We consider memory arrangement of the

structure as follows : 4 bytes 8bytes

empid

salary 12

Bytes

Object Oriented Source language Issues : Structure

But in real time, some gap must be inserted such that both structure members require equal amount of memory

This memory is calculated as Least Common Multile(LCM) of the size of data members of the structure.

Thus actually , the representation is as follows :

The size now is 16 bytes

empid Gap

salary

Object Oriented Source language Issues : Arrays

There’s no disagreement about how plain arrays are stored in memory

Any programming language that supports a plain (not associative) array type just stores the elements sequentially.

Once a language supports multidimensional arrays, it needs to decide how to squeeze the 2D arrangement of data into a 1D arrangement in memory, typically as an 1D array. One classical use case for multidimensional arrays are matrices.

Array copying can be done element by element or by copying block copy. Iilarly, comparison can be carried out.

Arrays can be static as well as dynamic

Object Oriented Source language Issues : Arrays

• Given a Matrix , there’s two “canonical” ways to store it in memory :

i. Row-major

ii. Column - major

Object Oriented Source language Issues : Arrays : Row Major

Storage traverses the matrix by rows then within each row enumerates the columns.

A would be stored in memory as a11, a12, a13, a21, a22, a23

The position of the element at row i, column j in the underlying 1D array is computed as : i*stride + j,

• where, stride is the number of elements stored per row, usually the width of the 2D array, but it can also be larger.

Object Oriented Source language Issues : Arrays : Column Major

Storage traverses the matrix by columns, then enumerates the rows within each column.

A would be stored in memory as a11, a21, a12, a22, a13,a23

The position of the element at row i, column j in the underlying 1D array is computed as : j*stride + i,

• where, stride is the number of elements stored per column, usually the width of the 2D array, but it can also be larger.

41

Routines and Activations : Activation record

• When a function / subroutine is called, it will create an activation record.

• This record contains the location of local variable, return address etc.

42

Activation Record - Example – foo()->bar()->baz(),

determine the return address

int foo() { int b; b = bar(); return b; } int bar() { int b = 0; b = baz(b); return b; } int baz(int b) { if (b < 1) return baz(b + 1); else return b; }

43

An example – bar()

44

Activation Record • temporaries: used in expression evaluation

• local data: field for local data

• saved machine status: holds info about

machine status before procedure call

• access link : to access non local data

• control link :points to activation record of caller

• actual parameters: field to hold actual parameters

• returned value: field for holding value to be returned

Temporaries

local data

machine status

Access links

Control links

Parameters

Return value

Routines

Classical Iterator Co-

routines

Routines

Routines : Iterator

Iterator is the one that can suspend itself temporarily and return to its parent without losing its activation record

After suspension it can be resumed again at the point where it had left. This happens in C# with the help of keyword yield.

Routines : Coroutines

Like Iterator it can suspend itself temporarily but control does not return to its parent but goes to another co-routine.

This is called resume.

This was offered by Simula

‘C ‘offers resume( )

Operations on Routines

Define

Call

Return

Pass

Lecture 18

INTRODUCTION TO FUNCTIONAL PROGRAMMING

Advantages :

• Have uniform view of programs

• Treat functions as data

• Automatic memory management

• Greatly flexible and has simple semantics

Disadvantage :

• These are interpreted thus result in substantial loss in execution

speed.

Lecture 18

KEY PROPERTIES OF FUNCTIONAL PROGRAMMING LANGUAGES

Lazy function evaluation

First – class objects

All programs and procedures are functions

Lack of variable and assignment

Lack of loop and iteration

Referential transparency

Dynamic memory environment

Garbage collection

Side – effect freedom

Lecture 18

KEY FEATURES OF FUNCTIONAL PROGRAMMING LANGUAGE

1.Lazy function evaluation:

• Unnecessary function evaluation is done

2. First – class objects : Functions are treated as objects

3. All programs and procedures are functions :

• Due to this feature the programs are considered as data and can be changed at runtime. This distinguishes

4. Lack of Variable Assignment

5. Lack of loops and iteration : replaced by recursive calls.

Lecture 18

KEY FEATURES OF FUNCTIONAL PROGRAMMING LANGUAGE

6. Referential Transparency ( Side Effect Free) Property of function whereby its value depends upon the parameters , but not upon previous computations.

7.Dynamic Memory Management : Done automatically

• Maintaining free space

• Reclamation of storage

8.Garbage collection :

• Methods to collect and return unreferenced storage

Factorial in Haskel vs. C

fac 0 = 1 fac n = n * fac (n -1)

int fac(int n) { int product = 1; while (n > 0) { product *= n ; n --; } return product; }

Offside rule

• Layout characters matter to parsing divide x 0 = inf divide x y = x / y

• Everything below and right of = in equations defines a new scope

• Applied recursively fac n = if (n ==0) then 1 else prod n (n-1) where prod acc n = if (n == 0) then acc else prod (acc * n) (n -1)

• Lexical analyzer maintains a stack

Lists

• Part of all functional programs since Lisp

• Empty list [] = Nil

• [1]

• [1, 2, 3, 4]

• [4, 3, 7, 7, 1]

• [“red”, “yellow”, “green”]

• [1 .. 10] => arithmetic sequence

• Can be constructed using “:” infix operator

– [1, 2, 3] is equivalent to (1 : (2 : ( 3 : [])))

– range n m = if n > m then [ ] else ( n: range (n+1) m)

Constructs arithmetic sequence [n…m] dynamically

List Comprehension

• Inspired by set comprehension S = {n2 | n {1, …, 100} odd n}

• Haskel code s = [n^2 | n <- [1..100], odd n] “n square such that n is an element of [1..100] and n is odd”

• Qsort in Haskel qsort [] = [] qsort (x: xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort[y | y <- xs, y >= x]

Pattern Matching

• Convenient way to define recursive functions

• A simple example fac 0 = 1 fac n = n * fac (n-1)

• Equivalent code fac n = if (n == 0) then 1 else n * fac (n -1)

• Another example length [ ] = 0 length (x: xs) = 1 + length xs

• Equivalent code length list = if (list == []) then 0 else let x = head list xs = tail list in 1 + length xs

Polymorphic Typing

• The basic types like int,char are said to be monomorphic types

• Polymorphic type means a variable can have many types.

• For eg. Empty list [ ] can have many types : list of characters, list of numbers etc.

• Benefits:

– Code reuse

– Guarantee consistency

Polymorphic Typing

• The compiler infers that in length [ ] = 0 length (x: xs) = 1 + length xs – length has the type [a] -> int

length :: [a] -> int

• Example expressions

– length [1, 2, 3] + length [“red”, “yellow”, “green”]

– length [1, 2, “green” ] // invalid list

• The user can optionally declare types

• Every expression has the most general type

Structure of a functional compiler

High-level language

Functional core

Polymorphic type checking De-sugaring: 1. Pattern matching 2. List to pairs 3. List comprehension 4. Lambda lifting

Optimizations

Code generation

C code Runtime system

Compiling Functional Languages

• Below is the compiler phase handles which aspect of Haskell:

Compiler Phase Language Aspect

Lexical Analyzer Off side Rule

Parser List Notation List Comprehension Pattern Matching

Context Handling Polymorphic type checking

Run Time System Referential Transperency Higher Order Functions Lazy Evaluation

Polymorphic Type Checking

• These are rules that ML uses to infer the type correctness of

polymorphic code known as :

1. All occurrences of the same identifier in a given scope must have the same

type.

2. In an if/then/else expression, the condition must be of type bool, and the

then and else clauses must be of the same type.

3. A programmer-defined function has type 'a -> 'b where 'a is the type of

the function’s parameter and 'b is the type of its result. (Functions have

tuple arguments.)

4. When a function is applied, the type of the argument passed to the function

must be the same as the parameter type in the function’s definition and the

type of the application is the same as the type of the result in the function’s

definition.

• The Hindley- Miler algorithm traverses a program’s abstract syntax tree (AST)1 , assigning type variables at each node. It then applies rules corresponding to syntactic constructs to attempt to resolve the assigned type variables against specific type information deduced from program contents and context.


• For example, consider checking the following Standard ML-ish function, which returns a string depending on the sign of its integer argument:

fun sign 0 = "zero" |

sign n = if (> n) 0 then "positive" else "negative”

This function has type int -> string, i.e. it takes an integer argument and returns a string result


• Prior to type checking, it is assumed that: 0 : int i.e. 0 is an integer "negative" : string

> : int -> int -> bool i.e. > compares two integers to return a boolean

• The rule for a function:

• Checks the first case: fun sign 0 = "zero"

• concludes 0 is int from assumptions • checks the body i.e. deduces "zero" is string from assumptions • concludes that the first case is int -> string


• Checks the second case: sign n = if (> n 0) then "positive" else "negative”

• Assumes parameter n has type α

• checks the body i.e. checks the if expression:

• checks the condition i.e. checks the application of > n to 0

• checks the application of > to n

• concludes that > is int -> int -> bool from assumptions

• assigns the type variable β to the result of the application

• concludes that n is α from assumptions

• unifies the anticipated type α -> β with the type of >, int -> int -> bool

• concludes that α is int and β is int -> bool


• assigns the type variable γ to the result of the application

• concludes that 0 is int from assumptions

• unifies the anticipated type int -> γ with the type of int -> bool

• concludes that γ is bool

• unifies the condition type bool with the required type for a condition, bool

• checks the then branch i.e. concludes that "positive" is string from assumptions

• checks the else branch i.e. concludes that "negative" is string from assumptions

• unifies the then and else branches, which must have the same type •

concludes that the if has type string

• concludes that the second case has type int -> string


Desugaring

Translating high level construct into a simpler one means desugaring

Desugaring

Translation to List

Translation of Pattern

Matching

Translation to List

Comprehension

Translation to Lists

List Notations has three operators in Functional Language : i) , ii) .. iii) :

List of the form x : xs is transformed to (cons x xs)

• list[1,2,3] or list(1:3) is transformed to (cons 1(cons 2 ( cons 3 [ ] ) ) )

Translation OF Pattern Matching

Pattern matching on constructors desugars to case statements

Translation OF Pattern Matching

Pattern matching on numeric or string literals desugars to equality tests:

Translation OF List Comprehension

List comprehensions are equivalent to ’do’ notation:

Lecture 18

JAVA CC …JAVA COMPILER COMPILER

Parser and Lexical Analyzer Generator

i.E it Produces Lexical Analyzers And Parsers in Java

It generates Top Down Parser hence cannot parse left recursive grammar

Lecture 18

JAVA CC …JAVA COMPILER COMPILER

JAVA CC

Regular Expression

Translation Grammar

Java Code

.jj file

Lecture 18

Steps for Execution

> JavaCC eg.jj //Generates eg. java

> Javac eg.java //Generates eg. Class

> Java eg

Java CC Specification : ClassName.jj Options { // Java CC Options } PARSER-BEGIN(Class-Name) // Code PARSER-END (Class-Name) SKIP : { ” ” } SKIP : { ”\n” | ”\r” | ”\r\n” } TOKEN : { < PLUS : ”+” > } TOKEN : { < NUMBER : ([”0”-”9”])+ > } void Start() : {} { <NUMBER> ( <PLUS> <NUMBER> )* <EOF>

Lexical Analyzer

Parser

Lecture 18

JavaCC Compilation : javaCC ClassName.jj

Compiling this file generates Seven Files

• TokenMgrError is a simple error class; it is used for errors detected by the lexical analyser and is a subclass of Throwable.

• ParseException is another error class; it is used for errors detected by the parser and is a subclass of Exception and hence of Throwable.

• Token is a class representing tokens.

• SimpleCharStream is an adapter class that delivers characters to the lexical analyser.

• ClassNameConstants is an interface that defines a number of classes used in both the lexical analyser and the parser.

• ClassNameTokenManager is the lexical analyser

• ClassName is the Parser

Java CC Example PARSER_BEGIN(Calc0) // must define parser class public class Calc0 { public static void main (String args []) { Calc0 parser = new Calc0(System.in); for (;;) try { if (parser.expr() == -1)

System.exit(0); } catch (Exception e) { e.printStackTrace(); System.exit(1); } } } PARSER_END(Calc0)

Java CC Example

SKIP: { " " | "\r" | "\t" } // defines input to be ignored TOKEN: // defines token names { < EOL: "\n" > | < CONSTANT: ( <DIGIT> )+ > // re: 1 or more | < #DIGIT: ["0" - "9"] > // private re } int expr() : // expr: sum \n { } // -1 at eof, 0 at eol { sum() <EOL> { return 1; } | <EOL> { return 0; } | <EOF> { return -1; } }

Java CC Example

// sum: product { + - product }

void sum(): { } { product() ( ( "+" | "-" ) product() )* }

// product: term { *%/ term }

void product():{} { term() (( "*" | "%" | "/" ) term() )* }

// term: +term | -term | (sum) | number

void term(): {} { "+" term() | "-" term() | "(" sum() ")" | <CONSTANT> }

funtional and logic programs - wordpress.com · any programming language that supports a plain (not...

Documents