funtional and logic programs - wordpress.com · any programming language that supports a plain (not...
TRANSCRIPT
FUNCTIONAL AND LOGIC PROGRAMS
Contents Language Specific Compilation
• Context Handling
• Identification
• Scope
• Overloading
• Imported Scope
• Type Checking
• Type table
• Type equivalence
• Coercions
• Casts and conversions
Object Oriented Language Issues
Routines and Activation
Code Generation and Control Flow
Contents
Funtional Programming Introduction
Basic Compilation
Polymorphic type checking
Compiling to register oriented architectures
JavaCC
Language Specific Compilation
Concept of compiler remains similar for any languages however its implementations differ from paradigm to paradigm(or language to language)
This happens due to the syntactic and semantic differences.
The main difference however in any paradigm lie in code generation : OO and structured oriented languages generate code at assembler or low level, whereas many compilers for functional, parallel and distributed languages generate code in C or C++.
Context Handling
means in short semantic analysis.
Concerned with type checking
it relates the type of a variable in a declaration
to its use
It is also related to identification
Context Handling
Identification Type
Checking
Identification
The process of finding the defining occurrence of a given applied occurrence of an identifier or operator
Defining occurrence of the identifier is the place where it is declared. This gives us the information about the identifier : its initial value, whether it is a constant or a variable , a module or class etc.
Applied occurrence of an identifier are the consumers of the information. i.e use
Some languages like LISP or Prolog have no declaration of identifiers i.e no defining occurences.
The identification process stores all the occurences of an identifier in the symbol table.
Identification
int month[12]; ………………..1 …… month = 1; ………………………2 do { cout<<month_name[month]; …………….3 } while (month<=12); ……………4 In above code , 1 is defining occurrence 2,3,4 are
applied occurrence of identifier month.
Identification
In some languages like ‘C’, forward declaration is provided, which means having more than one defining occurrences.
int x; ………………..1
int get( )
{
return x; ………………………2
}
int x = 5; …………………….3
main()
{
int x = 3; ………………………………4
cout<<“Value of x = “ << get() ;
}
In above code , 1,3,4 are defining occurrence 2 is applied occurrence of identifier x. This is forward declaration.
Identification
The identifiers can be searched in the symbol table depending upon their namespaces
General namespace is the one in which identifiers like variable names, structure names, funtion names are stored
Special namespace is the one for field selectors like struct members, class members etc
Identification
struct one_int
{
int i;
} i;
.. ..
i.i = 3; ……….. The first ‘i’ before dot is searched in general namespace whereas second in special namespace.
Identification
Labels, Module names can also live in special namespaces
Namespaces depends upon the syntax of various languages
In ‘C’ there are three namespaces : One for enums, structs and unions, Second for Labels and the last containing variable names, function names, type names
Identification ….Scope
Some namespaces are scope- structured
Scopes are arranged in the form of stack. For each scope there is an entry in the stack.
Rules for creating stack of scopes
• A new Scope is pushed into the stack.
• Declared identifiers are entered in the top scope element
• For each applied occurrence, search scope elements from top to bottom
• Upon scope exit, remove the top scope element and all its declarations.
A scoped hash-based symbol table
Name of identifier
Pointer to declaration
Other Information
Properties of the identifier
Level at which identifier is present
Pointer to the next node
A scoped hash-based symbol table
”mies”
decl
…
bucket 0
bucket 1
bucket 2
bucket 3
”aap”
decl
…
”noot”
decl
…
aap( int noot)
{ int mies, aap;
....
}
prop 2
prop 0 prop 2
prop 1
level
2
1
0
scope stack
hash table
Identification ….Overloading
The ambiguity caused by overloading is resolved by considering the context in which the name to be identified occurs
The overloaded identifiers come up with a set of definitions. These definitions are selected from the list of pointer to declaration field
Identification ….Imported Scopes
C++ scope resolution operator x::
Modula FROM module IMPORT ...
Solution: stack (or merge) the new scope
Type checking
Operators and functions impose restrictions on the types of the arguments
Types
• basic types
• structured types
• type names
Some languages support forward references i.e referring to an identifier that is not yet declared
Type information in a compiler must be implemented in such a way that all these checks must be performed conveniently
Forward Referencing Example
Class A; Class B { A a; //forward reference …………….. } Class A { ……….. }
Type checking
To resolve forward references, whenever it is met , it is added to the symbol table, marking as forward reference
When type declaration for this forward reference is met , its symbol table entry is modified to represent the actual type instead of forward reference
A check must be added for loose ends i.e any forward references not modified
A check must also be there for circularity i.e. TYPE x = y;
TYPE y = x;
Type checking ... Type table
All Type information for each type is stored in a type table
The entry contains following :
• Its type constructor (‘basic’, ‘record’, ‘array’ , ‘pointer’ , and others);
• The size and alignment requirement of a variable of the type
• The types of components
Various information is being recorded for types:
• For basic types : its precise type(integer, real etc.)
• For record type : The list of record fields, their names and types
• For Array : the number of dimension, index type(s) and element type
• For pointer : the referenced type
Type table Example
TYPE a = b;
TYPE b = Pointer to a;
TYPE c = d;
TYPE d = c;
Type table Example
For, TYPE a = b;
TYPE TABLE SYMBOL TABLE
TYPE 0 : INTEGER integer : TYPE 0
TYPE 1 : ID_REF “b” a” : TYPE 1
“b” : UNDEFINED TYPE
Type table Example
TYPE a = b;
TYPE b = Pointer to a;
TYPE TABLE SYMBOL TABLE
TYPE 0 : INTEGER integer : TYPE 0
TYPE 1 : ID_REF “b” “a” : TYPE 1
TYPE 2 : ID_REF “a” “b” : TYPE 3
TYPE 3 : Pointer to TYPE 2
Type table Example
Finally we have,
TYPE TABLE SYMBOL TABLE
TYPE 0 : INTEGER integer : TYPE 0
TYPE 1 : ID_REF “b” “a” : TYPE 1
TYPE 2 : ID_REF “a” “b” : TYPE 3
TYPE 3 : Pointer to TYPE 2 “c” : TYPE4
TYPE 4 : ID_REF d “d” : TYPE 5
TYPE 5 : ID_REF c
Type table Example
Finally we have,
TYPE TABLE
TYPE 0 : INTEGER
TYPE 1 : TYPE 3
TYPE 2 : TYPE 1
TYPE 3 : Pointer to TYPE 2
TYPE 4 : TYPE 5
TYPE 5 : TYPE 4
Type checking ... Type Equivalence
Always while performing type checking of an expression or the formal and actual parameters ofa routine call check the equality of two types.
When two types are equivalent value of these types are usually having same representations
name equivalence [all types get a unique name]
•VAR a : ARRAY [Integer 1..10] OF Real;
•VAR b : ARRAY [Integer 1..10] OF Real;
Type checking ... Type Equivalence
• structural equivalence [difficult to check]
TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD;
TYPE d = RECORD
i : Integer;
p : POINTER TO RECORD
i : Integer;
p : POINTER to c;
END RECORD;
END RECORD;
Type Checking : Coercions
• implicit data and type conversion
to match operand (argument) type
• coercions complicate identification
(ambiguity)
• two phase approach
– expand a type to a set by applying coercions
– reduce type sets based on constraints imposed by (overloaded) operators and language semantics
VAR a : Real;
...
a := 5;
3.14 + 7
8 + 9
Variable: value or location? ...Kind checking
• two usages of variables
rvalue: value
lvalue: location
• insert coercion to dereference variable
• checking rules:
VAR p : Real;
VAR q : Real;
...
p := q;
found
expected
lvalue rvalue
lvalue - deref
rvalue
ERROR ...l-
value
required -
:=
(location of) p
deref
(location of) q
Object Oriented Source language Issues
Basic types of data
Enumeration : copied,compared ,incremented & decremented
Pointer : Typed , Generic
Structure and Union
Array
Object : Inheritance, , Polymorphism, Dynamic Binding, Method Overriding, Multiple Inheritance
Object Oriented Source language Issues : Pointer
The run-time representation of pointer is an unsigned integer
Operations include copy, assignment, comparison, increment, decrement, dereferencing
Deferencing means obtaining the value of the data structure that the pointer refers to. : For Eg. ptr-> data or (*ptr).data.
Pointers are usually of two types:
• Typed : Pointer to a specific data type. Eg. int *p;
• Generic : One that can be coerced to any other type For Eg. void *p;
Object Oriented Source language Issues : Pointer Issues
Issue No Issue Solution
1 Pointer Never Intialized
Automatic initialization
2 NULL Pointer not assigned any value
Dereferencing of null pointers must be done
3 Dangling Pointers : Arised when a pointer refers to a location and that location is freed
Use garbage collector
Object Oriented Source language Issues : Pointer Issues
Scope of a pointer is that of the location into which it points.
int x, *ptr, a; x = 5; ptr = &x; a= x; cout<< *ptr; cout<< a; { int y = 20; ptr = &y; a = y;
cout<< *ptr; cout<< a; } cout<< *ptr; cout<< a; }
o/P : 5 5 20 20 20 5
Object Oriented Source language Issues : Structure
struct emp { int empid; //requires 4 bytes double salary; //requires 8 bytes }; • We consider memory arrangement of the
structure as follows : 4 bytes 8bytes
empid
salary 12
Bytes
Object Oriented Source language Issues : Structure
But in real time, some gap must be inserted such that both structure members require equal amount of memory
This memory is calculated as Least Common Multile(LCM) of the size of data members of the structure.
Thus actually , the representation is as follows :
The size now is 16 bytes
empid Gap
salary
Object Oriented Source language Issues : Arrays
There’s no disagreement about how plain arrays are stored in memory
Any programming language that supports a plain (not associative) array type just stores the elements sequentially.
Once a language supports multidimensional arrays, it needs to decide how to squeeze the 2D arrangement of data into a 1D arrangement in memory, typically as an 1D array. One classical use case for multidimensional arrays are matrices.
Array copying can be done element by element or by copying block copy. Iilarly, comparison can be carried out.
Arrays can be static as well as dynamic
Object Oriented Source language Issues : Arrays
• Given a Matrix , there’s two “canonical” ways to store it in memory :
i. Row-major
ii. Column - major
Object Oriented Source language Issues : Arrays : Row Major
Storage traverses the matrix by rows then within each row enumerates the columns.
A would be stored in memory as a11, a12, a13, a21, a22, a23
The position of the element at row i, column j in the underlying 1D array is computed as : i*stride + j,
• where, stride is the number of elements stored per row, usually the width of the 2D array, but it can also be larger.
Object Oriented Source language Issues : Arrays : Column Major
Storage traverses the matrix by columns, then enumerates the rows within each column.
A would be stored in memory as a11, a21, a12, a22, a13,a23
The position of the element at row i, column j in the underlying 1D array is computed as : j*stride + i,
• where, stride is the number of elements stored per column, usually the width of the 2D array, but it can also be larger.
41
Routines and Activations : Activation record
• When a function / subroutine is called, it will create an activation record.
• This record contains the location of local variable, return address etc.
42
Activation Record - Example – foo()->bar()->baz(),
determine the return address
int foo() { int b; b = bar(); return b; } int bar() { int b = 0; b = baz(b); return b; } int baz(int b) { if (b < 1) return baz(b + 1); else return b; }
43
An example – bar()
44
Activation Record • temporaries: used in expression evaluation
• local data: field for local data
• saved machine status: holds info about
machine status before procedure call
• access link : to access non local data
• control link :points to activation record of caller
• actual parameters: field to hold actual parameters
• returned value: field for holding value to be returned
Temporaries
local data
machine status
Access links
Control links
Parameters
Return value
Routines
Classical Iterator Co-
routines
Routines
Routines : Iterator
Iterator is the one that can suspend itself temporarily and return to its parent without losing its activation record
After suspension it can be resumed again at the point where it had left. This happens in C# with the help of keyword yield.
Routines : Coroutines
Like Iterator it can suspend itself temporarily but control does not return to its parent but goes to another co-routine.
This is called resume.
This was offered by Simula
‘C ‘offers resume( )
Operations on Routines
Define
Call
Return
Pass
Lecture 18
INTRODUCTION TO FUNCTIONAL PROGRAMMING
Advantages :
• Have uniform view of programs
• Treat functions as data
• Automatic memory management
• Greatly flexible and has simple semantics
Disadvantage :
• These are interpreted thus result in substantial loss in execution
speed.
Lecture 18
KEY PROPERTIES OF FUNCTIONAL PROGRAMMING LANGUAGES
Lazy function evaluation
First – class objects
All programs and procedures are functions
Lack of variable and assignment
Lack of loop and iteration
Referential transparency
Dynamic memory environment
Garbage collection
Side – effect freedom
Lecture 18
KEY FEATURES OF FUNCTIONAL PROGRAMMING LANGUAGE
1.Lazy function evaluation:
• Unnecessary function evaluation is done
2. First – class objects : Functions are treated as objects
3. All programs and procedures are functions :
• Due to this feature the programs are considered as data and can be changed at run- time. This distinguishes
4. Lack of Variable Assignment
5. Lack of loops and iteration : replaced by recursive calls.
Lecture 18
KEY FEATURES OF FUNCTIONAL PROGRAMMING LANGUAGE
6. Referential Transparency ( Side Effect Free) Property of function whereby its value depends upon the parameters , but not upon previous computations.
7.Dynamic Memory Management : Done automatically
• Maintaining free space
• Reclamation of storage
8.Garbage collection :
• Methods to collect and return unreferenced storage
Factorial in Haskel vs. C
fac 0 = 1 fac n = n * fac (n -1)
int fac(int n) { int product = 1; while (n > 0) { product *= n ; n --; } return product; }
Offside rule
• Layout characters matter to parsing divide x 0 = inf divide x y = x / y
• Everything below and right of = in equations defines a new scope
• Applied recursively fac n = if (n ==0) then 1 else prod n (n-1) where prod acc n = if (n == 0) then acc else prod (acc * n) (n -1)
• Lexical analyzer maintains a stack
Lists
• Part of all functional programs since Lisp
• Empty list [] = Nil
• [1]
• [1, 2, 3, 4]
• [4, 3, 7, 7, 1]
• [“red”, “yellow”, “green”]
• [1 .. 10] => arithmetic sequence
• Can be constructed using “:” infix operator
– [1, 2, 3] is equivalent to (1 : (2 : ( 3 : [])))
– range n m = if n > m then [ ] else ( n: range (n+1) m)
Constructs arithmetic sequence [n…m] dynamically
List Comprehension
• Inspired by set comprehension S = {n2 | n {1, …, 100} odd n}
• Haskel code s = [n^2 | n <- [1..100], odd n] “n square such that n is an element of [1..100] and n is odd”
• Qsort in Haskel qsort [] = [] qsort (x: xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort[y | y <- xs, y >= x]
Pattern Matching
• Convenient way to define recursive functions
• A simple example fac 0 = 1 fac n = n * fac (n-1)
• Equivalent code fac n = if (n == 0) then 1 else n * fac (n -1)
• Another example length [ ] = 0 length (x: xs) = 1 + length xs
• Equivalent code length list = if (list == []) then 0 else let x = head list xs = tail list in 1 + length xs
Polymorphic Typing
• The basic types like int,char are said to be monomorphic types
• Polymorphic type means a variable can have many types.
• For eg. Empty list [ ] can have many types : list of characters, list of numbers etc.
• Benefits:
– Code reuse
– Guarantee consistency
Polymorphic Typing
• The compiler infers that in length [ ] = 0 length (x: xs) = 1 + length xs – length has the type [a] -> int
length :: [a] -> int
• Example expressions
– length [1, 2, 3] + length [“red”, “yellow”, “green”]
– length [1, 2, “green” ] // invalid list
• The user can optionally declare types
• Every expression has the most general type
Structure of a functional compiler
High-level language
Functional core
Polymorphic type checking De-sugaring: 1. Pattern matching 2. List to pairs 3. List comprehension 4. Lambda lifting
Optimizations
Code generation
C code Runtime system
Compiling Functional Languages
• Below is the compiler phase handles which aspect of Haskell:
Compiler Phase Language Aspect
Lexical Analyzer Off side Rule
Parser List Notation List Comprehension Pattern Matching
Context Handling Polymorphic type checking
Run Time System Referential Transperency Higher Order Functions Lazy Evaluation
Polymorphic Type Checking
• These are rules that ML uses to infer the type correctness of
polymorphic code known as :
1. All occurrences of the same identifier in a given scope must have the same
type.
2. In an if/then/else expression, the condition must be of type bool, and the
then and else clauses must be of the same type.
3. A programmer-defined function has type 'a -> 'b where 'a is the type of
the function’s parameter and 'b is the type of its result. (Functions have
tuple arguments.)
4. When a function is applied, the type of the argument passed to the function
must be the same as the parameter type in the function’s definition and the
type of the application is the same as the type of the result in the function’s
definition.
• The Hindley- Miler algorithm traverses a program’s abstract syntax tree (AST)1 , assigning type variables at each node. It then applies rules corresponding to syntactic constructs to attempt to resolve the assigned type variables against specific type information deduced from program contents and context.
Polymorphic Type Checking
• For example, consider checking the following Standard ML-ish function, which returns a string depending on the sign of its integer argument:
fun sign 0 = "zero" |
sign n = if (> n) 0 then "positive" else "negative”
This function has type int -> string, i.e. it takes an integer argument and returns a string result
Polymorphic Type Checking
• Prior to type checking, it is assumed that: 0 : int i.e. 0 is an integer "negative" : string
> : int -> int -> bool i.e. > compares two integers to return a boolean
• The rule for a function:
• Checks the first case: fun sign 0 = "zero"
• concludes 0 is int from assumptions • checks the body i.e. deduces "zero" is string from assumptions • concludes that the first case is int -> string
Polymorphic Type Checking
• Checks the second case: sign n = if (> n 0) then "positive" else "negative”
• Assumes parameter n has type α
• checks the body i.e. checks the if expression:
• checks the condition i.e. checks the application of > n to 0
• checks the application of > to n
• concludes that > is int -> int -> bool from assumptions
• assigns the type variable β to the result of the application
• concludes that n is α from assumptions
• unifies the anticipated type α -> β with the type of >, int -> int -> bool
• concludes that α is int and β is int -> bool
Polymorphic Type Checking
• assigns the type variable γ to the result of the application
• concludes that 0 is int from assumptions
• unifies the anticipated type int -> γ with the type of int -> bool
• concludes that γ is bool
• unifies the condition type bool with the required type for a condition, bool
• checks the then branch i.e. concludes that "positive" is string from assumptions
• checks the else branch i.e. concludes that "negative" is string from assumptions
• unifies the then and else branches, which must have the same type •
concludes that the if has type string
• concludes that the second case has type int -> string
Polymorphic Type Checking
Desugaring
Translating high level construct into a simpler one means desugaring
Desugaring
Translation to List
Translation of Pattern
Matching
Translation to List
Comprehension
Translation to Lists
List Notations has three operators in Functional Language : i) , ii) .. iii) :
List of the form x : xs is transformed to (cons x xs)
• list[1,2,3] or list(1:3) is transformed to (cons 1(cons 2 ( cons 3 [ ] ) ) )
Translation OF Pattern Matching
Pattern matching on constructors desugars to case statements
Translation OF Pattern Matching
Pattern matching on numeric or string literals desugars to equality tests:
Translation OF List Comprehension
List comprehensions are equivalent to ’do’ notation:
Lecture 18
JAVA CC …JAVA COMPILER COMPILER
Parser and Lexical Analyzer Generator
i.E it Produces Lexical Analyzers And Parsers in Java
It generates Top Down Parser hence cannot parse left recursive grammar
Lecture 18
JAVA CC …JAVA COMPILER COMPILER
JAVA CC
Regular Expression
Translation Grammar
Java Code
.jj file
Lecture 18
Steps for Execution
> JavaCC eg.jj //Generates eg. java
> Javac eg.java //Generates eg. Class
> Java eg
Java CC Specification : ClassName.jj Options { // Java CC Options } PARSER-BEGIN(Class-Name) // Code PARSER-END (Class-Name) SKIP : { ” ” } SKIP : { ”\n” | ”\r” | ”\r\n” } TOKEN : { < PLUS : ”+” > } TOKEN : { < NUMBER : ([”0”-”9”])+ > } void Start() : {} { <NUMBER> ( <PLUS> <NUMBER> )* <EOF>
Lexical Analyzer
Parser
Lecture 18
JavaCC Compilation : javaCC ClassName.jj
Compiling this file generates Seven Files
• TokenMgrError is a simple error class; it is used for errors detected by the lexical analyser and is a subclass of Throwable.
• ParseException is another error class; it is used for errors detected by the parser and is a subclass of Exception and hence of Throwable.
• Token is a class representing tokens.
• SimpleCharStream is an adapter class that delivers characters to the lexical analyser.
• ClassNameConstants is an interface that defines a number of classes used in both the lexical analyser and the parser.
• ClassNameTokenManager is the lexical analyser
• ClassName is the Parser
Java CC Example PARSER_BEGIN(Calc0) // must define parser class public class Calc0 { public static void main (String args []) { Calc0 parser = new Calc0(System.in); for (;;) try { if (parser.expr() == -1)
System.exit(0); } catch (Exception e) { e.printStackTrace(); System.exit(1); } } } PARSER_END(Calc0)
Java CC Example
SKIP: { " " | "\r" | "\t" } // defines input to be ignored TOKEN: // defines token names { < EOL: "\n" > | < CONSTANT: ( <DIGIT> )+ > // re: 1 or more | < #DIGIT: ["0" - "9"] > // private re } int expr() : // expr: sum \n { } // -1 at eof, 0 at eol { sum() <EOL> { return 1; } | <EOL> { return 0; } | <EOF> { return -1; } }
Java CC Example
// sum: product { + - product }
void sum(): { } { product() ( ( "+" | "-" ) product() )* }
// product: term { *%/ term }
void product():{} { term() (( "*" | "%" | "/" ) term() )* }
// term: +term | -term | (sum) | number
void term(): {} { "+" term() | "-" term() | "(" sum() ")" | <CONSTANT> }