low level virtual machine c# compiler senior project proposal
DESCRIPTION
LLVM IR generator for subset of C# compilerTRANSCRIPT
ASSUMPTION UNIVERSITY Faculty of Science and Technology
Low Level Virtual Machine C# Compiler
Senior Project Proposal
In partial fulfillment of the course SC4299 Senior Project
Semester 1 / Year 2009
G ROUP MEMBERS
Prabir Shrestha (4915302) Myo Min Zin (4845411)
Napaporn Wuthongcharernkun (4846824)
COMM ITTEE MEMBERS
Dr. Songsak Channarukul A. Se Won Kim
ADVISOR
Dr. Kwankamol Nongpong
Table of Contents
1 Introduction ............................................................................................................ 1
1.1 Motivation ....................................................................................................... 2
1.2 Problem Statement .......................................................................................... 3
1.3 Objectives ........................................................................................................ 5
2 Literature Review ................................................................................................... 6
2.1 Source Language Background ........................................................................ 6
2.2 LLVM Description .......................................................................................... 6
2.3 Contributions to C# ......................................................................................... 7
3 Scope ...................................................................................................................... 9
3.1 Keywords ........................................................................................................ 9
3.2 Operators and Special Characters ................................................................... 9
4 The Framework..................................................................................................... 12
4.1 Scanner .......................................................................................................... 13
4.2 Parser ............................................................................................................. 13
4.3 Semantic Analyzer ........................................................................................ 17
4.4 Code Generator ............................................................................................. 18
4.5 Assembling and Linking ............................................................................... 21
5 Gantt Chart ........................................................................................................... 23
6 References ............................................................................................................ 24
7 Appendix .............................................................................................................. 25
7.1 LLVM C# Compiler EBNF .......................................................................... 25
List o f Figu res
Figure 1-A: Compilation Phases .................................................................................... 2
Figure 4-A: Overall Process of LLVM C# Compiler .................................................. 12
Figure 4-B: Custom Coco/R function .......................................................................... 15
Figure 4-C: Sample AST Nodes .................................................................................. 16
Figure 4-D: Sample AST Binary Nodes ...................................................................... 16
Figure 4-E: Sample AST Loop Nodes ......................................................................... 17
Figure 4-F: Semantic Error Code Fragment ................................................................ 18
Figure 4-G: Sample C# Code Fragment ...................................................................... 18
Figure 4-H: LLVM IR Equivalent of the C# Code Fragment ..................................... 19
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1
1 Int roduction
Modern programming languages today give us a means of expressivity for
applications in a variety of ways, through varying means.
The developer‟s choice of language in constructing an application, first and foremost,
could almost instantly convey to us information about the purpose of the system
design.
There are a myriad of classifications of styles of programming languages, from
logical, imperative, and functional to object-oriented styles of programming. The
wide mainstream use and popularity of object-oriented programming languages we
believe is due to its ability to effectively and easily model the real world objects and
their functionalities that we see around us in a way that machines can understand.
Modern high-level languages such as the source language we have focused on, C#,
more often than not contains a combination of all the above listed programming
paradigms. In the newer versions that have been released, an increased ease of use in
functionalities have been deployed in several areas such as generics, Language
Integrated Queries (Linq), and anonymous functions to name a few.
However the focus of our project will be primarily on the basic object-oriented
elements of the language which will capture the core-constructs of the syntax and
semantics of our source language.
Diversity in alternative usage is another factor of importance when there are large
communities of users for a particular language. To further this reason an alternative
method of deploying and compiling a C# application is primarily our objective in this
project. Large existing compiler frameworks are widely in use for the C# language
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2
such as Microsoft's .NET and Mono. These systems are sometimes however bulky
due to the sets of features it provides even for those which developers would not be
using. Therefore the practicality and usefulness of our project is seen as a small
portable tool for developers of C# applications.
The core objective of this project is to create a compiler for the C# language that
generates a portable intermediate representation of low level code, which can then be
used across a wide variety of architectures and operating systems with minimal or no
code modification to the original source. In order to accomplish the task, Low Level
Virtual Machine Intermediate Representation (LLVM IR) has been chosen as the
target code output generated by the compiler due to its nature of independence.
F i g u r e 1 - A : C o m p i l a t i o n P h a s e s
1.1 Motivation
From different contributions and evolutions to the compiler technologies and
programming paradigms we had motivations to pursue in the creation of a new C#
compiler.
Distributing the binaries created by the C# compilers requires us to install the bulky
.NET Framework. Even a traditional “helloworld” program would require all the
features of .NET Framework to be installed. To solve this problem we have taken the
Low Level Virtual Machine C# Compiler
Senior Project Proposal
3
approach of C and C++ which link the appropriate libraries required to the program
successfully.
D Language has also been one of the major inspirations, providing the programmers
with features of modern languages such as automatic memory management by
garbage collection, interfaces and yet producing high performance codes to enable
system programming [1] such as system drivers and even operating systems.
Writing of operating system has been evolving throughout the past decades from
assembly codes to high level languages such as C and C++. There have been many
other projects such as SharpOS [2], Comos and even Microsoft‟s research operating
system – Singularity [3], which have taken a different approach by writing the kernel,
device drivers and application in managed code. The compilers of these operating
systems have been the motivation to create a C# compiler that produces native codes.
“Write Once, Run Anywhere” (WORA) slogan from Sun Microsystems has made us
think to generate a portable code which could be used over a wide variety of operating
system and computer architectures.
1.2 Problem Statement
The way we write programs have been evolving ever since the beginning of the stored
program concept and continue to evolve even at the present due to the advances in
hardware and software. From the introduction of Java and now the .Net framework,
the concept of virtual stack machine and Just in Time Compilation (JIT) has been
coming to popularity. One of the notable compilers which use this concept is C#. It
has been allowing the programmers to write compiled machine-independent codes
which could virtually be executed in any architecture.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
4
Even though Java byte-code and Common Language Infrastructure (CLI) consists of
highly machine independent code, it has not been a candidate for system
programming due to performance issues such as lack of speed as compared to other
languages such as C and C++ and due to the JIT. LLVM has a similar concept of JIT
by converting the code to a compiled LLVM bit code which could then be executed in
other architecture and operating system. In order to gain better performance for a
particular architecture or operating system, it could further be compiled to a native
code. As of writing, LLVM‟s retargettable code generator currently supports most of
the popular architectures such as x86, x86-64, PowerPC, PowerPC-64, ARM, Thumb,
SPARC, Alpha, CellSPU, PIC16 MIPS, MSP430, SystemZ and XCore [4].
While languages such as C and C++ provide better execution speed than compared to
C# and Java, programmers do have to face with unsafe codes such as manual memory
management which could lead to memory leak or dangling pointers. This memory
problem is usually solved by the use of garbage collection as seen in C# and Java. It
also introduces the concepts of delegates by avoiding the use of unsafe function
pointers.
As developers have been writing their codes, a set of common principles on the way
they write code have been evolving. Uses of accessors and mutators have been a
common way of accessing variable in the object oriented world rather than the use of
public variables. Many of these features have been addressed by C# language.
Because of features such as the memory management and the adhering to the
principles of writing a program, we have chosen C# as an input language for our
compiler.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
5
Migration to different platforms causes the programmers to write architecture specific
code to each of those platforms. Languages such as C and C++ do not have a straight
forward way to know the length of integer – 32 bit or 64 bit. But C# provides an
easier way to access it by using the inbuilt Int32 object.
1.3 Objectives
The objective of our project is to create a compiler for the C# language in which the
target language is in a form of low level independent language similar to assembly
code called Low Level Virtual Machine Intermediate Representation (LLVM IR).
The focus of our project will be primarily on each phase of the compilation process,
from scanning the source language until target code generation. These phases include
Lexical Analysis, Syntax Analysis, Semantic Analysis and Intermediate Code
Generator. Other phases such as assembling and linking will be handled by LLVM
tools.
The finalization and expected outcome of the project will be a compiler that is set to
be functional for the C# language specifications according to the designated scope of
the language that we determined.
The basic requirements for the compiler include the following:
The compiler will properly recognize the lexical structures of the C# language.
Check the syntax taking into account the correct grammar according to the
language specifications as well as the semantics of the program otherwise
generating errors accordingly.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
6
2 Literature Review
2.1 Source Language Background
C# is a high-level object-oriented programming language that is part of the .NET
language family developed by Microsoft. Although the language is considered to be
primarily object-oriented a closer look reveals that it is in fact a multi-paradigm
language with aspects of functional and imperative programming styles included in it
as well.
It is currently designed to function within the Common Language Infrastructure (CLI)
which provides a CTS (Common Type System) and CLS (Common Language
Specification) so that when it is compiled it generates the CIL (Common Intermediate
Language).
2.2 LLVM Description
Low Level Virtual Machine (LLVM) is a compiler infrastructure that consists of two
primary components, an optimizer and a code generator. It is designed so that
optimizations of programs can occur at different phases of the program life such as
compile-time, link-time and run-time [5].
LLVM IR (Intermediate Representation) is a low-level language similar to assembly
language containing RISC like instruction set that effectively captures the operations
of the processor whilst avoiding machine-specific constraints such as pipelines,
physical registers and other low-level calling conventions. By increasing the layer of
abstraction apart from the hardware specifics in the code, the LLVM IR is in a sense,
Low Level Virtual Machine C# Compiler
Senior Project Proposal
7
platform independent and can be used on a variety of machines with different
hardware specifications.
The common code representation used throughout all phases of the LLVM
compilation strategy is a Single Static Assignment (SSA) based representation which
provides type safety, low-level operations and is flexible and capable of representing
high-level languages in a clear and efficient manner.
A key important factor contributing to the productivity of the LLVM system is its
virtual instruction set. The LLVM code is a low level representation while being able
to contain high-level information due to its designed structure.
2 . 3 C o n t r i b u t i o n s t o C #
Other C# compiler projects that are available apart from Microsoft's .NET framework
are discussed briefly here to give an overview of the relevant developments that have
surfaced in this particular field, these include Mono, Cosmos(IL2CPU) [6], Bartok
and Ensemble.
Mono is an open source implementation of the .NET framework, it contains a Mono
C# compiler that is written in C# and can be run on several different operating
systems such as Linux, UNIX, Mac OS X and Solaris. The concept of how it works is
first the C# code gets compiled into MSIL then the Mono JIT translates the MSIL into
native code at run time which is similar to as the original implementation of the .NET
framework by Microsoft.
Cosmos(C# Open Source Managed Operating System) is an OS that is written entirely
in C#, the OS makes use of IL2CPU which is an AOT(ahead-of-time) compiler that
Low Level Virtual Machine C# Compiler
Senior Project Proposal
8
translates the CIL into machine code by outputting raw assembly files which then get
processed through NASM (Netwide Assembler).
Bartok was originally made for the use of the OS Singularity developed by Microsoft
Research. It works by translating CIL into native code by using three intermediate
representations, HIR (High-level IR), MIR (Medium-level IR) and LIR (Low-level
IR). At each of these representations starting from high-level it works its way down to
low-level IR and gradually changes the code representation at each phase until it
reaches the lowest level which is basically assembly, and then a standard linker puts
the objects together to create the native x86 executables.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
9
3 Scope
The scope from the language specifications has been determined for our project
according to the following listed keywords and operators, which is a subset of C#
version 1.0. We have chosen version 1 rather than the newer versions of C# because
we will not be supporting most of those new additional features such as Generics,
Language Integrated Query (Linq).
3 . 1 K e y w o r ds
3 . 2 O p e r a t o r s a n d Sp e c i a l C h a r a c t e r s
Primary
x.y
f(x)
a[x]
x++
x--
Unary
+
-
!
++x
--x
Relational
and type
testing
<
>
<=
Assignment
=
+=
-=
*=
/=
Multiplicative
& Conditional
*
/
&&
||
Base
bool
break
char
class
const
continue
do
else
enum
explicit
extern
false
float
for
get
if
implicit
In
int
namespace
new
null
operator
object
public
override
private
protected
return
sealed
set
sizeof
static
string
struct
this
true
typeof
using
virtual
void
while
value
is
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 0
new
(T)x
>=
==
!=
Based on the ECMA-334 C# Language Specification [7], the value of char in C# is a
Unicode Character. Microsoft‟s implementation of .NET framework implements it as
16-bit characters that can be used to represent most of the known written languages in
the world. For our C# compiler we will not be implementing the original version but
rather, char will be the size of 8-bit which is the same as the standard C and C++. This
holds the same for string type.
C# Language Specification based on the ECMA-344 allows a distinct type for
enumeration type such as byte, sbyte, short, ushort and int. The compiler will only be
supporting 32-bit integer (int) as the enumeration type.
Microsoft .NET has provided base class libraries, which are the classes, structures,
enumerations and delegates, for C# programmers to deal I/O, accessing Database. In
our complier, we will be providing a subset of these libraries.
System.Array System.Char System.Random
System.Console System.Byte System.Boolean
System.Enum System.String
We will be providing our own libraries for the end user to assemble and link with the
output LLVM IR. The provided libraries will be performing most of the
functionalities of the above .NET libraries. Should there be any exceptional cases; the
user manual will also be provided.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 1
In our implementation of the compiler using declaratives can be used only at the top
of the file and cannot be placed inside the namespace block.
Only single dimension arrays will be supported.
Optimization would not be taken into consideration during the code generation of
LLVM IR.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 2
4 The Framework
The compiler will be written in C# language using the Microsoft Visual Studio and
Microsoft .NET framework.
Implementation of scanner and parser is done by the automatic scanner and parser
generator called Coco/R which is also written in C#. In order to make the generation
of scanner and parser easier we have also created a Coco/R plugin which can be used
directly from Visual Studio.
F i g u r e 4 - A : O v e r a l l P r o c e s s o f L L V M C # C o m p i l e r
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 3
4.1 Scanner
Basically, Coco/R takes the attributed grammar of source language and generates a
scanner and recursive descent parser for this particular language. The scanner
generated by Coco/R reads the input stream and returns the stream of tokens to the
parser.
In a traditional overview of the compilation scanning and parsing process are seen as
two distinct separate processes occurring one after the other. However using the
COCO/R tool the scanner and parser generation occurs at the same time where the
scanner codes and parser codes are written in the same attributed grammar file ending
usually with .atg extension.
The scanner generator's purpose is to perform the lexical analysis on the source
language. What it does is it takes the syntax input of the program, tokenizes it and
checks for lexical errors. Tokenization refers to the process of categorizing the syntax
of the program into its basic building blocks which are tokens. Tokens usually include
identifiers, keywords, numbers and symbols; these are the fundamental building
blocks of a program.
4.2 Parser
The parser generator handles the syntax analysis for the source language. During the
syntax analysis phase the focus of concern is checking for the source input program's
adherence to the grammatical rules of the source language. There are two major
techniques for parsing, table driven and recursive descent. The Coco/R tool deploys
the recursive descent parsing technique.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 4
Recursive descent parsing is a well-known as top-down parsing technique that is
simple, convenient and accomplishes the task efficiently for the next sequenced
phase, semantic analysis to begin. The top-down parsing technique as the name
suggests starts constructing the parse tree from the top of the tree, the root and works
its way downwards, making predictions for each next token input as to which
production rule may be used, and adding them on to the parse tree. The control flow
of recursive descent parsing is strictly linear, no jumps, loops or conditional
statements are used. However recursive subroutines are in effect as that is a primary
characteristic of recursive descent parsing.
However in general for this parsing technique a basic requirement of the grammar is
that it should be in LL(1) form.
LL(1) is an abbreviation for left to right with left canonical derivations using only
one look-ahead symbol. The grammar of the source language which we have written
for our compiler however is not in LL(1) form, this then presents another factor into
the equation, there are a number of solutions that Coco/R uses for grammars that are
not in LL(1) form. They are typically termed 'Conflict Resolvers' and include the
following.
1. Multi-symbol Look ahead
2. Resolver Symbols
Multi-symbol Look ahead
In this technique the Coco/R generated parser uses two global variables that store the
last recognized terminal and the current look ahead symbol. When the need arises to
look ahead more than one symbol, the generated scanner does this by using the
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 5
methods ResetPeek() and Peek(). The ResetPeek method initializes the peeking to
begin from the symbol after the current look-ahead symbol. The Peek method returns
the next symbol as a Token but does not remove it from the input stream, so these
symbols will be sent again by the scanner when parsing resumes.
To make it easier for us to look ahead more than one token ahead, we have created a
custom function called PeekutilizingResetPeek() and Peek() functions of Coco/R
which returns the n-th token after the current look ahead token.
F i g u r e 4 - B : C u s t o m C o c o / R f u n c t i o n
Resolver Symbols
These are artificial tokens that are added into a separate section in the grammar to
help direct the parser in the correct way. They are inserted on- the-fly during parse
time as seen necessary by the resolution routine that is used by Coco/R. These
resolution routines are automatically put into the generated parser by Coco/R.
During the parsing phase, Abstract Syntax Tree (AST) is generated. All the AST
nodes inherit from a common class called AstNode. Some AstNodes implement
IAstExpression indicating it is an expression while some inherit from
IAstHasExpression allowing to retrieve multiple expressions for the particular node.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 6
F i g u r e 4 - C : S a m p l e A S T N o d e s
For simplification, AstBinaryExpression was created containing LeftOpearand and
RightOperand which return an object implementing IAstExpression. Other binary
expressions such as binary arithmetic, logical expression derive from
AstBinaryExpression.
F i g u r e 4 - D : S a m p l e A S T B i n a r y N o d e s
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 7
F i g u r e 4 - E : S a m p l e A S T L o o p N o d e s
4.3 Semantic Analyzer
Semantic Analysis is the phase in the compilation process that follows after the
parsing phase.
Once the parsing and scanning phase has been completed this means that the source
code has been checked for lexical and syntax errors. The next step then is to check
that the program source code is semantically correct as well as not all program
properties can be expressed correctly using CFG form.
This task is aided by the semantic actions that are added onto the grammar in a format
that Coco/R supports.
For instance, types of errors that will be checked for during this phase are type
checks, scoping of variables, constant values not being changed, no redefinitions of
classes and methods within their respective scopes, initialization of variables and
fields.
Moreover, the source language C# does not allow the identifier to be used before it is
declared. Since C# is a strongly-typed, a language in which the type errors are
detected during compilation time; the compiler has to know the type information of a
certain identifier before it is used.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 8
If the compiler encounters the declaration of an identifier, it stores the type
information assigned to that identifier. In the later part of the program, when the
compiler examines the expression containing this identifier, it is verified by its type
information. For example, the follow fragment of C# is syntactically correct but
semantically wrong and will give a complier error.
F i g u r e 4 - F : S e m a n t i c E r r o r C o d e F r a g m e n t
In this example, the identifier x is used without being declared. When the compiler
encounters the expression x = 10, the type of the operands are compared and the
identifier x is checked if it is assignable. Since the identifier x is not declared before
this expression, the complier do not have the type information of x and will not be
able to perform any of these. Then, it will give a compile time error to the
programmer.
Once the semantic analysis process has been completed the source program is ready
to move on to the code generation phase.
4.4 Code Generator
After the creation of AST and passing the semantic analysis, appropriate LLVM IR
would be generated if no errors were encountered.
F i g u r e 4 - G : S a m p l e C # C o d e F r a g m e n t
Low Level Virtual Machine C# Compiler
Senior Project Proposal
1 9
The above C# code fragment would be generated to LLVM IR similar to the
following code.
F i g u r e 4 - H : L L V M I R E q u i v a l e n t o f t h e C # C o d e F r a g m e n t
; Declares a global string constant
The comments in LLVM begin with a semi colon terminating at the end of the line.
declare i32 @printf(i8*, ...) nounwind
This line at the end of the code in the sample generated LLVM IR contains the
declaration of the function called printf which takes in the first parameter as a pointer
to integer of 8 bits along with varying number of arguments.
As our generated code requires the use of system calls to the operating system to print
the text using Console.WriteLine C# function, we need to support some mechanism to
notify the operating system about writing the text in LLVM IR. Other features such as
returning the operating system the exit code also requires the use of system calls. This
could be achieved by hardcoding the architecture and operating system specific
assembly code in the LLVM IR. But to achieve portability among different systems
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 0
the code generator will make use of the Standard C Library which can be linked to the
generated LLVM code during the link phase.
Due to the existences of the printf function in Standard C Library, the body of the
printf function is not defined in the LLVM IR. Like a function in C# can be called
before the declaration of the function, LLVM IR too makes use of the same feature by
enabling to write the function definition before the actual calling of the function as
shown in the generated LLVM IR which is appended to the end of the code.
@.str = internal constant [4 x i8] c"%d\0A\00"
This code creates a global variable called .str , an array of 8 bits integer whose array
size is 4. @ denotes a global variable in LLVM. Since LLVM supports arbitrary bit
width for integer ranging from 1 bit to 231
-1 (approximately 8 million) explicit size
must be defined in integer type. (LLVM code generation does not support large
integer types to be used as function return types. The specific limit on how large a
return type the code generator can currently handle is target independent; currently it
is often 64 bits for 32-bit targets and 128 bits for 64-bit target. [8])The string variable
is integer of 8 bits due to the fact that the size of „char‟ in standard C is of 8 bits.
“%d\0A\00” is hexadecimal equivalent of “%d\n\0”. Special characters in LLVM are
escaped using “\xx” where xx is the ASCII code for the character in hexadecimal.
define void @PrintSquare(i32 %n) nounwind
{
; code omitted for brevity
}
The above block of code contains the function definition for PrintSquare function
accepting integer of 32 bits as parameter whose return type is void. „nounwind‟
keyword is added to inform that the function never returns the unwind or exception
control flow. In case the function does return, its runtime behavior is undefined.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 1
%n_addr = alloca i32
The above statement creates a local variable named n_addr and allocates memory in
the stack frame which automatically gets released when it is returned to the caller.
After the allocation of the memory the pointer to the allocated memory is returned
which is stored in the n_addr variable. „%‟ sign indicates the variable is local.
store i32 %n, i32* %n_addr
This statement copies the integer value of local value n to the memory location
pointed by n_addr variable.
%0 = load i32* %n_addr
The above code fragment copies the integer value of the memory pointed to the
memory location stored at n_addr variable to a local variable named 0 (zero). Variable
names which are numeric are referred as unnamed temporaries in LLVM.
%2 = mul i32 %0, %1
The mul i32 instruction performs multiplication on integer of 32 bits on local
unnamed temporary 0 and 1 and stores the value in unnamed temporary 2.
%4 = call i32 (i8*, ...)* @printf(i8*
getelementptr ([4 x i8]* @.str, i32 0, i32 0),
i32 %3) nounwind
The “getelementptr” instruction performs address calculation of the local variable .str
and doesn‟t access the memory. “call” instruction calls the function named printf and
passes the calculated memory location of the .str variable along with the integer value
stored at unnamed temporary 3.
4.5 Assembling and Linking
After the LLVM IR has been generated it is the user‟s responsibility to assemble and
link it further down to the appropriate binary executable. The LLVM IR generated by
our compiler can be compiled to LLVM bitcode. With the help of GNU binutils it
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 2
could be further compiled to native code or be able to generate architecture specific
assembly code. These tools are open source and also can be executed on wide
varieties of architectures and operating systems. For windows, we will be using the
officially LLVM tools while for the Gnu binutils we will be using the one from
Mingw, as it provides direct compatibility with Windows.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 3
5 Gantt Chart
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 4
6 Refer ences
[1]. Intro - D Programming Language - Digital Mars. D Programming Lauage -
Digital Mars. [Online] [Cited: August 5, 2009.] http://www.digitalmars.com/d/.
[2]. SharpOS Wiki. [Online] [Cited: August 14, 2009.] http://www.sharpos.org.
[3]. Singularity - Microsoft Research. [Online] [Cited: August 14, 2009.]
http://research.microsoft.com/en-us/projects/singularity/.
[4]. LLVM Compiler Infrastructure Project. LLVM Compiler Infrastructure Project.
[Online] [Cited: August 5, 2009.] http://llvm.org/Features.html.
[5]. Lattner, Chris and Adve, Vikram. The LLVM Compiler Infrastructure Project.
The LLVM Compiler Infrastructure Project. [Online] March 2004. [Cited: August 8,
2009.] http://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf.
[6]. Cosmos. [Online] [Cited: August 14, 2009.] http://www.gocosmos.org.
[7]. [Online] [Cited: August 14, 2009.] http://www.ecma-
international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-
334,%201st%20edition,%20December%202001.pdf.
[8]. LLVM Assembly Language Reference Manual. The LLVM Compiler
Infrastructure Project. [Online] [Cited: August 10, 2009.]
http://llvm.org/docs/LangRef.html.
[9]. Standard ECMA-334 - C# Language Specification. [Online] [Cited: August 12,
2009.] http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-
334.pdf.
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 5
7 Append ix
7 . 1 L L V M C # C o m p i le r E B NF
L L V M CS h a rp = { U s in gD e c l ar i t iv e } { N a me sp a c e Me m b er } . U s i n gD e c la r i ti v e = " u s i ng " Qu a l id e n t "; " . N a m e sp a c eM e m be r = ( " n a me s p ac e " Q u a li de n t "{ " { N a m es p a ce M e mb e r }" }" | { T yp e M od i f ie r s } Ty p e D ec l ) . Q u a l id e n t = id e n t {" . " } i d e nt . T y p e Mo d i fi e r s = "p ub l i c " | "p r o te c t ed " | " pr i v at e " | " se al e d " . T y p e De c l = ( " c la s s " i d en t [C la s s B as e ] C l a ss B o dy [ " ; " ] | "s t r uc t " i de n t [B a s e] S tr u c tB od y [ "; " ] | "e n u m" i de nt [ " :" I nt T y pe ] En um B o d y [ " ;" ] ) . C l a s sB a s e = ": " Cl as s T y pe . C l a s sT y p e = Qu a l id en t | " o b je c t " | "s tr i n g ". C l a s sB o d y = "{ " { {M e m b er M o di f i er } C la s s M e m be r } " } ". M e m b er M o di f i er = " ov e r r id e " | " pr i v at e" | "s e a le d " | " st at i c " | " e x t er n " | " vi r t ua l" . C l a s sM e m be r = S t ru ct M e m be r . S t r u ct B o dy = " { " { { M e m be r M od i f ie r } St ru c t M em b e r } " } " . S t r u ct M e mb e r = " c o ns t " T y p e id e n t " = " E x p r { ", " i d e nt " =" E xp r } "; " | i d e nt " (" [ Fo rm a l P ar a m s] " )" [ Co ns t r u ct o r Ca l l ] ( B lo ck | " ; " ) | ( " i mp l i ci t " |" ex p l i ci t " ) " o pe r a to r" T y pe " (" T yp e id en t " )" ( B l o ck | " ; " ) | T y p eD e c l | T y p e " o pe r a to r" O v er l o ad a b le O p " (" T y pe i de n t ( " , " Ty p e i d e n t | ) " ) " ( B l oc k| " ; " ) | F i e ld { " , " F ie l d } " ; " | Q u a li d e nt " (" [ F o r ma l P ar a m s] " )" ( B l o ck | " ;" ) | " { " A c c es s o rs " } " . B a s e = " :" Q ua l i de nt . I n t T yp e = " i nt " | "c h a r ". E n u m Bo d y = " {" E nu mM e m b er { ", " En u m Me mb e r } " } " . E n u m Me m b er = i d e nt [ " = " E x p r ] . T y p e = S im p l e T y p e | C l a ss T y pe . S i m p le T y pe = I n t Ty pe | "b o o l" | " f l oa t" .
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 6
C o n s tr u c to r C al l = ": " ( "b a s e" | " th i s ") " ( " [ A r gu m e nt { " ," A r g u me n t } ] ") " . B l o c k = "{ " {S t a te me n t } " } " . S t a t em e n t = ( " c o ns t " T yp e i de n t " = " E x p r { " , " i d e nt " =" E xp r} | L o ca l V ar De c l "; " | E m be d d ed St a t e me n t ) . L o c a lV a r De c l = T yp e L o c al V a r { " ," L oc al V a r }. L o c a lV a r = i de n t [ "= " I ni t ]. E m b e dd e d St a t em e n t = B l oc k | " ; " | S t at e m en t E xp r " ; " | " i f" " (" E xp r " ) " E m b ed d e dS t a te me n t [" e l se " E m b e dd e d St a t em e n t] | " w hi l e " " ( " Ex p r ") " Em b e dd e d St at e m e nt | " d o" E mb e d de dS t a t em e n t " w hi l e " "( " E xp r ") " "; " | " f or " "( " [F or I n i t] " ;" [ Ex p r ] "; " [ Fo r I nc ] ") " E m b e dd e d St a t em e n t | " b re a k " " ; " | " c on t i nu e " " ;" | " r et u r n" [ Ex pr ] " ;" . F o r I ni t = L o ca l V ar De c l | S t at e m en t E xp r { " , " S t at e m en t E xp r} . F o r I nc = S t a te m e nt Ex p r {" , " S t a te m e nt Ex p r } . S t a t em e n tE x p r = Un ar y A ss i g nO p E xp r . A s s i gn O p = " =" | " += " | " - = " | "* = " | " / = " . E x p r = U na r y ( O rE xp r | A s s ig n O pE x p r) . O r E x pr = A n d Ex p r { "| | " Un a r y A n dE x p r }. A n d E xp r = E q lE x p r{ " & & " U n a ry E ql E x pr } . E q l E xp r = R e lE x p r{ ( " ! = " | "= = " ) U n ar y R e l Ex p r } . R e l E xp r = A d dE x p r{ ( " < " | "> " | " < =" | " > =" ) } U na r y A dd E x p r | " i s " T y p e. A d d E xp r = M u lE x p r{ ( " + " |" - " ) M u lE x p r} . M u l E xp r = { ( "* " | "/ ") U n ar y } . U n a r y = {( " +" | " -" | "! " | " + +" | " -- " | " ( " T y p e " ) " ) } P r i m ar y . P r i m ar y = ( i d e n t | L i te r a l | " ( " E x pr " ) "
Low Level Virtual Machine C# Compiler
Senior Project Proposal
2 7
| ( "b o o l" | " c ha r " | " fl o a t" | " i nt " | " o bj e c t" | " s t r in g " ) " ." i de nt | " t hi s " | " b a s e" ( " . " i d e nt | " [ " E x pr " ]" ) | " n ew " Ty pe ( "( " [ A r gu m e nt { " , " A r g um e n t} ] ") " | A r r a yI n i t ) | " t yp e o f" " ( " Ty p e " ) " | " s iz e o f" " ( " Ty p e " ) " ) { "+ + " | " -- " | " . " i d en t | " ( " [ A rg um e n t { " , " A r gu m e nt }] " ) " } . L i t e ra l = i n te g e rC on s t a nt | r e a lC o n st an t | c h a ra c t er C o ns ta n t | s t r i ng C o ns t a nt | " tr u e " | " fa l s e" | " nu l l " . O v e r lo a d ab l e Op = " + " | " - " | "! " | " + + " | " - - " | " t r u e" | " f a ls e " | " * " | "/ " | "= = " | " ! =" | " > " | " < " | " >= " | " < =" . F i e l d = id e n t[ " =" I n i t ] . F o r m al P a ra m s = P ar [ " , " F o r ma l P ar a m s] . P a r = T y pe i de n t . // r e f a n d o u t n o t s up p o r te d A c c e ss o r s = Ge t A cc es s o r | S et A c ce s s or . G e t A cc e s so r = id e n t (B l o c k | "; " ). S e t A cc e s so r = id e n t (B l o c k | "; " ) . A r g u me n t = E xp r . I n i t = E xp r | A r ra yI n i t . A r r a yI n i t = "{ " [E xp r { " , " E x p r} ] "} ".