practical interpreter construction - leanpubsamples.leanpub.com/pic-sample.pdf ·...

114

Upload: others

Post on 16-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper
Page 2: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Practical Interpreter ConstructionLearn How to Craft An Interpreter Practical Way

Mehmet Emin Coşkun

This book is for sale at http://leanpub.com/pic

This version was published on 2015-07-19

This is a Leanpub book. Leanpub empowers authors and publishers with the LeanPublishing process. Lean Publishing is the act of publishing an in-progress ebookusing lightweight tools and many iterations to get reader feedback, pivot until youhave the right book and build traction once you do.

©2015 Mehmet Emin Coşkun

Page 3: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Tweet This Book!Please help Mehmet Emin Coşkun by spreading the word about this book onTwitter!

The suggested hashtag for this book is #learnhowinterpreterswork.

Find out what other people are saying about the book by clicking on this link tosearch for this hashtag on Twitter:

https://twitter.com/search?q=#learnhowinterpreterswork

Page 4: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Contents

Chapter 1 - Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Interpreters Are Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . 2Is It That Difficult to Write an Interpreter? . . . . . . . . . . . . . . . . . 4How We Will Make it Easy . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 2 - Parsing and Calculating Expressions . . . . . . . . . . . . . . 8Parsing Arithmetic Expressions . . . . . . . . . . . . . . . . . . . . . . . 9

The Magic in The Calculator . . . . . . . . . . . . . . . . . . . . . . 9PacMan Eating Characters: The Tokenizer . . . . . . . . . . . . . . . 29Better Token Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Chapter 4 - Parsing Programming Language Constructs . . . . . . . . . . 65Statements and Code Blocks . . . . . . . . . . . . . . . . . . . . . . . . . 66While Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Page 5: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation

1

Page 6: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 2

Interpreters Are Everywhere

• Ever thought of how your mobile phone gets the number from an SMS orWhatApp message? It parse the text and get the number from. Similar to this,smilies work the same way: an interpreter gets the text, parses it and interpretsthat there is a smiley in the text and then shows you an image of that smiley

• Do you use a spreadsheet software like Excel? If so, you do some calculations.It calculates the cell values first by parsing and then interpreting that sourceaccording to math rules and according to Excel functions

• If you use any sort of command line, like Linux’s Bash shell or Mac OS’sterminal, you know it takes the commands first and then understands them,then finally executes them. The command line does that by interpreting thecommands. Every operating system shell is an interpreter. Some of them arecapable of executing scripts, such as bash, tcsh, zsh…

• I am writing this book using Markdown markup language. To produce thefinal html I use one of the Markdown interpreters. It takes the text writtenusing Markdown syntax and produces the html output that I can use for thisbook’s web site

• Almost every application has a configuration file. You can change the systems’behaviour by changing that file. The system takes the configuration file,processes it according to commands in it. This component in the system isan interpreter.

• The web browser you use every single day interpretes HTML, JavaScript andCSS to present you beautiful web pages.

• A server that exchanges messages has a fixed structure. You need to parsemessages and take actions on the server side according to the message payload.

• Pattern recognition software that highlights plagerism must also use aninterpreter to highlight problems.

• All template engines are interpreters. If you need to modify and write a newtemplate engine for any purpose either for web development or any otherenvironment, you will need an interpreter

Apart from above examples, languages like Python, PHP, Ruby, JavaScript, CSS,HTML, etc. all include a parser and interpreter that can be use more effectively

Page 7: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 3

once you learn how parsers and interpreters work. With hardware getting cheaperand cheaper everyday, interpreters are becomming more popular as they allowdevelopers to use more abstract languages that are easier to produce complexprograms in. Like every abstraction in the computer world, interpreters help us toget the problem at a more understandable level with interpreted languages.

I think you have got better understanding about interpreters now, and more impor-tantly, you know the benefits of learning how interpreters work.

Page 8: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 4

Is It That Difficult to Write an Interpreter?

The answer is: No.

It is not difficult to write your own interpreter and your own programming language.

So, fear not if you have studied this area before but found yourself lost in the plethoraof daunting and complex theory.

If you decide to study this area, then you will notice there is a lot of theory in it.Like anything and everything, there are more ways than one to get to a goal. So, itincreases the complexity by adding the dilemma of too many choice to this theoryheavy subject.

Books in this area are generally compiler-centered books. Of course most of themtells you how to write an interpreter and include certain chapters regarding differentcomponents of interpreters and compilers. There is no problem with this approach,but the fact is, it makes the books longer. Let’s all admit it, the longer, the less you’dlike to read it.

Life is always busy and you have a life apart from computers. I have kept that inmind while writing this book and focused on simple and concise explainations.

Furthermore, many of these books are filled with dense complicated calculus withlittle explaination. They assume you’ve had prior exposure to the theory beforehandling the text, and fail to explain many core concepts.

But a lot of programmers learn not by reading, they learn by doing. The mostimportant and valuable learning tool for them is readingwell written software sourcecode.

I am the same. I learn by doing, by reading interesting code. So, I am not into readingtheory centered, long and difficult to understand books.

This book is different. I learned by writing my own interpreter from the ground up.This knowledge has let me cut out the trouble and confusion that I had to contendwith.

This book is more than an interpreter book. It is a motivational book that shows youthat things do not have to be complicated to study, learn and apply.

Page 9: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 5

HowWe Will Make it Easy

• I am a pragmatic programmer and I am into practical things. Theory isimportant of course, nobody can neglect it. Lets be very aware of that theoryand practice are intertwined, we shouldn’t break them up.But let’s accept that we have deadlines, we have families, we have hobbies,things to do, and most importantly, a life. For writing an interpreter or acompiler, I don’t think that we have to read long books or to study everysingle detail of them. So, I am going to show you how to write a workinginterpreter by writing real code. We will learn by doing along with this bookusing concrete examples. Plus, there are a lot of programmers who are self-taught or who have been away from computer science theory for long time.Weneed to use a language that they can understand. Programming is an intuitivething that one must practice it understand. So, let’s become less formal andmore practical.

• Anything can be improved but we have to be aware of that there is no endto editing. So, if we keep changing things to improve then we can not reach areasonable end.True artists know when and where to stop by identifying a goal or target.We are going to write an interpreter that is working and useful, but we arenot going to labor endlessly to make it perfect. We need to start somewhereto reach our goals. If we let ourselves dwell on the size and complexity ofcompilers and interpreters, we’ll never get started actually making one. Mybook makes you start, learning enough theory and practice to write your owninterpreter. Besides, you will learn to imrprove your arsenal of skills in termsof tokenizers, parsers, interpreters and compilers. We are not going to optimizeand focus on beautifying our code for style sake.

• Compilers and interpreters are such a deep topic. So, it is easy to get lost if youdo not define the topic set that you will study. I made this book with the intentof focusing on interpreters since I want you to learn how to build tokenizers,parsers and interpreters. I consciously did not incorporate compilers to keepthis book short, and the content concise. Those topics I include in this book areessential and when you learn them, you will be in a good shape to continueinto more elaborate topics.

• I will show you working examples in every single chapter. So, you don’t have

Page 10: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 6

to wait to comprehend through example or to produce working code. Gettinginstant results is important to staying motivated.

• The method that I will show you to parse the source code is referred to asrecursive descent parsing.Recursive descent parsing is the easiest and most intuitive method of parsing.One can claim that there are more effective methods than recursive descentparsing but don’t rush to believe it is so. There are many real world examplecompilers and interpreters that are built using recursive descent parsing andthey work quite effectively.One of the most famous example is the GNU Compiler Collection that waswritten from scratch using recursive descent parsing tecnique. You can find thedetails about it here: GCC 3.4 Release Series, Changes, New Features, and FixesCheck out the section that starts with “A hand-written recursive-descent”. Plus,every technique has its own advantages according to place that it is used. Ourtechnique -recursive descent parsing- is elegant enough for us.

• I am using Java since most modern programmers are familiar with it or asimilar syntax and structure. Most programmers are able to be get used toit quickly too. I consciously have not used C as it can be very difficult toread programs written in C. Plus, Java has features that will save us writingsome bare minimums and memory management early on. You will see Lispderivative languages in many places when you start to study compilers. Someprogrammers find it easy to parse and interpret things using list processing. Imay agree with them but there is a fact that, most programmers do not preferusing Lisp. Many of them never programmed in Lisp. It may be Common Lisp,Scheme or any similar in the functional programming world.We had interpreters and compilers without functional languages, like Lisp, andwe will have them in the future.Let’s be realistic, you can build working and smart interpreters in C, JAVA, C#,Ruby, Perl or in similar languages that are “Turing Complete”.

• I am not concerned with using object oriented programming techniques forthis book, as they would add complications and cloud the clarity of the lesson.As Joel Spolsky says, “Don’t Let Architecture Astronauts Scare You” Objectoriented programming is a useful paradigm, but it is not a must-have forevery project. Plus, imperative techniques are easier to understand and createdemonstrations in.

Page 11: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 1 - Motivation 7

Of course, I use classes and objects, but I am not over extending their use orforcing inheritance and polymorphism. Let’s remember, we are here to learn,not to develop perfectly to style constraints.

• The grammar of the programming language that I will write an interpreter forin this book is fixed.We’re not trying to parse every possible grammar for everypossible language. It will save us from digressing and also it will help focus us.Some may say that there is no point in an interpreter for a single grammar, butI pose that learning, parsing, and interpreting one language readies you to doso with more. With recursive descent parsing, you can write interpreters formany sorts of grammars. Once we’ve completed this interpreter, you will beable to extend it for more languages and grammars.

• I have written this book with the language I would use if I were to be thereteaching you these lessons in person.

• We need all type of programmers. It is an excellent reason for learning parsing,interpreters, and compilers. Nobody has to be an academic or an expert inthis field. Even if you were to never write a compiler or interpreter as part ofproduction code, it will help you become a better programmer through betterunderstanding. Big or small, better or best, any program, script or applicationthat improve daily life is useful. We are not seeking perfection, we are learning.

• Finally, I am not including any compiler tools in this book. There are somebooks that tell you how to construct a compiler and interpreter using yacc, lex,antlr, llvm or any one ofmany compiler construction kits. Try to search for howto construct an interpreter or ask on forums like Stackoverflow and the like. Allyou can get is advise that you should use this tool, that tool, this compiler kit,that compiler library. Instead of showing you full working examples, peopletend to point out deeper things; far from practical advice. Besides, one shouldlearn essentials first. Without essentials, the abstractions that tools provide youhide the important aspects that are worth your time to know. Plus, recursivedescent parser is most intuitive technique to implement among other parsingtechniques. So, you will craft your interpreter without having to learn anyother extra tools.

We are clear now. Let’s get started with a simple interpreter: A Calculator.

Page 12: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing andCalculating Expressions

8

Page 13: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 9

Parsing Arithmetic Expressions

The Magic in The Calculator

We all use calculators. they are in our computers, mobile phones, software that weuse or a handy one in our desk that some of use use every single day. Have you evernoticed that some calculators are a bit different than others? that they can understandand calculate things like this:

(3+7*8)-1

Normally, in an ordinary calculator you need to enter first 7*8 and then add 3 to itand then subtract 1 from that. I know some of them some memory functions thatyou can save some interim results but you need to enter the expression anyway.

For the above expression, the difference is that you can enter the entire expression atonce, then get the calculation result. Scientific calculators can handle full expressionsat a time. Plus, some of them compute expressions with variables like:

x = a * b + 2

In this expression, the first thing to calculate is multiplication, then addition, andfinally subtraction.

We call it “operator precedence.” It is a very important concept that can be seeneverywhere in arithmetic expressions. While simple calculators do arithmatic theway I explained above, the scientific calculators take the above expression in whole,then give you the result directly so that you don’t need to remember any operatorprecedence or interim values.

It is because those smart calculators are able to interpret what you enter. Congrat-ulations; you have met one of the most used interpreters.

In this chapter, I am going to teach you how to program an intelligent calculator likethose ones. With our calculator, we will enter any expresssion and it will give us theresult directly.

Page 14: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 10

Our method is iterative. That means we will start with simple steps and wewill advance gradually. Let’s start with addition and subtraction and then addmultiplication and division.

To program this, here is what we need:

• A method for getting (reading) numbers from an input string• A method that recognizes the operator and calculates the result according tothat operator

• A method presents the result

We are going to use one string, one integer and and one character as global variables.These are for respectively:

• expression (String): The string that contains our expression• currentCharPosition (int): The index number of the character that we arehandling from expression string

• Look (char): The character variable that holds the character that we read fromexpression, one by one

Let’s write our class and first method. We will also write a main method todemonstrate it. Our first class is for reading a character from experssion string.

Here we are:

class Calculator

{

public static String expression = "";

public static int currentCharPosition = 0;

public static char Look;

public static void GetChar()

{

if (currentCharPosition < expression.length())

Look = expression.charAt(currentCharPosition);

currentCharPosition++;

Page 15: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 11

}

public static void main(String args[])

{

expression = "1";

GetChar();

System.out.println(Look);

}

}

Lets compile and run our first program. Here is the result:

1

It works. We have got a working program that reads a single number from ourexpression.

I want to explain the only method in the class: GetChar. It reads a single characterfrom the string that we then put it to Look global variable. Next, it increments thecurrentCharPosition variable. So, as we read more chars using GetChar method, wewill be able to know whether or not there are more characters to read by comparingthe currentCharPosition with the expression string’s length.

Now, lets put an operator into our expression string and read one more characterfrom it. Here is our new main method:

public static void main(String args[])

{

expression = "1+";

GetChar();

System.out.println(Look);

GetChar();

System.out.println(Look);

}

Let’s compile and execute the program.

Page 16: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 12

1

+

Our program is able to read characters from the expression string one by one. Let’sadd one more number to the expression string and compile & run.

Here is the new main method:

public static void main(String args[])

{

expression = "1+2";

GetChar();

System.out.println(Look);

GetChar();

System.out.println(Look);

GetChar();

System.out.println(Look);

}

1

+

2

So far so good. We are able to read the characters in the expression. Now, it’s timeto convert them something meaningful. Let’s write a method that gets a digit fromthe expression and make a number from it. Here is our GetNum method:

public static int GetNum()

{

int number = new Integer(Look+"").intValue();

GetChar();

return number;

}

Let’s add it to Calculator class.

Page 17: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 13

All it does is taking the character that we stored into global Look variable andconverting it to an integer. Note that it calls GetChar to read next character in theexpression string and put it into Look variable.

Before using it, we need an initializer method that reads first digit in the expressionstring and put it to Look variable. Otherwise GetNum method gets a null value sincethere is nothing in the Look variable in the beginning. Here is our Init method, all itdoes is call GetChar:

public static void Init()

{

GetChar();

}

And here is the new main method:

public static void main(String args[])

{

int result = 0;

expression = "1+2";

System.out.println("Expression: " + expression);

Init();

int firstNumber = GetNum();

char operator = Look;

GetChar();

int secondNumber = GetNum();

System.out.println("First Number: " + firstNumber);

System.out.println("Operator: " + operator);

System.out.println("Second Number: " + secondNumber);

int sum = firstNumber + secondNumber;

System.out.println("SUM of Those Two Number: " + sum);

}

Update the program with these changes and then compile & run it.

Here is the output:

Page 18: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 14

Expression: 1+2

First Number: 1

Operator: +

Second Number: 2

SUM of Those Two Number: 3

We are able to read first number, then read addition operator and then secondnumber, and our calculator presents us the sum of those two numbers.

We can apply same technique for subtraction.

I think you’ve noticed a problem here: what if we need to calculate expressions thatincludes arbitrary amount of numbers and operators? For example:

expression = 1+2+5+7+3

We need a better approach.

I think you can guess that we need to use a loop since we have got more things toread and calculate. But then we will face with another need: in the loop, we needto check out if the operator is addition or subtraction. Then, when we decide to addmultiplication and division features to our calculator, yet again it needs to be checkedout multiplication and division operators in the loop.

There is a better approach, but before getting into that, let me tell you something im-portant that will help us a lot when calculating mathematical expressions: anythingand everything in nature has a structure and pattern. Look around carefully, you willsee that every single thing hides a structure and pattern inside itself.

Look petals on flower, scales on a fish, spirals on a snail’s shell, words and sentencesin the languages and like many other things then you will see patterns.

Mathematical expressions are no different. They obey a pattern. Actually, mathe-matics is how we model patterns we see in the nature, so we can calculate and makepredictions based on these models. For now, that is as far as we will go into the theorybehind pattern recognition.

So, our simple expressions have a pattern as well. Lets have a look this pattern:

Page 19: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 15

1+2

This simple expression tells us:

• First a number• Then an operator• Then a number again

Lets have a look subtraction

5-2

Pattern is same:

• First a number• Then an operator• Then a number again

There is a formula here. This formula is the same for both addition and subtraction.We can input any number to this formula; the numbers and operators may changebut the formula remains the same.

It gives us an idea as to how we can model mathematical expressions using thisformula.

Let’s refer to the first and second numbers as “Terms”. So, our formula becomes:

ArithmeticExpression = Term + Term

or

ArithmeticExpression = Term - Term

Lets express it in one single formula:

Page 20: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 16

ArithmeticExpression = Term +- Term

Developing a formula is good, but how are we going to program it? Let me helpyou by giving you this tip: Let’s write a method for each element in the formula.That means we will write methods for handling ArithmeticExpression, Term andoperators (+-).

Here are our new methods:

public static void MatchAndEat(char chr)

{

if (Look==chr) GetChar(); else

{

System.out.println("Error: Unexpected character.");

System.exit(0);

}

}

public static int Term()

{

return GetNum();

}

public static int Add()

{

MatchAndEat('+');

return Term();

}

public static int Subtract()

{

MatchAndEat('-');

return Term();

}

public static int ArithmeticExpression()

{

int result = Term();

switch(Look)

{

Page 21: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 17

case '+':

result = result + Add();

break;

case '-':

result = result - Subtract();

break;

}

return result;

}

public static void main(String args[])

{

expression = "1+9";

Init();

System.out.println(ArithmeticExpression());

}

Lets compile & runn it.

10

We input 1+9 and it returned 10. It has calculated correctly. Let’s try it with 5-2,change the expression string to

expression = "5-2";

and compile & run it:

3

The calculation is correct again. Before diving into the explaination, I want to explainthe MatchAndEat method. I choose this name because it preforms two importantfunctions:

• Checks if the global Look variable contains the expected character given as themethod parameter

Page 22: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 18

• If it matches, it reads the next character from the expression string, otherwiseit terminatesthe program

We will use this method to compare what is coming up in the string with what weexpect to see in the expression. This way, we will be able to check if the expressionhas the correct structure in accordance with our formula.

The technique in the MatchAndEat method is important. We will use it throughoutthe book for verifying and enforcing the formula and pattern that we define.

Now, let’s get back to our first working calculator. The progam code is very closeto formula, but there is a bit more to do. In the Expression method, we get the firstTerm, then we expect one of + or - operators. Depending on if the Operator is + or -will determine whether we will call Subtract or Add, then we get second Term.

Here is how the ArithmeticExpression() method works:

• Call Term method to get first Term• Check out global Look variable with switch statement• If it is +, call Add• If it is -, call Subtract• Return result

In the Add and Subtract methods: MatchAndEat() method checks if it is expectedoperator. If so, it takes to second Term of the expression.

Let me ask, what if we have an expression like this:

9+2+5-3

or

9-7+3

Page 23: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 19

You will remember that I proposed using loop to process more numbers than two,but then we digressed to find out a way that we can use a general formula for eitheraddition or subtraction.

The trick is, since we have got the formula for calculating expressions, we can clonethat formula in a loop, so we can calculate any amount of numbers.

Think of our formula as a black box that we use several times to calculate anexpression.

Where are we going to place this loop? I think we will get first number, that meanswe will place Term method first and then we will place a loop to execute addition orsubtraction according to upcoming operator.

I think it is good idea to put that loop before switch statement. So, after getting firstTerm, we can get arbitrary amount of Terms by calling Add() or Subtract() methodsin the loop.

Here is our new ArithmeticExpression method:

public static int ArithmeticExpression()

{

int result = Term();

while ( (Look == '+') || (Look == '-') )

{

switch(Look)

{

case '+':

result = result + Add();

break;

case '-':

result = result - Subtract();

break;

}

}

return result;

}

And here is our new main method:

Page 24: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 20

public static void main(String args[])

{

expression = "9+2+5-3";

Init();

int result = ArithmeticExpression();

System.out.println("Result: " + result);

}

Lets compile & run the program.

Result: 13

It works like a charm.

We can calculate any expression with addition or subtraction or both and with anyamount of numbers.

What about multiplication and division? Let’s recall the formula for addition andsubtraction:

ArithmeticExpression = Term +- Term

You will notice that all arithmetic expressions are based on this formula. The Termcan be a number, a variable, a function or something similar that you can calculateit first than add or subtract to or from another Term.

This feature makes us thinking that we can put a multiplication or division resultinto a Term. Here is an example:

ArithmeticExpression = 8 + 3

T = 8

T = 3

or

Page 25: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 21

T = 8 = 2 * 4

T = 3 = 6 / 2

So, as we did for ArithmeticExpression before, we can formulate the Term as well,like this:

Term = Number * Number

or

Term = Number / Number

Let’s make it short:

Term = Number */ Number

We can name numbers in the Term formula as “Factor”. So, our formula becomes

T = Factor */ Factor

Here is forumations for both ArithmeticExpression and Term:

ArithmeticExpression = Term +- Term

Term = Factor */ Factor

ArithmeticExpression comprises Terms, and Terms comprise Factors. That meansthe ArithmeticExpression method will include terms and +- operators, the Termmethod will include factors and */ operators, and the Factor method will consistof the GetNum method call that it takes a character from expression and put it intoLook.

You are probably wondering how our calculator will know that multiplicationand division have a higher order of precident than addition and subtraction sincewe never told the calculator this. In truth, our formula makes it guaranteed thatmultiplication and division operators have higher precedence than addition anddivision since the method call chain is

Page 26: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 22

ArithmeticExpression --> Term --> Factor

By using the right formula, we were able to write our code in a way that alreadyreflects the correct order of operations.

Here are our new methods, let’s preform these changes in our program, compile, andrun it.

Factor, Multiply and Divide methods:

public static int Factor()

{

return GetNum();

}

public static int Multiply()

{

MatchAndEat('*');

return Factor();

}

public static int Divide()

{

MatchAndEat('/');

return Factor();

}

New version of Term method that is with Factor:

public static int Term()

{

int result = Factor();

while ( (Look == '*') || (Look == '/') )

{

switch(Look)

{

case '*':

result = result * Multiply();

break;

case '/':

result = result / Divide();

Page 27: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 23

break;

}

}

return result;

}

New main method:

public static void main(String args[])

{

expression = "9*3-1+8*5-7";

System.out.println("Expression: " + expression);

Init();

int result = ArithmeticExpression();

System.out.println("Calculation Result: " + result);

}

The rest is the same. Add new methods to the Calculator.java, replace the Termmethod with the new one, compile, and run it. Here is the result:

Expression: 9*3-1+8*5-7

Calculation Result: 59

The result is correct. We have a calculator that we can ask it to calculate anyexpression that includes addition, subtraction, multiplication, division. Plus, we cancalculate expressions that include arbitrary amounts of numbers.

If we have a pattern/formula that we can apply it to the expressions that have thesame structure, then we can program the same kind of needs without much hassle.We are going to keep this attitude throughout this book. So, our programs will beeasy to write once we get the pattern.

Now, you should be confident to learn the basics of parsing and compiling jargon.As you see, we used methods/functions that call each other to calculate arithmeticexpressions. Also, reading characters and parsing them will be the same, we just calla couple of methods to read and check characters and branch out according to theresults.

Page 28: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 24

This method is recursive descent parsing. It is easiest and most intuitive parsingmethod. It is also quite well know and documented. Let me point out something im-portant: recursive descent parsing is NOT only for parsing and calculating arithmeticexpressions. You can use recursive descent parsing to parse source code. We will bedoing that in the next chapters, which is why I chose this tactic.

Before closing this chapter, let’s add support for paranthesis to our calculator. So, ourexpressions will be tidier and more handsome. To do this, lets re-think and re-defineour formula.

Here is what we have at the moment:

ArithmeticExpression = Term +- Term

Term = Factor */ Factor

As I told you that our terminal is factor, we can formulate it this way:

Factor = Number

So, how about re-formulating it this way:

Factor = Number or (ArithmeticExpression)

We can do that because every expression has a numerical result. So, the Factor caneither be a literal number or the result of an expression. The only thing that we needto support is surrounding the ArithmeticExpression with left and right paranthesisto separate it from a literal number.

We will change the Factor method according to this new formula, here we go:

Page 29: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 25

public static int Factor()

{

int result = 0;

if (Look == '(')

{

MatchAndEat('(');

result = ArithmeticExpression();

MatchAndEat(')');

} else result = GetNum();

return result;

}

Please replace the Factor method with this new versoin, compile, and run yourprogram with this expression:

(9*3-1+8)*5-7

Expression: (9*3-1+8)*5-7

Calculation Result: 163

Pretty nice that we were able to make such a huge improvement by simply surround-ing an Arithmetic Expression with parenthesis. In this expression, it calculates

9*3-1+8

first, then multiplies it by 5 and subtracts 7.

Let me explain how new Factor method works:

Notice that there are two calls to the MatchAndEat method. These are for checkingthe left and right paranthesis that covering expression, which is ArithmeticExpres-sion() method call.

Here another important point is, we have gotmethods that call each other recursively.It needs to be implemented this way since our formula has an ArithmeticExpressioninside of Factor, and Factors are in the ArithmeticExpression as well.

Page 30: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 26

This exemplifies another feature of recursive descent parsing that functions call eachother for parsing the program source. It is important concept in recursive descentparsing.

So far so good. We have got a full working calculator/arithmetic expression parserthat we programmed using recursive descent parsing method. I do recommend youstudy this code very well and improve it according to your taste and style. You canwrite a simple calculator that you get the expression string from keyboard or evenfrom a file. Also, feel free to try and tweak the presentation of the result.

I am stressing that you need to study this chapter and the programs we have written,because they are the very essential basis of what we are going to do in the nextchapters. Using the mehods that we learned in this chapter, you will be able to parseand interpret much more complex expressions, and soon, whole source code.

Here is the whole program so that you can copy-paste, compile, run, and study.

class Calculator

{

public static String expression = "";

public static int currentCharPosition = 0;

public static char Look;

public static void Init()

{

GetChar();

}

public static void GetChar()

{

if (currentCharPosition < expression.length())

Look = expression.charAt(currentCharPosition);

currentCharPosition++;

}

public static int GetNum()

{

int number = new Integer(Look+"").intValue();

GetChar();

return number;

}

Page 31: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 27

public static void MatchAndEat(char chr)

{

if (Look==chr) GetChar(); else

{

System.out.println("Error: Unexpected character.");

System.exit(0);

}

}

public static int Factor()

{

int result = 0;

if (Look == '(')

{

MatchAndEat('(');

result = ArithmeticExpression();

MatchAndEat(')');

} else result = GetNum();

return result;

}

public static int Multiply()

{

MatchAndEat('*');

return Factor();

}

public static int Divide()

{

MatchAndEat('/');

return Factor();

}

public static int Term()

{

int result = Factor();

while ( (Look == '*') || (Look == '/') )

{

switch(Look)

{

case '*':

result = result * Multiply();

break;

case '/':

result = result / Divide();

break;

}

Page 32: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 28

}

return result;

}

public static int Add()

{

MatchAndEat('+');

return Term();

}

public static int Subtract()

{

MatchAndEat('-');

return Term();

}

public static int ArithmeticExpression()

{

int result = Term();

while ( (Look == '+') || (Look == '-') )

{

switch(Look)

{

case '+':

result = result + Add();

break;

case '-':

result = result - Subtract();

break;

}

}

return result;

}

public static void main(String args[])

{

expression = "(9*3-1+8)*5-7";

System.out.println("Expression: " + expression);

Init();

int result = ArithmeticExpression();

System.out.println("Calculation Result: " + result);

}

}

Page 33: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 29

PacMan Eating Characters: The Tokenizer

In the last section, we used the methods GetChar and GetNum. The second one,GetNum, uses the GetChar method. There are also two variables, Look and cur-rentCharPosition, which accompany those methods.

In this chapter, we are going to talk about currentCharPosition and Look variables.We are going to see their importance in compiler and interpreter jargon. Knowingthe general jargon and definitions in any area makes your understanding of a topicmuch more thorough, whether from this book or other sources. Those sources areother compiler books, articles, compiler and interpreter source code, etc. that youwill read when you advance your study.

The Look variable is named as such as it functions as a look ahead character inparsing jargon. It is used to store the current character we are evaluating in theexpression. The look variable gets its new value when GetChar and its subsidiarymethods gets called. For example, GetNum retrieves the char from the Look variable,converts it a number to return, and then calls GetChar to advance the look aheadcharacter.

When method calls advance the look ahead character, another important thinghappens: the currentCharPosition variable needs to be increased at every callof GetChar. Otherwise we would be stuck reading the same character forever.currentCharPosition is a number that holds where we are: the number it holds isthe character index in the expression string. Here is an example:

expression = "5+6";

Here, when we call Init method, you know there is a call to GetChar in the Initmethod, then the Look variable becomes ‘5’ and currentCharPosition becomes 1. Wethen call GetNum, the Look variable becomes ‘+’ and currentCharPosition becomes2. It starts from 0 and goes untill 2 since 2 is the last index in the string.

Here, we take advantage of Java strings that we can read them as character arrays.So, this array starts from zero and goes to 2. So, currentCharPosition starts with 0and goes until 2.

The logic and themethod calls here is very important as it is fundamental structure ofall tokenizers. ATokenizer is a program that reads the source, splits it into collectionsof characters, called tokens and give those tokens to us as we decide to handle them.

Page 34: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 30

You are going to recall that wasn’t a tokenizer in the last chapter. There wasn’t aseparate one, but the program itself was a tokenizer that read the expression. Andthen, our program took some actions based on that character, if it is a number, thenfetch it; if it is an operator, then perform calculation according to that operator.

Tokenizers are not too intelligent themselves, they just get tokens and put them intoa data structure, like an array or linked list, and hand the collection over to parser.

The parser is the part of the program that takes tokens from a data structure, checksthem, and takes some action according to the token’s value. The actions that parsertakes can be directly interpreted, producing machine code or building another datastructure to give to another component of the program. The parser take those actionsbased on the rules of the programming language it was built for.

The advantage in separating the tokenizer and parser is important to recognize. If wedesign our interpreter this way, like separate black boxes, our program will be moreconsistent. This way, if tokenizer breaks, then we only need to check the tokenizerside, since its duty is only giving us the tokens that parser needs. Or similarly, ifthe parser breaks, then we only need to fix the parser. They can be tested separately,which makes your program more modular and saner to manage.

There are many other advantages that you will realize as we advance, but I won’tspoil them here.

Here, we can depict our program according to this new jargon:

(Program Source Code) ---> |TOKENIZER| ---> (Tokens) ---> |PARSER| ---> |INTERPRETER|

So, what good is knowing this? Let me answer that question by asking anotherimportant question: Our program calculates expressions that consists of numbersthat have only one digit, right? So, what if we want to use multi-digit numbers? Likethis one:

853+92*10-20/2+771

We need a technique that handles the numbers that are 853, 92, 10, 20, 2 and 771.

Lets meet use the token concept! These numbers are the tokens that our parser needsto handle as whole parts, not by single numbers.

Page 35: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 31

The solution is reading the characters one by one, merging them in a string, andputting it into a token until we see a character that signifies we need to stop. Forexample, if we read 853+92, then we need to read 8,5, and 3 before we see a + signand put 853 into a token, then we need to re-start reading to read 9 and 2 and put92 into another token. Digits are stored in the characters one by one. All we do isgetting them one by one and merging them in a string.

Lets get start to implement this plan.

We need to change our GetNum method. Here is the new one:

public static int GetNum()

{

int number = 0;

String tempStr = "";

if (!Character.isDigit(Look))

{

System.out.println("Error: Numbers expected."); System.exit(0);

}

while (Character.isDigit(Look))

{

tempStr = tempStr + Look;

GetChar();

}

number = new Integer(tempStr).intValue();

return number;

}

Now, change the old GetNum method to this new version. Then use this expressionstring:

expression = "(9*3-1+8)*5-70";

compile & run the program:

Expression: (9*3-1+8)*5-70

What happens here?

It just writes and waits. The program does not end. There must be an infinite loopsomewhere. I know what happens here, let me give you a tip to find out; change theexpression string by adding at the end of it a space like this:

Page 36: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 32

expression = "(9*3-1+8)*5-70 ";

Then compile-run the program.

Expression: (9*3-1+8)*5-70

Calculation Result: 100

It works. Interesting. All we have done is just added a space character to the endof the expression string. That means, there must be a character at the end of theexpression string. Why? We only changed GetNum method, so the problem must bethere.

Change the expression string to previous one to reproduce the problem again:

expression = "(9*3-1+8)*5-70";

It is the one without space at the end. The program gets into an infinite loop withthis expression.

Lets go to GetNum method and have a look the while loop there:

while (Character.isDigit(Look))

{

tempStr = tempStr + Look;

GetChar();

}

Lets put these two lines into the while loop to write the value in the Look variablein every single step of while loop. The sleep call is only for slowing down the outputso we can read it more clearly. This is a poor man’s debugger that you can use forsmall programs. If you have a debugger you’re used to using, that is fine too.

Page 37: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 33

while (Character.isDigit(Look))

{

System.out.println(Look);

try {Thread.sleep(300);} catch (Exception e) {}

tempStr = tempStr + Look;

GetChar();

}

And run the program. Here we go:

Expression:(9*3-1+8)*5-70

9

3

1

8

5

7

0

0

0

0

0

0

0

...

There is a zero at the end of expression string and the program writes it infinitemanner. It does not stop unless we break it with CTRL-C. There is no result line thatwe normally see when program ends.

The zero is the last digit that it reads in the expression string, and it does not advanceanymore. That digit remains in the Look variable.

The problem is, while loop become infinite since it needs to see a characterdifferent than digit in the Look variable.

This while loop ends if Look gets something different than digit type of characterlike space, new line or anything else. Even a letter is fine. Try this one for example:

Page 38: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 34

expression = "(9*3-1+8)*5-70A";

Expression:(9*3-1+8)*5-70A

9

3

1

8

5

7

0

Calculation Result: 100

It works. The last line written is Calculation Result: 100. It is the correct result.

So, the reason for the infinite loop earlier was because there was no terminatingcharacter to tell us we were at the end of the expression. Since the while conditionmust see a different character, we’ll use a space character for our examples.

We know the reason and the fix now. I prefer doing this in code rather than in everyexpression:

expression += " ";

Here is our new main method:

public static void main(String args[])

{

expression = "(9*3-1+8)*5-70";

expression += " ";

System.out.println("Expression:" + expression);

Init();

int result = ArithmeticExpression();

System.out.println("Calculation Result: " + result);

}

So, it adds a space to end of expression string every time and it is fixed.

Page 39: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 35

I pointed out this small thing because it is very important. While writing a tokenizer,parser, interpreter or compiler, even a single character can break everything. Youneed to think through this sort of case.

Since we fixed the issue, we can remove our debugger from the GetNum method.

We are able to handle multi-digit numbers. So, lets try this one:

expression = "853+92*10-20/2+771";

2534

It gives us the correct result. We successfully added the multi-digit number handlingfeature to our calculator without much effort and without changing the coremethods like Factor, Term and Expression. Did you notice that we only changethe tokenizer code, not the parser? The parser methods are Factor, Term andExpression. We have not touched them.

So far so good. We are progressing well.

Now, lets write a pretty printer for our calculator, so our output is more detailed, andeasier to debug. Like this:

Number..: 853

Operator: ADD

Number..: 92

Operator: MULTIPLY

Number..: 10

Operator: SUBTRACT

Number..: 20

Operator: DIVISION

Number..: 2

Operator: ADD

Number..: 771

You have got 6 different number and 5 operators.

Calculation result is 2534.

The first option is, in the program code, I can print things as I come across them.But it really ends up being a messy solution that makes the program crowded and

Page 40: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 36

difficult to maintain. For only having a pretty print or something similar, I wouldn’twant to complicate the program. As our program grows, it becomes more and moredifficult to read, understand and maintain if we keep adding that way.

I have a second option. after each Getnum call, I can store the number into a datastructure. I would also need to do same thing for all the operators that I come across.Then, after having my array list filled, I can print it in any format I choose. I don’tthink that it is an effective solution since it’s similar to the first option, and sharesit’s pitfalls.

There is another option, which is my favorite; like we did in GetNum, we can getoperators as though they are separate entities. Say for example, we name themoperator tokens and number tokens.

The byproduct of this approach is that once the two types of tokens are separatedout, we can perform computation on them in a simpler way.

Our intention was having a more structured printing facility, but I claim that we canuse this new structure in our parser method (Factor, Term, and Expression).

Let me explain it using an analogy. Remember, we were calculating single digitnumbers including expressions by checking out Look variable and by advancingcurrentCharPosition as we read the characters in the expression string. In thatexpression string, our single smallest element was character.

Then, we changed it to make our single smallest elements to numbers.

Now, our smallest elements are tokens instead of numbers. Tokens are java stringsthat includes whole multi-digit numbers or operators.

Our calculator will get those tokens and pass them to parsing methods– Factor, Term,and Expression. The calculation part of the program does not need know that thenumber is single digit or multi-digit. All it knows is its type. So, according to its typeit will add, subtract, multiply and divide according to the upcoming operator.

That explaination should be clear enough to introduce you to the tokenizer.

We need a new class: Token. Its objects will be for storing a token’s value and type.Here it is:

Page 41: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 37

class Token

{

public String text;

public String type;

public Token(String text, String type)

{

this.text = text;

this.type = type;

}

public String toString()

{

return "Text: " + text + " Type: " + type;

}

}

We need a new global variable for storing token index. It is like currentCharPositionthat we keep track of the current characters index in the expression. Now, current-TokenPositionwill keep track of the index of current token in the java array list. weare going to advance currentTokenPosition as we get a new token from java arraylist.

Here it is currentTokenPosition:

public static int currentTokenPosition = 0;

We will have three methods that we split the expression into tokens.

• IsOp• FindOpType• Tokenize• Isop is for to check out if a character is an operator like + - / *• FindOpType is to find out what type of operator given character is• Tokenize is to split up the expression into tokens and to store them into anarray list

Here is complete program with those methods:

Page 42: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 38

import java.util.List;

import java.util.ArrayList;

class Tokenizer

{

public String expression = "";

public int currentCharPosition = 0;

public char Look;

public void Init()

{

GetChar();

}

public void GetChar()

{

if (currentCharPosition < expression.length())

Look = expression.charAt(currentCharPosition);

currentCharPosition++;

}

public boolean IsOp(char chr)

{

return (chr == '+') || (chr == '-') ||

(chr == '*') || (chr == '/') ||

(chr == '(') || (chr == ')');

}

public String FindOpType(char chr)

{

String type = "UNKNOWN";

switch(chr)

{

case '+':

type = "ADD";

break;

case '-':

type = "SUBTRACT";

break;

case '*':

type = "MULTIPLY";

break;

Page 43: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 39

case '/':

type = "DIVIDE";

break;

case '(':

type = "LEFT_PAREN";

break;

case ')':

type = "RIGHT_PAREN";

break;

}

return type;

}

public List<Token> Tokenize(String source)

{

List<Token> tokens = new ArrayList<Token>();

String token = "";

String state = "DEFAULT";

for (int index = 0; index < source.length(); index++)

{

char chr = source.charAt(index);

switch(state)

{

case "DEFAULT":

String opType = FindOpType(chr);

if (IsOp(chr))

{

tokens.add(new Token(Character.toString(chr), opType));

}

else if (Character.isDigit(chr))

{

token += chr;

state = "NUMBER";

}

break;

case "NUMBER":

if (Character.isDigit(chr))

{

token += chr;

}

else

Page 44: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 40

{

tokens.add(new Token(token, "NUMBER"));

token = "";

state = "DEFAULT";

index--;

}

break;

}

}

return tokens;

}

public static void main(String args[])

{

String expression = "219+341+19";

expression += " ";

Tokenizer tokenizer = new Tokenizer();

List<Token> tokens = tokenizer.Tokenize(expression);

for (Token token: tokens)

{

System.out.println("Type:" + token.type + " Text: " + token.text);

}

}

}

I have written its main method to test our tokenizer. Lets compile & run this program:

Type: NUMBER Text: 219

Type: ADD Text: +

Type: NUMBER Text: 341

Type: ADD Text: +

Type: NUMBER Text: 19

Nice. We have got a pretty printer. Plus, we have the expression elements in an arraylist. We can use this array list and the tokens inside it for calculating the expression.

Before getting to the calculation part, I want give you more proper version ofthe pretty print method. Lets add below PrettyPrint method to our program inTokenizer.java. Also replace the main method with below one.

Page 45: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 41

public void PrettyPrint(List<Token> tokens)

{

int numberCount = 0;

int opCount = 0;

for (Token token: tokens)

{

if (token.type.equals("NUMBER"))

{

System.out.println("Number....: " + token.text);

numberCount++;

}

else

{

System.out.println("Operator..: " + token.type);

opCount++;

}

}

System.out.println("You have got "+ numberCount +

" different number and " + opCount

+" operators.");

}

public static void main(String args[])

{

String expression = "219+341+19";

expression += " ";

Tokenizer tokenizer = new Tokenizer();

List<Token> tokens = tokenizer.Tokenize(expression);

System.out.println("--------------");

tokenizer.PrettyPrint(tokens);

}

Lets compile & run. Here is the output:

Page 46: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 42

--------------

Number....: 219

Operator..: ADD

Number....: 341

Operator..: ADD

Number....: 19

You have got 3 different number and 2 operators.

It really is a pretty printout of our expression.

As I said before, once we have got a tokenizer, we can do many cool things withit. Yet another example: I have 40-50 complex arithmetic expressions and I ampreparing a report that summarizes them, showing numbers greater than 100 andonly multiplication operators. The report would be presented in HTML format and Iwill present numbers and operators using different colors. I assume that you have asolution in your mind as you read. Tokenizer allow you to use a lot of different toolsto customize how your program will function. Just widen your horizon.

Before calculating expressions with the tokenizer we have written, I want to talkabout the tokenizer. In some books, you will see descriptions like a tokenizer is a statemachine that parses regular expressions. A State Machine exists only in explicitfinite positions, such as “I am reading a number”, “I am reading an operator”, “I amtransitioning from an operator to a number”, or finally, “I am transitioning from anumber to an operator.” Regular Expressions we will go into more detail on later, butfor now suffice it to say that they are a way for describing all expressions a languagecontains. We developed one when defining Terms, Factors, and Expressions.

There is a string variable in the Tokenize method of Tokenizer class above. Its nameis state:

String state = "DEFAULT";

We use this variable to keep state of the tokenizer. We have got only two states:default and number. Our state machine works this way:

Its default state isDEFAULT that it takes the operators like + - * / there in the defaultstate. If it come across any number, it gets into the NUMBER state and it stays thereuntil there are no numbers coming out. How does it understand there is no morenumbers? It checks if it is a number in every step of the loop, if there are no more

Page 47: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 43

numbers, it puts the all numbers it collected into the new Token object and getsthe tokenizer to the DEFAULT state to make it continue getting the next upcomingtokens from the source string. This process continues until nomore characters remainin the source string.

As I said, we have two states at the moment: default and number. I think you canguess that we will have more states for strings, language keywords, etc. This way,we will recognize and tokenize source string elements. Then, we will give it to ourparser to produce the data structure that the interpreter needs.

State machines are very useful to tokenize strings that follow regular structure.Regular means: There is either one or more characters of same type, like onecharacter, or couple of characters, that make the expression, or one or more digitto comprise a number. Here, regularity is non nested structure. So there will be norecursive nested code blocks or a natural language sentences.

For that reason, this type of structures is named as regular expressions. Almostevery programming language has one or more regular expression parsing facilitiesthat you use in your daily programming activities to parse, say for example, a url,input text, etc.

For parsing the structures like recursive nested code blocks or a natural languagesentences, the state machine is not enough, it needs different parsing techniques,like revursive descent parsing.

We will learn how to parse programming language source using recursive descentparsing in later chapters.

Alright, we have printed our expression elements (tokens) pretty. Now, it is timeto calculate our expression. All we need to do is change our calculation methods sothat they will use tokens instead of characters. Also, we are not going to use GetNummethod now that we have tokens in the array list as separate tokens. For getting thattoken’s value, all we need to do is convert the token text to an integer.

The logical change in the calculation method is that we will check token type insteadof checking the Look variable value. That means character checking will be tokenchecking.

Here is the new version of the ArithmeticExpression() method:

Page 48: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 44

public static int ArithmeticExpression()

{

int result = Term();

while ( CurrentToken().type.equals("ADD") ||

CurrentToken().type.equals("SUBTRACT") )

{

switch(CurrentToken().type)

{

case "ADD":

result = result + Add();

break;

case "SUBTRACT":

result = result - Subtract();

break;

}

}

return result;

}

Term is similar. I am going to give you full source code soon, I want to explain someimportant points first.

The CurrentToken() method we use here is to get the token that we are on at themoment. It is similar to Look variable’s duty that it holds the current token. Now,instead of using Look variable and checking its value, we use GetCurrentToken()method.

Here it is source code:

public Token CurrentToken()

{

return GetToken(0);

}

We use the GetToken() method the get one token at a time from the array list herewe store the tokens. For example if you give it 0 as parameter, it gets current tokenaccording to currentTokenPosition value.

The other methods are:

• EatToken(int offset)

Page 49: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 45

• MatchAndEat(String type)

I want to explain them one by one.

EatToken(int offset): It just eats the token(s) given in the offset. Eating means, itincreases the currentTokenPosition value, so index of the array list shifts that it meansit skips the eaten tokens in the array list. It is similar to technique that we appliedusing the GetChar and MatchAndEat methods as well as the Look variable. GetCharwas getting the next character from expression string, put it into Look variable,then increase the currentCharPosition value. MatchAndEat was calling GetChar toforward the stream of characters.

MatchAndEat(String type): It gets the current token and checks it out if its type isgiven type. If so, it eats one token since it is the token we expect. By eating it, it givesthe parser the opportunity to continue. If its type is not the expected one, it just stopsthe program since there is something in the expression breaking the rules.

MatchAndEat method is very important, and one of the essentials in the book. Wewill use it to check out tokens and make a decision according to MatchAndEat’sresult. The decision I mention is branching. The greatest example of this behaviorwould be the use of an if/else block in Java.

The logic is the same, the only difference is we use tokens instead of characters. Therest is same in terms of increasing the index, checking out the token we have anddeciding according to its contents.

You will notice that our calculation methods have very few changes. Here is the fullsource code, please save, compile, and execute it.

Token.java:

Page 50: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 46

class Token

{

public String text;

public String type;

public Token(String text, String type)

{

this.text = text;

this.type = type;

}

public String toString()

{

return "Text: " + text + " Type: " + type;

}

}

Tokenizer.java:

import java.util.List;

import java.util.ArrayList;

class Tokenizer

{

public String expression = "";

public int currentCharPosition = 0;

public char Look;

public void Init()

{

GetChar();

}

public void GetChar()

{

if (currentCharPosition < expression.length())

Look = expression.charAt(currentCharPosition);

currentCharPosition++;

}

public boolean IsOp(char chr)

{

Page 51: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 47

return (chr == '+') || (chr == '-') ||

(chr == '*') || (chr == '/') ||

(chr == '(') || (chr == ')');

}

public String FindOpType(char chr)

{

String type = "UNKNOWN";

switch(chr)

{

case '+':

type = "ADD";

break;

case '-':

type = "SUBTRACT";

break;

case '*':

type = "MULTIPLY";

break;

case '/':

type = "DIVIDE";

break;

case '(':

type = "LEFT_PAREN";

break;

case ')':

type = "RIGHT_PAREN";

break;

}

return type;

}

public List<Token> Tokenize(String source)

{

List<Token> tokens = new ArrayList<Token>();

String token = "";

String state = "DEFAULT";

for (int index = 0; index < source.length(); index++)

{

char chr = source.charAt(index);

switch(state)

{

Page 52: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 48

case "DEFAULT":

String opType = FindOpType(chr);

if (IsOp(chr))

{

tokens.add(new Token(Character.toString(chr), opType));

}

else if (Character.isDigit(chr))

{

token += chr;

state = "NUMBER";

}

break;

case "NUMBER":

if (Character.isDigit(chr))

{

token += chr;

}

else

{

tokens.add(new Token(token, "NUMBER"));

token = "";

state = "DEFAULT";

index--;

}

break;

}

}

return tokens;

}

public void PrettyPrint(List<Token> tokens)

{

int numberCount = 0;

int opCount = 0;

for (Token token: tokens)

{

if (token.type.equals("NUMBER"))

{

System.out.println("Number....: " + token.text);

numberCount++;

}

else

Page 53: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 49

{

System.out.println("Operator..: " + token.type);

opCount++;

}

}

System.out.println("You have got "+ numberCount +

" different number and " + opCount

+" operators.");

}

public static void main(String args[])

{

String expression = "219+341+19";

expression += " ";

Tokenizer tokenizer = new Tokenizer();

List<Token> tokens = tokenizer.Tokenize(expression);

System.out.println("--------------");

tokenizer.PrettyPrint(tokens);

}

}

Calculator.java:

import java.util.List;

import java.util.ArrayList;

class Calculator

{

public int currentTokenPosition = 0;

public List<Token> tokens;

//~~~~Token Manipulation Methods Start~~~~

public Token GetToken(int offset)

{

if (currentTokenPosition + offset >= tokens.size())

{

return new Token("", "NO_TOKEN");

}

return tokens.get(currentTokenPosition + offset);

Page 54: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 50

}

public Token CurrentToken()

{

return GetToken(0);

}

// Just eats the token(s) given in the offset

public void EatToken(int offset)

{

currentTokenPosition = currentTokenPosition + offset;

}

// Eats the token given type and returns eaten token

public Token MatchAndEat(String type)

{

Token token = CurrentToken();

if (!CurrentToken().type.equals(type))

{

System.out.println("Saw " + token.type +

" but " + type +

" expected");

System.exit(0);

}

EatToken(1);

return token;

}

//~~~~Token Manipulation Methods End~~~~

public int Multiply()

{

MatchAndEat("MULTIPLY");

return Factor();

}

public int Divide()

{

MatchAndEat("DIVIDE");

return Factor();

}

public int Add()

{

MatchAndEat("ADD");

return Term();

}

public int Subtract()

{

Page 55: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 51

MatchAndEat("SUBTRACT");

return Term();

}

public int Factor()

{

int result = 0;

if (CurrentToken().type.equals("LEFT_PAREN"))

{

MatchAndEat("LEFT_PAREN");

result = ArithmeticExpression();

MatchAndEat("RIGHT_PAREN");

}

else if (CurrentToken().type.equals("NUMBER"))

{

result = new Integer(CurrentToken().text).intValue();

MatchAndEat("NUMBER");

}

return result;

}

public int Term()

{

int result = Factor();

while ( CurrentToken().type.equals("MULTIPLY") ||

CurrentToken().type.equals("DIVIDE") )

{

switch(CurrentToken().type)

{

case "MULTIPLY":

result = result * Multiply();

break;

case "DIVIDE":

result = result / Divide();

break;

}

}

return result;

}

public int ArithmeticExpression()

{

int result = Term();

while ( CurrentToken().type.equals("ADD") ||

CurrentToken().type.equals("SUBTRACT") )

Page 56: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 52

{

switch(CurrentToken().type)

{

case "ADD":

result = result + Add();

break;

case "SUBTRACT":

result = result - Subtract();

break;

}

}

return result;

}

public void PrettyPrint(List<Token> tokens)

{

int numberCount = 0;

int opCount = 0;

for (Token token: tokens)

{

if (token.type.equals("NUMBER"))

{

System.out.println("Number....: " + token.text);

numberCount++;

}

else

{

System.out.println("Operator..: " + token.type);

opCount++;

}

}

System.out.println("You have got " + numberCount +

" different number and " + opCount + " operators.");

}

public static void main(String args[])

{

String expression = "((853+92*5)*10-20/2+771)";

expression += " ";

Calculator calc = new Calculator();

Tokenizer tokenizer = new Tokenizer();

System.out.println("Expression: " + expression);

System.out.println("--------------------------");

calc.tokens = tokenizer.Tokenize(expression);

Page 57: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 53

calc.PrettyPrint(calc.tokens);

System.out.println("--------------------------");

System.out.println("Expression Result: " + calc.ArithmeticExpression());

}

}

Here is the results when I run it:

Expression: ((853+92*5)*10-20/2+771)

--------------------------

Operator..: LEFT_PAREN

Operator..: LEFT_PAREN

Number....: 853

Operator..: ADD

Number....: 92

Operator..: MULTIPLY

Number....: 5

Operator..: RIGHT_PAREN

Operator..: MULTIPLY

Number....: 10

Operator..: SUBTRACT

Number....: 20

Operator..: DIVIDE

Number....: 2

Operator..: ADD

Number....: 771

Operator..: RIGHT_PAREN

You have got 7 different number and 10 operators.

--------------------------

Expression Result: 13891

Such nice output! We have got the numbers and operators printed pretty and we haveexpression result.

Have you noticed that we have got two token types other than numbers andoperators? LEFT_PAREN and RIGHT_PAREN.

Our recognition method IsOp can identify them. FindOpType performs a lookup aswell. These two functions are referred to as recognizer methods.

Page 58: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 54

It is another nice feature of our tokenizer and calculator. We are able to recognizeand pretty print parenthesis as well as calculate the expressions that are withinparenthesis!

Before closing this chapter I want to talk about the MatchAndEat method again. Youknow, this is the method that we use to match and then eat tokens. Here its sourcecode:

public Token MatchAndEat(String type)

{

Token token = CurrentToken();

if (!CurrentToken().type.equals(type))

{

System.out.println("Saw " + token.type +

" but " + type +

" expected");

System.exit(0);

}

EatToken(1);

return token;

}

In some books, similar sources, or even in some source code of interpreters you willsee, that our equivalent is consume, match, or something similar. I deliberatelynamed it “Match and Eat” because it matches and then eats/consumes the token.Using onlyMatch can be confusing, because its duty is not only matching, but eatingthe next token as well. You will see only Match in some other books or interpreters,but when you have a look, it does both. That being said, Match is very popular incompiler theory, and you will see it a lot while studying, so I wanted to make thatconnection here.

Alright. We have a tokenizer as of this chapter, plus, we are able to calculatearithmetic expressions by using this tokenizer. The tokenizer and calculator methodsare in same program for now. We are going to separate them in the next chapters.

Also, in the next chapter, besides having separate parser for our calculator, we aregoing to add a boolean expression calculation feature. You may wonder what thegood of calculating boolean expressions is, so let me explain briefly. Being able to

Page 59: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 55

calculating boolean expressions will allow us to use them in language constructs likeif-else cases, while loops and similar constructs that need any type of comparison.

Our calculator is becoming a programming language interpreter. Keep on reading!

Better Token Handling

Before getting into details of parsing arithmetic and boolean expressions together, Iwant to change something in the token handling.

As you all rememeber, we were using the String type for the token type, in the Tokenclass.

Besides, we were taking advantage of the switch-case structure that it supportsStrings.

Even so, I am sure you noticed this approach and wondered “Couldn’t that be moreeffective if we use a number based typing for our token type?”

Yes, it may bemore effective to use numbers.We can use integer or something similar,but, the problem with using these types for this sort of need is that they are lessreadable and are disorganised. You would need to name every type in advance andassign that particular type a number value, a considerable overhead for a little speed.

Luckily, we have another option which is far more elegant for our use case: the Enumtype.

Java supports the Enum (Enumeration) type to give us the best of both worlds. Now,we will go back and change the type of TokenType and update all the related sectionsaccording to this change. We are doing this now, so that we are not worried abouthaving to do a massive change in the next chapters.

First thing we will change is Token class, here its new version:

Page 60: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 56

class Token

{

public final String text;

public final TokenType type;

public Token(String text, TokenType type)

{

this.text = text;

this.type = type;

}

public String toString()

{

return "Text: " + text + " Type: " + type;

}

}

Save it to Token.java.

And here is the TokenType enum that we will use for token types:

public enum TokenType

{

NUMBER, NEWLINE, OPERATOR, EOF, UNKNOWN,

ADD, SUBTRACT, MULTIPLY, DIVIDE, LEFT_PAREN, RIGHT_PAREN

}

We will keep the Enums in a separate java file. Save this into TokenType.java.

Remember that we used a variable to keep the tokenizer state, and it too was String.We will change it to an Enum as well. Here it is:

enum TokenizeState

{

DEFAULT, OPERATOR, NUMBER

}

Save this file as TokenizeState.java. Afterwards, compile all three files. Now, weneed to change all places in the Tokenizer and the Calculator that we used Stringcomparasion. You know, we were using equals for comparing tokens since the tokenwas a String. But now, we will use == since the token type is Enum.

Page 61: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 57

Since we have updated Tokenizer.java and Calculator.java, I will reproduce thecomplete source files below.

Tokenizer.java:

import java.util.List;

import java.util.ArrayList;

class Tokenizer

{

public boolean IsOp(char chr)

{

boolean aritOp = chr == '+' || chr == '-' ||

chr == '*' || chr == '/';

return aritOp;

}

public TokenType FindOpType(char firstOperator)

{

TokenType type = TokenType.UNKNOWN;

switch(firstOperator)

{

case '+':

type = TokenType.ADD;

break;

case '-':

type = TokenType.SUBTRACT;

break;

case '*':

type = TokenType.MULTIPLY;

break;

case '/':

type = TokenType.DIVIDE;

break;

}

return type;

}

public boolean IsParen(char chr)

{

boolean prntOp = chr == '(' || chr == ')';

return prntOp;

Page 62: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 58

}

public TokenType FindParenType(char chr)

{

TokenType type = TokenType.UNKNOWN;

switch(chr)

{

case '(':

type = TokenType.LEFT_PAREN;

break;

case ')':

type = TokenType.RIGHT_PAREN;

break;

}

return type;

}

public List<Token> Tokenize(String source)

{

List<Token> tokens = new ArrayList<Token>();

String token = "";

TokenizeState state = TokenizeState.DEFAULT;

for (int i = 0; i < source.length(); i++)

{

char chr = source.charAt(i);

switch(state)

{

case DEFAULT:

TokenType opType = FindOpType(chr);

if (IsOp(chr))

{

tokens.add( new Token(Character.toString(chr), opType) );

}

else if (IsParen(chr))

{

TokenType parenType = FindParenType(chr);

tokens.add(new Token(Character.toString(chr), parenType));

}

else if (Character.isDigit(chr))

{

token += chr;

state = TokenizeState.NUMBER;

Page 63: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 59

}

break;

case NUMBER:

if (Character.isDigit(chr))

{

token += chr;

}

else

{

tokens.add(new Token(token, TokenType.NUMBER));

token = "";

state = TokenizeState.DEFAULT;

i--;

}

break;

}

}

return tokens;

}

}

Calculator.java:

import java.util.List;

import java.util.ArrayList;

class Calculator

{

public int currentTokenPosition = 0;

public List<Token> tokens;

//~~~~Token Manipulation Methods Start~~~~

public Token GetToken(int offset)

{

if (currentTokenPosition + offset >= tokens.size())

{

return new Token("", TokenType.EOF);

}

return tokens.get(currentTokenPosition + offset);

}

public Token CurrentToken()

Page 64: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 60

{

return GetToken(0);

}

public Token NextToken()

{

return GetToken(1);

}

public void EatToken(int offset)

{

currentTokenPosition = currentTokenPosition + offset;

}

public Token MatchAndEat(TokenType type)

{

Token token = CurrentToken();

if (CurrentToken().type != type)

System.out.println("Saw " + token.type + " but " + type + " expected");

EatToken(1);

return token;

}

//~~~~Token Manipulation Methods End~~~~

public int Multiply()

{

MatchAndEat(TokenType.MULTIPLY);

return Factor();

}

public int Divide()

{

MatchAndEat(TokenType.DIVIDE);

return Factor();

}

public int Add()

{

MatchAndEat(TokenType.ADD);

return Term();

Page 65: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 61

}

public int Subtract()

{

MatchAndEat(TokenType.SUBTRACT);

return Term();

}

public int Factor()

{

int result = 0;

if (CurrentToken().type == TokenType.LEFT_PAREN)

{

MatchAndEat(TokenType.LEFT_PAREN);

result = ArithmeticExpression();

MatchAndEat(TokenType.RIGHT_PAREN);

}

else if (CurrentToken().type == TokenType.NUMBER)

{

result = new Integer(CurrentToken().text).intValue();

MatchAndEat(TokenType.NUMBER);

}

return result;

}

public int Term()

{

int result = Factor();

while (CurrentToken().type == TokenType.MULTIPLY ||

CurrentToken().type == TokenType.DIVIDE)

{

switch(CurrentToken().type)

{

case MULTIPLY:

result = result * Multiply();

break;

case DIVIDE:

result = result / Divide();

break;

}

}

return result;

Page 66: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 62

}

public int ArithmeticExpression()

{

int result = Term();

while (CurrentToken().type == TokenType.ADD ||

CurrentToken().type == TokenType.SUBTRACT)

{

switch(CurrentToken().type)

{

case ADD:

result = result + Add();

break;

case SUBTRACT:

result = result - Subtract();

break;

}

}

return result;

}

public void PrettyPrint(List<Token> tokens)

{

int numberCount = 0;

int opCount = 0;

for (Token token: tokens)

{

if (token.type == TokenType.NUMBER)

{

System.out.println("Number....: " + token.text);

numberCount++;

}

else

{

System.out.println("Operator..: " + token.type);

opCount++;

}

}

System.out.println("You have got "+

numberCount +

" different number and "

+ opCount +

" operators.");

Page 67: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 63

}

public static void main(String args[])

{

String expression = "(600+3+2)*5";

expression += " ";

Calculator calc = new Calculator();

Tokenizer tokenizer = new Tokenizer();

System.out.println("Expression: " + expression);

System.out.println("--------------------------");

calc.tokens = tokenizer.Tokenize(expression);

calc.PrettyPrint(calc.tokens);

System.out.println("--------------------------");

int result = calc.ArithmeticExpression();

System.out.println( result );

}

}

Save and compile them. And run Calculator.java, you should be seeing this resultsame as me:

Expression: (600+3+2)*5

--------------------------

Operator..: LEFT_PAREN

Number....: 600

Operator..: ADD

Number....: 3

Operator..: ADD

Number....: 2

Operator..: RIGHT_PAREN

Operator..: MULTIPLY

Number....: 5

You have got 4 different number and 5 operators.

--------------------------

3025

Let me explain why I introduced you to a string based tokenizer first before givingyou this new version. I wanted to ensure that the transition from characters to tokens

Page 68: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 2 - Parsing and Calculating Expressions 64

was as smooth as possible, and the most common way to deal with collections ofcharacters in Java is with Strings.

I did not jump directly to the Enums since they would seem like hiding complexityin the details. In the end, what we do is compare values and we used characters andStrings in the beginning.

And now, instead of comparing Strings, we realize that we can use and so compareEnum types.

We have an Enum based tokenizer as of this section. We can continue our work onparsing arithmetic and boolean expressions.

Page 69: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - ParsingProgramming LanguageConstructs

65

Page 70: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 66

Statements and Code Blocks

In the last chapter, I told you that I am going to show you how to parse and interpretcode blocks, while loops and if-else constructs.

First we are going to learn how to parse code blocks in this chapter.

Lets write a sample script first. We are going to have our interpreter run it:

println 1

wait 2000

println 2

This simple script consists of just three commands:

• println 1• wait 2000• println 2

Let’s refer to these commands as “statements.”

Now, I want you to recall the last section. We learned and demonstrated how to putexpressions into a linked list and evaluate those expressions by traversing the linkedlist.

Letme ask, what is the difference between statements and expressions?We are able tostore both of them in linked lists, and call an evaluation function on each element. Inthe end, the linked list does not have to knowwhat we store inside each node, be theyexpressions or statements. It only needs to know the object type of the expressionsand the statements, nothing more.

So, if we use the same container object for statements as we used for expressionsthen we can store statements in the same linked list.

And once the statements are in the linked list, evaluating them follows the samepattern we used for expressions. We will traverse the linked list and invoke the evalmethod of the statement objects.

Page 71: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 67

This is the logic behind our interpreter. The only remaining part is how we will parsestatements and store them in the linked list. Another term we will use for this linkedlist that we will use to store statements is Abstract Syntax Tree.

We learned how to parse expressions from the previous chapter. We will usesame techniques and structure to parse statements. We will use the match and eattechnique again and create node objects for the statements before finally puttingthem into our Abstract Syntax Tree.

Writing out the grammar we will model can help to plan how we will parse ourlanguage. Here is the grammar for a statement:

<statement> ::= <print> | <println> | <wait>

For now, we only have three statements: print, println and wait.

Let’s get the demo program from the previous section. I’ve reproduced it here:

public static void main(String args[])

{

Tokenizer tokenizer = new Tokenizer();

Calculator calc = new Calculator();

List<String> expresssionList = new LinkedList<String>();

// Remember, we use a trailing white space as a terminating character!

expresssionList.add("(100*2+2)*2+5>=500 ");

expresssionList.add("((5+1)*100-2+3) ");

expresssionList.add("100-30/2+13>=10 ");

expresssionList.add("(853+92*5)*10-20/2+771 ");

expresssionList.add("(5)*2 ");

List<Node> commandList = new LinkedList<Node>();

for (String expression: expresssionList)

{

System.out.println("Expression: "+ expression);

calc.currentTokenPosition = 0;

calc.tokens = tokenizer.Tokenize(expression);

Node node = calc.BooleanExpression();

if (node != null)

{

Page 72: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 68

commandList.add(node);

}

}

System.out.println("");

System.out.println("Now, lets calculate expressions...");

for (Node command: commandList)

System.out.println("Expression Result: " + command.eval());

}

In this program, there are a few expressions. We first put their node objects into alinked list. In this first linked list, there are node objects. They are not evaluated, theyjust are objects in the Abstract Syntax Tree.

Once we get that node objects into the linked list named commandList, we traversecommandList, get the node objects one by one, and execute their eval methods.Whenthe traversal ends, all expressions will have been evaluated.

We will do same thing for statements. To do that, we need a way to encapsulatestatements in node objects as we did for expressions so that we can put them intoAbstract Syntax Tree.

Here is the Node subclasses for the print, println and wait statements:

PrintNode:

public class PrintNode extends Node

{

public Node expression;

public String type;

public PrintNode() {}

public PrintNode(Node expression, String type)

{

this.expression = expression;

this.type = type;

}

public Object eval()

{

Object writee = expression.eval();

Page 73: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 69

if (type.equals("sameline")) System.out.print(writee);

else if (type.equals("newline")) System.out.println(writee);

return writee;

}

}

WaitNode:

public class WaitNode extends Node

{

public Node interval;

public WaitNode() {}

public WaitNode(Node interval)

{

this.interval = interval;

}

public Object eval()

{

Integer waitAmount = (Integer) interval.eval();

try

{

Thread.sleep(waitAmount.intValue());

}

catch (Exception e)

{

System.out.println("Error in WaitNode.eval() method" + e);

}

return waitAmount;

}

}

Please note that the PrintNode class is for both print and println statements. We useit for both by using a second parameter in its constructor. If the “type” parameter is“sameline”, it is a print statement, if the type is “newline” then it is a println statement.We will use only println in this section but print statement will be necessary forupcoming sections.

Page 74: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 70

These classes adhere to our standard of being able to stored in the Abstract SyntaxTree like expression nodes:

• They are Node subclasses• They have eval methods

I want to demonstrate their usage now. Please take the below example and save itinto the StatementsDemo.java:

import java.util.List;

import java.util.LinkedList;

public class StatementsDemo

{

public static void main(String args[])

{

Node firstMsg = new PrintNode(new NumberNode(1), "newline");

Node secondMsg = new PrintNode(new NumberNode(2), "newline");

Node wait = new WaitNode(new NumberNode(new Integer(2000)));

List<Node> script = new LinkedList<Node>();

script.add(firstMsg);

script.add(wait);

script.add(secondMsg);

for (Node statement:script)

statement.eval();

}

}

Compile and run this class. What do you see? It writes 1 and waits two seconds andthen writes 2.

What it does is execute our three statements:

• A print statement, its parameter is 1• A wait statement, its parameter is 2000

Page 75: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 71

• A print statement, its parameter is 2

Encapsulating the statements in the Node type classes is really useful. This way, wecan put these statements into the Abstract Syntaxt Tree.

But we need something more to be able to parse them. At the moment, as you see inabove example, we put the statements into the Abstract Syntax Tree manually, butinterpreters are for automating that process. We can’t know all scripts ahead of time,parse them in our minds and then create statement objects one by one manually.That’s what interpreters are for.

Our scripts need to be read from a file and fed into the interpreter.

Here is our script again:

println 1

wait 2000

println 2

Save this script into a file named script.con

To parse this script, we need a method like Expression, ArithmeticExpression,BooleanExpression, Term, Factor, etc. that we used for parsing expressions.

Before writing that method, I want to do some tidying up first. So far in the book,we’ve used Calculator.java that was designed to calculate arithmetic and booleanexpressions. All parsing methods are presently in Calculator.java.

I think it’s time we changed that from a calculator to an actual parser that we canuse more generally.

Time to back up all source files from the previous section and rename Calcu-lator.java to Parser.java. We are going to add all parsing methods to Parser.javamoving forward. Calculator.java will no longer be referenced.

After renaming it to the Parser.java, lets add a constructor and a method for gettingtokens of the parser.

Here is the constructor methods:

Page 76: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 72

public Parser() {}

public Parser(List<Token> tokens)

{

this.tokens = tokens;

}

And here is the getTokens() method:

public List<Token> getTokens()

{

return tokens;

}

Please add them to the Parser.java at the top after the currentTokenPosition andtokens variables declarations.

After this tidying up, let’s write a new method named Statement():

public Node Statement()

{

Node node = null;

TokenType type = CurrentToken().type;

if (type == TokenType.PRINT)

{

MatchAndEat(TokenType.PRINT);

node = new PrintNode(Expression(), "sameline");

}

else if (type == TokenType.PRINTLN)

{

MatchAndEat(TokenType.PRINTLN);

node = new PrintNode(Expression(), "newline");

}

else if (type == TokenType.WAIT)

{

MatchAndEat(TokenType.WAIT);

node = new WaitNode(Expression());

}

else

{

Page 77: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 73

Util.Writeln("Unknown language construct: "

+ CurrentToken().text);

System.exit(0);

}

return node;

}

Add this method to Parser.java.

I am going to explain how this method works soon, but first I want to run the simplescript from the beginning of this lesson using this method to show it works!

We need amethod to read the source from file, then give the file to the tokenizer. Aftergetting the tokens from the tokenizer, our parser will take over and parse statementsone by one. It will put node objects in the Abstract Syntax Tree. The rest is justevaluating the statements in the Abstract Syntax Tree as we’ve done before.

I have collected all of these steps into a new class: Interpreter.java:

import java.io.BufferedReader;

import java.io.FileInputStream;

import java.io.IOException;

import java.io.InputStreamReader;

import java.io.Reader;

import java.io.FileNotFoundException;

import java.nio.charset.Charset;

import java.util.List;

import java.util.LinkedList;

public class Interpreter

{

public static void main(String args[])

{

boolean debug = false;

if (args.length < 1) { Util.Writeln("Usage: Demo <script>"); return; }

if (args.length > 1) { if (args[1].equals("debug")) debug = true; }

Interpreter interpreter = new Interpreter();

String sourceCode = interpreter.ReadFile(args[0]);

interpreter.Interpret(sourceCode, debug);

Page 78: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 74

}

public void Interpret(String source, boolean debug)

{

Tokenizer tokenizer = new Tokenizer();

Parser parser = new Parser(tokenizer.Tokenize(source));

if (debug) DumpTokens(parser);

List<Node> script = new LinkedList<Node>();

script.add(parser.Statement());

script.add(parser.Statement());

script.add(parser.Statement());

for (Node statement:script)

statement.eval();

}

public void DumpTokens(Parser parser)

{

for (Token token: parser.getTokens())

Util.Writeln("Type: " + token.type + " Text: " + token.text+" ");

Util.Writeln();

}

private String ReadFile(String path)

{

FileInputStream stream = null;

InputStreamReader input = null;

try

{

stream = new FileInputStream(path);

input = new InputStreamReader(stream, Charset.defaultCharset());

Reader reader = new BufferedReader(input);

StringBuilder builder = new StringBuilder();

char[] buffer = new char[8192];

int read;

while ((read = reader.read(buffer, 0, buffer.length)) > 0)

{

builder.append(buffer, 0, read);

}

//Keep the space in the end

Page 79: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 75

builder.append(" ");

return builder.toString();

}

catch (FileNotFoundException e)

{

String errMsg = "FILE NOT FOUND. ";

String sourceInfo = "Error in Interpreter.java->"

+ "ReadFile(String path) method. ";

Util.Writeln(sourceInfo + errMsg);

System.exit(0);

}

catch (IOException e)

{

String errMsg = "Error while reading the script. ";

String sourceInfo = "Error in Interpreter.java->"

+ "ReadFile(String path) method. ";

Util.Writeln(sourceInfo + errMsg);

System.exit(0);

}

catch (Exception e)

{

String errMsg = "Error while reading the script. ";

String sourceInfo = "Error in Interpreter.java->"

+ "ReadFile(String path) method. ";

Util.Writeln(sourceInfo + errMsg + e);

System.exit(0);

}

finally

{

try

{

input.close();

stream.close();

}

catch (Exception e)

{

String errMsg = "Error while closing a stream or a stream reader. ";

String sourceInfo = "Error in Interpreter.java->"

+ "ReadFile(String path) method. ";

Util.Writeln(sourceInfo + errMsg + e);

System.exit(0);

}

}

Page 80: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 76

return null;

}

}

The Interpreter class has three methods apart from the main method:

• ReadFile• Interpret• DumpTokens

ReadFile reads the script source code and returns the source code as a string.TheInterpret method creates a new Tokenizer instance and new Parser instance with thescript tokens in the constructor.

These parts tokenize the source and creates a Parser object with those tokens.

Then the second part of the Interpret method takes over:

if (debug) DumpTokens(parser);

List<Node> script = new LinkedList<Node>();

script.add(parser.Statement());

script.add(parser.Statement());

script.add(parser.Statement());

for (Node statement:script)

statement.eval();

It calls the DumpTokens if the debug parameter is true. Debug will be true if we addthe “debug” keyword to the command line parameters while we are running “javaInterpeter” command to run it.

The DumpTokens method is for debugging. It prints all the tokens in the script.

After that, we call the Statement() method three times since we have three differentstatements. The Statement() method parses the current statement at every call andreturns a node object for that particular statement. They get added to Abstract SyntaxTree named script.

Page 81: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 77

With that, we have completed the parsing part.

It is not enough to just run scripts. There must be an update in the tokenizer as wellsince we have new token types: PRINT, PRINTLN and WAIT and KEYWORD.

We need to add them to TokenType.java; here its new version:

public enum TokenType

{

NUMBER, NEWLINE, OPERATOR, EOF, UNKNOWN,

ADD, SUBTRACT, MULTIPLY, DIVIDE, LEFT_PAREN,

RIGHT_PAREN, LESS, GREATER, EQUAL, NOTEQUAL,

LESSEQUAL, GREATEREQUAL, ASSIGNMENT, NOT, OR, AND,

KEYWORD, PRINT, PRINTLN, WAIT, END

}

MAIN, END, PRINT, PRINTLN, WAIT explains themselves. We use the KEYWORDtype for a new state in the tokenizer. Lets remember the states our tokenizer can bein so far:

• DEFAULT• OPERATOR• NUMBER

And now, we have a new one: KEYWORD

Let’s add it to TokenizeState first; here is the new version of the TokenizeState enum:

enum TokenizeState

{

DEFAULT, OPERATOR, NUMBER, KEYWORD

}

Let me explain the KEYWORD tokenizer state type. When the tokenizer sees a tokenother than a number or operator in the source code of our scripts, it will treat it asthough it is a language keyword. It needs to continue in a new state to get whole ofthat keyword. Here is the control we will add just below the parenthesis control inThe Tokenizer.java:

Page 82: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 78

else if (Character.isLetter(chr))

{

tokenText += chr;

state = TokenizeState.KEYWORD;

}

It checks if chr is a letter. If so, it adds it to the tokenText variable and get the tokenizerto the KEYWORD state. So, we need a case for KEYWORD state, here it is:

case KEYWORD:

if (Character.isLetterOrDigit(chr))

{

tokenText += chr;

}

else

{

TokenType type = FindStatementType(tokenText);

tokens.add(new Token(tokenText, type));

tokenText = "";

state = TokenizeState.DEFAULT;

index--;

}

break;

You can add it below NUMBER state case. Here in this state case, we have used anew method: FindStatementType(String str) It finds the token type of given string:

public TokenType FindStatementType(String str)

{

TokenType type = TokenType.UNKNOWN;

switch(str)

{

case "print":

type = TokenType.PRINT;

break;

case "println":

type = TokenType.PRINTLN;

break;

case "wait":

type = TokenType.WAIT;

Page 83: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 79

break;

default:

type = TokenType.KEYWORD;

}

return type;

}

Do these changes in the Tokenizer.java and save it. Now, we are ready to run our testscript. Compile all files with this command:

javac *.java command and then run this command:

java Interpreter script.con

Here is the output:

1

2

It writes 1 and waits two seconds and writes 2. Same as the earlier output with theStatementsDemo class.

We have not added these statements in the script manually, though. We have justgiven the source code to the parser. It took care of the rest by parsing statements sothat the interpreter could evaluate them.

I can explain the Statement() method now, recall:

public Node Statement()

{

Node node = null;

TokenType type = CurrentToken().type;

if (type == TokenType.PRINT)

{

MatchAndEat(TokenType.PRINT);

node = new PrintNode(Expression(), "sameline");

}

if (type == TokenType.PRINTLN)

Page 84: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 80

{

MatchAndEat(TokenType.PRINTLN);

node = new PrintNode(Expression(), "newline");

}

else if (type == TokenType.WAIT)

{

MatchAndEat(TokenType.WAIT);

node = new WaitNode(Expression());

}

else

{

Util.Writeln("Unknown language construct: "

+ CurrentToken().text);

System.exit(0);

}

return node;

}

Here is what it does:

• It gets the current token and checks if it matches any of the statements• It matches-eats print keyword if the statement is print and create a PrintNodeobject. Its parameter is obtained by calling the Expression() method. You mayremember that we parse expressions with the Expression() method, and anumber is an expression

• It matches-eats the println keyword if the statement is println and creates aPrintNode object. Its parameter is obtained by calling the Expression() method.You will remember that we parsed expressions with the Expression() method,and a number is an expression

• It matches-eats wait keyword if the statement is wait and create a WaitNodeobject. Its parameter is taken by calling Expression() method. You will remem-ber that we parse expressions with Expression() method and a number is anexpression

• And finally, if the current token does not match any statement keyword, itexits the program

• If none of these cases are triggered, it is necessary to exit beacuse the keywordis unknown.

Page 85: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 81

So far so good. We can parse statements and have them interpreted, but there is aslight problem. Let me show what it is:

public void Interpret(String source, boolean debug)

{

Tokenizer tokenizer = new Tokenizer();

Parser parser = new Parser(tokenizer.Tokenize(source));

if (debug) DumpTokens(parser);

List<Node> script = new LinkedList<Node>();

script.add(parser.Statement());

script.add(parser.Statement());

script.add(parser.Statement());

for (Node statement:script)

statement.eval();

}

This is the Interpret method of the Interpreter.java. The part we parse the statementsis:

script.add(parser.Statement());

script.add(parser.Statement());

script.add(parser.Statement());

It only parses three different statements. Since not all the scripts have three state-ments, we need to adjust that. They can have arbitrary amount of statements.

The solution is, we can get all statements at once using a loop. Lets add this methodto Parser.java:

Page 86: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 82

public List Block()

{

List<Node> statements = new LinkedList<Node>();

while ( CurrentToken().type != TokenType.END)

{

statements.add(Statement());

}

MatchAndEat(TokenType.END);

return statements;

}

When we call this method, it starts a loop and continue it till the current tokenbecomes “END”. We use END keyword to understand that we reach the end of thescript. This way, we can put any amount of statements before the END keyword, weare not limited to three statements or something like this.

This way, we automatize statement parsing. It calls the Statement() method in thewhile loop and so in every step of the loop, a statement gets parsed.

Let’s perform the necessary updates in the tokenizer for our next new keyword END.First, add END keyword to TokenType.java. Then update the FindStatementType()method in the Tokenizer.java, here its new version:

public TokenType FindStatementType(String str)

{

TokenType type = TokenType.UNKNOWN;

switch(str)

{

case "end":

type = TokenType.END;

break;

case "print":

type = TokenType.PRINT;

break;

case "println":

type = TokenType.PRINTLN;

break;

case "wait":

type = TokenType.WAIT;

break;

default:

Page 87: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 83

type = TokenType.KEYWORD;

}

return type;

}

We can update the Interpret method of Interpreter.java now:

public void Interpret(String source, boolean debug)

{

Tokenizer tokenizer = new Tokenizer();

Parser parser = new Parser(tokenizer.Tokenize(source));

if (debug) DumpTokens(parser);

List<Node> script = parser.Block();

for (Node statement:script)

statement.eval();

}

We need to update the script.con as well since we need to put the END keyword atthe end of every script:

println 1

wait 2000

println 2

end

Compile all files with javac *.java command and run this command:

java Interpreter script.con

Here is the output:

1

2

Page 88: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 84

It is the same output as above. But there is a huge difference in terms of techniquesbetween the two implementations.

We are able to parse arbitrary amount of statements, this is a huge achievement.

We can refer to this “arbitrary amount of statements” as a “statement block” or “codeblock”. Since code blocks, is more common in programming languages, we’ll usethat. Here is the grammar for code blocks:

<block> ::= ( <statement> )+

It reflects to the Block() method we have written above. In a code block, there areone or more statements.

Before closing this chapter, I want to add a bit of syntactic sugar to our parser. Wehave added END keyword to our parser to be able to understand that we are at theend of the script. There are statements before the END keyword. But it seems a bitunbalanced. It is good idea to put a starter keyword to the beginning of all scripts:

script

println 1

wait 2000

println 2

end

It is more tidy. Save the script.con this way that there is script keyword at thebeginning.

As we have done for END keyword, we will update the tokenizer for script keyword.Add SCRIPT token type to TokenType.java, here its new version:

Page 89: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 85

public enum TokenType

{

NUMBER, NEWLINE, OPERATOR, EOF, UNKNOWN,

ADD, SUBTRACT, MULTIPLY, DIVIDE, LEFT_PAREN,

RIGHT_PAREN, LESS, GREATER, EQUAL, NOTEQUAL,

LESSEQUAL, GREATEREQUAL, ASSIGNMENT, NOT, OR, AND,

KEYWORD, PRINT, PRINTLN, WAIT, END, SCRIPT

}

And then update the FindStatementType()method in the Tokenizer.java, by adding

case "script":

type = TokenType.END;

break;

case. Here its new version:

public TokenType FindStatementType(String str)

{

TokenType type = TokenType.UNKNOWN;

switch(str)

{

case "script":

type = TokenType.SCRIPT;

break;

case "end":

type = TokenType.END;

break;

case "print":

type = TokenType.PRINT;

break;

case "println":

type = TokenType.PRINTLN;

break;

case "wait":

type = TokenType.WAIT;

break;

default:

type = TokenType.KEYWORD;

}

return type;

}

Page 90: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 86

Finally, we need to update the Interpret() method in Interpreter.java. We also needto match and eat the script keyword before calling the Block() method:

Here is the new version of the Interprer() method:

public void Interpret(String source, boolean debug)

{

Tokenizer tokenizer = new Tokenizer();

Parser parser = new Parser(tokenizer.Tokenize(source));

if (debug) DumpTokens(parser);

parser.MatchAndEat(TokenType.SCRIPT);

List<Node> script = parser.Block();

for (Node statement:script)

statement.eval();

}

Compile and run the script.con as we have done above. Here is the output:

1

2

It is the same output as before.

We have got the script keyword at the beginning and the end keyword at the end ofall scripts as syntactic sugar. Later, you’ll see why this can be so useful for visualizingcode blocks.

As of this chapter, we are able to parse arbitrary amount of statements, referred toas code blocks. We will use it in the upcoming chapters to parse while loops body,if-else cases and more! Keep on reading!

Page 91: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 87

While Loop

We have learned how to parse code blocks and variables in the previous sections.

Code blocks are necessary for parsing the body section of while loops. Variables arenecessary because they are used in the conditional section of while loops.

Lets write a simple script first:

script

count = 1

while count <= 5

println "Hello World!"

wait 500

count = count + 1

end

end

Save it as while-demo.con.

The first question I expect you to ask is How can I evaluate println, wait and thecount = count + 1 statements in the while loop body?We can’t evaluate them theway we have done in the previous sections. Since they don’t belong to whole script,they get evaluated only if the while condition evaluates to true.

Additionally, the code within the body of the while loop should have no impact onother parts of the script. We need to ensure that, otherwise very odd behavior canmanifest without warning.

The key thing here is, the block belongs to the while loop.

This type of code blocks must be treated as like separate scripts inside the main script.They have an owner like a while loop.

Lets remember what we used to evaluate a whole script in the previous section:

• We parse statements using the Block() method that parses and puts nodeobjects into a linked list

• We traverse this linked list and call the eval method of the statements in eachstep of the traversing loop

Page 92: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 88

But this time, we can not put the statements into our list. Because while loop is yetanother statement as well and it has two parts:

• Condition• Body

It is not like print, println and wait statements because its body is a code block itself.To be able to handle a while loop with its condition and body, we need to encapsulateboth condition and body.

For encapsulating a while loop body, we need a new class for encapsulating codeblocks. We can use the BlockNode class for that:

import java.util.List;

import java.util.LinkedList;

public class BlockNode extends Node

{

private List<Node> statements;

public BlockNode(List<Node> statements)

{

this.statements = statements;

}

public Object eval()

{

Object ret = null;

for (Node statement : statements)

{

ret = statement.eval();

}

return ret;

}

public Node get(int index)

{

return statements.get(index);

}

Page 93: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 89

protected List<Node> getStatements()

{

return statements;

}

public String toString()

{

String str = "";

for (Node statement: statements)

str = str + statement + "\n";

return str;

}

}

Its member variable statements is for storing the statements that belong to aparticular code block. Once we get them in this variable, we can evaluate them bycalling the eval() method of this class on that code block object.

Remember, we use Block() method for parsing code blocks. It returns linked list atthe moment. We need to update it so that it returns a BlockNode object. Here its newversion:

public BlockNode Block()

{

List<Node> statements = new LinkedList<Node>();

while ( CurrentToken().type != TokenType.END)

{

statements.add(Statement());

}

MatchAndEat(TokenType.END);

return new BlockNode(statements);

}

Add this method to Parser.java.

This method parses any code block and returns a BlockNode object.

Lets update the Interpret method of Interpreter.java to use new version of the Block()method. Here is the new version:

Page 94: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 90

public void Interpret(String source, boolean debug)

{

Tokenizer tokenizer = new Tokenizer();

Parser parser = new Parser(tokenizer.Tokenize(source));

if (debug) DumpTokens(parser);

parser.MatchAndEat(TokenType.SCRIPT);

Node script = parser.Block();

script.eval();

}

The difference is, we do not need this loop below:

for (Node statement:script)

statement.eval();

to traverse the Abstract Syntax Tree. We removed this code snippet and putscript.eval() now that we have a BlockNode object that encapsulates all statementsof the script. The only thing to do next is just call the eval() method on this object.It evaluates all statements one by one by traversing the linked list that it holds, fullof the scripts statements.

Now we can get back to parsing while loops.

else if (IsWhile())

{

node = While();

}

Add this control to Statement() method in Parser.java just below the assignmentcontrol.

The recognizer method that we use is IsWhile(), add this method to Parser.java aswell.

Page 95: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 91

public boolean IsWhile()

{

return CurrentToken().type == TokenType.WHILE;

}

The While() method we used above in the control is for parsing the whole whileloop:

public Node While()

{

Node condition, body;

MatchAndEat(TokenType.WHILE);

condition = Expression();

body = Block();

return new WhileNode(condition, body);

}

It first matches and eats thewhile keyword. Then it gets the while loop condition bycalling the Expression() method that it does parse the boolean expressions. A notethere: while loop conditions are a boolean expression.

Then, it calls the Block() method to parse while loop body. Our Block() method isable to parse any code block and get that code block statements into a BlockNodeobject. So, we use it for while loop body as well.

Then, finally, we need to put this condition and body into an object so that we canstore it in the Abstract Syntax Tree.

An interesting takeaway from this technique is that we are creating scope for anyobject that holds a Block() call. So not only does the entire script get contained ina Block, but while loop bodies are contained in Blocks as well. If you’re thinkingforward, then you may also be considering using Blocks for if-else body sections aswell.

It is a very good example to the structure of the programming languages that theyare nested. A while loop can include an if case and then that particular if case caninclude a while loop. All of their bodies are code blocks and we use Block method toparse that code blocks.

Here is the WileNode class that we use for encapsulating the while loops:

Page 96: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 92

import java.util.List;

import java.util.LinkedList;

public class WhileNode extends Node

{

public Node condition;

public Node body;

public WhileNode() {}

public WhileNode(Node condition, Node body)

{

this.condition = condition;

this.body = body;

}

public Object eval()

{

Object ret = null;

while ( ((Boolean) condition.eval()).booleanValue() )

{

ret = body.eval();

}

return ret;

}

}

We need this class to put WhileNode objects to the Abstract Syntax Tree.

The last part we need to update is tokenizer. Because we have a new keyword: while.

Add WHILE token type to TokenType.java. Then update the FindStatementType()method in the Tokenizer.java. Here its new version:

Page 97: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 93

public TokenType FindStatementType(String str)

{

TokenType type = TokenType.UNKNOWN;

switch(str)

{

case "script":

type = TokenType.SCRIPT;

break;

case "end":

type = TokenType.END;

break;

case "while":

type = TokenType.WHILE;

break;

case "print":

type = TokenType.PRINT;

break;

case "println":

type = TokenType.PRINTLN;

break;

case "wait":

type = TokenType.WAIT;

break;

default:

type = TokenType.KEYWORD;

}

return type;

}

Now, it is able to recognize the while keyword. Compile all files and run thiscommand:

java Interpreter while-demo.con

Here is the output:

Page 98: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 94

Hello World!

Hello World!

Hello World!

Hello World!

Hello World!

Congratulations. You can loop!

script count = 1 while count <= 5 println “Hello World!” wait 500 count = count + 1end end

The script starts with assingning 1 to count variable. Then it loops in the while looptill the count variable gets to 6 that we increase it in each step of the loop. It printsHello World! in each step of the loop and waits 0.5 seconds after printing.

Please take a second to notice that count’s final value is 6 because we used <= insteadof strictly <.

We have completed parsing and interpreting while loops. Now it is time to add if-elsecases to our interpreter. Keep on reading!

Page 99: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 95

Arrays

We have implemented variable parsing in the last section. Arrays function similarlyto variables.

The difference is that they hold multiple values, instead of just one. The differentvalues can be accessed by their index in the array.

Parsing arrays is based on variables. We can say, once you add index support tovariables, they become arrays.

As usual, I am going to start by defining the structure of what we are parsing.

Arrays will have two functionalities, definition and access

Example array definitions:

fruits = ["Apple", "Orange"]

orders = [1234, 1235]

The first one is an array of strings, the second one is an array of numbers.

As stated earlier, we access elements of an array by their index:

myFavoriteFruit = fruits[0]

fruits[1] = "Banana"

yourlastOrder = orders[index]

println orders[index-1]

Page 100: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 96

updateRecord(recordList[5])

By looking at the structutures above, we are able to conclude that we will need tosupport two kinds of parsing for arrays:

• Parsing array definition• Parsing array access

And here is the grammar for array definition:

<array> ::= (<left_bracket> (<number> | <string> | <variable>)

[<comma> (<number> | <string> | <variable>)]* <right_bracket>)

With our grammar defined, we can get started parsing array defintion.

Array definition is really just a form of assignment, so we will handle it inthe assignment section of the parser. Remember that we are handling variableassignments in the Statement() method using below control:

if (IsAssignment())

{

node = Assignment();

}

Weare going to update it to allow it to parse an array definition aswell.Whatwe needto do is put an appropriate code snippet into the Assignment() method. Here is thenew version of the Assignment() method, remove the previous one from Parser.javaand add this version:

Page 101: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 97

public Node Assignment()

{

Node node = null;

String name = MatchAndEat(TokenType.KEYWORD).text;

MatchAndEat(TokenType.ASSIGNMENT);

if (CurrentToken().type == TokenType.LEFT_BRACKET)

{

node = ArrayDefinition(name);

}

else

{

Node value = Expression();

node = new AssignmentNode(name, value, this);

}

return node;

}

What we added is an extra control to understand if the assignment is an arraydefinition. Here is what we added:

if (CurrentToken().type == TokenType.LEFT_BRACKET)

{

node = ArrayDefinition(name);

}

It checks if there is left bracket after the equals sign, and if so, it creates the arraydefinition by calling the ArrayDefinition() method. If it does not find a left bracketafter the equals sign, we know that we’re dealing with a variable.

Here it is the ArrayDefinition method, invoked above:

Page 102: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 98

public Node ArrayDefinition(String name)

{

List<Node> elements = new ArrayList<Node>();

MatchAndEat(TokenType.LEFT_BRACKET);

if (CurrentToken().type != TokenType.RIGHT_BRACKET)

{

elements.add(Expression());

while (CurrentToken().type != TokenType.RIGHT_BRACKET)

{

MatchAndEat(TokenType.COMMA);

elements.add(Expression());

}

}

MatchAndEat(TokenType.RIGHT_BRACKET);

return new AssignmentNode(name, new ArrayNode(elements), this);

}

Add this method to Parser.java.

In this method, we eat left bracket in the beginning and get first element of the arrayusing the Expression()method. Then we get the array elements in the loop the sameway. We keep array elements in an array list to allow for dynamic expansion andconstant speed index access. The loop ends when it sees a right bracket. Afterwards,we create a new AssignmentNode object for this array. It’s worth noting that everyarray definition is an assignment as well. In the constructor of AssignmentNode,you will notice that there is an ArrayNode object creation.

The ArrayNode object is for encapsulating array definitions. Here its source code:

import java.util.ArrayList;

import java.util.List;

public class ArrayNode extends Node

{

private List<Node> elements;

public ArrayNode(List<Node> elements)

{

this.elements = elements;

}

Page 103: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 99

public Object eval()

{

List<Object> items = new ArrayList<Object>(elements.size());

for (Node node : elements)

{

items.add(node.eval());

}

return items;

}

}

Besides the obvious changes above, we need to also update the tokenizer since wehave used new token types in our Assignment() and ArrayDefinition() methods.

Let’s add these token types to the TokenType enum: COMMA, LEFT_BRACKETand RIGHT_BRACKET.

Then, we need to update the IsParen and FindParenType methods of Tokenizer.java.Here is their new version:

public boolean IsParen(char chr)

{

boolean prntOp = chr == '(' || chr == ')';

boolean brktOp = chr == '[' || chr == ']';

boolean puncOp = chr == ',';

return prntOp || brktOp || puncOp;

}

public TokenType FindParenType(char chr)

{

TokenType type = TokenType.UNKNOWN;

switch(chr)

{

case '(':

type = TokenType.LEFT_PAREN;

break;

case ')':

type = TokenType.RIGHT_PAREN;

break;

case '[':

type = TokenType.LEFT_BRACKET;

Page 104: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 100

break;

case ']':

type = TokenType.RIGHT_BRACKET;

break;

case ',':

type = TokenType.COMMA;

break;

}

return type;

}

Don’t forget to delete previous versions and replace them with these new ones in theTokenizer.java rather than simply adding them to the file. After these updates, ourtokenizer is able to recognize brackets.

Also we need two new methods to recognize the COMMA token type:

The first one is IsPunc() to recognize if the token is a comma.

public boolean IsPunc(char chr)

{

boolean puncOp = chr == ',';

return puncOp;

}

Second, FindPuncType() to find out the token type if it is a comma:

public TokenType FindPuncType(char firstOperator)

{

TokenType type = TokenType.UNKNOWN;

switch(firstOperator)

{

case ',':

type = TokenType.COMMA;

break;

}

return type;

}

Add these methods to Tokenizer.java.

Page 105: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 101

We only support one punctuation type at the moment. But we designed these meth-ods thinking that we can eventually expand our tokenizer to recognize additionalpunctuation types like dot, colon, semicolon, etc.

After adding these two methods, we need to update the Tokenize() method of theTokenizer.java to make it able to tokenize punctuation types. Add below control toTokenize() method in the Tokenizer.java:

else if (IsPunc(chr))

{

TokenType puncType = FindPuncType(chr);

tokens.add(new Token(Character.toString(chr), puncType));

}

It must be in the DEFAULT case of the switch control for parsing..

Before compiling all of our files with the new additions, remember to add importjava.util.ArrayList to Parser.java since we have started using array lists.

Now, compile all files with the javac *.java command. There shouldn’t be any errors.

We have completed parsing array definition. We need to implement array accessnow. Let’s recall the different array access examples we saw in the beginning of thissection:

myFavoriteFruit = fruits[0]

fruits[1] = "Banana"

yourlastOrder = orders[index]

println orders[index-1]

Page 106: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 102

updateRecord(recordList[5])

Arrays are accessed using their indices. This is the first thing we need to parse.Secondly, we need to parse assignment when there is an array accessed on the leftor right side of the equals sign.

As I said before, arrays are like variables, only difference is indices. So, we can treatthem like variables. This gives us a clue: we are going to update theVariable()methodfor handling array access. Remember, we are parsing variables using the Variable()method that is called in Factor().

This clue brings us some simplicity: since the Factor() method is a part of a chainof expression parsing, once we get array access working, we can use array elementsanywhere that we use expressions.

A quick example:

result = sum[0] + sum[1] + sum[2]

See that? The right side is an expression. And whole line is an assignment.

So, after having the necessary logic, it is new version of the Variable() method:

public Node Variable()

{

Token token = MatchAndEat(TokenType.KEYWORD);

Node varNode = new VariableNode(token.text, this);

// Handle array access here

if (CurrentToken().type == TokenType.LEFT_BRACKET)

{

MatchAndEat(TokenType.LEFT_BRACKET);

Node key = Expression();

MatchAndEat(TokenType.RIGHT_BRACKET);

return new LookupNode((VariableNode) varNode, key);

}

else return varNode;

}

Page 107: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 103

Please delete the old version and add this new one to the Parser.java.

Let me explain how the array access works here.

We added a new control that it utilizes to understand if a given variable is an arrayelement. If it is an array, it matchs-eats left bracket, gets the expression inside bracketsand matchs-eats the right bracket. Then, it creates a new object of LookupNode.

I want to point out something important here: we get the expression betweenbrackets, right? It is the another example of the usage of the expressions! We useexpression parsing the get the expression that is given as array indice. Nice.

Now, here is the LookupNode class’ source code first, then I will explain why weneed it:

import java.util.List;

public class LookupNode extends Node

{

private VariableNode varNode;

private Node keyNode;

public LookupNode(VariableNode varNode, Node keyNode)

{

this.varNode = varNode;

this.keyNode = keyNode;

}

public Object eval()

{

Object var = varNode.eval();

int index = ((Integer) keyNode.eval()).intValue();

Object ret = ((List) var).get(index);

return ret;

}

}

It is a class for encapsulating access to a pairing using key and value. It has twomembers:

• keyNode is the key of the element we access

Page 108: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 104

• varNode is the value of the element we access

Both members are Node subclasses. When its eval method is called, it evaluates thekey and the value and returns the element that is accessed with this key and value.

As you remember, variable node gets elements from the symbol table. So, theLookupNode object reads from the symbol table.

Now, let’s compile TokenType.java, Tokenizer.java and Parser.java with the newchanges. There shouldn’t be any error when you compile.

Let’s make a demonstration of what we have done so far, save below script in array-access-demo.con:

script

fruits = ["apple", "orange"]

count = 0

while count < 2

println fruits[count]

wait 300

count = count + 1

end

end

You will see below output when you run this script:

apple

orange

Nice. Our script gets the array and prints its elements.We have successfuly completedthe parsing array definition and array access.

What’s left? We need to implement array updates. Here is a quick example:

Page 109: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 105

fruits[1] = "Banana"

It updates the fruits array by puting “Banana” into the array at index 1. Array updateis a form of assignment as well, and since we know every assignment is a statement,we need to update the Statement() method.

Let’s start with putting a control into the Statement() method so we can determine ifan operation is array access. Put this control in the Statement() method in Parser.java:

else if (IsArrayAccess())

{

node = ArrayUpdate();

}

So, here is the new version of the Statement() method:

public Node Statement()

{

Node node = null;

TokenType type = CurrentToken().type;

if (IsAssignment())

{

node = Assignment();

}

else if (IsArrayAccess())

{

node = ArrayUpdate();

}

else if (IsWhile())

{

node = While();

}

else if (IsIfElse())

{

node = If();

}

else if (type == TokenType.PRINT)

{

MatchAndEat(TokenType.PRINT);

Page 110: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 106

node = new PrintNode(Expression(), "sameline");

}

else if (type == TokenType.PRINTLN)

{

MatchAndEat(TokenType.PRINTLN);

node = new PrintNode(Expression(), "newline");

}

else if (type == TokenType.WAIT)

{

MatchAndEat(TokenType.WAIT);

node = new WaitNode(Expression());

}

else

{

Util.Writeln("Unknown language construct: "

+ CurrentToken().text + CurrentToken().type);

System.exit(0);

}

return node;

}

There is a new recognizer method here, IsArrayAccess:

public boolean IsArrayAccess()

{

TokenType type = CurrentToken().type;

return type == TokenType.KEYWORD &&

NextToken().type == TokenType.LEFT_BRACKET;

}

Don’t forget to add it to Parser.java as well. It’s for checking if the operation is anattempt to access an array, like:

fruits[1] = "grape"

You see that there is a call to the ArrayUpdate() method:

Page 111: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 107

public Node ArrayUpdate()

{

String arrayName = MatchAndEat(TokenType.KEYWORD).text;

Node array = new VariableNode(arrayName, this);

MatchAndEat(TokenType.LEFT_BRACKET);

Node indexExpr = Expression();

MatchAndEat(TokenType.RIGHT_BRACKET);

MatchAndEat(TokenType.ASSIGNMENT);

Node rightSideExpr = Expression();

return new ArrayUpdateNode(array, indexExpr, rightSideExpr);

}

This method handles assignments like this:

myArray[index] = yourArray[yourIndex]

First we get the array name, then the expression for the left index. The index itself isan expression that might be literal number, or an expression, like 1+2, or a variablejust as it is in the above example.

Then, it matches and eats the “=” assignment operator. Afterwards, it gets the rightside. This time, the Expression() method handles it because we added the facility forExpression to take care of array access.

Finally, at the end of this method, we create a new ArrayUpdateNode object. TheArrayUpdateNode object is for encapsulating array updates. The array element getsupdated when the interpreter executes the ArrayUpdateNode object’s eval method.

Here is the ArrayUpdateNode class:

Page 112: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 108

import java.util.List;

public class ArrayUpdateNode extends Node

{

private Node array;

private Node indexExpression;

private Node rightSideExpression;

public ArrayUpdateNode(Node array, Node indexExpression,

Node rightSideExpression)

{

this.array = array;

this.indexExpression = indexExpression;

this.rightSideExpression = rightSideExpression;

}

public int ToInt(Node node)

{

Object res = node.eval();

return ((Integer) res).intValue();

}

public Object eval()

{

Object arrayVariable = array.eval();

int index = ToInt(indexExpression);

Object newValue = rightSideExpression.eval();

Object ret = ((List) arrayVariable).set(index, newValue);

return ret;

}

}

Save and compile all the classes with the new additions and updates.

Let’s write a new script to demo the array update functionality. Try the below scriptand save it to file array-update-demo.con:

Page 113: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 109

script

fruits = ["apple", "orange"]

println "Fruits BEFORE Update"

println "------------"

count = 0

while count < 2

println fruits[count]

wait 300

count = count + 1

end

# Lets leave a space here

println ""

grp = "Grape"

sberry = "Strawbery"

fruits[0] = grp

fruits[1] = sberry

println "Fruits AFTER Update"

println "------------"

count = 0

while count < 2

println fruits[count]

wait 300

count = count + 1

end

end

Here is the output when we run it:

Page 114: Practical Interpreter Construction - Leanpubsamples.leanpub.com/pic-sample.pdf · Chapter1-Motivation 3 onceyoulearnhowparsersandinterpreterswork.Withhardwaregettingcheaper and cheaper

Chapter 4 - Parsing Programming Language Constructs 110

Fruits BEFORE Update

------------

apple

orange

Fruits AFTER Update

------------

Grape

Strawbery

With this demo script, we have demonstrated:

• Array definition• Array access• Array update• Variables• And even comments!

Things are becoming pretty exciting! Good.

We have arrays functioning in our interpreter now.

In the next chapter, things will be evenmore exciting once we are going to implementfunctions. Keep on reading!