computer science 210 computer organization building an assembler part iv: the second pass, syntax...

Computer Science 210Computer Organization

Building an Assembler

Part IV: The Second Pass, Syntax Analysis, and Code Generation

The Second Pass

Scanner

First Pass

Source program listing,error messages(file and/or terminal)

Text fileCharacterIO

Line stream

Token stream

Symbol table

Opcode table

Sym file

Tools

Second Pass

Bin file

What the Second Pass Does

• Scans through the lines of code and performs syntax analysis

• Translates each line of code to a 16-bit binary instruction (or data values when .FILL, .STRINGZ, and .BLKW appear)

Implementation: The Data

#define FILL_ZERO "0000000000000000"#define SIX_ZEROS "000000"

static int spAddress; // The address counterstatic int spNotDone; // More instructions?static token spToken; // The current tokenstatic FILE* binfile; // The output filestatic char outputBuffer[17]; // The current binary instruction

The Top-Level Function

// Initializes the data, gets the first instruction, gets the// first token, and calls program()void secondPass(FILE* infile, FILE* outfile, FILE* bfile){ binfile = bfile; outputBuffer[16] = 0; initScanner(infile, outfile); spAddress = DEFAULT_START_ADDRESS; spNotDone = nextInstruction(); spToken = nextToken(); program();}

Implementation: Second Pass Tools

• Define some utility functions to

– Output a line of binary code

– Process a label reference

– Finish an instruction (scans to end of line, increments the address counter, gets the next token)

– Check a token’s type and output an error message if it’s unexpected

Finishing an Instruction

// The purported end of an instruction has been reached, so// check for the newline, get the next instruction, get // its first token, and increment the address counter.void finishInstruction(){ spToken = nextToken(); accept(TC_NEWLINE, "Too many tokens in instruction."); spNotDone = nextInstruction(); if (spNotDone){ spAddress++; spToken = nextToken(); }}

The Parsing Functions

• Each syntax rule in the EBNF grammar translates to a parsing function

• Each function assumes that the current token is its start symbol

• Each function calls finishInstruction as its last step

The Protoypes// Parsing function prototypesvoid program();void instruction();void orig_ins();void add_or_and_ins();void blkw_ins();void br_ins();void fill_ins();void jmp_ins();void jsr_ins();void jsrr_ins();void ld_ldi_st_sti_ins();void ldr_or_str_ins();void lea_ins();void not_ins();void ret_or_rti_ins();void stringz_ins();void trap_ins();

Instructions with the same format differ only in the leading token

Parsing with the Top-Level Rule// program = [ orig-directive ] { [ label ] instruction } ".END"

void program(){ orig_ins(); while (spNotDone && spToken.type != TC_END) instruction(); accept(TC_END, ".END expected.");}

We stop when .END is reached or there are no more instructions

accept checks the current token’s type for possible error

Parsing the not Instruction// not-ins = "NOT" register "," register

void not_ins(){ strcpy(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_COMMA, "Comma expected."); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); outputBinary(); finishInstruction();}

Parsing the .FILL Directive// fill-ins = ".FILL" integer-literal

void fill_ins(){ spToken = nextToken(); accept(TC_INT, "Integer literal expected."); strcpy(outputBuffer, signedBinary(spToken.intValue, 16)); outputBinary(); finishInstruction();}

Should add a check on the bounds of an integer fill value!

Parsing the .BLKW Directive// blkw-ins = ".BLKW" integer-literal

void blkw_ins(){ strcpy(outputBuffer, FILL_ZERO); spToken = nextToken(); accept(TC_INT, "Integer literal expected."); int i; for (i = 1; i <= spToken.intValue; i++) outputBinary(); spAddress += spToken.intValue - 1; finishInstruction();}

Should add a check on the memory available for the given number of words!

Parsing the .STRINGZ Directive// stringz-ins = ".STRINGZ" string-literal

void stringz_ins(){ spToken = nextToken(); accept(TC_STRING_LIT, "String literal expected."); char* lit = spToken.source; int i; for (i = 0; i < spToken.intValue; i++){ char ch = lit[i]; strcpy(outputBuffer, unsignedBinary(ch, 16)); outputBinary(); } strcpy(outputBuffer, FILL_ZERO); outputBinary(); spAddress += spToken.intValue - 2; finishInstruction();}

Should add a check on the memory available for the characters!

Parsing the LD Instruction// ld-ins = "LD" register "," label

void ld_ldi_lea_st_sti_ins(){ strcpy(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_COMMA, "Comma expected."); spToken = nextToken(); processLabel(9); outputBinary(); finishInstruction();}

LD, LDI, LEA, ST, and STI all have the same format

Processing a Reference to a Label// Converts an integer to a signed bit string and // appends that to the output buffervoid processLabel(int numBits){ accept(TC_LABEL, "Label expected."); if (spToken.type == TC_LABEL){ int labelAddress = findSymbol(spToken.source); if (labelAddress == -1) putError("Undeclared label."); else{ int offset = labelAddress - (spAddress + 1); strcat(outputBuffer, signedBinary(offset, numBits)); } }}

Make sure there is a label, make sure it’s declared, and use its address and PC + 1 to compute the offset of length numBits

Should add a check on the limits of the offset!

computer science 210 computer organization building an assembler part iv: the second pass, syntax...

Documents