computer science 210 computer organization building an assembler part iv: the second pass, syntax...
TRANSCRIPT
Computer Science 210Computer Organization
Building an Assembler
Part IV: The Second Pass, Syntax Analysis, and Code Generation
The Second Pass
Scanner
First Pass
Source program listing,error messages(file and/or terminal)
Text fileCharacterIO
Line stream
Token stream
Symbol table
Opcode table
Sym file
Tools
Second Pass
Bin file
What the Second Pass Does
• Scans through the lines of code and performs syntax analysis
• Translates each line of code to a 16-bit binary instruction (or data values when .FILL, .STRINGZ, and .BLKW appear)
Implementation: The Data
#define FILL_ZERO "0000000000000000"#define SIX_ZEROS "000000"
static int spAddress; // The address counterstatic int spNotDone; // More instructions?static token spToken; // The current tokenstatic FILE* binfile; // The output filestatic char outputBuffer[17]; // The current binary instruction
The Top-Level Function
// Initializes the data, gets the first instruction, gets the// first token, and calls program()void secondPass(FILE* infile, FILE* outfile, FILE* bfile){ binfile = bfile; outputBuffer[16] = 0; initScanner(infile, outfile); spAddress = DEFAULT_START_ADDRESS; spNotDone = nextInstruction(); spToken = nextToken(); program();}
Implementation: Second Pass Tools
• Define some utility functions to
– Output a line of binary code
– Process a label reference
– Finish an instruction (scans to end of line, increments the address counter, gets the next token)
– Check a token’s type and output an error message if it’s unexpected
Finishing an Instruction
// The purported end of an instruction has been reached, so// check for the newline, get the next instruction, get // its first token, and increment the address counter.void finishInstruction(){ spToken = nextToken(); accept(TC_NEWLINE, "Too many tokens in instruction."); spNotDone = nextInstruction(); if (spNotDone){ spAddress++; spToken = nextToken(); }}
The Parsing Functions
• Each syntax rule in the EBNF grammar translates to a parsing function
• Each function assumes that the current token is its start symbol
• Each function calls finishInstruction as its last step
The Protoypes// Parsing function prototypesvoid program();void instruction();void orig_ins();void add_or_and_ins();void blkw_ins();void br_ins();void fill_ins();void jmp_ins();void jsr_ins();void jsrr_ins();void ld_ldi_st_sti_ins();void ldr_or_str_ins();void lea_ins();void not_ins();void ret_or_rti_ins();void stringz_ins();void trap_ins();
Instructions with the same format differ only in the leading token
Parsing with the Top-Level Rule// program = [ orig-directive ] { [ label ] instruction } ".END"
void program(){ orig_ins(); while (spNotDone && spToken.type != TC_END) instruction(); accept(TC_END, ".END expected.");}
We stop when .END is reached or there are no more instructions
accept checks the current token’s type for possible error
Parsing the not Instruction// not-ins = "NOT" register "," register
void not_ins(){ strcpy(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_COMMA, "Comma expected."); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); outputBinary(); finishInstruction();}
Parsing the .FILL Directive// fill-ins = ".FILL" integer-literal
void fill_ins(){ spToken = nextToken(); accept(TC_INT, "Integer literal expected."); strcpy(outputBuffer, signedBinary(spToken.intValue, 16)); outputBinary(); finishInstruction();}
Should add a check on the bounds of an integer fill value!
Parsing the .BLKW Directive// blkw-ins = ".BLKW" integer-literal
void blkw_ins(){ strcpy(outputBuffer, FILL_ZERO); spToken = nextToken(); accept(TC_INT, "Integer literal expected."); int i; for (i = 1; i <= spToken.intValue; i++) outputBinary(); spAddress += spToken.intValue - 1; finishInstruction();}
Should add a check on the memory available for the given number of words!
Parsing the .STRINGZ Directive// stringz-ins = ".STRINGZ" string-literal
void stringz_ins(){ spToken = nextToken(); accept(TC_STRING_LIT, "String literal expected."); char* lit = spToken.source; int i; for (i = 0; i < spToken.intValue; i++){ char ch = lit[i]; strcpy(outputBuffer, unsignedBinary(ch, 16)); outputBinary(); } strcpy(outputBuffer, FILL_ZERO); outputBinary(); spAddress += spToken.intValue - 2; finishInstruction();}
Should add a check on the memory available for the characters!
Parsing the LD Instruction// ld-ins = "LD" register "," label
void ld_ldi_lea_st_sti_ins(){ strcpy(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_REG, "Register expected."); strcat(outputBuffer, spToken.binary); spToken = nextToken(); accept(TC_COMMA, "Comma expected."); spToken = nextToken(); processLabel(9); outputBinary(); finishInstruction();}
LD, LDI, LEA, ST, and STI all have the same format
Processing a Reference to a Label// Converts an integer to a signed bit string and // appends that to the output buffervoid processLabel(int numBits){ accept(TC_LABEL, "Label expected."); if (spToken.type == TC_LABEL){ int labelAddress = findSymbol(spToken.source); if (labelAddress == -1) putError("Undeclared label."); else{ int offset = labelAddress - (spAddress + 1); strcat(outputBuffer, signedBinary(offset, numBits)); } }}
Make sure there is a label, make sure it’s declared, and use its address and PC + 1 to compute the offset of length numBits
Should add a check on the limits of the offset!