microprocessor architectures--a comparison based on code generation by compiler (wirth, 1986)

7/30/2019 Microprocessor Architectures--A Comparison Based on Code Generation by Compiler (Wirth, 1986)

1/13

Edgar H. SibleyPanel Editor By carefully tuning computer and compiler, if is possible to avoid theotherujise inevitable compromises between complex compiling algorithmsand less-than-optimal compiled code, where the key to performance appearsto lie neither in sophisticated nor drastically reduced architectures, but inthe key concepfs of regularify and completeness.

MICROPROCESSORARClflTECtUfES:A COMPARISOU 0ASED ON CODEGEUERATlOU BY COMPILERNIKLAUS WIRTH

To a programmer using a high-level language, com-puter and compiler appear as a unit. More impor-tantly, they must not only be regarded, but also de-signed, as a unit.. However, many computers displaya structure and an instruction set-an architec-ture-that mirrors the metaphor of programming byassembling individual instructions. More recent de-signs feature characteristics that are oriented towardthe use of high-level languages and automatic codegeneration by compilers.Comparing the suitability of different architec-tures is problematic because many variables are in-volved and even the criteria by which they arejudged are controversial. Ultimately, it is the entiresystems effectiveness in terms of speed and storageeconomy that counts. W e chose two criteria for thecomparison that we consider relevant: code densityand compiler complexity, although they are not theonly indicators of overall effectiveness.l Code density. Densely encoded information re-quires less memory space and fewer accesses forits interpretation. Density is increased by provid-ing appropriate resources (e.g., fast address regis-ters), suitable instructions and addressing modes,and an encoding that takes into account the in-structions relative frequency of occurrence.@1986ACMOOOl-0782/86/1000-0978 750

l Simplicity of compilation. A simple, compact com-piler is not only faster, but also more reliable. It ismade feasible by regularity of the instruction set,simplicity of instruction formats, and sparsity ofspecial features.In this article, we make an attempt to measureand analyze the suitability of three different proces-sors in terms of the above criteria. In general, threevariables are involved, namely, the computer archi-tecture, the compiler, and the programming lan-guage. If we fix the latter two, we have isolated theinfluence of the architecture, the item to be investi-gated. Accordingly, we shall involve a single lan-guage only, namely, Modula-2 [8]. Unfortunately,fixing the compiler variable is not as easy: Compilersfor different processor architectures differ inher-

ently. Nevertheless, a fair approximation to the idealis obtained if we use as compilers descendants of thesame ancestor, that is, variants differing in theircode-generating modules only. To this end, we havedesigned compilers that use the same scanner, par-ser, symbol table, and symbol file generator, and,most importantly, that feature the same degree ofsophistication in code optimization.It is reasonable to expect that a simple and regulararchitecture with a complete set of elementary oper-ations, corresponding to those of the source lan-guage, will yield a straightforward compiling algo-

970 Communications of the ACM October 1986 Volume 29 Number 10


2/13

Computing Practices

rithm. However, the resulting code sequences maybe less than optimally dense. The observation thatcertain quantities (such as frame addresses) occurfrequently may motivate a designer to introducespecial registers and addressing modes (implyingreferences to these registers). Or the observation thatcertain short sequences of instructions (such asfetching, adding, and storing) occur frequently mayspur the introduction of special instructions combin-ing elementary operators. The evolution of the morecomplex architectures is driven primarily by the de-sire to obtain higher code density and thereby in-creased performance. The price is usually not only amore complex processor, but also a more compli-cated compiling algorithm that includes sophisti-cated searches for the applicability of any of theabbreviating instructions. Hence, the compiler be-comes both larger and slower.The microprocessor architectures chosen for thisinvestigation are the Lilith [4, 73, National Semicon-ductor 32000 [3], and Motorola 68000 [2]. (To denotethe latter two, we shall use the abbreviations NS andMC, respectively.) Lilith is a computer with a stackarchitecture designed specifically to suit a high-levellanguage compiler (i.e., to obtain both a straight-forward compiling algorithm and a high code den-sity. Both the MC and in particular the NS are saidto have been designed with the same goals althoughthey both feature considerably more complexinstruction sets.All three processors are microcoded; that is, everyinstruction invokes a sequence of microinstructions.These sequences differ in length considerably, andtherefore the execution time for the instructions alsovaries. (On the MC and NS, the microinstructionsare stored in a ROM that is included in the processorchip.)Whereas for decades the future was seen to lie inthe more baroque architectures, the pendulum nowappears to be swinging back toward the opposite ex-treme. The ideal machine is now said to have only afew, simple instructions [5], where the key distinc-tion (e.g., for RISC architectures) is that executiontime is the same for all instructions, namely, onebasic cycle. Quite likely the optimal solution is to befound neither in extremely Spartan nor in lavishlybaroque approaches.First we present an overview of the comparedprocessor organizations, pointing out a few relevantdifferences, and then we consider the compiler andits strategy for code generation. By means of a fewexamples of language constructs, we illustrate theinfluence of the architecture on the complexity ofthe code-generation process, which is reflected inturn in the size of the compilers (source programlength). Finally, we use the compilers themselves

as test cases to measure the overall density ofgenerated code.THE PROCESSOR ARCHITECTURES ANDTHEIR INSTRUCTION FORMATSIn this section, we compare briefly the essential andrelevant features of the three architectures. Formore detail, the reader is referred to specific de-scriptions of the individual processors. All threeprocessors mirror a run-time organization tailoredfor high-level languages involving a stack of proce-dure activation records. Lilith and NS feature threededicated address registers for pointing to the frameof global variables, to the frame of the most recentlyactivated procedure, and to the top of the stack. Inthe MC, three of the seven general-purpose addressregisters are dedicated to this purpose.For expression evaluation and storing intermedi-ate results, Lilith features a so-called expressionsfack-a set of fast registers that are implicitly ad-dressed by an up/down counter whose value is au-tomatically adjusted when data are fetched orstored. The expression s tack logically constitutes anextension of the stack of procedure activation rec-ords. Since it is empty at the end of the interpreta-tion of each statement, the difficulties inherent inany scheme involving two levels of storage are mini-mized: The expression stack need be unloaded (fromthe registers) into the main stack (in memory) onlywhen context is changed within a statement (i.e.,only upon calling a function procedure). In contrast,the other processors offer a set of explicitly num-bered data registers. The run-time organizations ofthe three processors used by the Modula-2 systemare shown in Figure 1, on the next page.The processors instruction formats are shown inFigures 2-4 (p. 981). Lilith and NS instructions formbyte streams, whereas the MC instructions form astream of 16-bit units. Lilith is a pure stack machinein the sense that load and store instructions have asingle operand address, and actual operators havenone, referring implicitly to the stack. Instructionsfor the NS and MC mostly have two explicit oper-ands. Their primary instruction word contains fieldsal and a2 indicating the addressing mode (and aregister number) and frequently require one or twoextension fields containing the actual offset value(called displacement). In the case of indexed address-ing modes, the extensions include an additionalindex byte specifying the register to be used asindex.The so-called external addressing mode of Lilithand the NS deserves special mention, It is used torefer to objects declared in other, separately com-piled modules. These objects are accessed indirectlyvia a table of linking addresses. The external ad-

October 1986 Volume 29 Number 10 Communications of the ACM 979


3/13

Conlpufirlg Practices

Lilith NS32000m--T-(limit)Parameters

Parameters

Variables

Parameters

G

PC

I Parameters IFP

SP

SB

I-

MOD

Currentcode

iframe

AE

I VariablesI Parameters

Currentglobalframe*

Links

FIGURE1. Run-Time Organizations

- PCPC --*

dressing mode, when used properly, makes programlinking as a separate operation superfluous-a defi-nite advantage whose value cannot be overesti-mated. In the case of Lilith, the use of a single,global table of module references makes it necessaryto modify the instructions upon loading; modulenumbers generated by the compiler must be mappedinto those defined by the module table. The NS sys-tem eliminates t.he need for code modification by

retaining a local link table for each module; theloader then merely generates this link table.Another architectural difference worth mention-ing relates to the relative facilities for evaluatingconditions. Lilith allows Boolean expressions to betreated in the same way as other expressions, whereeach relational operator is uniquely represented inthe instruction set and leaves a Boolean result on topof the stack. In addition, there are conditional

980 Con~mu~~icatiom of the ACM October 1986 Volume 29 Number 10


4/13

Computing Practices

opcode 1 Operator (no operands)

op ) a 1 Single operand instructions

opcode a

opcode a

opcode I a I

opcode b a External addressing

FIGURE2. Instruction Formats of Lilith

Conditional jumps (FO) Each operand fieldmay require additional

piGq displacement and/orProcedure calls (Fl) index bytesIopcode c al Operand c is a small integer (F2)Iopcode J

pi&-p&

Single operand instructions (F3)

Double operand instructions (F4)

II Iopcode a2 al Double operandinstructions (F6, F8, Fll)

jumps, corresponding to the AND and OR operators,which are suitable for the abbreviated evaluation ofexpressions: If the first operand has the value FALSE(TRUE), this value is left on the stack, and the proc-essor skips evaluation of the second operand.By contrast, the NS and MC architectures offer asingle comparison instruction, which leaves its re-sult in a special condition code register. The distinc-tion between the various relational operators is es-tablished by the use of different condition masks ina subsequent instruction that converts the conditioncode into a Boolean value. As a result, the compila-tion of Boolean expressions differs significantly fromthat of arithmetic expressions and is more compli-cated. The condition code register is an exceptionalfeature to be treated differently from all otherregisters.The primary characteristics discussed thus far aresummarized in Tables I and II, on the next page.THE CODE-GENERATION STRATEGYThe three compilers we are comparing not only havethe same scanner, parser, table handler, and symbolfile generator modules, they also share the samemethod for code generation. The parser uses the top-down, recursive descent method, which implies thateach syntactic entity is represented by a procedurerecognizing that entity. The procedure is then aug-mented with statements that generate code, and ithas a result parameter that describes the recognizedentity in the form of attribute values. The computa-tion of both the code and the attribute values iscontext free in the following sense: Given a syntacticrule

10 -64 5 d < 64 the attribute values of So are determined by a func-tion Fi whose arguments are the attribute values of10 d 1 -8192


5/13

Computiq Practices

instruction lengths 8.16.24Address lengths 4,8, 16Addresses per instruction OS1External addressing YesCondition ccrde NoData registers Stack (16)Address registers G, f-7 S. 0-Y

8, 16, 24,32,40, .8, 16, 32I,2YesYesRO-R7SB, FP, SP, MOD

16,32,48,64, 8016,321.2NoYesDO-D7AO-A6, SP

TABLE II. Data Addressing Modes

Register T (stack) RLnlAddress registerRegister indirect WW IAutoincrement M[SP]; INC(SP)Autodecrement DEC(SP); M[SP]Direct M[G + d] M[SB + d]M[L + d] M[FP + d]M[SP + d]

Indirect M[T+ d] M[M[SB f dl] + d2]MIM[FP +- dl] + d2]M[M(SP + dl] + d2]Indexed M[T + TJ M[SB + d + R[x] x s]M[FP + d + R[x] x s]Indirect indexed M[M[SB + dl] + d2 + R[x] x s]M[M[FP + dl] + d2 + R[x] x s]External M[Mft + dl] + d2] M[M[M[MOO + 8] + dl] + d2]immediate WCI MPCICapital letters denote resources of the processor; small fetters the parameters of the instructio n.n, x are register number (0 7); d, dl, d2 are displacements (offsets).Autoinc and -dac mities are called stack mode ~1 the NS and apply to the SP register only.s denotes a scale factor of 1, 2, 4, or 8.MCs term for direct is register indirect with offset.

WnlAlnlWW IM[A[nll; INC(A[n])DEC(A[n]); M[A[nl]Wlnl + 4

M[A[n] + d + O[xJ]M[A[n] + d + A[x]]

WCI

sion represents a constant or a variable, because, ifan addition is compiled, the addition is performeddirectly if both t.erms are constants; otherwise, anadd operator is emitted, and the attribute indicatesthat the result is placed on the stack. In order toallow (constant) expressions to occur in declarations,

the compilers ability to evaluate expressions is in-dispensable. In essence, we wish to distinguish be-tween all modes of operands for which the eventualcode might differ. Code is emitted whenever a fur-ther deferment of code release could bring noadvantage. Table III displays the modes of itemTABLE III. Item Descriptor Modes and Their Attributes

conMd value conMd valuedirMd adr dirMd adrindMd OffSet indMd adr, offsetindRMd RinxMd inxMd adr, RXinxiMd adr, offset, RX,inxRfvld R, offset, RXstkMd stkMdregMd R

cocMd cc, Tjmp, FjmptwMd WP typMd$ typeprocMd prm procMd prmNote: The value adr is actually a triple consisting of module number, level, and off&.

ConMd

indAMdinxAMd

stkMdAregMdOregMdcocMdtypMdprocMd

value

adr, Aadr, DX

A0cc, Tjmp, FjmptypePrw

982 Commmica tiom of the ACM October 1986 Volume 29 Number 10


6/13

Computing Practices

descriptors and their attributes as chosen forthe three processors.The original modes are conMd, dirMd, indMd,typMd, and procMd: They are the modes given tothe newly created constant factor, variable, var-parameter, type transfer function, or procedure call,respectively. The other modes emerge when appro-priate constructs are recognized: For instance, anitem is given inxMd (or inxiMd) when it is combinedwith an index expression to form an indexed desig-nator. Or an item obtains indMd if a pointer variable(dirMd), followed by a dereferencing operator and afield identifier, has been parsed. In general, the morecomplicated modes originate from the reduction ofcomposite object designators.Evidently, the set of available modes is deter-mined largely by the addressing modes of the pro-cessor: the more addressing modes, the more attri-bute modes, the larger the state space of the items tobe compiled, and the more complicated the transfor-mation and code selection routines. Complex in-struction sets and large varieties of addressing modesdistinctly increase the complexity of a compiler.Tables IV-VI give examples of the code generatedfor several typical constructs: procedure parameters,indexed variables, and arithmetic expressions. Thethree columns display the code for Lilith, NS, andMC, respectively.Procedure ParametersProcedure parameters (Table IV, p. 984) are passedvia the stack of activation records. The NS/MC pro-cessors deposit the parameters values or addresses ontop of the stack (allocated in memory) before controlis transferred to the procedure. Since parameters areaddressed relative to the local frame base, they arealready in their proper place when the procedure isentered. In the Lilith computer, parameters are alsoput on the stack. However, because the top of thestack is represented by fast registers [the expressionstack) and because this stack is reused in the proce-dure for expression evaluation, the parameters haveto be unstacked into the memory frame immediatelyafter procedure entry. This complicates code genera-tion somewhat, but generally also shortens the gen-erated code, because the unstack operations occur inthe procedures code once, and not in each call. Thefact that the NS/MC architectures include a moveinstruction compensates for this advantage of Lilith,because in the NS/MC machines the move instruc-tion bypasses registers (which play a role corre-sponding to the Lilith expression stack).Indexed VariablesBecause indexed variables (Table V, p. 985) occurvery frequently, the resulting code should be short,

All three processors therefore include special in-structions for indexed address computation, includ-ing the validation of array bounds, that is, theycheck whether the index value lies within thebounds specified by the array variables declaration.In the case of Lilith, the code differs when the lowbound is zero. Although this may seem an insignifi-cant peculiarity, it contributes to the effectiveness ofthe architecture because of the high frequency ofoccurrence of the zero low bound.Arithmetic ExpressionsTo compute arithmetic expressions (Table VI, p. 986),the NS/MC compilers utilize the data registers in amanner similar to a stack. Since the compiler doesnot keep track of what was loaded into these regis-ters, it is clear that the registers are not used in anoptimal fashion; however, any further improvementincreases the compilers complexity considerably.Nonetheless, multiplications and divisions by inte-gral powers of two are (easily) recognized and repre-sented by shift instructions.Boolean ExpressionsBoolean expressions require special attention be-cause, although they are specified by the same syn-tax as other expressions, their evaluation rules dif-fer. In fact, the definition of the semaniics of Booleanexpressions is inconsistent with their syntax, at leastif one adheres to the notion that a syntax must faith-fully reflect the semantic structure. This anomaly isdue to the fact that the syntax of expressions is de-fined regardless of type, even though arithmetic op-erators are left-associative, whereas logical operatorsare right-associative. For example, x + y + z is un-derstood to be equivalent to (x + y) + z, and p&q&r isequivalent to p&(q&r). The logical connectives aredefined in Modula in terms of conditional expres-sions, namely,

p AND 4 = if p then 4 else falseP OR 4 = if p then true else q

Consequently,p OR q OR r= if p then true else (if q then true else r),

which is obviously right-associative. The Booleanconnectives are implemented not by logical opera-tors, but by conditional jumps. And, since Booleanexpressions occur most frequently as constituents ofif and while statements, a further complicationarises: An efficient implementation must unify con-ditional jumps within expressions with those occur-ring in statements, thus effectively breaching thesyntactic structure of the language. By showing a



7/13

TABLE N. Assignment and Procedures

x:=y-tzLLW yLLW zADDSTW x

x := 3 f 5LIT 8STW xx := r7.f

LLW rLSW fSTW xali] := b[j]LLA aLLW i

LLA bLLW jLXWsxw

x:=y


8/13

Computing Practices

sions are modestly reflected in the introduction of anew item mode (cocMd) meaning the items valueis represented by the condition code register. Theitems attributes are the mask value CC appropri-ately transforming the register value into a Booleanvalue, and the two sequences of locations of branchinstructions that require updating once their desti-nation address is known. These sequences designatethe branches taken when the Boolean result isTRUE or FALSE, respectively.In summary, we observe that, as expected, theNS/MC architectures lead to a smaller number ofgenerated instructions compared to Liliths purestack architecture. However, the gain is made at the

cost of more complicated compiling algorithms,which can be seen in Table VIII (p. 987) and Figure 6(p. 987), indicating the size of the compiler modulesin terms of source and object code length.These results not only run counter to all intuitiveexpectations, but they are also highly disappointingwith regard to the commercial microprocessors. Be-cause of the complex instruction set, the hardwarefor the MC and NS microprocessors is considerablymore intricate than that of Lilith, and its cost hasbeen felt severely in terms of long development de-lays. Another consequence of complex instructionsets is the need for more sophisticated code genera-tors to fully tap the power of the instruction set.

TABLE V. Indexed Variables

LGW a MOVW a-18(SB) u(SB) MOVW a-lE(A5) u(A5)LSW 9SGW u0 := a[i]LGW aLGW iLIB HIGH(a)CHKZLXW

SGW

u := b[i]LGW bLGW iLIW -10ISUBLIB 20CHKZLXWSGW u

u := c[9, 91LGW cLSA 216LSW 9SGW u

CHECKW RO [O, 991 i(SB)FLAGMOVW [RO:W] a(SB) u(SB)

CHECKW RO i-10, +lO] i(SB)FLAGMOVW [RO:W] b(SB) u(SB)

MOVB [RO:W] c-450(SB) u(SB)

MOVW i(A5) DOCHK 99 DOASLW 1 DOLEA d(A5) A4MOVW O(A4, DO.W) u(A5)

MOVW i (As) DOADDW 10 DOCHK 20 DOASLW 1 DOLEA b(A5) A4MOVW O(A4, D0.W) u(A5)

MOVB C-450(A5) ch(A5)u := c[i, j]LGW cLGW iLIB 103CHKZLIB 24

UMULUADDLGW jLIB 23CHKZLXWSGW u

CHECKW RO [O, 991 i(SB)FLAG

CHECKW Rl [0, 231 j(SB)FLAGINDEXW RO 23 RlMOVB [RO:W] c(SB) u(SB)

MOVW i(A5) DOCHK 99 DOASLW 4 DOLEA c(A5) A4MOVW j(A5, D2CHK 15 D2LEA O(A4, D0.W) A4MOVB O(A4, D2.W) u(A5)



9/13

Computirlg Pracfices

TABLE VI. Arithmetic Expression

LGW aLI 10UADDLLW iLGW bLI 5UMULUADDLLW jLI 1SHLUADDLI 2SHRUSUB

MOVW a(SB) ROAPDW 10 ROMOVW b(SB) Rt?mr.bw .5 RlAsrrt@- i(FP) RlMOVW j(FP) R2L&W 1 R2ADDW R2 R1

~L$HW -2 R1SUBW Rl RO

MOVW a(A5) DOADDIW 10 DOMOVW b(A5) DZMULS 5 02ADDW i(A6) D2MOVW j(A6) D4ASLW 1 D4ADDW D4 D2EXTL D2DIVS 4 D2SUBW D2 DO

TABLE VII. Boolean Expressions Compiled for the Lilith

LGW xLGW yLSSOR3 LlLGW zLGW xLEQLl: AJP L2LGW uLGW vLSSORJ L2LGW wLGW "LEQL2: JPC I.3LGW ySGW xJP L4L3: LGW vSGW u

L4: ...

.,"_, c3lPkl y x

BGT tl,: CMPW x z

'I * * BLS L3Lj : CMPW v u

.(i .%CT L2

r+4: ,..

MOVW x(A5) DOCMPW y(A5) DOBLT LlMOVW z(A5) DOCMPW x{A5) DOBGT I.3Ll: MOVW u(A5) DOCMPW v(A!i) DOBLT ~2MOVW w(A5) DOCMPW u(A5) DOBGT L3L2: MOVW y(A5) x(A5)BRA L4

L3 : MOVW V(A5) u(A5)L4: . . .

IF p & q THEN SO ELSE Sl END

P

IF (p & q) OR (r & s) THEN SO ELSE Sl END

P

IF p OR q THEN SO ELSE Sl END IF (p OR q) & (r OR s) THEN SO ELSE Sl ENDIq Sl

FIGURE5. Boolean Connectives Represented by Conditional Jumps for the NS and MC Architectures

966 Communicatior7s of tlrr ACM October 1986 Volume 29 Number 10


10/13

Computing Practices

TABLE VIII. Size of Compiler Modules

Scanner 410 11,200 2,640 4,180 5,580 1.58 2.11Parser 1,300 39,520 8,190 11,340 18,500 1.38 2.26Table handler 270 8,400 1,350 2,460 4,450 1.82 3.29Symbol file generator 530 20,850 3,680 5,730 9,240 1.55 2.51Code generated for Lilith 1,490 50,200 10,190Code generated for NS 2,050 69,000 (15,340) 22,960 1.50Code generated for MC 3,780 150,000 (21,550) 48,630 2.26Total 26,050 46,670 86,400 1.79 3.32

ScannerParserTable generatedSymbol filesCode generated

LilithNSMC

Source code in lines

0 500 1000 1500 2000 2500 3000 3500

Object code in bytes

20K 40K 6OK 80KFIGURE6. Overall Size of Compilers

As a result, the compiler program is 14 percentlonger for NS, and 56 percent longer for MC, than forLilith. If we consider the code-generator parts only,the respective figures are 37 percent and 154 per-cent. But the most disappointing thing of all is thatthe reward for all these efforts and expenses appearsnegative: For the same programs, the compiled codeis about 50 percent longer for NS, and 130 percentlonger for MC, than for Lilith. The cumulative effectof having a more complicated compiling algorithmapplied to a less-effective architecture results in acompiler for the NS that is 1.8 times more volumi-nous than that for Lilith, whereas the compiler forthe MC is 3.3 times as long. Quite obviously, thevalue of a megabyte of memory strongly depends onthe computer in which it is installed.ANALYSISNaturally, one wonders where the architects mayhave miscalculated. Measurements shed some light

on this question but there is no single contributingfactor to the poor result, and there is no single, s im-ple answer.In Figure 7, on the next page, we give the relativefrequencies of occurrence of various instructionlengths and types for the three microprocessors. Asobjects of this investigation, we use the modules ofthe compilers themselves. Admittedly, this intro-duces some bias, for example, against long operands(real numbers), but other measurements havelargely confirmed these results.For Lilith, the average length of an instruction is1.52 bytes where 16 percent of all instructions areoperators without explicit operand fields that referimplicitly to operands on the expression stack; 50percent have a single operand field 4 bits long; and17 percent require a l-byte and 10 percent a &byteoperand field. The J-bit field is packed together withthe operator field into a single byte. This facilitywith short operand fields-an idea that stems from

October 1986 Volume 29 Number 10 Communications of the ACM


11/13

Conzputii~g Practices

Lilith NS MC NS

980

1 g-bit operand2 bytes

Operators1 byte

4-bit operand1 byte

8-bit operand2 bytes3 bytes

1 byte

2 bytes

3 bytes

2 bytes

4 bytes

6 bytes

FO jumpFlF2

calltwo operandswith short immediate field

F4 two operandsF6 single operand

FIGURE7. Frequency of Instruction Lengths and Types

the Mesa instruction set of Xeroxs D-machines[I, 6]-holds the major key to Liliths high codedensity.In Figure 7, the relative frequencies of instruc-tions generated for the NS and MC architectures areclassified according to their formats and baselengths. These 1, 2, or 3 bytes are usually followedby further bytes containing the addresses, operands,and indexing information. III rare cases, a single in-struction may consist of a dozen bytes or even more.The average total instruction length is about 3.6bytes for the NS versus 1.5 bytes for Lilith, and al-most 6 for the MC (3.5 not counting the displace-ments). The number of generated instructions,however, is only 1.6 times higher for Lilith.The NS and MC architectures feature a particu-larly rich set of data addressing modes that are de-signed to reduce the number of instructions and toincrease code density. The relative frequencies ofusage of these modes is tabulated in Figure 8.Fourteen percent of the references are to registersdirectly. This percentage corresponds roughly to theimplicit stack references of Lilith. In the case of NS

NS MCRegisterRegister indirectDirect (FP based)Direct (SB based)Indirect (FP)Immediate byteImmediate wordExternalStack

and MC, the stack mode is used exclusively forplacing procedure parameters into the stack of acti-vation records and therefore has no relationship toLiliths stack usage. The frequency of stack refer-ences is nevertheless surprisingly high (over20 percent).A noteworthy but not surprising result of this richset of addressing modes is that local objects are ac-cessed considerably more frequently (via FP) thanglobal ones (via SB). Surprisingly frequent areindirect accesses (20 percent), which use twodisplacements-a reflection of the preponderance ofaccess to r?cord fields via pointers. This addressingmode is present in the NS, but not in the MC,architecture.Looking at constants that are represented as im-mediate mode data placed in the instruction streamimmediately following the instruction, one recog-nizes the predominance of Is-bit operands. In lightof the data size distribution measured for Lilith, onerealizes that a major flaw of the NS/MC designs liesin their requirement that the length o f an immediateoperand be exactly as defined by the operator; no

D directA directA indirectA indirect displacementImmediateA indirect withincrement/decrement

NS

bytesFIGURE8. Distribution of Addressing Modes and Displacement Sizes for the NS and MC Architectures

Conlmuuications of the ACM October 1986 Volume 29 Number 10


12/13

Computing Practices

automatic lengthening (with either zero or sign ex-tension) is provided, as is the case with addresses(displacements).This brings us to a final investigation of the fre-quencies of the various displacement sizes (also shownin Figure 8). The NS architecture provides sizes of 1,2, or 4 bytes. The length is not dictated by the opera-tor code, but instead is encoded in the displacementvalue itself, a solution that is equally desirable fromthe point of view of code generation. As expected,the l-byte displacements dominate strongly. The av-erage displacement size is 1.32 bytes. Particularlynoteworthy is the fact that, for the MC, 94 percent ofall displacement values could be placed into a singlebyte instead of a Is-bit word.CONCLUSIONSThe NS and MC architectures have been comparedwith the Lilith architecture, a prototype of a regular,stack-oriented design. The increased complexity ofthe NS and MC resources, instruction sets, and ad-dressing modes not only fails to lead to a simplercompiler, but actually requires a more complicatedone. Regrettably, it also results in longer and oftenless-efficient code. On average, code for the NS isabout 50 percent longer, and code for the MC 130percent longer, than that for Lilith. Among the com-mercial products, this puts NS far ahead of MC.Although these two microprocessors are the bestarchitectures widely available, the results of this in-vestigation suggest that they also leave room for con-siderabie improvement. Between the two, the NSyields markedly better results, particularly whenjudged by the compiler designer. In the authorsopinion, both designs could have avoided some seri-ous miscalculations if their compilers (for somehigh-level language) had been implemented beforethe designs of the processors were fixed. The analy-sis presented here reveals the two main pinpointablecauses of the low code density to be1. the lack of short (less than 1 byte) address oroperand fields, and2. the use of explicitly addressed registers for inter-mediate results,Nonetheless, the principal underlying syndrome is amisguided belief in complexity as a way to achievebetter performance. Both the NS and MC architec-tures feature complicated instruction sets and ad-dressing modes: Obviously, these architectures arecompromises in an attempt to satisfy many require-ments, but they are also products of an unboundedbelief in the possibilities of VLSI. However, noteverything that can be done should be done.This criticism of overly complex architectureswould seem to favor the development of architec-

tures featuring a simple structure, a small set of sim-ple instructions, and only a few basic addressingmodes-designs that have become known as RISCarchitectures [5]. However, one should be cautiousnot to rush from one extreme to the other. In fact,some recent RISC schemes propose facilities, such asa register bank effectively implying a two-levelstore, that require complicated code-generation algo-rithms to achieve optimal performance. Once again,the designers are primarily, if not exclusively, con-cerned with speed. But there is no reason whyfeatures could not be added to a design to cater tospecific, genuine problems posed by the implemen-tation of high-level languages. Under no circum-stances, however, should such additions involve acomplicated mechanism or infringe on the regularstructure of the existing scheme. Regularity of designemerges as the key. Features must solve problems,not create them. In order to promise genuine prog-ress, the acronym RISC should stand for regular (notreduced) instruction set computer.Regularity alone, however, is not sufficient. Itmust be accompanied by completeness. The instruc-tion set must closely mirror the complete set of basicoperators available in the language. In this respect,the NS architecture represents a significant improve-ment over earlier products, whereas the MC designgives rise to innumerable grievances and is poorlysuited for effective compiler design. Irregularity isthe chief culprit for the complexity of the MC codegenerator, which is more than twice as long (interms of source code lines) as Liliths. (In the case ofthe Intel 8086, the factor lies between 3 and 4.) Thisobservation is particularly relevant in view of thewidespread acceptance of these architectures andthe likelihood of their becoming de facto standards.Recognizing that regularity and completeness havebeen pivotal concepts in mathematics for centuries,it is high time that they be taken into account byengineers designing what are in effect mathematicalmachines.Although this analysis may have given theimpression of undue concern for code density andcode efficiency, there is actually a much more pro-found reason to strive for regularity of design thanefficiency: and that is reliability. Reliability unques-tionably suffers whenever unnecessary complexitycreeps in. Reliability grows at least proportionally tothe regularity of a devices specification, let alone itsimplementation-a law that applies equally wellto hardware and software.Reliability (and not convenience of programming)was also the primary motivation behind the develop-ment of high-level, structured languages, which are The size of the code generator for the 18086 is 4,880 lines (cf. Figure 6). andthe object code size for the 18086 compiler is 149,000 bytes.



13/13

Computing Practices

supposed to provide suitable abstractions for formu-lating data definitions and algorithms in formsamenable to precise, mathematical reasonsing. Butthese abstractions are useless unless they areproperly supported by a correct, watertight imple-mentation. This postulate implies that all violationsof the axioms governing an abstraction must be de-tected and reported; moreover, that checks againstviolations must be performed by the compilerwhenever possible, and otherwise by additionalinstructions interpreted at run time. It is therefore aprimary characteristic of architectures designed forhigh-level languages that they support these abstrac-tions by suitable facilities and efficient instructionsin order to make the overhead minimal.Consistent support for such checking is perhapsthe most commendable characteristic of Lilith. Thefollowing violations lead to immediate terminationof a computation:l access to an array with an invalid index;l access to a variable via a pointer variable withvalue NIL;l overflow in integer, cardinal, and real-numberarithmetic;l selection of an invalid case in a case statement;l lack o f data space on procedure call (stackoverflow).All these violations except the first are detectedwithout the need for additional instructions. Thechecks are built into the address computation,arithmetic, case selection, and procedure entry in-structions. Index values are validated by additionalinstructions inserted before the address calculation.Above anything else, it is these features that havecharacterized Lilith as high-level language oriented.During five years of intensive use, they have provednot only valuable, but also indispensable, and havemade possible a truly effective environment for pro-gram development. Guaranteeing the validity of alanguages abstractions is not a luxury, if is a necessity.It is as vital to inspiring confidence in a system asthe correctness of its arithmetic and its memory ac-cess. However, a processor must be designed in sucha manner that the overhead caused by the guardsis unnoticeable.By these standards, both the NS and MC architec-tures can be called halfheartedly high-level lan-guage oriented, even though they both represent atremendous improvement over all earlier commer-cial processor designs. Both processors feature con-venient index bound checks, but unfortunately, testsfor invalid access via NIL values, or for stack over-flow are available only at the cost of cumbersome

instruction sequences to which most programmers-too confident in their art-are unwilling to submit.Even tests for arithmetic overflow require additionalinstructions. It is incomprehensible that instructionsspecifically designed for reserving space for localvariables upon procedure entry can be designedwithout the inclusion of a limit check.

Acknowledgments. The author gratefully acknowl-edges the valuable contributions by W. Heiz andH. Seiler, who ported the compiler to the MC 68000,designed the new code generator, and provided thedata concerning that architecture.

REFERENCES1.

2.3.4.

5.

6.

7.

6.

Johnsson, R.K., and Wick, J.D. An overview of the Mesa processorarchitecture. In Proceedings of the Symposium on Architectural Supportfor Programming Languages and Operating Systems (Palo Alto., Calif.,Mar.]. 1982. (Also publishe d in SZGARCH Comput. Archit. News IO, 2,and in SIGPLAN Not. 17, 4).Motorola Corp. MC68020 32-Bit Microprocessor Users Manual.Prentice-Hall, Englewood Cliffs, N.J., 1984.National Semiconductor Corp. Series 32000 Instruction Set ReferenceManual. National Semiconducto r Corporation, 1984.Ohran, R.S. Lilith and Mod&-Z. BYTE 9, 8 (Aug. 1984), 181-192. Adescription of the structure of the Lilith computer, and its orienta-tion toward Modula-2.Patterson, D .A. Reduced instructio n set computers. Commun. ACM28,~ (Jan. 1985), 8-21. A thorough presentation of the concept of theRISC.Sweet, R .E., and Sandman, J.G. Empirical analysis of the Mesa in-struction set. In Proceedings of the Symposium on Architectural Supportfor Progkmvning Languages and Operating Systems [Palo Alto, C alif.,Mar.). 1982. (Also published in SIGARCH Comput. Archit. News 10, 2,and in SIGPLAN Not. 17, 4).Wirth. N. The personal computer Lilith. In Proceedings of fhe 5thInternational Conference on Software Engineering (San Diego, Calif..Mar.). IEEE Computer Society Press, 1981. A presentation of thecoinbined hardware/sof tware design of the workstation Lilith.Wirth, N. Programming in Modula-2. Springer-Verlag . New York,1982. An introduction to the use of Mod&-Z. Includes the definingreport.

CR Categories and Subject Descriptors: C.0 [Computer Systems Or-ganization]: General--hardware/software interfaces, instruction set design;D.3.4 [Programmin g Languages]: Processors-code generation, compilers,optimizationGeneral Terms: Design, PerformanceAdditional Key Words and Phrases: code density, compiler complex-ity, design regularity and completeness , high-level language orientationof processor architecture, Lilith. MC68000, Mod&-Z, NS32000

Received l/86: accepted 4/66

Authors Present Address: Niklaus Wirth, Institut fiir Informatik, ETH,CH-8092 Ziirich. Switzerland .

Permission to copy without fee all or part of this material is grantedprovided that the copies are not made or distributed for direct commer-cial advantage, the ACM copyright notice a nd the title of the publicationand its date appear, and notice is given that copying is by permission ofthe Association for Computing Machinery. To copy otherwise, or torepublish . requires a fee and/or specific perm ission.

990 Communications of the ACM October 1986 Volume 29 Number 10

microprocessor architectures--a comparison based on code generation by compiler (wirth, 1986)

Documents