efficientstorage management values concurrent … ighlevel languages which support concurrency as an...

9
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 9, SEPTEMBER 1985 Efficient Storage Management for Temporary Values in Concurrent Programming Languages DONNA QUAMMEN, JOHN PHILIP KEARNS, AND MARY LOU SOFFA Abstract An evaluation stack, used exclusively to store tem- porary values in expression evaluation, is known to be an effective mechanism in the implementation of high level languages. This work considers the efficient management of evaluation stacks for concurrent programming languages. Techniques for sharing a single evaluation stack among many processes, without copying on process switches, are developed. The best strategy for managing the evaluation stack is shown to depend strongly upon the sched- uling paradigm adopted by the run-time support of the language. Simulation studies, driven by synthetic workloads, show that the techniques described in this paper exhibit substantial perfor- mance improvements over traditional temporary storage manage- ment schemes for concurrent languages. Index Terms - Concurrency, high level language architecture, registers, scheduling, storage management. I. INTRODUCTION H IGH level languages which support concurrency as an intrinsic control form have experienced a dramatic up- surge of interest in the past few years. Concurrent Pascal [6], Modula [21], Mesa [13], and Ada' [2] are perhaps the best- known examples of such concurrent languages. Certain algo- rithms, particularly those for applications in real-time control and embedded systems, are most naturally expressed as a set of communicating computations (at least conceptually) pro- ceeding in parallel. The practical acceptance of such lan- guages will, to a great extent, depend upon the development of efficient implementations. This work proposes storage management techniques for temporary values created during program execution that support the efficient implementation of concurrent languages. The hardware being considered is a multiprocessor shared memory system in which each processor has some form of local memory. We develop techniques that allow temporaries from concurrent processes to be housed in a single LIFO stack. This stack can reside in a register bank, such as those existing in RISC [ 16] or in slower main memory. The activa- tion records of program units are not stored in the temporary stack but are managed in another structure. The techniques are applicable in the implementation of a statement-oriented concurrent programming language with static priority as- Manuscript received August 1, 1984; revised January 9, 1985. This work was supported in part by the National Science Foundation under Grant MCS-8119341. The authQrs are with the Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260. 'Ada is a registered trademark of the Ada Joint Program Office, U.S. Government. signment to the concurrent units. In the development of the techniques, we consider three types of scheduling policies provided by the run-time support. These include nonpreemp- tive, preemptive priority, and general preemptive. The paper is organized as follows: in Section II, the prob- lems associated with storage management for temporaries in concurrent programming languages are discussed. Section III considers how scheduling strategies impact upon schemes for managing temporaries. In Section IV, a methodology is presented for the generation of synthetic workloads which can be used to evaluate storage management schemes. The section concludes with performance results obtained using the methodology. Concluding remarks are made in Section V. II. DIFFICULTIES WITH TEMPORARIES Any storage management scheme that is used in the imple- mentation of a high level programming language must be able to handle the storage for activation records (AR's) of pro- gram units (e.g., procedures, functions, and processes) and the temporaries that are generated during a program's exe- cution. A sequential language which supports at most recur- sion is efficiently implemented using a stack since the control flow of program units is LIFO by nature. As the executing AR is always located at the top of the stack, storage for temporaries can be dynamically allocated above the AR. Thus, due to the stackability of temporaries and their tran- sient nature, the handling of temporaries in a sequential set- ting requires no special attention. However, when moving to an environment supporting log- ical concurrent control forms such as backtracking or co- routines, the management of temporaries becomes more difficult and expensive. A stack is no longer adequate due to the non-LIFO control flow of the program instances. The AR being resumed may not be at the top of the stack, and the allocation of temporary storage above the AR may not be possible. Another complication that must be considered in this retentive environment is the handling of temporaries that still exist when an instance suspends voluntarily, e.g., through a coroutine operation. When a programming language supports concurrency, the storage management scheme is further complicated by two issues. The first is that there will be, in general, more than one active instance at any time. Also, temporaries become long-lived not only due to the voluntary suspension of a process but also due to preemptions of active processes by the 0018-9340/85/0900-0832$01.00 © 1985 IEEE 832

Upload: danglien

Post on 10-Apr-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 9, SEPTEMBER 1985

Efficient Storage Management for Temporary Values inConcurrent Programming Languages

DONNA QUAMMEN, JOHN PHILIP KEARNS, AND MARY LOU SOFFA

Abstract An evaluation stack, used exclusively to store tem-porary values in expression evaluation, is known to be an effectivemechanism in the implementation of high level languages. Thiswork considers the efficient management of evaluation stacks forconcurrent programming languages. Techniques for sharing asingle evaluation stack among many processes, without copying onprocess switches, are developed. The best strategy for managingthe evaluation stack is shown to depend strongly upon the sched-uling paradigm adopted by the run-time support of the language.Simulation studies, driven by synthetic workloads, show that thetechniques described in this paper exhibit substantial perfor-mance improvements over traditional temporary storage manage-ment schemes for concurrent languages.

Index Terms - Concurrency, high level language architecture,registers, scheduling, storage management.

I. INTRODUCTION

H IGH level languages which support concurrency as anintrinsic control form have experienced a dramatic up-

surge of interest in the past few years. Concurrent Pascal [6],Modula [21], Mesa [13], and Ada' [2] are perhaps the best-known examples of such concurrent languages. Certain algo-rithms, particularly those for applications in real-time controland embedded systems, are most naturally expressed as a setof communicating computations (at least conceptually) pro-ceeding in parallel. The practical acceptance of such lan-guages will, to a great extent, depend upon the developmentof efficient implementations. This work proposes storagemanagement techniques for temporary values created duringprogram execution that support the efficient implementationof concurrent languages.The hardware being considered is a multiprocessor shared

memory system in which each processor has some form oflocal memory. We develop techniques that allow temporariesfrom concurrent processes to be housed in a single LIFOstack. This stack can reside in a register bank, such as thoseexisting in RISC [ 16] or in slower main memory. The activa-tion records of program units are not stored in the temporarystack but are managed in another structure. The techniquesare applicable in the implementation of a statement-orientedconcurrent programming language with static priority as-

Manuscript received August 1, 1984; revised January 9, 1985. This workwas supported in part by the National Science Foundation under GrantMCS-8119341.The authQrs are with the Department of Computer Science, University of

Pittsburgh, Pittsburgh, PA 15260.'Ada is a registered trademark of the Ada Joint Program Office, U.S.

Government.

signment to the concurrent units. In the development of thetechniques, we consider three types of scheduling policiesprovided by the run-time support. These include nonpreemp-tive, preemptive priority, and general preemptive.The paper is organized as follows: in Section II, the prob-

lems associated with storage management for temporaries inconcurrent programming languages are discussed. Section IIIconsiders how scheduling strategies impact upon schemes formanaging temporaries. In Section IV, a methodology ispresented for the generation of synthetic workloads whichcan be used to evaluate storage management schemes. Thesection concludes with performance results obtained usingthe methodology. Concluding remarks are made in Section V.

II. DIFFICULTIES WITH TEMPORARIES

Any storage management scheme that is used in the imple-mentation of a high level programming language must be ableto handle the storage for activation records (AR's) of pro-gram units (e.g., procedures, functions, and processes) andthe temporaries that are generated during a program's exe-cution. A sequential language which supports at most recur-sion is efficiently implemented using a stack since the controlflow of program units is LIFO by nature. As the executingAR is always located at the top of the stack, storage fortemporaries can be dynamically allocated above the AR.Thus, due to the stackability of temporaries and their tran-sient nature, the handling of temporaries in a sequential set-ting requires no special attention.

However, when moving to an environment supporting log-ical concurrent control forms such as backtracking or co-routines, the management of temporaries becomes moredifficult and expensive. A stack is no longer adequate due tothe non-LIFO control flow of the program instances. The ARbeing resumed may not be at the top of the stack, and theallocation of temporary storage above the AR may not bepossible. Another complication that must be considered inthis retentive environment is the handling of temporariesthat still exist when an instance suspends voluntarily, e.g.,through a coroutine operation.When a programming language supports concurrency, the

storage management scheme is further complicated by twoissues. The first is that there will be, in general, more thanone active instance at any time. Also, temporaries becomelong-lived not only due to the voluntary suspension of aprocess but also due to preemptions of active processes by the

0018-9340/85/0900-0832$01.00 © 1985 IEEE

832

QUAMMEN et al.: EFFICENT STORAGE MANAGEMENT

scheduler. Thus, storage structures must be used that can

accommodate the retention of both AR's and temporaries.Examples of such retentive storage structures include heaps,cactus stacks, and generalized stacks [4], [11]. Retentivestorage structures experience an interesting tradeoff. If onewishes to allow the dynamic growth and shrinkage of tempo-rary storage, but avoid internal fragmentation, the storagemanagement scheme must, in general, copy. Such a copyingoverhead is paid by the generalized stack of Bobrow andWegbreit [4]. An alternative to copying is the acceptance ofsome internal fragmentation caused by preallocating space toaccommodate at least the maximum possible storage demandof each instance. Empirical studies [9], [10] have shown thatcopying imposes substantial penalties upon the storagemanagement scheme in terms of both space and time, and atleast for coroutines, preallocation of space for temporaries ispreferable.

In order to avoid both copying and preallocation in therealm of concurrent systems, we distinguish between evalua-tion temporaries and persistent temporaries. Persistenttemporaries are long-lived in that their existence crosses

statement boundaries in the programming language. Suchpersistent temporaries include:

1) those created by the optimization phase of a compiler tohold common subexpression values for a sequence of state-ments;

2) loop control or case selection values; and3) intermediate values which exist when control forms,

such as functions, execute within an expression evaluation.Persistent temporaries may exist when a process volun-

tarily relinquishes control of the processor, such as by exe-

cuting a SUSPEND operation. All other temporaries generatedduring the execution of a program are called evaluation tem-poraries. In a sequential environment, evaluation tem-poraries are generated during the execution of a statementcontaining a functionless expression, and no other statementsexecute while they exist. However, evaluation temporaries in

a concurrent environment can -be active when a process ispreempted in the middle of computing an expression and,thus, can exist during the execution of other statements.Thus, both persistent and evaluation temporaries are moredifficult to handle in a concurrent environment.

A. Techniquesfor the Management ofTemporaries

Three basic methods to manage temporaries which remainwhen a process is no longer executing include the following.

1) Preallocation: Space to accommodate the maximumnumber of temporaries for each program unit is preallocatedin its activation record upon creation. This is essentially themechanism used in a cactus-stack implementation, which isthe structure suggested for the implementation of Ada in thePreliminary Reference Manual [1]. Furthermore, the cactusstack requires that space be preallocated not only for thetemporaries but for all the storage needed for each instancecalled directly or indirectly by a process. This bound musteither be supplied by the programmer or the programmer

must accept a system-specified bound. Of course, the storagemanagement scheme will then be stacklike within a particular

process' preallocated area. No copying will occur with pre-allocation. However, a drawback will be fragmentation andpossible stack overflow, if the bound is not sufficient.

2) Copy on Suspension or Preemption: During exe-cution, all temporaries use a common area (e.g., registers),and whenever a process voluntarily suspends or is pre-empted, any live temporaries are copied to another structure,such as the activation record of the suspended process. How-ever, because suspension could occur in a procedure directlyor indirectly called by the process, preallocation of the maxi-mum number of storage locations ever needed for tempo-raries by any sequence of instances called by the processwould be necessary. The structure could also be a commonarea in which active temporaries from some or all suspendedprocesses are kept. This is essentially the approach thatMesa takes [12], [8]. In Mesa, a stack is used for expres-sion evaluation. Upon a preemption, the contents of the stackare copied to a "state vector" assigned to the process beingpreempted. Besides the expense of copying and preallocationof the state vectors, an additional overhead is incurred sincestate vectors are allocated in real memory. In addition, dead-lock could occur if not enough memory is available for thecreation of a state vector.

3) Copy on Resumption: It is only when an instance isready to execute and its temporaries are not in an activeposition (e.g., at the top of a stack, if a stack is being used)that temporaries are copied. This is, in essence, the approachtaken by the generalized stack of Bobrow and Wegbreit [4].This approach is an optimistic approach, for copying is de-layed in the hope that when a process resumes execution, itstemporaries are located such that execution can continuewithout any copying. Copying is done only when it is abso-lutely necessary to do so.

Since there are inefficiencies in both preallocation andcopying-, this work examines conditions under which tempo-raries do not need to be copied or storage preallocated. Weconsider the scheduling strategy provided by the run-timesupport system, as well as the semantics of the control struc-tures defined by the concurrent language.By examining different methods for scheduling processes,

we develop efficient storage management techniques forevaluation temporaries. These temporaries are stored in astack structure that can be shared by many processes, evenover a process switch. This evaluation stack could be storedin a common area of memory or, most beneficially, could beimplemented using a register bank. The use of fast registerbanks for the evaluation stack is reasonable given recentwork [19], [5] which suggests that procedure calls can beimplemented efficiently by using large register banks, suchas found in RISC- 1 [ 16]. These schemes have entire or partialactivation records residing in registers. Such schemes, al-though shown to be extremely effective for sequential highlevel languages [16], have difficulty generalizing to concur-rent languages. Copying large register banks to memory on aprocess switch or even on a procedure/function call is ex-pensive [12], [3]. In addition, there are serious problems andoverhead involved with maintaining an activation recordsometimes in memory and sometimes in registers. These

833

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 9, SEPTEMBER 1985

problems do not exist when the registers are used as theevaluation stack.

B. Storage Managementfor Concurrent Languages

The storage management techniques developed in this pa-per use two storage structures for the memory demands of aconcurrent high level language program. Any one of severalstorage structures which handle retention, such as a heap orcactus stack, can be used to store the activation records ofprocesses, procedures, and functions. Space for persistenttemporaries is preallocated in the AR. On the other hand,storage for evaluation temporaries is not preallocated withinthe AR, but is stored in a separate structure, whose manage-ment will be shown in Section III to depend upon the processscheduling strategy. That is, the best management schemewill depend on the particular process scheduling strategy.

This approach is intended to offer better space and timeperformance than traditional designs. It should be noted thatexperimental results for coroutines [18] have shown that aheap in which temporaries are allocated separately in registerbanks requires only about 44 percent of the space-time prod-uct [3], [10] required by the simple heap when presented withthe same workload. (The space-time product is a single met-ric which, in a sense, expresses the policy's combined de-mand for computational activity and memory.)

III. SPECIFIC IMPLEMENTATIONS

The local memory of each processor will be utilized as astack to store temporaries of program units executed by theprocessor. We refer to this as a register stack, and will provethat stackability is possible. All processes can run on anyprocessor, but the storage management scheme may be re-stricted to either dynamic or static binding of processes toprocessors. In static binding, a process is bound to a pro-cessor at process creation, and the binding does not changefor the entire execution of the process. Thus, if the processsuspends and subsequently resumes, it always resumes on thesame processor. In dynamic binding, the binding of the pro-cess to a processor can change during the execution of theprocess. Any time the process resumes, a new binding witha processor is established. Thus, a process may execute on anumber of different processors.We use P to represent the set of all processors {Im, 7T2,*

lTk} in the system. T = {tl, t2,... , tn} denotes the set of allprocesses at any point in the execution. These processes maybe actively executing on a processor, ready to execute andresiding in a ready queue, terminated, or temporarily sus-pended. Evaluation temporaries used during the execution ofa single process can be utilized in a stacklike fashion. Thisfollows immediately from the fact that a process supportsprecisely one operating chain, and that a given operatingchain will contain exactly one process instance.We assume that a process' evaluation temporaries will be

allocated in the register bank associated with the processor onwhich the process is executing. Let S - {Sj, S2, * * *, Sk} de-note the set of all register banks in the system where sj is the

bank associated with processor rj, 1 s j - k. If th is toexecute on Irj, then the evaluation temporaries for the oper-ating chain of th are stored in sj. Recall that the AR's associ-ated with instances will be allocated separately from theevaluation temporaries. The following discussion serves toexplain how processes' evaluation temporaries may coexistin a stacklike fashion in sj, depending on the schedulingpolicy.We consider three types of scheduling policies: non-

preemptive, priority preemptive, and general preemptive.The priority of a process th is designated by an integerPh. Thelarger the value ofPh, the greater the priority of th. This studyconsiders only static priority values. We also require thatvoluntary suspensions occur at statement level, which is theusual language design. Thus, when a process suspends vol-untarily, it possesses no evaluation temporaries. The onlytime that evaluation temporaries exist for a nonexecutingprocess is when it is preempted during statement evaluation.

A. Systems with Nonpreemptive Scheduling

In a nonpreemptive scheduling environment, a process issuspended only at its own request, and thus, no active evalua-tion temporaries exist. The following property holds regard-less of whether processes are statically or dynamically boundto processors.Implementation Policy 1: In nonpreemptive scheduling,

for any 7rj, sj may be managed as a simple stack.Discussion: Assume th is executing on 7Tm. When th sus-

pends, Sm is empty. Therefore, when tk is assigned to Irm, itmay use Sm as it pleases. In particular, tk can use Sm as a stack.

B. Preemptive Priority Scheduling

Preemptive priority is a frequently used scheduling policyin which the execution of a process can be suspended by thelogical activation of a process of higher priority. This sched-uling policy is suitable for concurrent languages such asMesa and Ada. The storage management schemes describedhere are dependent on the type of process-to-processor bind-ing. We first consider static binding.

1) Static Binding of Process to Processor: As we arebinding the processes to a processor, it is only necessary toconsider a single processor, say frj. Without loss of gener-ality, restrict T = {tl, t2, * * *, tr} to be the set of processesthat have been bound to the processor irj. In ImplementationPolicy 2, we require that each process has a unique priority.This restriction is relaxed in Implementation Policy 3.

Implementation Policy 2: If all t E T assigned to 7rj havedifferent priorities, then sj may be managed as a singlestack. That is, the evaluation temporaries of all t E T can bestacked in a single structure sj.

Discussion: It must be shown that whenever tm resumeson irj, the temporaries of tm are at the top of sj. Assume thattm is currently executing at priority Pm . If tk with priority Pkinterrupts tin, then Pk > Pm . tk executes and then either sus-pends itself, and thus leaves no temporaries on the stack, ortk is interrupted by a process with higher priority. In eithercase, only after all processes of higher priority are no longer

834

QUAMMEN et al.: EFFICIENT STORAGE MANAGEMENT

contending for trj will tm be executed on ij . This implies thatall processes of higher priority are either suspended or termi-nated, leaving no temporaries on the stack above those of tin.

E

In order to accommodate the existence of processes ofequal priority, a tie-breaking discipline must be definedwhich determines which process is given preference if sev-eral processes of the same priority are eligible for execution.The particular tie breaker which is used in Policies 3 and 5 islast preempted first (LPF). Under LPF tie breaking, if sev-eral processes of the same priority are logically capable ofexecuting, that process which was preempted most recentlywill be scheduled. If there are no preempted processes at thatpriority level, any discipline (such as FIFO, LIFO, or Ran-dom) may be employed to choose a process for execution.The fact that LPF does not guarantee fairness is not inconsis-tent with schedulers currently being used in programminglanguages, e.g., Mesa and Ada.

Implementation Policy 3: Assume P and T as definedabove and assume that preemptive priority scheduling is usedwith m distinct priority levels. If for all processes within apriority level, the tie-breaking policy is LPF, then sj may bemanaged as a single stack.

Discussion: Implementation Policy 2 guarantees thatprocesses of unequal priority will not interfere with eachother. Therefore, it is only left to demonstrate that, within apriority level, processes also will not interfere with eachother. We do this by showing that temporaries from at mostone process from each priority level exist on sj, and LPFtie breaking guarantees that this process will execute. Notethat under priority scheduling, a process cannot be inter-rupted by a process of the same priority.Assume th is the first process to execute in priority class n;

i.e., Ph = n. th either suspends, leaving no temporaries onthe stack, or is interrupted by a process of higher priority,leaving its temporaries on the stack. Eventually, the priorityclass n will be the highest priority waiting for the processor.If an LPF policy is used, and if two processes th and tk suchthat Ph = Pk are waiting for a processor, then th will be chosenby the scheduler, as it was the last (and only) to execute inthat priority. Its temporaries are at the top of the stack. Itshould be noted that there is at most one preempted processfrom each priority level at any time under the LPF tie-breaking scheduling. o

2) Dynamic Binding of Process to Processors: We nowconsider the management of evaluation temporaries for pro-cesses when dynamic binding is used. An interrupted processwith active temporaries will not, in general, be assigned tothe same processor upon its reactivation, and thus its evalua-tion temporaries may reside on an inaccessible sj. In order topreclude this and support the free migration of processes, thefollowing strategy must be adopted.

1) Upon the preemption of a process executing on Tj, itsevaluation temporaries must be copied out of sj into a storagestructure (denoted R ) commonly accessible by all processors.

2) Upon the resumption of the preempted process, sayonto processor r,,, its temporaries must be copied from Rinto s,.

It is shown in Implementation Policy 4 that R can beaccessed as a stack. Also, it should be noted that because ofcopying, sj is always empty when a new process is assignedto irj and can always function as a stack. We will first makethe restriction that there is at most one process per prioritylevel. This is relaxed in Implementation Policy 5.

Implementation Policy 4: For all processes {t,, t2,tk}, assume preemptive priority scheduling and only one pro-cess per priority level. Then only one storage area, R, isneeded to hold the interrupted temporary stacks of all 7rj EP. Furthermore, R may be managed as a stack.

Discussion: Processes are preempted in order of lowestpriority. Assume at some point in the execution that allX E P are active, and a process tk with priority greater thansome process tj on ITj is put on the ready queue. tj is inter-rupted, and tk is set to execute on ij. The active temporariesof tj must be saved. Assume that they are pushed into astorage stack R.At a later time tj is reassigned a processor. If tj's tempo-

raries are not at the top of R, the temporaries must belong toanother process, say tmi. Then either 1) tm has higher prioritythan tj, and should be assigned the processor, or 2) t,,, haslower priority than tj. Then, either tm or tj were both exe-cuting when tj was interrupted, which is forbidden by thepriority scheduling, or else tm was chosen to execute after tjwas suspended. Again, this violates the priority schedulingpolicy.

Thus, when tj is assigned a processor, its temporaries mustbe at the top of R .

Implementation Policy 5: Let P = {Jr, 72, * * nr,}, andassume preemptive priority scheduling with LPF tie breakingwithin a priority. Then only one storage area, R, is needed tohold copies of temporary stacks for all X in P. R may bemanaged as a stack.

Discussion: The argument for the validity of this imple-mentation policy is similar to the previous argument. Itshould be noted that all interrupted evaluation temporarystacks for processes of the same priority are grouped togetheron R, with the top process in the group being the last onepreempted. The group with the highest priority in the stackresides at the top of the stack. The last process interruptedwithin the highest priority group must be that process withtemporaries at the top of the stack. This process is selected toexecute when a processor is free. U

C. Nonpriority Preemptive Scheduling

In a general preemptive scheduling system, any processcan be interrupted by another process, leaving temporaries inits sj. This implies that either a stack storage structure has tobe allocated for each process or that the active temporariesmust be copied into a heap structure upon suspension, suchas the copy on suspension. It is not clear at this point whichstrategy would yield the best performance. Although co-routine studies indicate that copying is prohibitively expen-sive, a firm resolution of this issue would require empiricalstudies using representative samples of concurrent programsrunning under a general preemptive scheduler.

835

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-34, NO. 9, SEPTEMBER 1985

D. Summary

Table I is a summary of the management strategies for sjand R as a function of the system scheduler and the bindingof processes to processors. It is important to note that thereis a sharp division in the difficulty of managing the stacks sjand R and the management of a general retentive storagestructure. If an implementor is willing to forego a generalpreemptive scheduler or non-LPF preemptive priority sched-uling, the above arguments present an opportunity to havemany processes share, without copying a reasonably largeregister bank as the single temporary evaluation stack for allthose processes statically assigned to a particular processor.Even in the presence of processes which move from pro-cessor to processor (at the time of resumption of execution)we may utilize a memory stack R as the intermediate storagefacility through which the process evaluation temporarystacks migrate. Of course, this migration imposes the over-head of copying.

IV. PERFORMANCE STUDY OF THE REGISTER STACK

To demonstrate the effectiveness of the storage manage-ment schemes described, we experimentally evaluate theperformance of Policy 2. A methodology for simulating sta-tistically a concurrent workload is developed and then ap-plied to Implementation Policy 2. A register bank is used tohandle the evaluation temporaries. We assume the existenceof a reasonably large register bank, which is in keepingwith some modern designs such as RISC-1 [16] and the IBM801 [17]. As the process proceeds through its execution, theprocedure instances on its call chain are allocated AR spacein shared memory; their evaluation temporaries are stored instacklike fashion in the register bank associated with eachprocessor. The operation PUSH-REG is intended to be themechanism by which values are loaded into registers whenthe register bank is being used as a stack. In order for PUSH-REG to work correctly (placing the new value into the nextsequential register location), one pointer register must besupplied to record the address of the top of stack.When process tj begins its execution on ij, having pre-

empted th, there is no way to know the contents of the registerbank. In particular, there is no a priori knowledge of theidentity of the registers which will serve as the evaluationtemporary substack for this phase of the tj 's execution. How-ever, by Implementation Policy 2, tj's temporaries can beplaced at the top of the current stack (above th 's temporaries)by the instruction PUSH-REG. In effect, the mechanism forstack maintenance under preemptive priority scheduling isthe creation of a "virtual stack" on a per-process basis(Fig. 1). An alternative view of this structure would be as a

stack of per-process substacks.The collection of empirical data requires either the actual

execution of concurrent programs or the use of simulators.Due to the lack of large samples of concurrent programs, thisresearch was conducted using a simulator. The simulatorgathers statistics on run-time memory and time requirementsof a statistically generated program, taking into account thelow level overhead details of all system operations and the

TABLE IORGANIZATION OF PER-PROCESSOR STORAGE STRuCTURE (REGISTER BANK) AND

MEMORY STRUCTURE FOR PROCESS MIGRATION

SCHEDULING/BINDING STRUCTURE OF s /MIGRATION

non-preemptive/either Single stack per processor/Task migration is trivial.

preemptive priority/static Single stacks per processor./(LPF tiebreaking) No migration.

preemptive priority/dynamic Single stack per processor./(LPF tiebreaking) Through common stack R.

preemptive priority/either Heap of stacks per processor./(non-LPF tiebreaking) Through common heap.

general preemptive/either Heap of stacks per processor./Through common heap.

TOS- Ltemps tjtemps th

Fig. 1. Stacking of evaluation temporaries for multiprocessor.

access times of different forms of memory. During exe-cution, the temporaries and activation records are allocatedand reclaimed according to the specifications of the storagemanager.

A. The Synthetic Workload Model

The workload of the system is assumed to be generated bya number of concurrent processes. The behavior of each pro-cess is based statistically upon the results derived by Tan-enbaum [20]. That is, the execution of a process tj may beexpressed as a sequence of high level language statements:

co)tj = hi, h2,~. . . hj *...

where h1 represents an assignment statement, a procedurecall, an if-statement, etc., and co, is generated by randomlychoosing a successive hj according to the relative frequenciesof occurrence of the statement types from the Tanenbaumstudy. Tanenbaum's statement types were augmented for thisstudy by the addition of a "SUSPEND" statement. The proba-bility of a SUSPEND statement executing is expressed para-metrically. A SUSPEND will cause a process to wait for anaverage of INTERVAL time units (also parametric) before itbecomes "ready." If, on becoming ready, the process finds itspriority to be higher than that of the process currently exe-cuting, it preempts that process.The probability of procedure call is also expressed para-

metrically. As the depth of a procedure chain increases, theprobability that a procedure will not call another increases.For this reason the probability of procedure call is inverselyproportional to the procedure depth. It is shown by experi-mentation that this rather ad hoc approach produces an intu-itively acceptable procedure call behavior with few longchains of procedures (peaking around 14) and an averagedepth of approximately three procedures. The number oflocals and actual parameters for the called procedure (andhence the size of its AR) is also obtained from the Tanenbaumresults.

836

QUAMMEN et al.: EFFICiENT STORAGE MANAGEMENT

In essence, the form of coi is determined by choosing theparameterized probabilities and then generating a randomsequence of high level statements. The number of processesin the system is also specified as a parameter.Each high level statement type is mapped into a sequence

of more primitive low level operations. For example, anassignment statement is expressed as the evaluation of anexpression (the right-hand side of the assignment) and then astore into a memory location (the left-hand side variable).The allocation of temporaries and complexity of thearithmetic/logical operations involved in the evaluation ofthe expression is in accordance with the Tanenbaum statis-tics. A note of interest from Tanenbaum's work is that theprobability of no temporaries at all being required in an ex-pression is 50 percent. As the synthetic program executes itslow level operations, a counter representing system time isupdated. The overhead associated with a particular storagemanagement scheme is included in the system time.

There are two major difficulties with the above model.1) Although each process passes through a sequence of

active and suspended phases, preempting lower priority pro-cesses as the need arises, there is no explicit interactionbetween the processes. Such would be the case if these pro-cesses had been assigned to the processor by a "coschedul-ing" allocator [15]. In such an allocator, processes whichinteract heavily are assigned to separate processors in orderto avoid the overhead of a context switch on an interaction be-tween two processes. Coscheduling was an implementationpolicy in the CM*-Medusa project [14]. If interacting pro-cesses are assigned to the same processor, however, it isprobable that the interaction is explicitly reflected in theworkload presented to the storage management system.

2) The processes obey a distribution of statement typesfor "ordinary" programs. However, it is conceivable thatmultiprocess applications intended for execution on multi-processor systems may exhibit substantially different behav-ior from that observed by Tanenbaum.

Resolution of these difficulties and validation of the work-load generation model depend upon the accumulation of asample of production concurrent programs. As such a sampleis not currently available, we have no alternative except tomake a reasonable estimate of a typical workload expected tobe presented to the storage manager.

B. The Storage Management Schemes

The synthetic workload described above was presented tostorage management schemes in order to compute three per-formance metrics.

1) Time used -this includes both program time and thetime expended in implementing the various storage manage-ment demands. For example, scanning a free space list, cre-ating and deleting AR's, and allocating and deallocating tem-poraries all consume time.

2) Memory high-water mark-the maximum amount ofmemory required for execution of the synthetic program un-der a given storage management scheme. For a scheme whichutilizes the register bank, this is the sum of the register and

memory usage. Space demands include the free space inter-nal to the storage structure.

3) Space-time integral -the area under the space versustime curve, where space and time are as defined above.This is the primary performance metric which reflects aprogram/scheme's joint demand for space and computationalfacilities.

Simulators for each different storage management schemewere coded. They were executed with a synthetic workloadin order to demonstrate the effectiveness of ImplementationPolicy 2.

There are three methods for utilizing the register stack. Ineach, the evaluation temporaries are allocated exclusively inthe stack. The three schemes areCAR copies the temporaries from registers into the AR of

the executing instance, possibly a procedure, atprocess preemption,

TAR copies the temporaries from registers into the ARof the executing process instance at processpreemption,

R no copying of temporaries occurs. They are left inthe register stack at preemption.

The third scheme should be recognized as that proposed inImplementation Policy 2. It is being compared to CAR,which is a modified "preallocation" -scheme, and also toTAR, a "copy on preemption" scheme similar to that used ina Mesa implementation [13].

These three methods of using the register stack are coupledwith three retentive storage management structures whichstore the AR's. They are

1) the heap,2) the cactus stack, and3) an ideal method.

The details of the possible combinations of storage manage-ment schemes for AR's and temporaries follow. They aregrouped by the method in which they manage AR's. Thecactus stack, when coupled with "CAR" or "TAR," acts iden-tically; that is, temporaries will be copied to the top of thecactus stack. Therefore, only one of these two combinationswill be addressed.

1) The Heap: All AR's are allocated on demand, andwhen a procedure returns to its caller, its deallocated space isplaced on the free list. Adjacent free "holes" are coalescedwhenever possible. Standard first-fit allocation is employedfor AR allocation. All three methods of handling temporariesare incorporated with the heap. They are called HR, heapwith register stack; HCAR, heap with temporaries stored inthe current AR; and HTAR, heap with temporaries stored ina process area.

2) Cactus Stack: On process creation, space for the pro-cess' AR plus the AR's of the deepest or longest chain ofprocedures called by the process is allocated in a single con-tiguous block. Procedure AR's are then stacked in this pre-allocated area. This scheme is basically a heap of stacks, onestack per process. Note that the storage allocation only takesplace at process creation, and deallocation only takes place atprocess termination. Only two methods of allocating tempo-raries are used with the cactus stack. The first, CR, uses the

837

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-34, NO. 9, SEPTEMBER 1985

register stack. In the second, CTAR, the temporaries are

copied, in a stacklike fashion, to the same structure whichhouses AR's. The CAR method is not included because of theconflict of this method with the LIFO behavior of the cactusstack.

3) Ideal: Both the heap and the cactus stack have intrinsicinefficiencies. With the heap, extra processing is required toconcatenate adjoining holes, and extra space is consumed byunused holes. In the cactus stack the maximum amount ofspace is preallocated but often left unused. Because of this,an ideal but unimplementable scheme was also simulated.Here all AR's are placed in an idealized "structure." Thisstructure increases in size whenever an AR is allocated by thesize of the requested AR. The structure decreases in sizewhenever an AR is deallocated. This method requires theabsolute minimum space to house AR's. The administrativeoverhead required here is the same as that for a heap elementof quantized size, as proposed by Lampson and used by Mesaimplementations [12] in which lists of free heap segments ofvariable sizes are used for AR's. Only a few instructions are

required to allocate or deallocate AR's. The space requiredwill be the same as an idealized quantized heap where a heapelement of the correct size is always available, and there is nooverhead for heap elements not currently in use. All threetemporary management schemes are used in conjunction withthe ideal retentive storage: IR, ICAR, and ITAR. In sum-

mary, Fig. 2 lists the eight storage management schemes.4) Comments on the Schemes: For fairness, design deci-

sions were made in all cases to favor any scheme over theregister stack scheme. The schemes which copy temporariesto AR's include only the maximum dynamic number of tem-poraries needed (i.e., the preallocation will be fully used ifcopying occurs when the maximum number of temporariesexist); in an actual implementation, a static worst case pre-

allocation could be calculated by a compiler. As such, theheap AR sizes represent a lower bound on what they might bein practice.The cactus stack, suggested as an implementation policy

for preliminary Ada [7], and employed in the implementationof Modula [22] and Concurrent Pascal [6], has an inherentdanger of overflow. The elimination of overflow requireseither that recursion be eliminated so that a compiler can

determine the size of the maximum procedure call chain or

that the programmer specify accurately the size of the stackpreallocatiom. In our simulator, a preliminary scan of thesimulated execution determines the exact size required foreach process' branch of the cactus stack. This implemen-tation never preallocates more than is necessary, and never isforced to deal with a procedure call chain which overflowsthe preallocation.

C. Results

The performance of the storage management schemes us-

ing a register stack surpasses the other methods of storingtemporaries in all three performance metrics. Four series ofsimulated program behavior were performed. Each series

altered one of the following parameters, while keeping theothers constant:

AR AR Includes Time of ETs ETs Copied._______ Storage ETs Allocation Copied To

HR heap no instance nocreation

HTAR heap no instance yes task ARcreation

HCAR heap yes instance yes current ARcreation

CR cactus no task nocreation

CTAR cactus no task yes top ofcreation cactus stack

IR ideal no instance nocreation

ITAR ideal no instance yes task ARcreation

ICAR ideal yes instance yes current AR11 ~~~~~~~~~creation

Fig. 2. Summary of storage management schemes considered.

1) probability of generating a procedure CALL,2) number of active tasks,3) probability of generating a SUSPEND,4) the INTERVAL of the SUSPEND duration.

The CPU time demands on the low level operations werechosen as follows:

1) arithmetic/logical operation -0. 1 units of "executiontime,"

2) memory access- 1.0 unit of time,3) register access -0. 1 units of time.

The time demand for register access was also varied to be 0.2and 1.0. The latter represents a stack resident in memory, notregisters.The representative results from one series, that of varying

the probability of a procedure, are included here. The proba-bility that any statement is a procedure call ranges from 10 to40 percent. Fig. 3 plots the ratio of the space-time for eachstorage management scheme to the space-time IR. Table IIpresents the real-time and maximum storage used. The real-time statistics are expressed as a ratio to the real-time of IR.The time for register access was 0.1; however, correspondinggraphs for other register accesses of 0.2 and 1.0 do not varyin any significant manner, indicating that even if the evalua-tion stack resides in main memory, significant savings can bederived from using Policy 2.

It can be seen that the register stack increases the efficiencyof any retentive storage system. A substantial savings is real-ized by the technique of not copying temporaries on preemp-tion. The simulated results answered other questions aboutthe feasibility of implementing this scheme.

1) In a system with 40 processes and 0.3 probability ofprocedure call, the maximum number of registers, allocatedin the register stack, was 24. This indicates that the total sizeneeded for the register stack is surprisingly small. A bank of32 registers should handle a register stack safely.

2) The number of register accesses slightly surpasses thenumber of memory accesses. This indicates that although thenumber of evaluation temporaries needed for most expres-sions is small, when an evaluation temporary exists it is usedmore than once. In fact, an evaluation temporary is accessedat least two times in our simulator: once on loading a valueand once to execute an operation on the value. Therefore,-efforts to efficiently store evaluation temporaries are wellplaced.

838

QUAMMEN et al.: EFFICIENT STORAGE MANAGEMENT

,_ HCAR l_,, .

/ number of tasks/ prob. of procedurre

15VARIED

/ prob. of suspend .05CTAR --/ suspend duration 0 - 10,0

CR

HTAR

~ == ,=

ICAR

/ 4~ ~ ~~~ITAR

.1 .2

)00

.3 .4

Probability of Procedure Call

Fig. 3. Relative space-time performance.

without incurring the overhead of copying the registers intomemory locations on aprocess switch. Conventional wisdomis that this overhead becomes more frequent as the modu-larization facilities presented by structured languages andmodem software construction methodologies become moreprevalent. In addition, the ability to utilize the registers as astack greatly simplifies the determination of register assign-ments by the compiler.The quantitative evaluation of the performance of this

technique vis a vis that of the more conventional storagemanagement schemes will be ultimately experimental in na-ture, requiring a sample of real concurrent programs writtenin a high level language. Such studies are currently inprogress. We also believe that the semantics of current lan-guages can provide information for further storage manage-ment optimizations. The development of those optimizationsdepends upon subtle interaction among language features.The correctness of those optimizations requires the moreformal specification of the semantics of the language featuresand a rigorous proof technique.Given the joint emergence of concurrent high level lan-

guages and architectures with large register sets, the storagemanagement algorithm developed here would seem to be anatural means of using the architectures to best advantage inthe implementation of current languages.

ACKNOWLEDGMENT

TABLE IIRELATIVE PERFORMANCE VARYING PROBABILITY OF PROCEDURE CALL

PROB. of PROCEDURE CALL 0.1 0.2 0.3 0.4

HR 1.11 1.21 1.34 1.46Real-Time Ratio HTAR 1.12 1.23 1.35 1.47

HCAR 1.13 1.26 1.41 1.54CR 0.98 0.97 0.96 0.96CTAR 0.99 0.98 0.97 0.97IR 1.00 1.00 1.00 1.00ITAR 1.01 1.01 1.01 1.01ICAR 1.01 1.01 1.01 1.01HR 368 557 662 841

Max Heap HTAR 513 707 815 996HCAR 636 979 1132 1459CR 670 872 989 1134CTAR 697 899 1025 1155IR 345 533 642 799ITAR 490 683 795 954ICAR 605 928 1075 1359

3) It should be recognized that the time saved by the useof a register stack is realized at the point of process preemp-tion. This means that the delay incurred when a high priorityprocess interrupts a lower priority process is reduced. This isimportant to real-time programmers.Given the current development of both concurrent high

level languages and architectures with large register banks,we have demonstrated that the development of a small sepa-rate register stack to house evaluation temporaries in a pre-emptive priority scheduling context is advantageous.

V. CONCLUDING REMARKS

The main advantage of the schemes advocated is that theregister sets of the processors are used to contain heavilyaccessed program data locations (the evaluation temporaries)

The authors express their gratitude to the referees for thedetailed comments and numerous suggestions which led toimprovements in the paper.

REFERENCES

[1] "Preliminary Ada reference manual," ACM SIGPLAN Notices, vol. 14,no. 6, 1979.

[2] "Reference manual for the Ada programming language," ACM SigplanAdatec Special Publ., draft proposed ANSI standard doc., 1982.

[3] L. A. Belady and C. J. Kuehner, "Dynamic space sharing in computersystems," Commun. Ass. Comput. Mach., vol. 12, no. 5, pp. 282-288,1969.

[4] D. Bobrow and B. Wegbreit, "A model and stack implementation of

multiple environments," Commun. Ass. Comput. Mach., vol. 16,no. 10, pp. 591-603, 1973.

[5] D. R. Ditzel and H. R. McLellan, "Register allocation for free: The Cmachine stack cache," in Proc. Symp. Architectural Support Pro-gramming Lang. Oper. Syst., 1982, pp. 48-56.

[6] P. B. Hansen, The Architecture of Concurrent Programs. EnglewoodCliffs, NJ: Prentice-Hall, 1977.

[7] J. D. Ichbiah, J. G. P. Barnes, J. C. Heliard, B. Krieg-Brueckner, 0.

Roubine, and B. A. Wichmann, "Rationale for the design of the Ada

programming language," ACM Sigplan Notices, vol. 14, no. 6, 1979.[8] R. K. Johnsson and J. D. Wick, "An overview of the Mesa processor

architecture," in Proc. Symp. Architectural Support Programming Lang.Oper. Syst., 1982, pp. 20-29.

[9] J. P. Kearns and M. L. Soffa, "Performance comparison of copy-lesscoroutine implementations," in Proc. 5th IEEE COMPSAC, 1981,pp. 213-218.

[10] J. P. Kearns, C. J. Meier, and M. L. Soffa, "Performance evaluation ofcontrol implementations," IEEE Trans. Software Eng., vol. SE-8,pp. 89-96, 1982.

[11] B. W. Lampson, J. G. Mitchell, and E. H. Satterwaite, "On the transferof control between contexts," in Lecture Notes in Computer Science19. New York: Springer-Verlag, 1974.

[12] -B. W. Lampson, "Fast procedure calls," in Proc. Symp. ArchitecturalSupport Programming Lang. Oper. Syst., 1982, pp. 66-76.

3,

0

2

1

839

/

I LI ,

IEEE TRANSACTIONS ON COMPUTERS, VOL. c-34, NO. 9, SEPTEMBER 1985

[13] J. 0. Mitchell, W. Maybury, and R. Sweet, Mesa Language Manual,Xerox Corp., Palo Alto, CA, 1979.

[14] J. K. Ousterhout, D. A. Scelza, and P. S. Sindhu, "Medusa: An experi-ment in distributed operating system structure," Commun. Ass. Comput.Mach., vol. 23, no. 2, pp. 92-105, 1980.

[15] J. K. Ousterhout, "Scheduling techniques for concurrent systems," inProc. 3rd Int. Conf. Distrib. Comput. Syst., 1982.

[16] D. A. Patterson and C. H. Sequin, "RISC 1: A reduced instruction setVLSI computer," in Proc. 8th Int. Symp. Comput. Architecture, 1981,pp. 443-457.

[17] G. Radin, "The 801 minicomputer," in Proc. Symp. Architectural Sup-port Programmning Lang. Oper. Syst., 1982, pp. 39-47.

[18] R. D. Reichard, "The impact of a register set upon storage managementfor retentive control," Dep. Comput. Sci., Univ. Pittsburgh, Pittsburgh,PA, Tech. Rep. 83-3, 1983.

[19] R. L. Sites, "How to use 1000 registers," in Proc. Caltech Conf. VLSI,1979, pp. 527-532.

[20] A. S. Tanenbaum, "Implications of structured programming for machinearchitecture," Commun. Ass. Comput. Mach., vol. 21, no. 3,pp. 237-246, 1978.

121] N. Wirth, "Modula: A language for modular multiprogramming," Soft-ware Practice Experience, vol. 7, no. 1, pp. 3-36, 1977.

[22] , "Design and implementation of Modula," Software Practice Expe-rience, vol. 7, no. 1, pp. 67-84, 1977.

Donna Quammen received the M.S. degree in com-puter science from the University of Pittsburgh,Pittsburgh, PA, in 1981 and is currently working onthe Ph.D. degree in computer science, also at theUniversity of Pittsburgh.Her main areas of interest are the implementation

of concurrency and real-time systems.Ms. Quammen is a member of the Association

for Computing Machinery and the IEEE ComputerSociety.

John Philip Kearns received the M.S. and Ph.D.degrees in computer science from the Universityof Virginia, Charlottesville, in 1976 and 1979,respectively.He is currently an Assistant Professor of Com-

puter Science at the University of Pittsburgh,Pittsburgh, PA. His research interests include theimplementation of programming languages, per-

ALj- _ formance evaluation, program behavior, and operat-'I _ ing systems.

Dr. Keams is a member of Phi Eta Sigma and theAssociation for Computing Machinery.

Mary Lou Soffa received the M.S. degree in mathe-matics from The Ohio State University, Columbus,and the Ph.D. degree in computer science from theUniversity of Pittsburgh, Pittsburgh, PA, in 1977.

She is currently an Associate Professor of Com-puter Science at the University of Pittsburgh. Theimplementation of programming languages is herprimary area of research, and in particular, the im-

[00010SV plementation of sequential and concurrent controlforms. A current interest is the incremental gener-ation of optimized code. Her other research areas

include programming tools and language interfaces.Dr. Soffa is a member of the Association for Computing Machinery,

SIGPLAN, and SIGSOFT.

840