dynamic improvement of locality in virtual memory systems

9
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL SE-2, NO. 1, MARCH 1976 Calvin C. Elgot was bom in New York City, NY, on January 3, 1922. He received the B.S. degree from the City College of New York, New York, NY, and the A.B. degree from Columbia University, New York, NY, and the Ph.D. degree from the University of Michigan, Ann Arbor, all in mathematics, in 1948, 1949, and 1960, respectively. Since 1959 he has been a Research Staff Member with the IBM Thomas J. Watson Research Center, Yorktown Heights, NY. He has also been an Adjunct Professor with Columbia University since 1972. His special fields of research and interest include theory of computation, algebra, logic, and mathematical aspects of computer science. Dr. Eigot is a member of the American Mathematical Society, the Mathematical Association of America, Pi Mu Epsilon, Sigma Xi, the Association for Computing Machinery Special Interest Group on Automata and Computability Theory (SIGACT), and the Associa- tion for Symbolic Logic, of which he is Consulting Editor for its journaL He is also Editor of the Journal of Computer and System Science. Dynamic Improvement of Locality in Virtual Memory Systems JEAN-LOUP BAIER, MEMBER, IEEE, AND GARY R. SAGER Abstract-Replacement algorithms for virtual memory systems are typically based on temporal measures of locality, while predictive load- ing and program restructuring are based on spatial measures of locality. This paper suggests some techniques for dynamically improving the spatial locality of a program via predictive loading and virtual space re- structuring, and presents the results of applying these techniques to actual programs. Bounds are derived for the performance of the methods. Index Terrns-Virtual memory, paging, locality, replacement algo- rithm, predictive loading, dynamic restructuring. I. INTRODUCTION VrIRTUAL memory, and paging in particular, is now widely V accepted as a viable solution to the necessity of a hierar- chical organization of storage in computer systems. How- ever, the performance of programs running in a paging system leaves room for substantial improvement. Components of the system have been studied and important results have been ob- tained in such areas as replacement algorithms [1], dynamic versus static allocation of primary memory [4], and a poste- noni restructuring of programs [6], [7]. It should be noted that the first two techniques are constrained by the available hardware (for exarmple, it is impossible to implement a stack algorithm [12] or the working set model [4] correctly on a machine without hardware which records page usage statistics Manuscript received March 24, 1975; revised July 7, 1975. J.-L. Baer is with the Department of Computer Science, University of Washington, Seattle, WA. G. R. Sager is with the Department of Computer Science, University of Waterloo, Waterloo, Ont., Canada. at every reference), while the third requires preprocessing of memory reference streams, which is expensive due to the large amount of data involved. The purpose of this paper is to investigate some execution- time techniques which might either supplement or supplant those just mentioned to yield a reasonable gain in performance by improving the "locality" of the subject program as it runs. To this end, we begin Section II with a semiformal definition of the concept of locality. We then consider the problem of predictive loading and show that predictive loading can be used not only to decrease the number of page exceptions but also to decrease the tendency of the subject program to generate "bursts" of page exceptions. These results are sub- stantiated by experiments on actual subject programs. In Section III we investigate a technique for virtual space re- structuring which could prove to be useful in a memory hierarchy with more than two levels. The common motivating theme of these techniques is the modification of the virtual space arrangement of pages based on observations of locality of reference; thus, pages which neighbor in a locality are made neighbors in virtual space. In the case of preloading, this in- volves the mutual loading of neighboring pages; in the re- structuring experiments, changes are made in the structure of virtual space at levels above the executable store. Before continuing our discussion, we present a short expla- nation of the context and terminology of the study; the reader who finds this explanation to be inadequate may wish to refer to the comprehensive survey [5]. In a paging system, the memory as perceived by the user (virtual space) is divided into equal size units of information si

Upload: doanxuyen

Post on 13-Feb-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Improvement of Locality in Virtual Memory Systems

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL SE-2, NO. 1, MARCH 1976

Calvin C. Elgot was bom in New York City,NY, on January 3, 1922. He received the B.S.degree from the City College of New York,New York, NY, and the A.B. degree fromColumbia University, New York, NY, and thePh.D. degree from the University of Michigan,Ann Arbor, all in mathematics, in 1948, 1949,and 1960, respectively.Since 1959 he has been a Research Staff

Member with the IBM Thomas J. WatsonResearch Center, Yorktown Heights, NY.

He has also been an Adjunct Professor with Columbia Universitysince 1972. His special fields of research and interest include theory ofcomputation, algebra, logic, and mathematical aspects of computerscience.Dr. Eigot is a member of the American Mathematical Society, the

Mathematical Association of America, Pi Mu Epsilon, Sigma Xi, theAssociation for Computing Machinery Special Interest Group onAutomata and Computability Theory (SIGACT), and the Associa-tion for Symbolic Logic, of which he is Consulting Editor for itsjournaL He is also Editor of the Journal of Computer and SystemScience.

Dynamic Improvement of Locality in Virtual MemorySystems

JEAN-LOUP BAIER, MEMBER, IEEE, AND GARY R. SAGER

Abstract-Replacement algorithms for virtual memory systems aretypically based on temporal measures of locality, while predictive load-ing and program restructuring are based on spatial measures of locality.This paper suggests some techniques for dynamically improving thespatial locality of a program via predictive loading and virtual space re-structuring, and presents the results of applying these techniques toactual programs. Bounds are derived for the performance of themethods.

Index Terrns-Virtual memory, paging, locality, replacement algo-rithm, predictive loading, dynamic restructuring.

I. INTRODUCTIONVrIRTUAL memory, and paging in particular, isnow widelyV accepted as a viable solution to the necessity of a hierar-

chical organization of storage in computer systems. How-ever, the performance of programs running in a paging systemleaves room for substantial improvement. Components of thesystem have been studied and important results have been ob-tained in such areas as replacement algorithms [1], dynamicversus static allocation of primary memory [4], and a poste-noni restructuring of programs [6], [7]. It should be notedthat the first two techniques are constrained by the availablehardware (for exarmple, it is impossible to implement a stackalgorithm [12] or the working set model [4] correctly on amachine without hardware which records page usage statistics

Manuscript received March 24, 1975; revised July 7, 1975.J.-L. Baer is with the Department of Computer Science, University of

Washington, Seattle, WA.G. R. Sager is with the Department of Computer Science, University

of Waterloo, Waterloo, Ont., Canada.

at every reference), while the third requires preprocessing ofmemory reference streams, which is expensive due to thelarge amount of data involved.The purpose of this paper is to investigate some execution-

time techniques which might either supplement or supplantthose just mentioned to yield a reasonable gain in performanceby improving the "locality" of the subject program as it runs.To this end, we begin Section II with a semiformal definitionof the concept of locality. We then consider the problem ofpredictive loading and show that predictive loading can beused not only to decrease the number of page exceptions butalso to decrease the tendency of the subject program togenerate "bursts" of page exceptions. These results are sub-stantiated by experiments on actual subject programs. InSection III we investigate a technique for virtual space re-structuring which could prove to be useful in a memoryhierarchy with more than two levels. The common motivatingtheme of these techniques is the modification of the virtualspace arrangement of pages based on observations of localityof reference; thus, pages which neighbor in a locality are madeneighbors in virtual space. In the case of preloading, this in-volves the mutual loading of neighboring pages; in the re-structuring experiments, changes are made in the structure ofvirtual space at levels above the executable store.Before continuing our discussion, we present a short expla-

nation of the context and terminology of the study; the readerwho finds this explanation to be inadequate may wish to referto the comprehensive survey [5].In a paging system, the memory as perceived by the user

(virtual space) is divided into equal size units of information

si

Page 2: Dynamic Improvement of Locality in Virtual Memory Systems

BAER AND SAGER: IMPROVEMENT OF LOCALITY IN MEMORY SYSTEMS

(pages) which are stored in equal size units of storage (frames).The hardware of a paging system maps the user's virtual spaceaddresses into real space addresses by substituting the appro-priate primary memory frame number for the page number.Loading, or the reading of pages from secondary into primarymemory frames, occurs when a reference is made to a pagewhich is not resident in primary memory. This situation iscalled a page fault or exception; typically, loading must becompleted before execution continues. Since loading occursonly when the program explicitly demands the information,this arrangement is known as demand loading. Replacement,or the writing of pages from primary to secondary memoryframes, occurs when it is determined that a page has a lowprobability of reference in the near future. It should be notedthat a frame can be simply marked "empty" if no change wasmade in the page it contained, thereby avoiding the cost of awrite.

If the primary memory allocation (as measured by the num-ber of primary memory frames assigned for the program's use)is fixed, the loading and replacement mechanisms are neces-sarily closely related: a replacement must occur to make roomfor the page being loaded. Examples of this close relationshipare first in first out (FIFO) (pages are replaced in the order ofloading) and LRU (the least recently used page is replaced).If the primary memory allocation is allowed to vary, loadingand replacement may become disjoint activities, as in Denning'sWorking Set model (pages not referenced within the latest rreferences are replaced).As is the usual practice in paging system studies, we use the

memory reference as the unit of time.

II. PREDICTIVE LOADINGA. Locality

Informally, locality has been defined as the property of aprogram to favor a relatively small subject of its pages duringany process-time interval of its execution. It has been sug-gested, however, that locality can be defined in two com-ponents [11].

1) Temporal locality is the tendency of a program toreference during the process-time interval (t, t + r) those pageswhich were referenced during the interval (t - r, t). Loops,constants, and temporary variables or working stacks areconstructs which lead to this concept.2) Spatial locality is the tendency of a program to make a

reference to the virtual space adjacent to the last reference.More specifically, if a is the virtual address referenced at timet, spatial locality is the tendency to reference the (virtual)space (a - k, a + k) during time (t, t + r). Sequential por-tions of code, placement, and traversals of arrays lead to thisconcept.Naturally, temporal and spatial locality coexist during the

execution of a program.It is worthwhile to note that traditional replacement algo-

rithms have been based on temporal measures: FIFO on thelongest time of residence, LRU on the longest time to the lastreference, and Working Set on the time to the last reference.In all cases, these algorithms attempt to take advantage of

temporal locality to approximate the "longest time to nextreference" replacement strategy of MIN, Belady's unrealizableminimal fault algorithm [1].On the other hand, restructuring of code and data either a

priori [13] (i.e., at compile and/or load time with no informa-tion from an execution), or a posteriori [6], [71 (i.e., basedon examination of one or more executions) is aimed at im-proving spatial locality. We contend that our techniques fordynamic improvement of locality operate on spatial locality.

B. The Predictive Strategy

As previously mentioned, in a demand paging system, a pageis loaded into primary memory only when an explicit refer-ence is made to it. However, predictive loading, that is, theloading of a page in anticipation of a reference to it, has beenadvocated in the design of "intelligent" systems (2]. Studiesof predictive loading have been reported by Joseph [8](whose method inspired the OBL algorithm described in Sec-tion IID) and by Madnick [11] (whose tuple-coupling tech-nique encompasses more than predictive loading). Both ofthese studies present a priori techniques: they assume astatic preconceived definition of spatial contiguity based oncontiguity of the virtual addresses. But, because of separationof code from data, the modular subdivision of a program intoscattered subroutine and data areas, or the various bindingsof subroutines to their parameters, the actual "continguities"dynamically warp the virtual space in a manner which defiesstatic description and may even escape the comprehension ofthe program's author.The rationale behind predictive loading is not to save on the

amount of pages fetched from secondary to primary memoryand hence to save transfer time (an impossible feat if no accesstime is required in addition to transfer time [3]), but to re-duce the number of times this operation is called for by trans-ferring possibly more than one page at once. Thus the num-ber of unscheduled program interruptions will be diminished.This will present obvious advantages in a multiprogrammingsystem.

C. Analysis of the Performance ofPredictive LoadingWe consider three preloading schemes used in conjunction

with a replacement algorithm having the stack property [12].The demanded page will be placed at the top of the stack inthe usual manner; when preloading occurs, an additional re-placement is done and the preloaded page is placed in the Rthposition of the stack (where R is the number of frames allo-cated to the program). Thus, if another page fault occurs be-fore the preloaded page is referenced, the preloaded page isreplaced. By restricting our attention to this form of pre-loading, we obtain an advantage in analyzing the performanceof the system.We must assume that page faults are a renewal process, hence

the lengths of successive execution intervals between faults are

independent and also independent of whether a prediction,correct or incorrect, was made (assumptions of this type are

discussed in [ 10] ). Let e(R) be the expected length of execu-

tion intervals when a subject program is executed with Rframes of primary memory. Note that the number of faults

55

Page 3: Dynamic Improvement of Locality in Virtual Memory Systems

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MARCH 1976

during the execution will be f(R) = n/e(R), where n is thenumber of references made during the execution.Given a probability h of a correct prediction when a page

fault occurs, the expected execution interval length will be

el (R, h) = (I - h) - e(R - 1) + h * [e(R - 1) + e(R)] * (1)The term (1 - h) - e(R - 1) is due to the fact that the errone-ously preloaded page effectively reduces the primary memoryallocation by one frame. On the other hand, if the predictionis correct one expects e(R - 1) references to elapse before thepreloaded page is needed, then e(R) references before the nextfault.Converting (1) from execution interval lengths to number of

faults yields

IiR) l/f(R - 1) +hlf(R) (2

As a test of the validity and accuracy of this analysis, weperformed the following experiment on two subject programs:using LRU replacement as our basis, runs were made for pre-selected values of 0 < h < 1; at each fault, the Rth positionof the LRU ordering was replaced by a special marker-withprobability h the marker was chosen to represent a correctpreload, while with probability 1 - h it was chosen to repre-sent an incorrect preload. Whenever the correct preloadmarker is encountered by the simulated address mapper, itis changed to represent the page being referenced (therebypreventing a fault); the incorrect preload marker will nevermatch a legal page number. Table I presents a comparison ofexperimental results with (2), demonstrating agreement within3 percent. We may also note that in both cases, h > 0.2implies fi (R, h) <f(R), thus representing an improvement.However, we will see that the value of h necessary to obtainan improvement is a function off(R) and f(R - 1).Equation (2) can be extended to account for the fact that

preloading may not always occur, either due to insufficientinformation to make a prediction or because the predictedpage is already resident in primary memory. Ifp is the propor-tion of faults for which prediction occurs, then, reasoning asfor (1), we have

e2(R,h,p)= (I - p) -e(R)+p -el (R,h)where the first term corresponds to no prediction at all. Sub-stituting for el (R, h) and converting to faults gives

[2 R, h, p) = 13f2(R, Xp) lpf(R - 1) + (I - p + p * h)/f(R)'The reader may wish to apply this formula to later empiricalresults (see Table III) and assure himself or herself that anybiases introduced by the invalidity of the assumptions resultin conservative appraisals by (3).

D. Predictive Loading TechniquesWe now present three predictive loading techniques which

have been tested on three subject programs using an LRUreplacement policy. These techniques are described in detailin Appendix A. Our desire to obtain results which are quan-

TABLE ICOMPARISON OF EXPERIMENTAL RESULTS WITH EQUATION (2)

Trace I (R=20), f(20)=486 Trace II (R=52), f(52)=581f (1 9)=591 f(51)=682

h f measured fI(R,h) h f measured fl(R,h)

0 591 591 0 682 682

.1 525 527 .1- 612 610

.18 476 485 .2 549 552

.28 436 441 .3 505 504

.38 396 404 .41 459 460

.47 369 376 .5 429 430

.58 346 347 .57 409 409

.71 317 317 .67 370 382

.8 298 300 .79 349 354

.9 280 282 .89 330 334

1 266 267 1 309 314

titatively comparable to the preceding analysis imposes thefollowing restrictions:

a) a stack algorithm must be used as the basis for replace-ment;

b) the primary memory allocation must be static;c) preloading is limited to a single page (our experience and

later analysis shows that this is probably desirable);d) preloading is done only at page faults.

The first two restrictions are easily removed; one can easilyadapt the techniques to nonstack algorithms such as FIFO, orextend the technique to aid dynamic allocation policies suchas Denning's Working Set or a dynamic LRU [14]. We expectthat adaptations or extensions will produce results qualitativelysimilar to our own.The three algorithms are as follows.1) One Block Lookahead (OBL): This technique was

studied, in a different form, by Joseph in 1970 [8]. The pre-dictive function is defined by PRED[i] = i + 1. When a faultoccurs on a reference to page q then, if PRED [q] is not al-ready in primary memory, it is loaded with q and takes theRth position in the LRU ordering. Of course, the page in theRth position is first copied to secondary memory wheneverpreloading occurs.2) Spatial Lookahead (SL): If we assume that a program

typically executes for long periods within a locality, thengenerates a series of faults at short intervals (i.e., a burst), wemay conjecture that recurrent pattems may exist which tendto identify the locality- being entered: namely, the samesequence of faults should occur as that locality gathers up itsspace. This is quite a reasonable assumption according to pre-vious experiments [10]. Therefore, we propose to redefinecontiguity in the virtual address space by dynamically updat-ing the PRED function in an attempt to account for warps.The algorithm is as follows: initially, PRED[i] = i + 1 and

LAST = b; thereafter, when a fault occurs to a page q, theLRU ordering process leaves LRU[I] = q and removesLRU[R]. If preloading did not occur on the previous faultor if preloading occurred but the page has not been referencedthen update PRED [LAST] = q. If PRED [qI is not alreadyresident in primary memory, then do a preload (i.e., LRU[R] =

56

Page 4: Dynamic Improvement of Locality in Virtual Memory Systems

BAER AND SAGER: IMPROVEMENT OF LOCALITY IN MEMORY SYSTEMS

PRED [q]). Set LAST= q. It should be noted that if pre-loading occurred and the preloaded page was referenced, thenupdating PRED [LAST] would destroy correct predictiveinformation.3) Temporal Lookahead (TL): Whereas SL is designed to

take advantage of spatial localities, this technique attempts toupdate PRED on a temporal basis by recording a connectionbetween the page referenced at time t - 1 and one causing afault at time t. The actual operation is similar to that of SL:initially, PRED [iI = i + 1. Thereafter, when a fault occurs toa page q, the LRU ordering process leaves LRU[1] = q and re-moves LRU[R], then updates PRED[LRU[2]] = q. IfPRPED [q I is not already resident in primary memory, then doa preload (i.e., LRU[R] = PRED [q] ).As a further illustration of the three techniques, Table II

gives an example of their operation on the execution of aprogram referencing 8 pages {1, 2,..., 8}. With LRU re-placement and an allocation of 4 frames, 15 page faults result.Under OBL, many faults to sequentially numbered pages areeliminated and only 10 interruptions result, while the numberof loads increases from 15 to 16. Only one incorrect preloadwas made (on the first reference to page 6). This circumstanceis rectified by SL, which sets PRED [6] = 5 when the error isdetected and realizes an advantage when the sequence recurs.The TL technique appears to update PRED more frequently,yet yields the same performance as SL. However, these ex-amples are designed solely to illustrate the operation of thetechniques; experiments involving real programs are requiredbefore conclusions can be drawn.

E. ExperimentalResultsThe preceding techniques described were tested on traces of

three subject programs: (1) 1.1 X 106 references generated bya Fortran compiler, (II) 1.25 X 106 references generated byan XPL compiler, and (III) 0.55 X 106 references generatedby a single pass assembler. For each trace, R was chosen tocorrespond approximately to the knee of the parachor (i.e.,f versus R) curve as observed by running the program withLRU replacement. The knee represents a tradeoff point, as aslight decrease in R results in a large increase in the number offaults, but an increase in R does not significantly decrease thenumber of faults. Refer to Fig. 1 for a typical parachor curve.All experiments were run with a page size of 256 words.Table III summarizes the number of faults, h (accuracy of pre-diction), and p (proportion of predictive loads) for MIN, LRU,OBL, SL, and TL strategies.The following observations can be made from Table III.

First, OBL yields a 5: 1 range in the accuracy of its prediction;TL is more consistent than OBL, but is still erratic; SL appearsto have an accuracy of prediction which is consistently closeto 0.4 and prediction occurs half of the time. SL was alwaysan improvement over LRU, by 16 percent, 14 percent, and3 percent, respectively. It should be noted that the ratiof(R - 1)/f(R) is a determining factor in calculation ofimprovement:

f(R) - f(R, h, p)improvement = 100 percent -

fR

or, substituting from (3),

improvement = 100 percent

\ 1-p + p - h + pl/[f(R - 1)/f(R)J

(4)

Since p and h are approximately equal for all three experi-ments, the small improvement for Trace III can be directlyattributed to the fact that the f(R - l)/f(R) ratio is signifi-cantly larger than for Traces I and II. A larger value of R(hence, a move further to the right of the knee) could havebeen beneficial in this case. Equation (4) makes it apparentthat predictive loading can improve performance only whenalready operating in the shallow part of the parachor curve.In order to determine the extent to which redefinition of

the predictive function occurs, at the end of each simulationwe calculated the proportion of pages referenced for whichthe value of the predictive function had changed from itsoriginal (OBL) value. These departure ratios are presented inthe bottom two rows of Table III and indicate that SL ex-tensively redefines the concept of spatial contiguity, while'TLrepresents much less of a departure from OBL. This appearsto be due to the tendency of TL to redefine PRED for a fewcode pages which make many references to several data pages.Finally, the tendency of a program to generate bursts of

faults is measured by the distribution of execution intervallengths. If it is in fact true that locality transitions causebursts to identifiable sequences -of pages as is suggested in[10], then SL should tend to reduce the density of shortexecution intervals by combining pairs of consecutive shortexecution intervals to create one longer interval. This phe-nomenon is substantiated in Figs. 2 and 3, which show thecumulative distribution functions of execution intervals forTraces I and II under LRU and LRU with SL: evidently, thedensity for LRU with SL is less biased toward short executionintervals than is LRU alone (note that the density for LRUalone grows faster below 103 references between faults, whileLRU with SL grows faster above 103 references betweenfaults). In addition, a slight reduction in the coefficient ofvariation was noted when predictive loading was used. Thus,the use of predictive loading could have beneficial effects onthe queueing for secondary memory service [3].

III. DYNAMIC RESTRUCTURING OF THE VIRTUAL SPACE

A. A Posteriori RestructuringHatfield and Gerald [7] and Ferrari [6] have shown that

substantial improvements can be made in the performance ofprograms running in a paging environment by rearranging, orduplicating and rearranging, relocatable sectors of programs.Their goal was to insure that those portions of the subject pro-gram which are needed within a relatively short time of oneanother are located contiguously in virtual space; that is, theyattempt to remove warps from the virtual space in such a waythat demand paging will tend to account for spatial and tem-poral locality. By considering program sectors of one-tenth toone-third the page size, improvements in the page fault rate

57

Page 5: Dynamic Improvement of Locality in Virtual Memory Systems

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MARCH 1976

TABLE IIEXAMPLE APPLICATIONS OF THE PREDICTIVE METHODS. PREDICTIVE LOADSARE INDICATED WITH BoxEs AROUND THE PRELOADED PAGE NUMBER

referencestream 1 2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 1 2 1 3 4

1 2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 1 2 1 3 4LRU 1 2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 1 2 1 3

2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 4 2 12 1 3 3 3 3 4 6 5 7 8 3 3 6 5 5 4 2

#faUltS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15loads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 1 3 4 6 5 4 6 5 7 3 3 4 6 5 4 1 2 l 3 4

m21346546578346544213

0BL ~~~~~~21 3 4 6 5 4 6 5 7 8 3 4 6 5 4 4 2 1EU2 =73 3 3 3, 161 a]7 8 3 3[:5 5 4 2

interruptions 1 2 3 4 5 6 7 8 9 10#loads 2 4 6 7 9 11 12 13 15 16

1 2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 1 2 1 3 4

SL m~~~- 1 2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 1 2 1 3

SBL 21 4 54 5 8 46 4 2

2 1 3 4 6 5 4 6 5 7 8 3 4 6 5 4 4 2 1lZ32 Y73 3 3 3jEu6 j 7 8 3 3Q1135 5 4 2

#interruptions 1 2 3 4 5 6 7 8 9#loads 2 4 6 7 9 11 13 15 16

ChangeS to PRED2[354]7665 4

#loads 2 4 6 7 9 11 13 15 16

ChangeS to PRED [51] = 6 RD[5]=7g PE

PRED

_12 1 465465783465412134TL 2 4 654 6 5 78 3 4654 1 2 1 3

LE2LZJ3 3 3 3La]i6Ei=4j7LEl3 3 611113 4 2

#interruptions 1 2 3 4 5 6 7 8 9#loads 2 4 6 7 9 11 13 14 16'

t ~ ~ ~~~~~tchanges to PRED[1] =3 PRED[5] = 71' PRED[ 4] 1PRED PRED[41 PRED[8] = 3 PRED[1 =_

PRED[61 =

/f

(# faults)V-

"knee" of the curve

R

(8 primary frames)

Fig. 1. Parachor curve.

from 3:1 for page sizes of 512 words up to 10:1 for page

sizes of 2048 words were obtained [7].The disadvantage of this technique is that it requires the

preprocessing of a "representative" memory trace of eachsubject program, thus making its use expensive and restrictingit to production programs which are not highly data depen-dent. Therefore, it is natural to consider whether restructuringthe program during execution could be done profitably.

B. Dynamic Restructuring

Unfortunately, it is not possible to dynamically restructureprograms in a straightforward manner since this impliesvirtual address modifications. For example, the replication ofheavily used pieces of code is trivial with a priori or aposteriorirestructuring, whereas too much binding has occurred to dothis at execution time. However, a restricted form of dy-namic, or execution time, restructuring can be realized when-ever two contiguous levels of memories in a hierarchy have a

different frame size: assuming that the higher the level, thelarger the frame, then pages must coalesce as they move up thehierarchy (further from the executable store). Likewise, aspages move down the hierarchy, they pull their "partners"down with them. A limited form of this is discussed byMadnick [11 .

With this arrangement, a movement down the hierarchy is infact preloading the lower levels.At all but the lowest levels of the hierarchy, the coalescing

of pages can be handled entirely in the software manipulationof the directories relating pages to the frames in which theyreside. At lower levels, especially in the case of cache mem-ory, the hardware requirements become prohibitive becauseconstruction, manipulation, and use of the directories must bedone at speeds which compare to that of the memory.For simplicity of exposition, we describe the restructuring

technique in terms of pages and frames at two contiguous levelsof the memory hierarchy,M1 and M2. In addition, given page(frame) sizes sl and S2, respectively, we restrict our attentionto

s2=21=k s, and s1=2', forsomei>j.

C. Description ofRestructuringIf a program is allocated k * R frames at level Ml and LRU

replacement is used, the LRU stack for level M1 will be k Rpositions deep and ordered from the most recently used pageof size sl (hereafter called a minipage) at the top to the leastrecently used at the bottom. When a page fault for minipageq occurs, k minipages will be loaded, q and its partners residingin an M2 frame. From the viewpoint ofMI, this correspondsto a preload of (k - 1) pages. In order to make room for the

58

Page 6: Dynamic Improvement of Locality in Virtual Memory Systems

BAER AND SAGER: IMPROVEMENT OF LOCALITY IN MEMORY SYSTEMS

TABLE IIIPERFORMANCE OF THE PREDICTIVE TECHNIQUES

TRACE I TRACE II TRACE IIIFORTRAN XPL assembler(R=20) (R=52) (R=11)

faults h p faults h p faults h p

MIN 268 - - 248 - - 383 - _

LRU@R-1 591 - - 682 - - 998 - _@R 486 - - 581 - - 578 - _

OBL 449 .29 .59 576 .15 .42 768 .06 .80

SL 421 .36 .55 498 .40 .55 558 .44 .45

TL 436 .32 .59 560 .20 .40 668 .18 .68

departure ratio departure ratio departure ratio

SL 26/33 65/76 15/17

TL 10/33 35/76 9/17

Note that p and h tend to be negatively correlated; that is, goodpredictors tend to agree with the replacement algorithm more orten, andtend to disagree only when they have a good chance of being correct.

TRACE I

LRU alone

LRU with SL

1i1 102 103

Execution Interval Length(number of memory references)

Fig. 2. Comparison of cumulative density of execution interval lengthsfor LRU with SL versus LRU alone.

incoming minipages, the k minipages at the bottom of thestack are written into a single M2 frame. An entry for mini-page q is made at the top of the stack, and the accompanyingminipages are entered in the bottom (k - 1) positions. This

1.0

TRACE II r',J

0.5 r

E C4..)

LRU alone

LRU with SL

100 101 102 103 104

Execution Interval Length(number of memory references)

Fig. 3. Comparison of cumulative density of execution interval lengthsfor LRU with SL versus LRU alone.

mode of operation allows us to apply much of the same rea-

soning as in the development of (2).As with the preloading techniques, we have imposed several

restrictions:

1.0 .

C)

E

aC k0..5 -

4.-_

59

Page 7: Dynamic Improvement of Locality in Virtual Memory Systems

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, MARCH 1976

TABLE IVDYNAMIC RESTRUCTURING OF TRACE I

kworst actual fRES fLRU4R s k case gain - RE LU

5 1024 4 35 43 830 486

5 1024 2 10 17 3361 1821

10 512 2 3 3.3 557 486

6 1024 4 20 24 332 191

6 1024 2 8 8.5 960 667

12 512 2 2.5 3 217 191

7 1024 4 18 24 83 39

7 1024 2 5 6.5 305 229

14 512 2 5.5 6.7 35 39

TABLE VDYNAMIC RESTRUCTURING OF TRACE II

worst actual s1R s k case gain fRES fLRU

13 1024 4 1.6 2 839 581

13 1024 2 1.33 1.5 1173 909

26 512 2 1.25 1.4 653 581

14 1024 4 2.3 2.8 388 264

14 1024 2 1.6 1.8 617 497

28 512 2 1.4 1.7 297 264

15 1024 4 2.9 4.1 148 133

15 1024 2 1.8 2.1 287 183

30 512 2 1.4 1.5 121 133

a) a stack algorithm must be used as the basis for replace-ment;

b) the primary memory allocation must be static;c) restructunrng is done only at page faults.

Again, the first two restrictions are easily removed.,Our desire to obtain experimental results susceptible to some

rudimentary analytical treatment has imposed yet anotherrestriction which may not be immediately apparent: since wealways load the demanded page and its k - 1 partners, the re-placement rule must make sure that subsequent preloads willnot result in duplicate pages at level Ml. The various ad-vantages and disadvantages of allowing duplicate pages at oneor more levels of the hierarchy add a degree of complexitywhich would be inappropriate at this stage of the research.

Several variations of the restructuring technique were tested,consisting mostly of different placements of the k - 1 partnerpages in the LRU ordering. For the most part, performancewas the same as placing the partner pages at the bottom of theordering. This is probably due to the tendency of the tech-nique to restructure by coalescing unused minipages into largepages and writing them to M2, where they remain unrefer-enced but out of the way.

D. Experimental ResultsAlthough we do not necessarily advocate this method for

M1 being primary memory (core) and M2 being secondarymemory (paging drum), we use these two levels as an exampleof the application and performance of the method. Reasoningas in the previous preloading experiments (see (1) and (2)),we see that if restructuring is always the worst possible, that

is, if none of the (k - 1) minipages placed at the bottom of thestack are referenced before the next fault, then the number offaults incurred will be identical to demand paging with LRUreplacement and k R - (k - 1) miniframes. On the otherhand, if we do demand paging and fix the association betweenminipages (i.e., no restructuring), the number of faults incurredwill be identical to demand paging with LRU replacement withR frames of size S2. These two observations can be used tocompute the "worst case" gain expected from restructuring:

fiku(R)lflu(k R - (k- 1))where the superscripts indicate the page size used and sub-scripts indicate the replacement algorithm.Tables IV and V show the worst case and actual gain for

Traces I and II. Actual gain is computed as

QLu (R)IfRES(R, S2) k)where fRES is the number of faults (or, more correctly, pro-gram interruptions)' observed when restructuring with k Rminiframes and k minipages perM2 frame.As can be seen, the gains for Trace I are considerably larger

than those for Trace IL. This can be attributed to the fact thatTrace I tends to reference only small portions of its pages,while Trace II references large portions of its pages; thus, aspreviously noted, restructuring tends to "regroup" the mini-pages for Trace I by associating the unused portions of largepages with other unused portions and eventually moving themto M2 to be forgotten; this allowed the intrinsic locality ofTrace I to exhibit itself.No such regrouping was possible for Trace II. These results

seem to indicate that the effectiveness of dynamic restructur-

60

Page 8: Dynamic Improvement of Locality in Virtual Memory Systems

BAER AND SAGER: IMPROVEMENT OF LOCALITY IN MEMORY SYSTEMS

ing should be closely related to the superfluity measure sug-gested by Kuck and Lawrie [9].Tables IV and V also give the figures for fRES (R, s2, k) and

fLRhu (k R) (i.e., the number of faults for LRU replacementusing k R miniframes). In all cases but two, we see thatfRES(R, S2, k) >fu (k - R); the two departures representcases in which the total number of minipages referenced isessentially k -R. Thus, dynamic restructuring does not per-form as well as LRU replacement with the smaller page sizeand cannot be justified on that basis. However, the economicsof memory hierarchies dictate that the higher slower levels ofthe hierarchy should transfer larger pages than the lowerfaster levels [5] , [11] . In this case, transferring pages betweenlevels of differing frame size becomes a necessity, and restruc-turing will improve performance at those levels.

IV. CONCLUSIONS

We have presented several techniques for dynamically im-proving the locality of programs running in a paging environ-ment. By restricting the manner in which preloading couldinteract with the replacement algorithm, we developed analyt-ical arguments concerning the performance of the techniques;in particular, it was determined that preloading could not beeffective unless the ratio f(R - 1)/f(R) is close to 1; thiscorresponds to that part of the parachor curve to the rightof the knee. The analytical arguments were substantiated byexperiments on actual programs.The techniques based strictly on preloading could be im-

plemented with minimal additions to the software and noadditional hardware; the advantage of these techniques is thereduction of the number of times the loading and replacementrules are invoked. Dynamic restructuring was shown to yieldsignificant improvements when large portions of pages wereunus6d; however, the hardware requirements for use at thelevel of executable store could not be justified in terms ofperformance improvement over use of the small page sizealone. Thus, restructuring should be considered for use at thehigher levels of the hierarchy, where directories could bemaintained by the software or by microprocessors.Further research is needed in the area of dynamic allocation

with preloading, and on the use of preloading in hierarchiesconsisting of three or more levels. Better restructuring rulesare perhaps possible if the problem of duplication of pages isovercome so that the loading and replacement rules canoperate with less interdependence.

APPENDIX A

PROGRAMMATIC DESCRIPTIONS OF THE PREDICTIVETECHNIQUES NOTATION

t Counter indicating the reference being processed.PT Total process time, or the number of references

issued.ri i-th page reference.

PRED Predictive function.LRU LRU ordering of the R resident pages.order Procedure which does LRU ordering and replacement.fault Set by order to indicate whether or not a fault occurred.

One Block Lookahead (OBL)initially, PRED [i] = i + l Vi, LRU [i] = 0V1 i R,NFAULT = 0

for t v- 1 step 1 until PT dobegin q -- rt;order (LRU, q, fault);if fault then begin NFAULT <- NFAULT + 1;

if PRED [q] E LRU then LRU [R] - PRED [q]end

end

Spatial Lookahead (SL)initially, PRED [i] = i + I Vi, LRU [i] =0 V I < i S R,

LOAD = REF = true, LAST = 0, NFAULT =0for t v- 1 step 1 until PT dobegin q -- rt;order (LRU, q, fault);if fault then begin NFAULT v NFAULT + 1;

if 7 LOAD V (LOAD A 7 REF) thenPRED [LAST] - q;

REF v- false; LAST v- q;LOAD PRED [q] fLRU;if LOAD then LRU [R] v- PRED [q]end

else REF<-REFV(q=LRU[R])end

Temporal Lookahead (TL)initially, PRED [i] = i + I Vi, LRU [i] = 0V1 < i. R,NFAULT= 0

for t v- 1 step 1 until PT dobegin q v- rt;order (LRU, q, fault);if fault then begin NFAULT = NFAULT + 1;

if PRED [q] f LRU then LRU [R ] PRED [q];PRED [LRU [2]] v qend

end

REFERENCES[1] L. A. Belady, "A study of replacement algorithms for a virtual

storage computer,"IBM Syst. J.,vol. 5,no. 2,pp. 78-101, 1966.[2] T. C. Chen, "Distributed intelligence for user-oriented comput-

ing," 1972 Fall Joint Computer Conf, AFIPS Conf Proc., vol.41, p. II. Montvale, N.J.: AFIPS Press, 1972, pp. 1049-1056.

[3] E. G. Coffman and P. Denning, Operating System Theory.Englewood Cliffs, NJ: Prentice-Hall, 1973.

[4] P. J. Denning, "The working set model for program behavior,"Commun. ACM, vol. 11, no. 5, pp. 323-333, 1968.

[5] --, "Virtual memory," Comput. Surveys, vol. 2, no. 3, pp.153-189, 1973.

[6] D. Ferrari, "Improving locality by critical working sets,"Commun. ACM, vol. 17, pp. 614-621, Nov. 1974.

[7] D. J. Hatfield and J. Gerald, "Program restructuring for virtualmemory,"IBMSyst.J.,vol. 10,no. 3,pp. 168-192, 1971.

[8] M. Joseph, "An analysis of paging and program behavior,"Comput. J., vol. 13, no. 1, pp. 48-54, 1970.

[9] D. J. Kuck and D. H. Lawrie, "The use and performance ofmemory hierarchies: A survey," in Software Engineering, J. Tou,Ed. 1969,pp.45-77.

[101 P. Lewis and G. Shedler, "Empirically derived micromodels forsequences of page exceptions," IBM J. Res. Develop., vol. 17,pp. 86-100, Mar. 1973.

61

Page 9: Dynamic Improvement of Locality in Virtual Memory Systems

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL SE-2, NO. 1, MARCH 1976

[11] S. E. Madnick, "Storage hierarchy systems," Mass. Inst. Technol.,Cambridge, Tech. Rep. MAC-TR-1 07, 1973.

[12] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger, "Evalua-tion techniques for storage hierarchies," IBM Syst. J., vol. 9,no. 2, pp. 78-117, 1970.

[13] J. E. Morrison, "User program performance in virtual storage sys-tems,"IBMSyst. J., vol. 12, no. 3, pp. 216-237, 1973.

[14] G. Sager, "Dynamic storage allocation in a paged virtual mem-ory," Ph.D. dissertation, Univ. Washington, Seattle, 1972.

Jean-Loup Baer (S'66-M'69) received the Di-plome d'Ingenieur in electrical engineering andthe Doctorate 3e Cycle in computer sciencefrom the University of Grenoble, Grenoble,France, in 1961 and 1963, respectively, and thePh.D. degree from the University of California,Los Angeles, in 1968.From 1961 to 1963 he was a Research Engi-

neer with the Laboratoire de Calcul, Universityof Grenoble. From 1965 to 1969 he was a

member of the Digital Technology Group at

U.C.L.A. Since 1969 he has been on the faculty of the University ofWashington, Seattle, where he is currently an Associate Professor. Hispresent interests are in parallel processing, the management of memoryhierarchies, and in scheduling theory.Dr. Baer is a member of the Association for Computing Machinery

and the Association Francaise de Cybernetique Economique et Tech-nique. He served as an IEEE Computer Society Distinguished Visitorduring 1973-75 and is an Associate Editor of the Journal of ComputerLanguages.

Gary R. Sager was born in Shawnee, OK on September 13, 1946. Hereceived the Ph.D. degree in computer science from the University ofWashington, Seattle, in 1972.He was an engineer for Honeywell, Inc. from 1967 to 1970. He was

an Assistant Professor of Computer Science at Colorado State Uni-versity from 1972 to 1974. He is now an Assistant Professor of Com-puter Science at the University of Waterloo, Waterloo, Ont., Canada.He has published several papers in the field of operating systems. Hiscurrent research interests include file system performance, operatingsystem portability, and debugging techniques.Dr. Sager is a member of the ACM and SIAM.'

A Picture-Building SystemROBIN WILLIAMS, MEMBER, IEEE, AND GARY M. GIDDINGS

gIbstmct-The picture-building system (PBS) deals with the problemof creating and manipulating data structures for applications using com-puter graphics. PBS has a data definition facility that enables one tospecify structures needed for a given application, a manipulation facil-ity for loading and editing the data, and displaying the image stored inthe structures. The structures are defined in a relational data base andallow graphical attributes to be specified in the same way as nongraphi-cal attributes. There is a deliberate attempt to separate data from pro-

grams which is in accord with other data-base developments. It is alsopossible to mix text and vector data with raster scan data which we

have found to be very useful in some graphical applications. Some ex-

amples and a brief description of our color display system will be given.

Index Terms-Computer graphics, data structures, interactive systems,relational data base.

I. CONCEPT OF PICTURE BUILDING

XANY applications deal with pictorial representationsof objects and data. Line drawings and images are pre-

sented to the user, together with menus of commandsand the user interacts with the graphical presentations to spec-

Manuscript received July 2, 1975. This paper was presented at the1975 UCLA, IEEE, ACM Conference on Computer Graphics, PatternRecognition, and Data Structure, Los Angeles, CA, May 14-16, 1975.The authors are with the IBM Research Laboratory, San Jose, CA

95193.

ify computations. The application program frequently em-ploys a graphic subroutine package to perform the graphicalI/O, and in many cases this works quite well with simple arraysof coordinates and text strings. However, in many applica-tions there is a need to build more complex structures thansimple lists of x, y, z coordinates. Imagine trying to locate acarburetor in an engine from a large array of coordinatevalues! It has been argued in earlier papers [1]-[3] that theimportant thing to do is to build a model in the computer torepresent the parts, their attributes, and their interrelation-ships so that a whole variety of processing can be performedon the model. However, many attributes of data are notgraphical. It would be beneficial, therefore, to store graphicaldata together with nongraphical data in a manner that is con-sistent with current, or future, standard data bases. A majoraim is to separate data from application programs so that reor-ganizations of the data do not affect the validity of applicationprograms and so that many users (application programs) canshare the same data easily. A general mechanism to achievethis data independence for graphical data is explained. It isalso shown how raster data can be incorporated and used.The picture-building system (PBS) consists of a mechanism

for creating structures for graphical and nongraphical data, a

62