new visual characterization graphs for memory system ... · techniques applied to locality...

WSO III Workshop de Sistemas Operacionais l

14 a 20 de julho de 2006Campo Grande, MS

Anais do XXVI Congresso da SBC

New Visual Characterization Graphs for Memory System Analysis and Evaluation

Edson T. Midorikawa, Hugo Henrique Cassettari

Departamento de Engenharia de Computação e Sistemas Digitais Escola Politécnica – Universidade de São Paulo

Av. Prof. Luciano Gualberto, travessa 3, 158, 05508-900 Cidade Universitária, São Paulo – SP – Brasil

[email protected], [email protected]

Abstract. The characterization of workloads used in memory systems analysis and evaluation is important to detect factors in programs which may or may not favor the performance of each potential system configuration. In this paper, by suggesting the use of simple and functional graphs, we demonstrate that it is possible to visually get, with considerable precision, a number of relevant information about memory access characteristics. The new proposed graphs are useful both to document related experiments, in terms of memory access patterns and locality of references in workloads, and to foresee the performance of distinct memory systems. Applications of locality analysis include the improvement upon algorithms for memory management and techniques for program optimization.

1. Introduction

Computational processes tend to explore, with higher or lower intensity, the property known as locality of references [Denning 1980]. This property asserts that only a subset of pages, which compose the virtual address space of a program, is really necessary for its execution within a specific time frame. In other words, the data processed at a time interval are usually concentrated in a few memory pages.

The working set concept [Denning 1968] was also formalized from this property. The working set of a running process is the page subset required for its execution at some time interval. It represents, in a way, the locality property concretization: the subset may be interpreted, by itself, as a locality inherent in the process execution moment.

Such property is important due to the following fact: the locality degree of a program can directly affect its processing time, considering the influence exercised by it on the reuse of resident pages and, consequently, on the memory system performance. An example of temporal locality can be observed when some memory pages are continuously referenced along a period of time. On the other hand, if the memory accesses are restricted to a specific portion of the virtual memory space (that is, the referenced pages are located in nearby addresses), we have an example of spatial locality.

Workload characterization may be essential, in scientific experiments, for the correct interpretation of obtained results. In memory management studies, this characterization includes, among other factors, both the quantity and order of the accesses to each virtual memory address, pointing out locality occurrences along the program execution.

59

The locality degree presented by an application (or by part of it) can be determined using mathematical and/or visual methods. Mathematical methods quantify the locality existence through an algorithm – online or offline – developed from either an analytical or empirical model [Ding and Zhong 2003, Phalke 1995]. Visual methods illustrate, through interpretative graphs, relevant memory access characteristics collected in offline mode. Obviously, results achieved with mathematical methods may be also disposed in a graphical way, depending on the research purpose.

The utilization of visual resources in studies about locality of references is interesting both for performance prediction, and for documentation and didactic description of experimental workloads. In order to enhance the analysis and evaluation of memory systems, we present six new visual characterization graphs. They are first introduced to analyze the memory access behavior of some selected applications and after used in a case study to analyze the performance of modern page replacement algorithms.

This paper is organized in the following way. Section 2 presents the main visual techniques applied to locality analysis, addressing their deficiencies. These deficiencies are dealt with in Section 3 by using new graphs: memory access surfaces, IRR/IRG graphs and IRR histograms. Such graphs are illustrated and explained by representative examples, complemented by a case study in Section 4. Finally, Section 5 concludes the paper, underlining its contributions.

2. Visual Techniques for Locality Analysis

In researches targeted for analyzing locality of references in a clear and didactic way, the visual interpretation of memory access characteristics observed in programs is the most common method to identify patterns. Graphs which provide a general view of programs behavior are widely used, especially access graphs, as they allow empirical inferences.

A memory access graph (or map) consists in the visual representation of a matrix of points that informs which memory pages are referenced by a process along its execution. The horizontal axis corresponds to the process virtual time, measured by number of performed accesses, while the program virtual address space is distributed along the vertical axis. Hatfield and Gerald [Hatfield and Gerald 1971] used graphs of this nature more than 30 years ago to analyze program optimizations that improve the performance of virtual memory systems. Several recent works also introduce memory access graphs to illustrate techniques for pattern recognition [Choi et al 1999, Choi et al 2000, Kim et al 2000] or to examine workload behaviors [Glass and Cao 1997, Markatos 1997, Phalke 1995].

Figure 1 shows a typical example of this kind of graph, where the behavior of Espresso (a circuit simulator application) is exhibited.1 The existence of temporal locality is intuitively identified by a horizontal concentration of adjacent points. In other words, it occurs when a page (or a page subset) is continuously referenced in an execution interval, originating a horizontal line, a blur, or a trace in the graph. The longer the horizontal length of an identified graphical element, the longer the temporal locality will be. Spatial locality can be identified in a similar way, by observing the presence of a vertical concentration of points. This indicates that pages with nearby virtual addresses are accessed in a short amount of time.

1 The respective trace file [Uhlig and Mudge 1997], from which the graph was generated, was collected by Smaragdakis, Kaplan and Wilson [Smaragdakis et al 1999].

60

Diagonal lines reveal the occurrence of sequential accesses: the addresses touched by a diagonal line are sequentially referenced, one by one. A continuous repetition of point patterns in the horizontal direction – either sequential, temporally-clustered or other patterns – suggests a looping, in which accesses to a page subset repeat the same pattern at consecutive intervals. The program working set at a given time can be also estimated by using two arbitrary vertical lines in the graph to select a time interval. All pages associated with the points among those lines take part in the process working set at that interval.

Another relevant kind of graph illustrates the so-called locality surfaces [Grimsrud et al 1996]. Proposed as a technique to quantify temporal and spatial locality in programs, these surfaces are mainly applied to cache memory studies [Sorenson and Flanagan 2001, Sorenson and Flanagan 2002]. Their goal is to detect locality in applications – according to both the reuse of pages and the accessed addresses –, building representative 3D surfaces with specific rules of interpretation.

Locality surfaces characterize the memory accesses from a macroscopic viewpoint, taking into account the complete program. The third dimension of the graph, represented by z-axis, indicates the number of occurrences of specific delay (stack distance) and stride (difference between memory addresses) values obtained among “inter-referenced” pages. According to the surface format, it is possible to conclude whether memory access patterns are predominantly sequential, or looping, or even whether memory accesses exhibit high temporal locality, for instance.

Figure 2 illustrates the locality surface of Espresso. The point concentration at the origin of y-axis shows a presence of strong temporal locality in the more referenced pages, while the distribution along x-axis denotes a certain spatial distance between some intercalated references. The peak around the origin of the surface – coordinate (0, 0, 0) –, however, illustrates that the spatial locality is also intense on average.

2.1. Limitations of traditional visual resources

Although such graphs may be seen as valuable tools for the understanding and exploration of memory access characteristics in workloads, they do not offer enough data for a deeper and more detailed analysis on the impact of these characteristics on the memory system performance. The conclusion is that traditional graphs undoubtedly consist of useful resources, but limited when applied alone.

The size of access graphs is generally reduced by a scale factor, in a way that they can be completely visualized. Thus, many points are superimposed on the graphs, which

Figure 1. Memory access graph (Espresso)

61

may cause a great loss of data. The greater the number of memory accesses and the larger the program address space, the greater the visual compression of graph will be and, consequently, the lower its precision degree.

Despite being an objective and interesting resource, locality surfaces, in turn, do not inform the order in which memory references occur, mixing different execution phases – which may present unique memory access characteristics.

Besides the problems mentioned above, none of these graph models provide detailed data about the reuse of memory pages. The superficiality of the available data makes it difficult to develop reliable predictions – for example, correct estimates of page fault rate in some cases –, which could lead to misinterpretations. There is a lack of visual resources that offer a wider range of relevant data, in the most accurate possible way, while keeping the simplicity required for their purpose. The next section introduces new graphs intended to contribute to addressing this scientific demand.

3. New Proposed Graphs

We propose six new graphs in order to improve the analysis of memory access behaviors. To illustrate these graphs, three trace files are used as workloads:

• Compress [Lee et al 1998]: SPEC95 version of the UNIX compress utility; 2 • Grobner [Smaragdakis et al 1999]: a formula-rewrite program; • Sprite [Jiang and Zhang 2002, Lee et al 1999]: Sprite network file system.

The new graphs, described below, intend to complement the data obtained through traditional visual resources, increasing precision in an empirical analysis (without decreasing its speed) and qualitatively enriching the documentation of experiments about memory management, regarding characterization of workloads. Furthermore, they allow making direct and reliable estimates related to the performance of applications in specific memory configurations.

3.1. Memory access surfaces

The global visualization presented by conventional access graphs is really practical. However, the problem of superimposed points makes them not that accurate. Each point in

2 Collected using Etch instrumentation tool [Romer et al 1997], it presents a greater number of memory accesses than that presented by the others.

Figure 2. Locality surface (Espresso)

62

the graph may represent from only one memory access to a great number of accesses, without the possibility of visual distinction between both cases.

In order to overcome this limitation, we propose the so-called memory access surfaces, or three-dimensional access graphs. Their main advantage is to inform the number of condensed accesses in every point of a two-dimensional graph. This additional information allows identifying sections with high locality of references, i.e. permits distinguishing which pages are more accessed than others at each interval of the program execution.

Figure 3 shows examples of both the conventional memory access graph and the proposed three-dimensional graph of Grobner. The access surface provides more useful information about access distribution on the referenced memory positions. The x and y axes remain unchanged as in 2D graph, while z-axis presents the access density on every point. Several view angles may be used to complement the standard (frontal) visualization. These variations can be applied to deal with the hiding problem of some areas in 3D surfaces. However, despite the advantages over equivalent two-dimensional graphs, they are not employed in related works.

3.2. IRR/IRG graphs and surfaces

Focusing on the temporal aspect, Ding and Zhong [Ding and Zhong 2003] analyze locality as being a function of the number of distinct pages referenced between two references to the same page. This metric is known as reuse distance, stack distance [Mattson et al 1970], or IRR (Inter-Reference Recency) [Jiang and Zhang 2002]. Some authors prefer to consider the virtual time interval between accesses to a memory page, that is, the temporal distance between consecutive references to the same page. This metric is also known as IRG (Inter-Reference Gap) [Phalke 1995] and, despite not being very significant for replacement policies based on recency, it can be valuable for policies based on access frequency.

IRR graph and IRR surface are, respectively, the two- and three-dimensional versions of a kind of graph that presents the LRU stack position of pages accessed during a program execution, considering an unlimited available memory. They also indicate, consequently, the number of distinct pages referenced between consecutive accesses to the same page, and can be still called stack distance map. In LRU model, this metric corresponds to the position which the page occupies in LRU stack – counted from zero – when it is reused.

Figure 3. Memory access graph and memory access surface (Grobner)Figure 3. Memory access graph and memory access surface (Grobner)

63

IRG graphs exhibit the temporal distance between consecutive accesses to the same page. The virtual time is measured by the number of memory accesses. Therefore, this graph presents the time interval to every page to be reused along the program execution.

Figure 4 includes, respectively, IRG graph, IRR graph and IRR surface of the Sprite program. First graph shows that most accesses occur in virtual time intervals between 1 and 5,000 references, although there are some cases of reuse after more than 120,000 references. In other words, despite most accesses to a certain page repeat oneself after a relatively short interval, some pages are irregularly accessed, especially in the final execution phase. The two next graphs – IRR graph and surface – point to same conclusion: most accesses occur to pages that occupy the first 1,000 positions in LRU stack (only 15% of the footprint of Sprite), although many references occur to upper stack positions.

3.3. IRR histograms

The idea behind IRR histograms is not a novelty [Ding and Zhong 2003]. This visual resource allows easily determining the number of page faults – and consequently, predicting the performance – that a workload will present when submitted to a memory system with LRU replacement policy. Such histograms are a global and condensed representation of the same data visualized in IRR surfaces, and can also be drawn in form of a cumulative access graph.

The IRR histograms consider an unlimited stack. By defining a memory size M, all memory accesses on LRU stack positions greater than M (low recency degree) necessarily result in page faults if the adopted page replacement algorithm is LRU or FIFO. Another consideration for the LRU algorithm: it is possible to establish the minimum memory size in order to maintain the page fault rate below a certain limit value. It is just necessary to get a previous workload analysis under the LRU algorithm.

By grouping IRR values in powers-of-two intervals, we obtain the IRR histogram and cumulative graph exhibited in Figure 5. First graph represents the source of the data: IRR surface of Compress, where the access distribution and critical processing phases can be visualized (back view). Table 1, in turn, describes the histogram data in a tabular form. The first column is composed by intervals of LRU stack positions that, except for 0 and 1, follows the notation “a – b ”, which must be interpreted as “from a (inclusive) until b”. Thus, line “2^5 – 2^6 ” contains the accesses in which the referenced page occupied an LRU stack position between 32 and 63. LRU stack positions start in zero, meaning effectively the number of distinct pages referenced between two consecutive accesses to the same page.

Figure 4. IRG graph, IRR graph and IRR surface (Sprite)Figure 4. IRG graph, IRR graph and IRR surface (Sprite)

64

Table 1 indicates that a memory size equal to 8 pages is enough to maintain the page fault rate below 20% for Compress, considering the LRU replacement algorithm. On the other hand, if the memory size is at least 64 pages, it is possible to guarantee that the program will cause a page fault rate as low as 1.5%.

Table 1. IRR data in tabular form (Compress)

Compress IRR (a≤x<b) Accesses Accum.acces. % Acesses % Accum.

0 0 0 0.00 0.00 1 34010633 34010633 26.34 26.34

2^1 - 2^2 48619837 82630470 37.66 64.00

2^2 - 2^3 22342281 104972751 17.30 81.30

2^3 - 2^4 14729136 119701887 11.41 92.71

2^4 - 2^5 4575908 124277795 3.54 96.25

2^5 - 2^6 2966067 127243862 2.30 98.55

2^6 - 2^7 1865244 129109106 1.44 99.99 2^7 - 2^8 3390 129112496 0.003 99.99

2^8 - 2^9 3284 129115780 0.003 100.00

Other kinds of histogram can also be extracted from the three-dimensional graphs. For example, a histogram that informs the number of references to each memory page can be useful to determine how balanced are the accesses in relation to the program virtual address space. If the virtual addresses are grouped, it is possible to predict the performance of cache memories.

A joint analysis of the three representation ways of graphs – 2D, 3D and histograms – presented in this section can enrich the conclusions in any memory access characterization effort.

4. Case Study: Performance of Page Replacement Algorithms

A page fault occurs when a process accesses a virtual page that is not currently in the memory. In this case, the page needs to be immediately allocated for the continuity of execution. If the memory is full, the problem consists in deciding which resident page must be moved back to disk, that is, which page must be replaced by that which is coming from disk. The operating system is responsible for such decision, implementing a page replacement algorithm.

The replacement policies try to explore, each one in its own way, the locality characteristics found in programs, mainly temporal locality. Apart from being an intuitive way of understanding how memory accesses happen, the visual observation of the graphs described in previous sections allows debating and demonstrating, in a simple way,

Figure 5. IRR surface, IRR histogram and IRR cumulative graph (Compress)Figure 5. IRR surface, IRR histogram and IRR cumulative graph (Compress)

65

implications inherent to the access patterns that influence the performance of page replacement algorithms. In this section, we empirically analyze apparent characteristics of Compress, Espresso, Grobner and Sprite, through the respective graphs in figures from 1 to 5, and compare such analysis with the performance results obtained in simulation experiments by the following algorithms:

• Traditional: FIFO (First-In, First-Out), LRU (Least Recently Used), FBR (Frequency-Based Replacement) [Robinson and Devarakonga 1990], and 2Q (Two Queue) [Johnson and Shasha 1994];

• Adaptive: ARC (Adaptive Replacement Cache) [Megiddo and Modha 2003], EELRU (Early Eviction LRU) [Smaragdakis et al 1999], LIRS (Low Inter-Reference Recency Set) [Jiang and Zhang 2002], and LRU-WAR (LRU with Working Area Restriction) [Cassettari and Midorikawa 2004b];

• Online (feasible solutions): Clock [Corbató 1969], CAR (Clock with Adaptive Replacement) [Bansal and Modha 2004], CART (CAR with Temporal filtering) [Bansal and Modha 2004], and 3P (Three Pointers) [Cassettari and Midorikawa 2005].

4.1. Empirical analysis of program behaviors

Spatial locality is positive, in relation to the main memory management, when it is restricted to a reduced page subset – limited by the available memory size –, such as a loop in which a few pages are referenced. A memory access pattern like this one determines that the process working set also remains reduced, leading to a desired reuse of the resident pages. Therefore, if accompanied by strong temporal locality, the spatial locality is much more interesting. Otherwise, the presence or not of spatial locality makes no difference to the large majority of the traditional policies for main memory management. Even so, it is very relevant for the efficient management of cache memories, for example.

The presence of temporal locality, on the other hand, is invariably beneficial because it involves the punctual reuse of the resident pages. LRU and its approximations (like Clock) are examples of page replacement algorithms that intend to predict and exploit such property in order to reduce the page faults: their criterion is always to replace memory pages that have not been recently referenced; in other words, pages which do not show the tendency of continuous accesses at the current execution moment. The greater the degree of temporal locality in a program, the closer to optimum case [Belady 1966] the performance of LRU-based algorithms will be, since the presence of sequential access patterns can deteriorate the LRU performance. However, the temporal locality is not always high in generic and/or scientific applications.

The graphs in figures 1 and 2 indicate that Espresso have a medium locality degree. More precisely, some pages are massively referenced and exhibit high temporal locality, while other pages are not much referenced, presenting different access patterns. Despite this non-uniformity, LRU tends to achieve a satisfactory performance due to the apparent importance of the more accessed pages.

In the example of Figure 3, the three-dimensional graph shows the existence of an access concentration when pages enter the program working set, i.e. near their first accesses. After a short period of time, pages continue to be accessed, but with a less frequency. Such behavior is valid for all referenced pages, except the ones with address below 100, highly accessed along all the Grobner execution. Another important result can be obtained from the 3D surface: most pages enter the program working set in a sequential

66

way, forming a diagonal “wall” in the graph. This behavior (hidden in the two-dimensional graph) is better addressed by adaptive algorithms that deal efficiently with sequential accesses, such as EELRU, LIRS and LRU-WAR, among others.

The memory access characteristics of Sprite, visualized in Figure 4, favor the performance of LRU and LRU-based algorithms, since low IRR values mean high temporal locality. Frequency-based algorithms, such as FBR, may also have a good performance in this case due to the predominance of short reuse intervals in the IRG graph.

As for Compress, a strong temporal locality can be noted in the IRR surface presented by Figure 5. The “wall” in the left side of the surface points to a high concentration of memory accesses to the lower positions of the LRU stack. This fact is confirmed and detailed by the IRR histogram and IRR cumulative graph, which inform, for instance, that 64% of the accesses occur in the first four positions of the stack.

4.2. Simulation results

Table 2 depicts the performance results obtained from the simulation experiments listed in Table 3. The percentages in Table 2 correspond to the average increase in page faults that each algorithm generates in relation to the optimum case [Belady 1966]. Table 3, in turn, details the simulations realized with the four workloads considered in this study: each row presents a set of simulations (in which the memory size vary), every one associated with a workload, whose results were used for calculating the average performance percentages of the algorithms. The higher the percentage, the worse the algorithm average performance will be.

Table 2. Average results obtained from the simulations detailed in Table 3

Average Percentage Increase in Page Faults from the Optimum Case Trace

FIFO LRU FBR 2Q ARC EELRU LIRS LRU-WAR CLOCK CAR CART 3P

Compress 191.97% 121.07% 96.24% 94.79% 120.81% 72.63% 122.41% 79.46% 145.99% 127.91% 103.95% 94.66%

Espresso 174.85% 67.73% 62.88% 80.27% 68.19% 52.24% 66.84% 66.88% 114.98% 114.14% 102.99% 80.60%

Gnobner 233.98% 139.12% 115.66% 116.53% 139.17% 72.91% 113.86% 125.40% 139.20% 139.34% 118.21% 108.06%

Sprite 33.29% 17.70% 17.90% 25.21% 22.93% 19.04% 40.72% 21.36% 18.67% 23.68% 26.27% 25.13%

The data in Table 2 demonstrate that the expected effects of certain access patterns on the performance of page replacement algorithms have generally been confirmed. Algorithms which deal effectively with sequential accesses tend to achieve better performances in the Grobner execution, although this tendency does not always hold. The algorithms ARC and CAR, for example, obtain worse results than expected. Among other factors, the co-existence of sequential accesses and constant page reuse (see Figure 3) may explain such phenomenon.

Sprite presents pages with strong temporal locality being alternating with less-accessed pages. This behavior does not favor the performance of adaptive algorithms, since such algorithms usually expect for regular and easily identifiable access patterns.

Table 3. Details of the experiments

Trace Number of Distinct

Pages Memory Sizes Considered in Simulations

Number of Simulations

Compress

396

10, 15, 20, ..., 385, 390, 395

78 Espresso

77

10, 11, 12, ..., 73, 74, 75

66 Grobner

67

10, 11, 12, ..., 63, 64, 65

56 Sprite

7075

100, 200, 300, .... 6800, 6900, 7000

70

67

Finally, Compress and Espresso also exhibit high temporal locality. However, the alternating with less-accessed pages happens in a more predictable way. In other words: it happens occasionally, allowing a more accurate distinction among different processing phases by adaptive algorithms. Furthermore, the virtual address space of these workloads is smaller and their execution time is longer than those of Sprite, suggesting a higher access clustering.

FBR outperforms LRU in three of the four cases. This probably occurs because there is, in all of them, a clear access concentration in a few pages, which is not a universal rule. The best average performance is achieved by EELRU, while 3P is the best online algorithm, in average, when these workloads are performed.

5. Conclusion

We propose, in this paper, the use of new graphs in studies about locality of references. Among other aspects related to the utilization of the main memory, the graphs allow to visually characterize:

• Memory access patterns; • Temporal and spatial localities, along with their amplitude and density; • The real distribution of memory accesses in the virtual address space used by the

programs; • The reuse frequency of memory pages (temporal distance between accesses); • The LRU stack position that pages occupy when they are accessed.

In addition to the above aspects, the graphs can supply enough data to performance prediction of some page replacement algorithms. Such kind of analysis is very important not only to better understand the dynamic behavior of workloads as for the use of memory, but also to provide some inspiration to the development of new techniques for memory management that can deal with the diversity of access patterns presented by the applications.

To generate the graphs in this paper, we have used data collected by tools of the Elephantools toolkit [Cassettari and Midorikawa 2004a]. We hope that the suggested visual resources can be applied both to the development of new researches on memory management, and to a clear and didactic way for documentation of workload behavior regarding memory utilization.

Future works include studies about new techniques and metrics for workload characterization. We can cite a study of a new metric called IRR–n, which is the number of distinct pages referenced among “n+1” consecutive accesses to the same page. This study requires the development of new tools, along with the enhancement of the tools available in Elephantools.

6. Acknowledgments

We would like to thank all people who have contributed in our studies. We specially thank Scott Kaplan for making available most of the trace files used in our experiments and Yannis Smaragdakis for providing us a simulator of the EELRU adaptive page replacement algorithm. We also want to thank Elizabeth Sorenson for kindly sending us her locality surface generator.

68

7. References Bansal, S. and Modha, D. S. (2004) “CAR: Clock with Adaptive Replacement”, In Proc. of

the USENIX Conference on File and Storage Technologies (FAST’04), San Francisco, pp.187-200.

Belady, L.A. (1966) “A Study of Replacement Algorithms for a Virtual Storage Computer”, IBM Systems Journal, 5(2), pp.78-101.

Cassettari, H.H. and Midorikawa, E.T. (2004) “Characterization of Workloads for Studies on Virtual Memory Management”, In Proc. of the SBC Workshop on Computing Systems and Communication Performance (III WPerformance), Salvador, CD-ROM. In Portuguese.

Cassettari, H.H. and Midorikawa, E.T. (2004) “The LRU-WAR Adaptive Page Replacement Algorithm: Exploring the LRU Model with Sequential Access Detection”, In Proc. of the SBC Workshop on Operating Systems (I WSO), Salvador, CD-ROM. In Portuguese.

Cassettari, H.H. and Midorikawa, E.T. (2005) “The 3P Page Replacement Algorithm: Making Clock Adaptive”, In Proc. of the SBC Workshop on Operating Systems (II WSO), São Leopoldo, In Portuguese.

Choi, J. et al. (1999) “An Implementation Study of a Detection-Based Adaptive Block Replacement”, In Proc. of the USENIX Annual Technical Conference (USENIX’99), Monterey, pp.239-252.

Choi, J. et al. (2000) “Towards Application/File-Level Characterization of Block References: a Case for Fine-Grained Buffer Management”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’00), Santa Clara, pp.286-295.

Corbató, F. J. (1969) “A Paging Experiments with the Multics System”, In Festschrift in Honor of P.M. Morse, MIT Press, Cambridge, pp.217-228.

Denning, P. J. (1968) “The Working Set Model for Program Behavior”, Communications of the ACM, 11(5), pp.323-333.

Denning, P.J. (1980) “Working Sets: Past and Present”, IEEE Transactions on Software Engineering, SE-6(1), pp.64-84.

Ding, C. and Zhong, Y. (2003) “Predicting Whole-Program Locality through Reuse Distance Analysis”, In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’03), San Diego, pp.245-257.

Glass, G. and Cao, P. (1997) “Adaptive Page Replacement Based on Memory Reference Behavior”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’97), Seattle, pp.115-126.

Grimsrud K. et al. (1996) “Locality as a Visualization Tool”, IEEE Transactions on Computers, 45(11), pp.1319-1326.

Hatfield, D.J. and Gerald, J. (1971) “Program Restructuring for Virtual Memory”, IBM Systems Journal, 10(3), pp.168-192.

Jiang, S. and Zhang, X. (2002) “LIRS: An Efficient Low Inter-Reference Recency Set Replacement Policy to Improve Buffer Cache Performance”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’02), Marina Del Rey, pp.31-42.

69

Johnson, T. and Shasha, D. (1994) “2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm”, In Proc. of the International Conference on Very Large Data Bases (VLDB’94), Santiago, pp.439-450.

Kim, J. M. et al. (2000) “A Low-Overhead High-Performance Unified Buffer Management Scheme that Exploits Sequential and Looping References”, In Proc. of the USENIX Symposium on Operating System Design and Implementation (OSDI’00), San Diego, pp.119-134.

Lee, D. et al. (1999) “On the Existence of a Spectrum of Policies that Subsumes the Least Recently Used (LRU) and Least Frequently Used (LFU) Policies”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’99), Atlanta, pp.134-143.

Lee, D. C. et al. (1998) “Execution Characteristics of Desktop Applications on Windows NT”, In Proc. of the Annual International Symposium on Computer Architecture (ISCA’98), Barcelona, pp.27-38.

Markatos, E. P. (1997) “Visualizing Working Sets”, ACM SIGOPS Operating Systems Review, 31(4), pp.3-11.

Mattson, R. L. et al. (1970) “Evaluation Techniques and Storage Hierarchies”, IBM Systems Journal, 9(2), pp.78-117.

Megiddo, N. and Modha, D. S. (2003) “ARC: A Self-Tuning, Low Overhead Replacement Cache”, In Proc. of the USENIX Conference on File and Storage Technologies (FAST’03), San Francisco, pp.115-130.

Phalke, V. (1995) Modeling and Managing Program References in a Memory Hierarchy, Ph.D. Thesis, Rutgers University, New Brunswick.

Robinson, J. T. and Devarakonda, M. V. (1990) “Data Cache Management Using Frequency-Based Replacement”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’90), Boulder, pp.134-142.

Romer, T. et al. (1997) “Instrumentation and Optimization of Win32/Intel Executables Using Etch”, In Proc. of the USENIX Windows NT Workshop, Seattle, pp.1-8.

Smaragdakis, Y. Kaplan, S. and Wilson, P. (1999) “EELRU: Simple and Effective Adaptive Page Replacement”, In Proc. of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’99), Atlanta, pp.122-133.

Sorenson, E. S. and Flanagan, J. K. (2001) “Cache Characterization Surfaces and Predicting Workload Miss Rates”, In Proc. of the IEEE Annual Workshop on Workload Characterization (WWC-4), Austin, pp.129-139.

Sorenson, E. S. and Flanagan, J. K. (2002) “Evaluating Synthetic Trace Models Using Locality Surfaces”, In Proc. of the IEEE Annual Workshop on Workload Characterization (WWC-5), Austin, pp.23-33.

Uhlig, R. A. and Mudge, T.N. (1997) “Trace-Driven Memory Simulation: a Survey”, ACM Computing Surveys, 29(2), pp.128-170.

70

new visual characterization graphs for memory system ... · techniques applied to locality...

Documents